Related
I'm running on R2012a version. I tried to write a function that imitates randi using rand (only rand), producing the same output when the same arguments are passed and the same seed is provided. I tried something with the command window and here's what I got:
>> s = rng;
>> R1 = randi([2 20], 3, 5)
R1 =
2 16 11 15 14
10 17 10 16 14
9 5 14 7 5
>> rng(s)
>> R2 = 2+18*rand(3, 5)
R2 =
2.6200 15.7793 10.8158 14.7686 14.2346
9.8974 16.3136 10.0206 15.5844 13.7918
8.8681 5.3637 13.6336 6.9685 4.9270
>>
A swift comparison led me to believe that there's some link between the two: each integer in R1 is within plus or minus unity from the corresponding element in R2. Nonetheless, I failed to go any further: I checked for ceiling, flooring, fixing and rounding but neither of them seems to work.
randi([2 20]) generates integers between 2 and 20, both included. That is, it can generate 19 different values, not 18.
19 * rand
generates values uniformly distributed within the half-open interval [0,19), flooring it gives you uniformly distributed integers in the range [0,18].
Thus, in general,
x = randi([a,b]]);
y = rand * (b-a+1) + a;
should yield numbers with the same property. From OP’s experiment it looks like they might generate the same sequence, but this cannot be guaranteed, and it likely doesn't.
Why? It is likely that randi is not implemented in terms of rand, but it’s underlying random generator, which produces integers. To go from a random integer x in a large range ([0,N-1]) to one in a small range ([0,n-1]), you would normally use the modulo operator (mod(x,N)) or a floored division like above, but remove a small subset of the values that skew the distribution. This other anser gives a detailed explanation. I like to think of it in terms of examples:
Say random values are in the range [0,2^16-1] (N=2^16) and you want values in the range [0,18] (n=19). mod(19,2^16)=5. That is, the largest 5 values that can be generated by the random number generator are mapped to the lowest 5 values of the output range (assuming the modulo method), leaving those numbers slightly more likely to be generated than the rest of the output range. These lowest 5 values have a chance floor(N/n)+1, whereas the rest has a chance floor(N/n). This is bad. [Using floored division instead of modulo yields a different distribution of the unevenness, but the end result is the same: some numbers are slightly more likely than others.]
To solve this issue, a correct implementation does as follows: if you get one of the values in the random generator that are floor(N/n)*n or higher, you need to throw it away and try again. This is a very small chance, of course, with a typical random number generator that uses N=2^64.
Though we don't know how randi is implemented, we can be fairly certain that it follows the correct implementation described here. So your sequence based on rand might be right for millions of numbers, but then start deviating.
Interestingly enough, Octave's randi is implemented as an M-file, so we can see how they do it. And it turns out it uses the wrong algorithm shown at the top of this answer, based on rand:
ri = imin + floor ( (imax-imin+1)*rand (varargin{:}) );
Thus, Octave's randi is biased!
How can one create an integer random number with Normal distribution in an interval in Matlab? Could anyone provide an answer?
I know how to create a random number ,say y, with Normal distribution:
std = 5;
mean = 500;
y = std.*randn + mean;
But it is not an integer number and also it is not in a specific interval
If you want integers, you can use randn and round the numbers. However, your second question is kind of weird.
Normal distribution does not have a definite interval. You can only define a "confidence interval" around the mean. For example, 99.7% of the distribution is contained within 3 standard deviations from the mean. But it does not mean that you have a strict interval, it means probability of seeing a number beyond 3xStandard deviations is just too low. Let's say I generated 10000 numbers with mean=100 and std.deviation=10 and rounded them. Then I expect to see numbers between 70 and 130. There might be numbers beyond this interval, but their frequencies(~probabilities) will be low.
mu=100; sigma=10; figure,hist(round(normrnd(mu,sigma,10000,1)),100)
Choose the number from a binomial(N, 0.5) distribution for large N. This will yield something that is as close as you might be able to get to a "normal distribution of integers". The mean will be N/2 and the std deviation N/4. Subtract N/2 to center it about 0.
Say N = 100. Then to generate a sample, you could do:
k = sum(randi(2, [100,1]) - 1);
or:
k = sum(rand(100,1) < 0.5);
You could use randn and convert to integer by rounding the output number. Repeat until the number is in range [a,b] you are interested in.
It will likely work fine for wide enough range around the middle, but you will be doing tons of attempts when you want to look at a narrow part of the tail.
Other option is to get any integer from whatever range with equal probability and convert that to gaussian-like in your range. Say numbers 0->10 would become a, 11-50 would be a+1 ... maxint-10:maxint would be b.
I'm looking for a function that will generate random values between 0 and 1, inclusive. I have generated 120,000 random values by using rand() function in octave, but haven't once got the values 0 or 1 as output. Does rand() ever produce such values? If not, is there any other function I can use to achieve the desired result?
If you read the documentation of rand in both Octave and MATLAB, it is an open interval between (0,1), so no, it shouldn't generate the numbers 0 or 1.
However, you can perhaps generate a set of random integers, then normalize the values so that they lie between [0,1]. So perhaps use something like randi (MATLAB docs, Octave docs) where it generates integer values from 1 up to a given maximum. With this, define this maximum number, then subtract by 1 and divide by this offset maximum to get values between [0,1] inclusive:
max_num = 10000; %// Define maximum number
N = 1000; %// Define size of vector
out = (randi(max_num, N, 1) - 1) / (max_num - 1); %// Output
If you want this to act more like rand but including 0 and 1, make the max_num variable quite large.
Mathematically, if you sample from a (continuous) uniform distribution on the closed interval [0 1], values 0 and 1 (or any value, in fact) have probability strictly zero.
Programmatically,
If you have a random generator that produces values of type double on the closed interval [0 1], the probability of getting the value 0, or 1, is not zero, but it's so small it can be neglected.
If the random generator produces values from the open interval (0, 1), the probability of getting a value 0, or 1, is strictly zero.
So the probability is either strictly zero or so small it can be neglected. Therefore, you shouldn't worry about that: in either case the probability is zero for practical purposes. Even if rand were of type (1) above, and thus could produce 0 and 1, it would produce them with probability so small that you would "never" see those values.
Does that sound strange? Well, that happens with any number. You "never" see rand ever outputting exactly 1/4, either. There are so many possible outputs, all of them equally likely, that the probability of any given output is virtually zero.
rand produces numbers from the open interval (0,1), which does not include 0 or 1, so you should never get those values.. This was more clearly documented in previous versions, but it's still stated in the help text for rand (type help rand rather than doc rand).
However, since it produces doubles, there are only a finite number of values that it will actually produce. The precise set varies depending on the RNG algorithm used. For Mersenne twister, the default algorithm, the possible values are all multiples of 2^(-53), within the open interval (0,1). (See doc RandStream.list, and then "Choosing a Random Number Generator" for info on other generators).
Note that 2^(-53) is eps/2. Therefore, it's equivalent to drawing from the closed interval [2^(-53), 1-2^(-53)], or [eps/2, 1-eps/2].
You can scale this interval to [0,1] by subtracting eps/2 and dividing by 1-eps. (Use format hex to display enough precision to check that at the bit level).
So x = (rand-eps/2)/(1-eps) should give you values on the closed interval [0,1].
But I should give a word of caution: they've put a lot of effort into making sure that output of rand gives an appropriate distribution of any given double within (0,1), and I don't think you're going to get the same nice properties on [0,1] if you apply the scaling I suggested. My knowledge of floating-point math and RNGs isn't up to explaining why, or what you might do about that.
I just tried this:
octave:1> max(rand(10000000,1))
ans = 1.00000
octave:2> min(rand(10000000,1))
ans = 3.3788e-08
Did not give me 0 strictly, so watch out for floating point operations.
Edit
Even though I said, watch out for floating point operations I did fall for that. As #eigenchris pointed out:
format long g
octave:1> a=max(rand(1000000,1))
a = 0.999999711020176
It yields a floating number close to one, not equal, as you can see now after changing the precision, as #rayryeng suggested.
Although not direct to the question here, I find it helpful to link to this SO post Octave - random generate number that has a one liner to generate 1s and 0s using r = rand > 0.5.
Having read carefully the previous question
Random numbers that add to 100: Matlab
I am struggling to solve a similar but slightly more complex problem.
I would like to create an array of n elements that sums to 1, however I want an added constraint that the minimum increment (or if you like number of significant figures) for each element is fixed.
For example if I want 10 numbers that sum to 1 without any constraint the following works perfectly:
num_stocks=10;
num_simulations=100000;
temp = [zeros(num_simulations,1),sort(rand(num_simulations,num_stocks-1),2),ones(num_simulations,1)];
weights = diff(temp,[],2);
I foolishly thought that by scaling this I could add the constraint as follows
num_stocks=10;
min_increment=0.001;
num_simulations=100000;
scaling=1/min_increment;
temp2 = [zeros(num_simulations,1),sort(round(rand(num_simulations,num_stocks-1)*scaling)/scaling,2),ones(num_simulations,1)];
weights2 = diff(temp2,[],2);
However though this works for small values of n & small values of increment, if for example n=1,000 & the increment is 0.1% then over a large number of trials the first and last numbers have a mean which is consistently below 0.1%.
I am sure there is a logical explanation/solution to this but I have been tearing my hair out to try & find it & wondered anybody would be so kind as to point me in the right direction. To put the problem into context create random stock portfolios (hence the sum to 1).
Thanks in advance
Thank you for the responses so far, just to clarify (as I think my initial question was perhaps badly phrased), it is the weights that have a fixed increment of 0.1% so 0%, 0.1%, 0.2% etc.
I did try using integers initially
num_stocks=1000;
min_increment=0.001;
num_simulations=100000;
scaling=1/min_increment;
temp = [zeros(num_simulations,1),sort(randi([0 scaling],num_simulations,num_stocks-1),2),ones(num_simulations,1)*scaling];
weights = (diff(temp,[],2)/scaling);
test=mean(weights);
but this was worse, the mean for the 1st & last weights is well below 0.1%.....
Edit to reflect excellent answer by Floris & clarify
The original code I was using to solve this problem (before finding this forum) was
function x = monkey_weights_original(simulations,stocks)
stockmatrix=1:stocks;
base_weight=1/stocks;
r=randi(stocks,stocks,simulations);
x=histc(r,stockmatrix)*base_weight;
end
This runs very fast, which was important considering I want to run a total of 10,000,000 simulations, 10,000 simulations on 1,000 stocks takes just over 2 seconds with a single core & I am running the whole code on an 8 core machine using the parallel toolbox.
It also gives exactly the distribution I was looking for in terms of means, and I think that it is just as likely to get a portfolio that is 100% in 1 stock as it is to geta portfolio that is 0.1% in every stock (though I'm happy to be corrected).
My issue issue is that although it works for 1,000 stocks & an increment of 0.1% and I guess it works for 100 stocks & an increment of 1%, as the number of stocks decreases then each pick becomes a very large percentage (in the extreme with 2 stocks you will always get a 50/50 portfolio).
In effect I think this solution is like the binomial solution Floris suggests (but more limited)
However my question has arrisen because I would like to make my approach more flexible & have the possibility of say 3 stocks & an increment of 1% which my current code will not handle correctly, hence how I stumbled accross the original question on stackoverflow
Floris's recursive approach will get to the right answer, but the speed will be a major issue considering the scale of the problem.
An example of the original research is here
http://www.huffingtonpost.com/2013/04/05/monkeys-stocks-study_n_3021285.html
I am currently working on extending it with more flexibility on portfolio weights & numbers of stock in the index, but it appears my programming & probability theory ability are a limiting factor.......
One problem I can see is that your formula allows for numbers to be zero - when the rounding operation results in two consecutive numbers to be the same after sorting. Not sure if you consider that a problem - but I suggest you think about it (it would mean your model portfolio has fewer than N stocks in it since the contribution of one of the stocks would be zero).
The other thing to note is that the probability of getting the extreme values in your distribution is half of what you want them to be: If you have uniformly distributed numbers from 0 to 1000, and you round them, the numbers that round to 0 were in the interval [0 0.5>; the ones that round to 1 came from [0.5 1.5> - twice as big. The last number (rounding to 1000) is again from a smaller interval: [999.5 1000]. Thus you will not get the first and last number as often as you think. If instead of round you use floor I think you will get the answer you expect.
EDIT
I thought about this some more, and came up with a slow but (I think) accurate method for doing this. The basic idea is this:
Think in terms of integers; rather than dividing the interval 0 - 1 in steps of 0.001, divide the interval 0 - 1000 in integer steps
If we try to divide N into m intervals, the mean size of a step should be N / m; but being integer, we would expect the intervals to be binomially distributed
This suggests an algorithm in which we choose the first interval as a binomially distributed variate with mean (N/m) - call the first value v1; then divide the remaining interval N - v1 into m-1 steps; we can do so recursively.
The following code implements this:
% random integers adding up to a definite sum
function r = randomInt(n, limit)
% returns an array of n random integers
% whose sum is limit
% calls itself recursively; slow but accurate
if n>1
v = binomialRandom(limit, 1 / n);
r = [v randomInt(n-1, limit - v)];
else
r = limit;
end
function b = binomialRandom(N, p)
b = sum(rand(1,N)<p); % slow but direct
To get 10000 instances, you run this as follows:
tic
portfolio = zeros(10000, 10);
for ii = 1:10000
portfolio(ii,:) = randomInt(10, 1000);
end
toc
This ran in 3.8 seconds on a modest machine (single thread) - of course the method for obtaining a binomially distributed random variate is the thing slowing it down; there are statistical toolboxes with more efficient functions but I don't have one. If you increase the granularity (for example, by setting limit=10000) it will slow down more since you increase the number of random number samples that are generated; with limit = 10000 the above loop took 13.3 seconds to complete.
As a test, I found mean(portfolio)' and std(portfolio)' as follows (with limit=1000):
100.20 9.446
99.90 9.547
100.09 9.456
100.00 9.548
100.01 9.356
100.00 9.484
99.69 9.639
100.06 9.493
99.94 9.599
100.11 9.453
This looks like a pretty convincing "flat" distribution to me. We would expect the numbers to be binomially distributed with a mean of 100, and standard deviation of sqrt(p*(1-p)*n). In this case, p=0.1 so we expect s = 9.4868. The values I actually got were again quite close.
I realize that this is inefficient for large values of limit, and I made no attempt at efficiency. I find that clarity trumps speed when you develop something new. But for instance you could pre-compute the cumulative binomial distributions for p=1./(1:10), then do a random lookup; but if you are just going to do this once, for 100,000 instances, it will run in under a minute; unless you intend to do it many times, I wouldn't bother. But if anyone wants to improve this code I'd be happy to hear from them.
Eventually I have solved this problem!
I found a paper by 2 academics at John Hopkins University "Sampling Uniformly From The Unit Simplex"
http://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf
In the paper they outline how naive algorthms don't work, in a way very similar to woodchips answer to the Random numbers that add to 100 question. They then go on to show that the method suggested by David Schwartz can also be slightly biased and propose a modified algorithm which appear to work.
If you want x numbers that sum to y
Sample uniformly x-1 random numbers from the range 1 to x+y-1 without replacement
Sort them
Add a zero at the beginning & x+y at the end
difference them & subtract 1 from each value
If you want to scale them as I do, then divide by y
It took me a while to realise why this works when the original approach didn't and it come down to the probability of getting a zero weight (as highlighted by Floris in his answer). To get a zero weight in the original version for all but the 1st or last weights your random numbers had to have 2 values the same but for the 1st & last ones then a random number of zero or the maximum number would result in a zero weight which is more likely.
In the revised algorithm, zero & the maximum number are not in the set of random choices & a zero weight occurs only if you select two consecutive numbers which is equally likely for every position.
I coded it up in Matlab as follows
function weights = unbiased_monkey_weights(num_simulations,num_stocks,min_increment)
scaling=1/min_increment;
sample=NaN(num_simulations,num_stocks-1);
for i=1:num_simulations
allcomb=randperm(scaling+num_stocks-1);
sample(i,:)=allcomb(1:num_stocks-1);
end
temp = [zeros(num_simulations,1),sort(sample,2),ones(num_simulations,1)*(scaling+num_stocks)];
weights = (diff(temp,[],2)-1)/scaling;
end
Obviously the loop is a bit clunky and as I'm using the 2009 version the randperm function only allows you to generate permutations of the whole set, however despite this I can run 10,000 simulations for 1,000 numbers in 5 seconds on my clunky laptop which is fast enough.
The mean weights are now correct & as a quick test I replicated woodchips generating 3 numbers that sum to 1 with the minimum increment being 0.01% & it also look right
Thank you all for your help and I hope this solution is useful to somebody else in the future
The simple answer is to use the schemes that work well with NO minimum increment, then transform the problem. As always, be careful. Some methods do NOT yield uniform sets of numbers.
Thus, suppose I want 11 numbers that sum to 100, with a constraint of a minimum increment of 5. I would first find 11 numbers that sum to 45, with no lower bound on the samples (other than zero.) I could use a tool from the file exchange for this. Simplest is to simply sample 10 numbers in the interval [0,45]. Sort them, then find the differences.
X = diff([0,sort(rand(1,10)),1]*45);
The vector X is a sample of numbers that sums to 45. But the vector Y sums to 100, with a minimum value of 5.
Y = X + 5;
Of course, this is trivially vectorized if you wish to find multiple sets of numbers with the given constraint.
I need to generate random numbers with following properties.
Min must be 1
Max must be 9
Average (mean) is 6.00 (or something else)
Random number must be Integer (positive) only
I have tried several syntaxes but nothing works, for example
r=1+8.*rand(100,1);
This gives me a random number between 1-9 but it's not an integer (for example 5.607 or 4.391) and each time I calculate the mean it varies.
You may be able to define a function that satisfies your requirements based on Matlab's randi function. But be careful, it is easy to define functions of random number generators which do not produce random numbers.
Another approach might suit -- create a probability distribution to meet your requirements. In this case you need a vector of 9 floating-point numbers which sum to 1 and which, individually, express the probability of the i-th integer occurring. For example, a distribution might be described by the following vector:
[0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1]
These split the interval [0,1] into 9 parts. Then, take your favourite rng which generates floating-point numbers in the range [0,1) and generate a number, suppose it is 0.45. Read along the interval from 0 to 1 and you find that this is in the 5-th interval, so return the integer 5.
Obviously, I've been too lazy to give you a vector which gives 6 as the mean of the distribution, but that shouldn't be too hard for you to figure out.
Here is an algorithm with a loop to reach a required mean xmean (with required precision xeps) by regenerating a random number from one half of a vector to another according to mean at current iteration. With my tests it reached the mean pretty quick.
n = 100;
xmean = 6;
xmin = 1;
xmax = 9;
xeps = 0.01;
x = randi([xmin xmax],n,1);
while abs(xmean - mean(x)) >= xeps
if xmean > mean(x)
x(find(x < xmean,1)) = randi([xmean xmax]);
elseif xmean < mean(x)
x(find(x > xmean,1)) = randi([xmin xmean]);
end
end
x is the output you need.
You can use randi to get random integers
You could use floor to truncate your random numbers to integer values only:
r = 1 + floor(9 * rand(100,1));
Obtaining a specified mean is a little trickier; it depends what kind of distribution you're after.
If the distribution is not important and all you're interested in is the mean, then there's a particularly simple function that does that:
function x=myrand
x=6;
end
Before you can design your random number generator you need to specify the distribution it should draw from. You've only partially done that: i.e., you specified it draws from integers in [1,9] and that it has a mean that you want to be able to specify. That still leaves an infinity of distributions to chose among. What other properties do you want your distribution to have?
Edit following comment: The mean of any finite sample from a probability distribution - the so-called sample mean - will only approximate the distribution's mean. There is no way around that.
That having been said, the simplest (in the maximum entropy sense) distribution over the integers in the domain [1,9] is the exponential distribution: i.e.,
p = #(n,x)(exp(-x*n)./sum(exp(-x*(1:9))));
The parameter x determines the distribution mean. The corresponding cumulative distribution is
c = cumsum(p(1:9,x));
To draw from the distribution p you can draw a random number from [0,1] and find what sub-interval of c it falls in: i.e.,
samp = arrayfun(#(y)find(y<c,1),rand(n,m));
will return an [n,m] array of integers drawn from p.