Random numbers that add to 100: Matlab - matlab

[I'm splitting a population number into different matrices and want to test my code using random numbers for now.]
Quick question guys and thanks for your help in advance -
If I use;
100*rand(9,1)
What is the best way to make these 9 numbers add to 100?
I'd like 9 random numbers between 0 and 100 that add up to 100.
Is there an inbuilt command that does this because I can't seem to find it.

I see the mistake so often, the suggestion that to generate random numbers with a given sum, one just uses a uniform random set, and just scale them. But is the result truly uniformly random if you do it that way?
Try this simple test in two dimensions. Generate a huge random sample, then scale them to sum to 1. I'll use bsxfun to do the scaling.
xy = rand(10000000,2);
xy = bsxfun(#times,xy,1./sum(xy,2));
hist(xy(:,1),100)
If they were truly uniformly random, then the x coordinate would be uniform, as would the y coordinate. Any value would be equally likely to happen. In effect, for two points to sum to 1 they must lie along the line that connects the two points (0,1), (1,0) in the (x,y) plane. For the points to be uniform, any point along that line must be equally likely.
Clearly uniformity fails when I use the scaling solution. Any point on that line is NOT equally likely. We can see the same thing happening in 3-dimensions. See that in the 3-d figure here, the points in the center of the triangular region are more densely packed. This is a reflection of non-uniformity.
xyz = rand(10000,3);
xyz = bsxfun(#times,xyz,1./sum(xyz,2));
plot3(xyz(:,1),xyz(:,2),xyz(:,3),'.')
view(70,35)
box on
grid on
Again, the simple scaling solution fails. It simply does NOT produce truly uniform results over the domain of interest.
Can we do better? Well, yes. A simple solution in 2-d is to generate a single random number that designates the distance along the line connecting the points (0,1) and 1,0).
t = rand(10000000,1);
xy = t*[0 1] + (1-t)*[1 0];
hist(xy(:,1),100)
It can be shown that ANY point along the line defined by the equation x+y = 1, in the unit square, is now equally likely to have been chosen. This is reflected by the nice, flat histogram.
Does the sort trick suggested by David Schwartz work in n-dimensions? Clearly it does so in 2-d, and the figure below suggests that it does so in 3-dimensions. Without deep thought on the matter, I believe that it will work for this basic case in question, in n-dimensions.
n = 10000;
uv = [zeros(n,1),sort(rand(n,2),2),ones(n,1)];
xyz = diff(uv,[],2);
plot3(xyz(:,1),xyz(:,2),xyz(:,3),'.')
box on
grid on
view(70,35)
One can also download the function randfixedsum from the file exchange, Roger Stafford's contribution. This is a more general solution to generate truly uniform random sets in the unit hyper-cube, with any given fixed sum. Thus, to generate random sets of points that lie in the unit 3-cube, subject to the constraint they sum to 1.25...
xyz = randfixedsum(3,10000,1.25,0,1)';
plot3(xyz(:,1),xyz(:,2),xyz(:,3),'.')
view(70,35)
box on
grid on

One simple way is to pick 8 random numbers between 0 and 100. Add 0 and 100 to the list to give 10 numbers. Sort them. Then output the difference between each successive pair of numbers. For example, here's 8 random numbers between 0 and 100:
96, 38, 95, 5, 13, 57, 13, 20
So add 0 and 100 and sort.
0, 5, 13, 13, 20, 38, 57, 95, 96, 100
Now subtract:
5-0 = 5
13-5 = 8
13-13 = 0
20-13 = 7
38-20 = 18
57-38 = 19
95-57 = 38
96-95 = 1
100-96 = 4
And there you have it, nine numbers that sum to 100: 0, 1, 4, 5, 7, 8, 18, 19, 38. That I got a zero and a one was just a strange bit of luck.

It is not too late to give the right answer
Let's talk about sampling X1...XN in the range [0...1] such that Sum(X1, ..., XN) is equal to 1. Then you could rescale it to 100
This is called Dirichlet distribution, and below is the code to sample from it. Simplest case is when all parameters are equal to 1, then all marginal distributions for X1, ..., XN would be U(0,1). In general case, with parameters different from 1s, marginal distributions might have peaks.
----------------- taken from here ---------------------
The Dirichlet is a vector of unit-scale gamma random variables, normalized by their sum. So, with no error checking, this will get you that:
a = [1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0]; // 9 numbers to sample
n = 10000;
r = drchrnd(a,n)
function r = drchrnd(a,n)
p = length(a);
r = gamrnd(repmat(a,n,1),1,n,p);
r = r ./ repmat(sum(r,2),1,p);

Take a list of N - 1 numbers, create a list of N + 1 numbers by inserting 0 and 100, sort the list, and diff them down to a total of N numbers.

Related

Draw non full matrix of random numbers

I am doing a Monte-Carlo simulation, where each repetition requires the sum or product of a random number of random variables. My problem is how to do this efficiently as the entire simulation should be as vectorized as possible.
For example, say we want to take the sum of 5, 10 and 3 random numbers, represented by the vector len = [5;10;3]. Then what I am currently doing is drawing a full matrix of random numbers:
A = randn(length(len),max(len));
Creating a mask of the non-needed numbers:
lenlen = repmat(len,1,max(len));
idx = repmat(1:max(len),length(len),1);
mask = idx>lenlen;
and then I can "pad", the matrix as I am interested in the sum the padding have to be zero (for the case with the product the padding had to be 1)
A(mask)=0;
To obtain:
A =
1.7708 -1.4609 -1.5637 -0.0340 0.9796 0 0 0 0 0
1.8034 -1.5467 0.3938 0.8777 0.6813 1.0594 -0.3469 1.7472 -0.4697 -0.3635
1.5937 -0.1170 1.5629 0 0 0 0 0 0 0
Whereafter I can sum them together
B = sum(A,2);
However, I find it rather superfluous that I have to draw too many random numbers and then throw them away. In the real case, I need in the range of hundred thousands of repetitions and the vector len might vary a lot, i.e. it can easily be that I have to draw twice or three times the number of random numbers than of what is needed.
You can generate the exact amount of random numbers required, create a grouping variable with repelem, and compute the sum of each group using accumarray:
len = [5; 10; 3];
B = accumarray(repelem(1:numel(len), len).', randn(sum(len),1));
You could just use arrayfun or a loop. You say "efficient" and "vectorized" in the same breath, but they are not necessarily the same thing - since the new(ish) JIT compiler, loops are pretty fast in MATLAB. arrayfun is basically a loop in disguise, but means you could create B like so:
len = [5;10;3];
B = arrayfun( #(x) sum( randn(x,1) ), len );
For each element in len, this creates a vector of length len(i) and takes the sum. The output is an array with one value for each value in len.
This will certainly be a lot more memory friendly for large values and largely different values within len. It may therefore be quicker, your mileage may vary but it cuts out a lot of the operations you're doing.
You mention wanting to take the product sometimes, in which case use prod in place of sum.
Edit: rough and ready benchmark to compare arrayfun and a loop...
len = randi([1e3, 1e7], 100, 1);
tic;
B = arrayfun( #(x) sum( randn(x,1) ), len );
toc % ~8.77 seconds
tic;
out=zeros(size(len));
for ii = 1:numel(len)
out(ii) = sum(randn(len(ii),1));
end
toc % ~8.80 seconds
The "advantage" of the loop over arrayfun is you can pre-generate all of the random numbers in one go, then index. This isn't necesarryily quicker because you're addressing much bigger chunks of memory, and the call to randn is the main bottleneck anyway!
tic;
out = zeros(size(len));
rnd = randn(sum(len),1);
idx = [0; cumsum(len)]; % note: cumsum is very quick (~0.001sec here) so negligible
for ii = 1:numel(len)
out(ii) = sum(rnd(idx(ii)+1:idx(ii+1)),1);
end
toc % ~10.2 sec! Slower because of massive call to randn and the indexing into large array.
As stated at the top, arrayfun and looping are basically the same under the hood, so no reason to expect a big time difference.
The sum of multiple random numbers drawn from a specific distribution is also a random number with a (different) specific distribution. Therefore you can just cut the middleman and draw directly from the latter distribution.
In your case you are summing 3, 10 and 5 numbers drawn from a N(0,1) distribution. As explained here, the resulting distributions therefore are N(0,3), N(0,10) and N(0,5). This page explains how you can draw from non-standard normal distributions in Matlab. As such, we can in this case generate those numbers with randn(3,1).*sqrt([5; 10; 3]).
In case you would want 1000 triples, you could then use
randn(3,1000).*sqrt([5; 10; 3])
or pre Matlab2016b
bsxfun(#times, randn(3,1000), sqrt([5; 10; 3]))
which is of course very fast.
Different distributions have different summation rules, but as long as you are not summing up numbers drawn from different distributions the rules are usually quite simple and found quickly with google.
You can do this using a combination of cumsum and diff. The plan is:
Create all the random numbers in a single call to randn up front
Then, use cumsum to produce a vector of cumulative summations
Use cumsum on the list of number-of-samples-per-result to work out where to read out the results
We also need diff to correct for the prior summations.
Note that this method might lose accuracy if you weren't using randn for the random samples, as cumsum would then build up arithmetic rounding errors.
% We want 100 sums of random numbers
numSamples = 100;
% Here's where we define how many random samples contribute to each sum
numRandsPerSample = randi(5, 1, numSamples);
% Let's make all the random numbers in one call
allRands = randn(1, sum(numRandsPerSample));
% Use CUMSUM to build up a cumulative sum of the whole of allRands. We also
% need a leading 0 for the first sum.
allRandsCS = [0, cumsum(allRands)];
% Use CUMSUM again to pick out the places we need to pick from
% allRandsCS
endIdxs = 1 + [0, cumsum(numRandsPerSample)];
% Use DIFF to subtract the prior sums from the result.
result = diff(allRandsCS(endIdxs))

Quantizing an image in matlab

So I'm trying to figure out why my code doesn't seem to be displaying the properly uniformed quantized image into 4 levels.
Q1 =uint8(zeros(ROWS, COLS, CHANNELS));
for band = 1 : CHANNELS,
for x = 1 : ROWS,
for y = 1 : COLS,
Q1(ROWS,COLS,CHANNELS) = uint8(double(I1(ROWS,COLS,CHANNELS) / 2^4)*2^4);
end
end
end
No5 = figure;
imshow(Q1);
title('Part D: K = 4');
It is because you are not quantifying. You divide a double by 16, then multiply again by 16, then convert it to uint8. The right way to quantize is to divide by 16, throw away any decimals, then multiply by 16:
Q1 = uint8(floor(I1 / 16) * 16);
In the code snippet above, I assume I1 is a double. Convert it to double if its not: I1=double(I1).
Note that you don't need the loops, MATLAB will apply the operation to each element in the matrix.
Note also that if I1 is an integer type, you can do something like this:
Q1 = (uint8(I1) / 16) * 16;
but this is actually equivalent to replacing the floor by round in the first example. This means you get an uneven distribution of values: 0-7 are mapped to 0, 8-23 are mapped to 16, etc. and 248-255 are all mapped to 255 (not a multiple of 16!). That is, 8 numbers are mapped to 0, and 8 are mapped to 255, instead of mapping 16 numbers to each possible multiple of 16 as the floor code does.
The 16 in the code above means that there will be 256/16=16 different grey levels in the output. If you want a different number, say n, use 256/n instead of 16.
It's because you are using ROWS, COLS, CHANNELS as your index, it should be x,y,band. Also, the final multiplication of 2^4 has be after the uint8 cast otherwise no rounding ever takes place.
In practice you should avoid the for loops in Matlab since matrix operations are much faster. Replace your code with
Q1=uint8(double(I1/2^4))*2^4
No5 = figure;
imshow(Q1);
title('Part D: K = 4');

Generating random numbers with weighted distribution in Matlab?

I know how to generate random numbers in a certain range in Matlab. What i am trying to do now is generate random numbers in a range where there is more chance of getting certain ones.
For example: how could i use Matlab to generate random numbers between 0 and 2, where 50% of them will be less than 0.5?
To get numbers between 0 and 2 I would use (2-0)*rand+0. How can i do this but get a certain percentage of the numbers generated to be less than 0.5? Is there a way to do this using the rand function?
Here is a suggestion:
N = 10; % how many random numbers to generate
bounds = [0 0.5 1 2]; % define the ranges
prob = cumsum([0.5 0.3 0.2]); % define the probabilities
% pick a random range with probability from 'prob':
s = size(bounds,2)-cumsum(bsxfun(#lt,rand(N,1),prob),2);
% pick a random number in this range:
b = rand(1,N).*(bounds(s(:,end)+1)-bounds(s(:,end)))+bounds(s(:,end))
Here we have a probability of prob(k) to draw a number between bounds(k) to bounds(k+1). Basically we first draw a range with defined probability, and then draw another number from the range. So we are interested only in b, but need s on the way (mainly for creating a lot of numbers in a vectorized manner).
so we get:
b =
Columns 1 through 5
0.5297 0.15791 0.88636 0.34822 0.062666
Columns 6 through 10
0.065076 0.54618 0.0039101 0.21155 0.82779
Or, for N = 100000 we can draw:
so we can see how the values are distributed between the 3 ranges in bounds.
You can use a multinomial distribution to draw the ranges, and then compute the random numbers. Here's how:
N = 10;
bounds = [0 0.5 1 2]; % define the ranges
d = diff(bounds);
% pick a N random ranges from a multinomial distribution:
s = mnrnd(N,[0.5 0.3 0.2]);
% pick a random number in this range:
b = rand(1,N).*repelem(d,s)+repelem(bounds(1:end-1),s)
so you get s:
s =
50 39 11
that says you take 50 values from the first range, 39 from the second, and so on...
And you got the result in b:
b =
Columns 1 through 5
0.28212 0.074551 0.18166 0.035787 0.33316
Columns 6 through 10
0.12404 0.93468 1.9808 1.4522 1.6955
So basically it works the same as the first method I posted here, but it may be more accurate and/or readable. Also, I didn't test which method is faster.

determine the frequency of a number if a simulation

I have the following function:
I have to generate 2000 random numbers from this function and then make a histogram.
then I have to determine how many of them is greater that 2 with P(X>2).
this is my function:
%function [ output_args ] = Weibullverdeling( X )
%UNTITLED Summary of this function goes here
% Detailed explanation goes here
for i=1:2000
% x= rand*1000;
%x=ceil(x);
x=i;
Y(i) = 3*(log(x))^(6/5);
X(i)=x;
end
plot(X,Y)
and it gives me the following image:
how can I possibly make it to tell me how many values Do i Have more than 2?
Very simple:
>> Y_greater_than_2 = Y(Y>2);
>> size(Y_greater_than_2)
ans =
1 1998
So that's 1998 values out of 2000 that are greater than 2.
EDIT
If you want to find the values between two other values, say between 1 and 4, you need to do something like:
>> Y_between = Y(Y>=1 & Y<=4);
>> size(Y_between)
ans =
1 2
This is what I think:
for i=1:2000
x=rand(1);
Y(i) = 3*(log(x))^(6/5);
X(i)=x;
end
plot(X,Y)
U is a uniform random variable from which you can get the X. So you need to use rand function in MATLAB.
After which you implement:
size(Y(Y>2),2);
You can implement the code directly (here k is your root, n is number of data points, y is the highest number of distribution, x is smallest number of distribution and lambda the lambda in your equation):
X=(log(x+rand(1,n).*(y-x)).*lambda).^(1/k);
result=numel(X(X>2));
Lets split it and explain it detailed:
You want the k-th root of a number:
number.^(1/k)
you want the natural logarithmic of a number:
log(number)
you want to multiply sth.:
numberA.*numberB
you want to get lets say 1000 random numbers between x and y:
(x+rand(1,1000).*(y-x))
you want to combine all of that:
x= lower_bound;
y= upper_bound;
n= No_Of_data;
lambda=wavelength; %my guess
k= No_of_the_root;
X=(log(x+rand(1,n).*(y-x)).*lambda).^(1/k);
So you just have to insert your x,y,n,lambda and k
and then check
bigger_2 = X(X>2);
which would return only the values bigger than 2 and if you want the number of elements bigger than 2
No_bigger_2=numel(bigger_2);
I'm going to go with the assumption that what you've presented is supposed to be a random variate generation algorithm based on inversion, and that you want real-valued (not complex) solutions so you've omitted a negative sign on the logarithm. If those assumptions are correct, there's no need to simulate to get your answer.
Under the stated assumptions, your formula is the inverse of the complementary cumulative distribution function (CCDF). It's complementary because smaller values of U give larger values of X, and vice-versa. Solve the (corrected) formula for U. Using the values from your Matlab implementation:
X = 3 * (-log(U))^(6/5)
X / 3 = (-log(U))^(6/5)
-log(U) = (X / 3)^(5/6)
U = exp(-((X / 3)^(5/6)))
Since this is the CCDF, plugging in a value for X gives the probability (or proportion) of outcomes greater than X. Solving for X=2 yields 0.49, i.e., 49% of your outcomes should be greater than 2.
Make suitable adjustments if lambda is inside the radical, but the algebra leading to solution is similar. Unless I messed up my arithmetic, the proportion would then be 55.22%.
If you still are required to simulate this, knowing the analytical answer should help you confirm the correctness of your simulation.

Random numbers with constant sum in MATLAB [duplicate]

[I'm splitting a population number into different matrices and want to test my code using random numbers for now.]
Quick question guys and thanks for your help in advance -
If I use;
100*rand(9,1)
What is the best way to make these 9 numbers add to 100?
I'd like 9 random numbers between 0 and 100 that add up to 100.
Is there an inbuilt command that does this because I can't seem to find it.
I see the mistake so often, the suggestion that to generate random numbers with a given sum, one just uses a uniform random set, and just scale them. But is the result truly uniformly random if you do it that way?
Try this simple test in two dimensions. Generate a huge random sample, then scale them to sum to 1. I'll use bsxfun to do the scaling.
xy = rand(10000000,2);
xy = bsxfun(#times,xy,1./sum(xy,2));
hist(xy(:,1),100)
If they were truly uniformly random, then the x coordinate would be uniform, as would the y coordinate. Any value would be equally likely to happen. In effect, for two points to sum to 1 they must lie along the line that connects the two points (0,1), (1,0) in the (x,y) plane. For the points to be uniform, any point along that line must be equally likely.
Clearly uniformity fails when I use the scaling solution. Any point on that line is NOT equally likely. We can see the same thing happening in 3-dimensions. See that in the 3-d figure here, the points in the center of the triangular region are more densely packed. This is a reflection of non-uniformity.
xyz = rand(10000,3);
xyz = bsxfun(#times,xyz,1./sum(xyz,2));
plot3(xyz(:,1),xyz(:,2),xyz(:,3),'.')
view(70,35)
box on
grid on
Again, the simple scaling solution fails. It simply does NOT produce truly uniform results over the domain of interest.
Can we do better? Well, yes. A simple solution in 2-d is to generate a single random number that designates the distance along the line connecting the points (0,1) and 1,0).
t = rand(10000000,1);
xy = t*[0 1] + (1-t)*[1 0];
hist(xy(:,1),100)
It can be shown that ANY point along the line defined by the equation x+y = 1, in the unit square, is now equally likely to have been chosen. This is reflected by the nice, flat histogram.
Does the sort trick suggested by David Schwartz work in n-dimensions? Clearly it does so in 2-d, and the figure below suggests that it does so in 3-dimensions. Without deep thought on the matter, I believe that it will work for this basic case in question, in n-dimensions.
n = 10000;
uv = [zeros(n,1),sort(rand(n,2),2),ones(n,1)];
xyz = diff(uv,[],2);
plot3(xyz(:,1),xyz(:,2),xyz(:,3),'.')
box on
grid on
view(70,35)
One can also download the function randfixedsum from the file exchange, Roger Stafford's contribution. This is a more general solution to generate truly uniform random sets in the unit hyper-cube, with any given fixed sum. Thus, to generate random sets of points that lie in the unit 3-cube, subject to the constraint they sum to 1.25...
xyz = randfixedsum(3,10000,1.25,0,1)';
plot3(xyz(:,1),xyz(:,2),xyz(:,3),'.')
view(70,35)
box on
grid on
One simple way is to pick 8 random numbers between 0 and 100. Add 0 and 100 to the list to give 10 numbers. Sort them. Then output the difference between each successive pair of numbers. For example, here's 8 random numbers between 0 and 100:
96, 38, 95, 5, 13, 57, 13, 20
So add 0 and 100 and sort.
0, 5, 13, 13, 20, 38, 57, 95, 96, 100
Now subtract:
5-0 = 5
13-5 = 8
13-13 = 0
20-13 = 7
38-20 = 18
57-38 = 19
95-57 = 38
96-95 = 1
100-96 = 4
And there you have it, nine numbers that sum to 100: 0, 1, 4, 5, 7, 8, 18, 19, 38. That I got a zero and a one was just a strange bit of luck.
It is not too late to give the right answer
Let's talk about sampling X1...XN in the range [0...1] such that Sum(X1, ..., XN) is equal to 1. Then you could rescale it to 100
This is called Dirichlet distribution, and below is the code to sample from it. Simplest case is when all parameters are equal to 1, then all marginal distributions for X1, ..., XN would be U(0,1). In general case, with parameters different from 1s, marginal distributions might have peaks.
----------------- taken from here ---------------------
The Dirichlet is a vector of unit-scale gamma random variables, normalized by their sum. So, with no error checking, this will get you that:
a = [1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0]; // 9 numbers to sample
n = 10000;
r = drchrnd(a,n)
function r = drchrnd(a,n)
p = length(a);
r = gamrnd(repmat(a,n,1),1,n,p);
r = r ./ repmat(sum(r,2),1,p);
Take a list of N - 1 numbers, create a list of N + 1 numbers by inserting 0 and 100, sort the list, and diff them down to a total of N numbers.