I want to generate biased random numbers in matlab. Let me explain a bit more, by what I mean by biased.
Lets say I have a defined upper bound and lower bound of 30 and 10 respectively.
I want to generate N random numbers biased towards the bounds, such that the probability of the numbers lying close to 10 and 30 (the extremes) is more as compared to them lying some where in the middle.
How can I do this?
Any help is much appreciated :)
% Upper bound
UB = 30
% Lower bound
LB = 0;
% Range
L = UB-LB;
% Std dev - you may want to relate it to L - maybe use sigma=sqrt(L)
sigma = L/6;
% Number of samples to generate
n = 1000000;
X = sigma*randn(1,n);
% Remove items that are above bounds - not sure if it's what you want.. if not comment the two following lines
X(X<-L) = [];
X(X>L) = [];
% Take values above zero for lower bounds, other values for upper bound
ii = X > 0;
X(ii) = LB + X(ii);
X(~ii) = UB + X(~ii);
% plot histogram
hist(X, 100);
I used a normal distribution here but obviously you can adapt to use others.. you can change the sigma also.
To generate random numbers for an arbitrary distribution, you have to define the inverted cumulative distribution function. Let's say you called it myICDF. Once you got this function, you can generate random samples using myICDF(rand(n,m)).
Related
Matlab has the function randn to draw from a normal distribution e.g.
x = 0.5 + 0.1*randn()
draws a pseudorandom number from a normal distribution of mean 0.5 and standard deviation 0.1.
Given this, is the following Matlab code equivalent to sampling from a normal distribution truncated at 0 at 1?
while x <=0 || x > 1
x = 0.5 + 0.1*randn();
end
Using MATLAB's Probability Distribution Objects makes sampling from truncated distributions very easy.
You can use the makedist() and truncate() functions to define the object and then modify (truncate it) to prepare the object for the random() function which allows generating random variates from it.
% MATLAB R2017a
pd = makedist('Normal',0.5,0.1) % Normal(mu,sigma)
pdt = truncate(pd,0,1) % truncated to interval (0,1)
sample = random(pdt,numRows,numCols) % Sample from distribution `pdt`
Once the object is created (here it is pdt, the truncated version of pd), you can use it in a variety of function calls.
To generate samples, random(pdt,m,n) produces a m x n array of samples from pdt.
Further, if you want to avoid use of toolboxes, this answer from #Luis Mendo is correct (proof below).
figure, hold on
h = histogram(cr,'Normalization','pdf','DisplayName','#Luis Mendo samples');
X = 0:.01:1;
p = plot(X,pdf(pdt,X),'b-','DisplayName','Theoretical (w/ truncation)');
You need the following steps
1. Draw a random value from uniform distribution, u.
2. Assuming the normal distribution is truncated at a and b. get
u_bar = F(a)*u +F(b) *(1-u)
3. Use the inverse of F
epsilon= F^{-1}(u_bar)
epsilon is a random value for the truncated normal distribution.
Why don't you vectorize? It will probably be faster:
N = 1e5; % desired number of samples
m = .5; % desired mean of underlying Gaussian
s = .1; % desired std of underlying Gaussian
lower = 0; % lower value for truncation
upper = 1; % upper value for truncation
remaining = 1:N;
while remaining
result(remaining) = m + s*randn(1,numel(remaining)); % (pre)allocates the first time
remaining = find(result<=lower | result>upper);
end
I tried to generate 1000 the random values in normal distribution by the normrnd function.
A = normrnd(4,1,[1000 1]);
I would like to set the minimum value is 2. However, that function just can define the mean and sd. How can I set the minimum value is 2 ?
You can't. Gaussian or normally distributed numbers are in a bell curve, with the tails tailing off to infinity. What you can do is "censor" them by eliminating every number beyond a cut-off.
Since you choose mean = 4 and sigma = 1, you will end up ~95% elements of A fall within range [2,6]. The number of elements whose values smaller than 2 is about 2.5%. If you consider this figure is small, you can wrap these elements to a minimum value. For example:
A = normrnd(4,1,[1000 1]);
A(A < 2) = A(A<2) + 2 - min(A(A<2))
Of course, it is technically not gaussian distribution. However if you have total control of mean and sigma, you can get a "more gaussian like" distribution by adding an offset to A:
A = A + 2 - min(A)
Note: This assumes you can have an arbitrarily set standard deviation, which may not be the case
As others have said, you cannot specify a lower bound for a true Gaussian. However, you can generate a Gaussian and estimate 1-p percent of values to be above and then ignore p percent of values (which will fall outside your cutoff).
For example, in the following code, I am generating a Gaussian where 95% of data-points fall above 2. Then I am removing all points below 2, knowing that 5% of data will be removed.
This is a solution because setting as p gets closer to zero, your chances of getting uncensored sample data that follows your Gaussian curve and is entirely above your cutoff goes to 100% (Really it's defined by the p/n ratio, but if n is fixed this is true).
n = 1000; % number of samples
cutoff = 2; % Cutoff point for min-value
mu = 4; % Mean
p = .05; % Percentile you would like to cutoff
z = -sqrt(2) * erfcinv(p*2); % compute z score
sigma = (cutoff - mu)/z; % compute standard deviation
A = normrnd(mu,sigma,[n 1]);
I would recommend removing values below the cutoff rather than re-attributing them to the lower bound of your distribution, but that is up to you.
A(A<cutoff) = []; % removes all values of A less than cutoff
If you want to be symmetrical (which you should to prevent sample skew) the following should work.
A(A>(2*mu-cutoff)) = [];
I use both Matlab and OpenCV to produce Grayscale histogram, divided into 10 bins.
In OpenCV, each bin has equal range (i.e. [0,25], [26,51], [52,77], ...).
However, in Matlab, the bin sizes are not equal (I guess it's related to some theory about different sensitivity to intensity changes between lower and higher values).
These different results make big trouble for me.
Is there an option to use calcHist with equal bin sizes? (Of course except for the option of implementing it myself...)
Answering my own question with a self-implemented function:
function h = fixedSizeBinnedHist(grayImg, numBins)
binSize = 256 / numBins;
binnedImg = floor(double(grayImg) / binSize);
maxVal = max(binnedImg(:));
numLeadingZeros = min(binnedImg(:));
numTrailingZeros = numBins - maxVal - 1;
% First, computing histogram for the existing range
h = hist(double(binnedImg(:)), maxVal - numLeadingZeros + 1);
leading = zeros(1, numLeadingZeros);
trailing = zeros(1, numTrailingZeros);
% Finally attaching needed zeros in both sides, so the histogram is in the requested size
h = [leading h trailing];
end
My original problem was to create a scenario whereby there is a line of a specific length (x=100), and a barrier at specific position (pos=50). Multiple rounds of sampling are carried out, within which a specific amount of random numbers (p) are made. The numbers generated can either fall to left or right of the barrier. The program outputs the difference between the largest number generated to the left of the barrier and the smallest number generated to the right. This is much clearer to see here:
In this example, the system has created 4 numbers (a,b,c,d). It will ignore a and d and output the difference between b and c. Essentially, it will output the smallest possible fragment of the line that still contains the barrier.
The code I have been using to do this is:
x = 100; % length of the grid
pos = 50; % position of the barrier
len1 = 0; % left edge of the grid
len2 = x; % right edge of the grid
sample = 1000; % number of samples to make
nn = 1:12 % number of points to generate (will loop over these)
len = zeros(sample, length(nn)); % array to record the results
for n = 1:length(nn) % For each number of pts to generate
numpts = nn(n);
for i = 1:sample % For each round of sampling,
p = round(rand(numpts,1) * x); % generate 'numpts' random points.
if any(p>pos) % If any are to the right of the barrier,
pright = min(p(p>pos)); % pick the smallest.
else
pright = len2;
end
if any(p<pos) % If any are to the left of the barrier,
pleft = max(p(p<pos)); % pick the largest.
else
pleft = len1;
end
len(i,n) = pright - pleft; % Record the length of the interval.
end
end
My current problem: I'd like to make this more complex. For example, I would like to be able to use more than just one random number count in each round. Specifically I would like to relate this to Poisson distributions with different mean values:
% Create poisson distributions for λ = 1:12
range = 0:20;
for l=1:12;
y = poisspdf(range,l);
dist(:,l) = y;
end
From this, i'd like to take 1000 samples for each λ but within each round of 1000 samples, the random number count is no longer the same for all 1000 samples. Instead it depends on the poisson distribution. For example, within a mean value of 1, the probabilities are:
0 - 0.3678
1 - 0.3678
2 - 0.1839
3 - 0.0613
4 - 0.0153
5 - 0.0030
6 - 0.0005
7 - 0.0001
8 - 0.0000
9 - 0.0000
10 - 0.0000
11 - 0.0000
12 - 0.0000
So for the first round of 1000 samples, 367 of them would be carried out generating just 1 number, 367 carried out generating 2 numbers, 183 carried out generating 3 numbers and so on. The program will then repeat this using new values it gains from a mean value of 2 and so on. I'd then like to simply collect together all the fragment sizes (pright-pleft) into a column of a matrix - a column for each value of λ.
I know I could do something like:
amount = dist*sample
To multiply the poisson distributions by the sample size to gain how many of each number generation it should do - however i'm really stuck on how to incorporate this into the for-loop and alter the code to meet to tackle this new problem. I am also not sure how to read down a column on a matrix to use each probability value to determine how much of each type of RNG it should do.
Any help would be greatly appreciated,
Anna.
You could generate a vector of random variables from a known pdf object using random, if you have the statistics toolbox. Better still, skip the PDF step and generate the random variables using poissrnd. Round off the value to the nearest integer and call rand as you were doing already. In your loop simply iterate over your generated vector of poisson distributed random numbers.
Example:
x = 100; % length of the grid
pos = 50; % position of the barrier
len1 = 0; % left edge of the grid
len2 = x; % right edge of the grid
sample = 1000; % number of samples to make
lambda = 1:12; % lambdas
Rrnd = round(poissrnd(repmat(lambda,sample,1)));
len = zeros(size(Rrnd)); % array to record the results
for n = lambda; % For each number of pts to generate
for i = 1:sample % For each round of sampling,
numpts = Rrnd(i,n);
p = round(rand(numpts,1) * x); % generate 'numpts' random points.
len(i,n) = min([p(p>pos);len2]) - max([p(p<pos);len1]); % Record the length
end
end
I want to pick values between, say, 50 and 150 using an exponential random number generator (a flat hazard function). How do I implement bounds on the built-in exponential random number function in matlab?
A quick way is to a sequence longer than you need, and throw out values outside your desired range.
dist = exprnd(100,1,1000);
%# mean of 100 ---^ ^---^--- 1x1000 random numbers
dist(dist<50 | dist>150) = []; %# will be shorter than 1000
If you don't have enough values after pruning, you can repeat and append onto the vector, or however else you want to do it.
exprandn uses rand (see >> open exprnd.m) so you can bound the output of that instead by reversing the process and sampling uniformly within the desired range [r1, r2].
sizeOut = [1, 1000]; % sample size
mu = 100; % parameter of exponential
r1 = 50; % lower bound
r2 = 150; % upper bound
r = exprndBounded(mu, sizeOut, r1, r2); % bounded output
function r = exprndBounded(mu, sizeOut, r1, r2);
minE = exp(-r1/mu);
maxE = exp(-r2/mu);
randBounded = minE + (maxE-minE).*rand(sizeOut);
r = -mu .* log(randBounded);
The drawn densities (using a non-parametric kernel estimator) look like the following for 20K samples