Sampling random numbers from normal distribution with given probability (Matlab) - matlab

As seen in the code below, I am currently generating random numbers from a Normal Distribution and am selecting the ones within the -3*sigma and 3*sigma interval.
However, I now want to generate numbers such that there is a higher probability that I select numbers from outside the -3*sigma and 3*sigma interval. E.g. a number from [-4*sigma -3*sigma) should have 35% probability of being chosen and same for [3*sigma 4*sigma).
Basically, I'll be calling this function several times and am wondering if there is a way for me to select a higher proportion of random numbers from the "tails" of the normal distribution, without actually altering the shape of the normal distribution.
I have been told to use a "Rejection sampling algorithm" or the "Metropolis-Hastings algorithm" for this problem. I am struggling to understand how to implement either. Could someone provide a slight push in the right direction? I'm using
N = pdf('Normal',136e9-(3*9.067e9):1e8:136e9+(3*9.067e9),136e9,9.067e9)
to first generate a pdf to draw from. However, I am then unsure which I should take as my "target distribution" and which I should take as the "proposed distribution".
function [new_E11, new_E22] = elasticmodulusrng()
new_E11 = normrnd(136e9,9.067e9,[1 1]);
new_E22 = normrnd(8.9e9,2.373e9,[1 1]);
while new_E11<=-3*9.067e9 && new_E11>=3*9.067e9
new_E11 = normrnd(136e9,9.067e9,[1 1]);
end
while new_E11<=-3*2.373e9 && new_E11>=3*2.373e9
new_E22 = normrnd(8.9e9,2.373e9,[1 1]);
end
Thanks

Related

comparing generated data to measured data

we have measured data that we managed to determine the distribution type that it follows (Gamma) and its parameters (A,B)
And we generated n samples (10000) from the same distribution with the same parameters and in the same range (between 18.5 and 59) using for loop
for i=1:1:10000
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W(i,:) =random(tot,1,1);
end
Then we tried to fit the generated data using:
h1=histfit(W);
After this we tried to plot the Gamma curve to compare the two curves on the same figure uing:
hold on
h2=histfit(W,[],'Gamma');
h2(1).Visible='off';
The problem s the two curves are shifted as in the following figure "Figure 1 is the generated data from the previous code and Figure 2 is without truncating the generated data"
enter image description here
Any one knows why??
Thanks in advance
By default histfit fits a normal probability density function (PDF) on the histogram. I'm not sure what you were actually trying to do, but what you did is:
% fit a normal PDF
h1=histfit(W); % this is equal to h1 = histfit(W,[],'normal');
% fit a gamma PDF
h2=histfit(W,[],'Gamma');
Obviously that will result in different fits because a normal PDF != a gamma PDF. The only thing you see is that for the gamma PDF fits the curve better because you sampled the data from that distribution.
If you want to check whether the data follows a certain distribution you can also use a KS-test. In your case
% check if the data follows the distribution speccified in tot
[h p] = kstest(W,'CDF',tot)
If the data follows a gamma dist. then h = 0 and p > 0.05, else h = 1 and p < 0.05.
Now some general comments on your code:
Please look up preallocation of memory, it will speed up loops greatly. E.g.
W = zeros(10000,1);
for i=1:1:10000
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W(i,:) =random(tot,1,1);
end
Also,
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
is not depending in the loop index and can therefore be moved in front of the loop to speed things up further. It is also good practice to avoid using i as loop variable.
But you can actually skip the whole loop because random() allows to return multiple samples at once:
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W =random(tot,10000,1);

K-means Stopping Criteria in Matlab?

Im implementing the k-means algorithm on matlab without using the k-means built-in function, The stopping criteria is when the new centroids doesn't change by new iterations, but i cannot implement it in matlab , can anybody help?
Thanks
Setting no change as a stopping criteria is a bad idea. There are a few main reasons you shouldn't use a 0 change condition
even for a well behaved function the difference between 0 change and a very small change (say 1e-5 perhaps)could be 1000+ iterations, so you are wasting time trying to get them to be exactly the same. Especially because computers usually keep far more digits than we are interested in. IF you only need 1 digit accuracy, why wait for the computer to find an answer within 1e-31?
computers have floating point errors everywhere. Try doing some easily reversible matrix operations like a = rand(3,3); b = a*a*inv(a); a-b theoretically this should be 0 but you will see it isn't. So these errors alone could prevent your program from ever stopping
dithering. lets say we have a 1d k means problem with 3 numbers and we want to split them into 2 groups. One iteration the grouping can be a,b vs c. the next iteration could be a vs b,c the next could be a,b vs c the next.... This is of course a simplified example, but there can be instances where a few data points can dither between clusters, and you will end up with a never ending algorithm. Since those few points are reassigned, the change will never be 0
the solution is to use a delta threshold. basically you subtract the current values from the previous and if they are less than a threshold you are done. This on its own is powerful, but as with any loop, you need a backup escape plan. And that is setting a max_iterations variable. Look at matlabs documentation for kmeans, even they have a MaxIter variable (default is 100) so even if your kmeans doesn't converge, at least it wont run endlessly. Something like this might work
%problem specific
max_iter = 100;
%choose a small number appropriate to your problem
thresh = 1e-3;
%ensures it runs the first time
delta_mu = thresh + 1;
num_iter = 0;
%do your kmeans in the loop
while (delta_mu > thresh && num_iter < max_iter)
%save these right away
old_mu = curr_mu;
%calculate new means and variances, this is the standard kmeans iteration
%then store the values in a variable called curr_mu
curr_mu = newly_calculate_values;
%use the two norm to find the delta as a single number. no matter what
%the original dimensionality of mu was. If old_mu -new_mu was
% 0 the norm is still 0. so it behaves well as a distance measure.
delta_mu = norm(old_mu - curr_mu,2);
num_ter = num_iter + 1;
end
edit
if you don't know the 2 norm is essentially the euclidean distance

Generate Quasi-Random Samples from Distribution Object

I've got several distribution objects in Matlab like this one:
% gravity [m/s2]
Uncer.Param.gravity.LB = 9.801;
Uncer.Param.gravity.UB = 9.867;
Uncer.Param.gravity.pd = makedist('Uniform','Lower',Uncer.Param.gravity.LB, 'Upper', Uncer.Param.gravity.UB);
Uncer.Param.gravity.value=0;
I know I can generate a random sample with the random-function, but I want to generate a sample made of quasi-random-numbers (Sobol).
I get a matrix filled with these quasi-random-numbers like this:
set = net(sobolset(countParameter*2, 'Skip',1), countSimulation);
And I know furthermore, that I can interpolate the distribution values with the function interp1 and the corresponding CDF.
The problem is that my matrix-dimensions are round about 1000x20 and the interpolation would cost a ton ammount of time.
Is there a faster way to do this?
The search has come to an end and was way to simple: icdf

Interactive curve fitting with MATLAB using custom GUI?

I find examples the best way to demonstrate my question. I generate some data, add some random noise, and fit it to get back my chosen "generator" value...
x = linspace(0.01,1,50);
value = 3.82;
y = exp(-value.*x);
y = awgn(y,30);
options = optimset('MaxFunEvals',1000,'MaxIter',1000,'TolFun',1e-10,'Display','off');
model = #(p,x) exp(-p(1).*x);
startingVals = [5];
lb = [1];
ub = [10];
[fittedValue] = lsqcurvefit(model,startingVals,x,y,lb,ub,options)
fittedGraph = exp(-fittedValue.*x);
plot(x,y,'o');
hold on
plot(x,fittedGraph,'r-');
In this new example, I have generated the same data but this time added much more noise to the first 15 points. Because it is random sometimes it works out okay, but after a few runs I get a good example that illustrates my problem. Same code, except for these lines added under value = 3.82
y = exp(-value.*x);
y(1:15) = awgn(y(1:15),5);
y(15:end) = awgn(y(15:end),30);
As you can see, it has clearly not given a good fit to where the data seems reliable, because it is fitting from points 1-50. What I want to do is say, okay MATLAB, I can see we have some noisy data but it seems decent over a range, only fit your exponential from points 15 to the end. I could go back to my code and update it to do this, but I will be batch fitting graphs like this where each one will have different ranges of 'good' data.
So what I am after is a GUI callback mechanisms that allows me to click on two circles from the data and have them change color or something, which indicates the lsqcurvefit will only fit over that range. Internally all it has to change is inside the lsqcurvefit call e.g.
x(16:end),y(16:end)
But the range should update depending on the starting and ending circles I have clicked.
I hope my question is clear. Thanks.
You could use ginput to select the two points for your min and max in the plot.
[x,y]=ginput(2);
%this returns you the x and y coordinates of two points clicked after each other
%the min point is assumed to be clicked first
min=[x(1) y(1)];
max=[x(2) y(2)];
then you could fit your curve with the coordinates for min and max I guess.
You could also switch between a rightclick for the min and a leftclick for the max etc.
Hope this helps you.

Matlab: generate random numbers from normal distribution with given probability

As seen in the code below, I am currently generating random numbers from a Normal Distribution and am selecting the ones within the -3*sigma and 3*sigma interval. However, I now want to generate numbers such that there is a higher probability that I select numbers from outside the -3*sigma and 3*sigma interval. For eg. A number from [-4*sigma -3*sigma) should have 35% probability of being chosen and same for [3*sigma 4*sigma). Basically, I'll be calling this function several times and am wondering if there is a way for me to select a higher proportion of random numbers from the "tails" of the normal distribution, without actually altering the shape of the normal distribution. I'm struggling to do this.
function [new_E11, new_E22] = elasticmodulusrng()
new_E11 = normrnd(136e9,9.067e9,[1 1]);
new_E22 = normrnd(8.9e9,2.373e9,[1 1]);
while new_E11<=-3*9.067e9 && new_E11>=3*9.067e9
new_E11 = normrnd(136e9,9.067e9,[1 1]);
end
while new_E11<=-3*2.373e9 && new_E11>=3*2.373e9
new_E22 = normrnd(8.9e9,2.373e9,[1 1]);
end
Thanks
The question does not make much sense, as pointed out by Jojo: this is not a normal distribution anymore.
What you could do is to create your own Probability density functions pdf and draw from it.
For instance,
N = pdf('Normal',-5:0.2:5,0,1);
gives you the normal PDF with a good resolution.
You could alter it, say
Z = N;
Z(5:15)=3*Z(5:15);
Z(35:45)=3*Z(35:45);
and use Direct Methods, Inversion Methods, or Acceptance-Rejection Methods as explained here
There is an implementation in the FileExchange:
http://www.mathworks.com/matlabcentral/fileexchange/27590-simple-rejection-sampling