Matlab: generate random numbers from normal distribution with given probability - matlab

As seen in the code below, I am currently generating random numbers from a Normal Distribution and am selecting the ones within the -3*sigma and 3*sigma interval. However, I now want to generate numbers such that there is a higher probability that I select numbers from outside the -3*sigma and 3*sigma interval. For eg. A number from [-4*sigma -3*sigma) should have 35% probability of being chosen and same for [3*sigma 4*sigma). Basically, I'll be calling this function several times and am wondering if there is a way for me to select a higher proportion of random numbers from the "tails" of the normal distribution, without actually altering the shape of the normal distribution. I'm struggling to do this.
function [new_E11, new_E22] = elasticmodulusrng()
new_E11 = normrnd(136e9,9.067e9,[1 1]);
new_E22 = normrnd(8.9e9,2.373e9,[1 1]);
while new_E11<=-3*9.067e9 && new_E11>=3*9.067e9
new_E11 = normrnd(136e9,9.067e9,[1 1]);
end
while new_E11<=-3*2.373e9 && new_E11>=3*2.373e9
new_E22 = normrnd(8.9e9,2.373e9,[1 1]);
end
Thanks

The question does not make much sense, as pointed out by Jojo: this is not a normal distribution anymore.
What you could do is to create your own Probability density functions pdf and draw from it.
For instance,
N = pdf('Normal',-5:0.2:5,0,1);
gives you the normal PDF with a good resolution.
You could alter it, say
Z = N;
Z(5:15)=3*Z(5:15);
Z(35:45)=3*Z(35:45);
and use Direct Methods, Inversion Methods, or Acceptance-Rejection Methods as explained here
There is an implementation in the FileExchange:
http://www.mathworks.com/matlabcentral/fileexchange/27590-simple-rejection-sampling

Related

comparing generated data to measured data

we have measured data that we managed to determine the distribution type that it follows (Gamma) and its parameters (A,B)
And we generated n samples (10000) from the same distribution with the same parameters and in the same range (between 18.5 and 59) using for loop
for i=1:1:10000
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W(i,:) =random(tot,1,1);
end
Then we tried to fit the generated data using:
h1=histfit(W);
After this we tried to plot the Gamma curve to compare the two curves on the same figure uing:
hold on
h2=histfit(W,[],'Gamma');
h2(1).Visible='off';
The problem s the two curves are shifted as in the following figure "Figure 1 is the generated data from the previous code and Figure 2 is without truncating the generated data"
enter image description here
Any one knows why??
Thanks in advance
By default histfit fits a normal probability density function (PDF) on the histogram. I'm not sure what you were actually trying to do, but what you did is:
% fit a normal PDF
h1=histfit(W); % this is equal to h1 = histfit(W,[],'normal');
% fit a gamma PDF
h2=histfit(W,[],'Gamma');
Obviously that will result in different fits because a normal PDF != a gamma PDF. The only thing you see is that for the gamma PDF fits the curve better because you sampled the data from that distribution.
If you want to check whether the data follows a certain distribution you can also use a KS-test. In your case
% check if the data follows the distribution speccified in tot
[h p] = kstest(W,'CDF',tot)
If the data follows a gamma dist. then h = 0 and p > 0.05, else h = 1 and p < 0.05.
Now some general comments on your code:
Please look up preallocation of memory, it will speed up loops greatly. E.g.
W = zeros(10000,1);
for i=1:1:10000
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W(i,:) =random(tot,1,1);
end
Also,
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
is not depending in the loop index and can therefore be moved in front of the loop to speed things up further. It is also good practice to avoid using i as loop variable.
But you can actually skip the whole loop because random() allows to return multiple samples at once:
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W =random(tot,10000,1);

K-means Stopping Criteria in Matlab?

Im implementing the k-means algorithm on matlab without using the k-means built-in function, The stopping criteria is when the new centroids doesn't change by new iterations, but i cannot implement it in matlab , can anybody help?
Thanks
Setting no change as a stopping criteria is a bad idea. There are a few main reasons you shouldn't use a 0 change condition
even for a well behaved function the difference between 0 change and a very small change (say 1e-5 perhaps)could be 1000+ iterations, so you are wasting time trying to get them to be exactly the same. Especially because computers usually keep far more digits than we are interested in. IF you only need 1 digit accuracy, why wait for the computer to find an answer within 1e-31?
computers have floating point errors everywhere. Try doing some easily reversible matrix operations like a = rand(3,3); b = a*a*inv(a); a-b theoretically this should be 0 but you will see it isn't. So these errors alone could prevent your program from ever stopping
dithering. lets say we have a 1d k means problem with 3 numbers and we want to split them into 2 groups. One iteration the grouping can be a,b vs c. the next iteration could be a vs b,c the next could be a,b vs c the next.... This is of course a simplified example, but there can be instances where a few data points can dither between clusters, and you will end up with a never ending algorithm. Since those few points are reassigned, the change will never be 0
the solution is to use a delta threshold. basically you subtract the current values from the previous and if they are less than a threshold you are done. This on its own is powerful, but as with any loop, you need a backup escape plan. And that is setting a max_iterations variable. Look at matlabs documentation for kmeans, even they have a MaxIter variable (default is 100) so even if your kmeans doesn't converge, at least it wont run endlessly. Something like this might work
%problem specific
max_iter = 100;
%choose a small number appropriate to your problem
thresh = 1e-3;
%ensures it runs the first time
delta_mu = thresh + 1;
num_iter = 0;
%do your kmeans in the loop
while (delta_mu > thresh && num_iter < max_iter)
%save these right away
old_mu = curr_mu;
%calculate new means and variances, this is the standard kmeans iteration
%then store the values in a variable called curr_mu
curr_mu = newly_calculate_values;
%use the two norm to find the delta as a single number. no matter what
%the original dimensionality of mu was. If old_mu -new_mu was
% 0 the norm is still 0. so it behaves well as a distance measure.
delta_mu = norm(old_mu - curr_mu,2);
num_ter = num_iter + 1;
end
edit
if you don't know the 2 norm is essentially the euclidean distance

Generate Quasi-Random Samples from Distribution Object

I've got several distribution objects in Matlab like this one:
% gravity [m/s2]
Uncer.Param.gravity.LB = 9.801;
Uncer.Param.gravity.UB = 9.867;
Uncer.Param.gravity.pd = makedist('Uniform','Lower',Uncer.Param.gravity.LB, 'Upper', Uncer.Param.gravity.UB);
Uncer.Param.gravity.value=0;
I know I can generate a random sample with the random-function, but I want to generate a sample made of quasi-random-numbers (Sobol).
I get a matrix filled with these quasi-random-numbers like this:
set = net(sobolset(countParameter*2, 'Skip',1), countSimulation);
And I know furthermore, that I can interpolate the distribution values with the function interp1 and the corresponding CDF.
The problem is that my matrix-dimensions are round about 1000x20 and the interpolation would cost a ton ammount of time.
Is there a faster way to do this?
The search has come to an end and was way to simple: icdf

GUI solving equations project

I have a Matlab project in which I need to make a GUI that receives two mathematical functions from the user. I then need to find their intersection point, and then plot the two functions.
So, I have several questions:
Do you know of any algorithm I can use to find the intersection point? Of course I prefer one to which I can already find a Matlab code for in the internet. Also, I prefer it wouldn't be the Newton-Raphson method.
I should point out I'm not allowed to use built in Matlab functions.
I'm having trouble plotting the functions. What I basically did is this:
fun_f = get(handles.Function_f,'string');
fun_g = get(handles.Function_g,'string');
cla % To clear axes when plotting new functions
ezplot(fun_f);
hold on
ezplot(fun_g);
axis ([-20 20 -10 10]);
The problem is that sometimes, the axes limits do not allow me to see the other function. This will happen, if, for example, I will have one function as log10(x) and the other as y=1, the y=1 will not be shown.
I have already tried using all the axis commands but to no avail. If I set the limits myself, the functions only exist in certain limits. I have no idea why.
3 . How do I display numbers in a static text? Or better yet, string with numbers?
I want to display something like x0 = [root1]; x1 = [root2]. The only solution I found was turning the roots I found into strings but I prefer not to.
As for the equation solver, this is the code I have so far. I know it is very amateurish but it seemed like the most "intuitive" way. Also keep in mind it is very very not finished (for example, it will show me only two solutions, I'm not so sure how to display multiple roots in one static text as they are strings, hence question #3).
function [Sol] = SolveEquation(handles)
fun_f = get(handles.Function_f,'string');
fun_g = get(handles.Function_g,'string');
f = inline(fun_f);
g = inline(fun_g);
i = 1;
Sol = 0;
for x = -10:0.1:10;
if (g(x) - f(x)) >= 0 && (g(x) - f(x)) < 0.01
Sol(i) = x;
i = i + 1;
end
end
solution1 = num2str(Sol(1));
solution2 = num2str(Sol(2));
set(handles.roots1,'string',solution1);
set(handles.roots2,'string',solution2);
The if condition is because the subtraction will never give me an absolute zero, and this seems to somewhat solve it, though it's really not perfect, sometimes it will give me more than two very similar solutions (e.g 1.9 and 2).
The range of x is arbitrary, chosen by me.
I know this is a long question, so I really appreciate your patience.
Thank you very much in advance!
Question 1
I think this is a more robust method for finding the roots given data at discrete points. Looking for when the difference between the functions changes sign, which corresponds to them crossing over.
S=sign(g(x)-f(x));
h=find(diff(S)~=0)
Sol=x(h);
If you can evaluate the function wherever you want there are more methods you can use, but it depends on the size of the domain and the accuracy you want as to what is best. For example, if you don't need a great deal of accurac, your f and g functions are simple to calculate, and you can't or don't want to use derivatives, you can get a more accurate root using the same idea as the first code snippet, but do it iteratively:
G=inline('sin(x)');
F=inline('1');
g=vectorize(G);
f=vectorize(F);
tol=1e-9;
tic()
x = -2*pi:.001:pi;
S=sign(g(x)-f(x));
h=find(diff(S)~=0); % Find where two lines cross over
Sol=zeros(size(h));
Err=zeros(size(h));
if ~isempty(h) % There are some cross-over points
for i=1:length(h) % For each point, improve the approximation
xN=x(h(i):h(i)+1);
err=1;
while(abs(err)>tol) % Iteratively improve aproximation
S=sign(g(xN)-f(xN));
hF=find(diff(S)~=0);
xN=xN(hF:hF+1);
[~,I]=min(abs(f(xN)-g(xN)));
xG=xN(I);
err=f(xG)-g(xG);
xN=linspace(xN(1),xN(2),15);
end
Sol(i)=xG;
Err(i)=f(xG)-g(xG);
end
else % No crossover points - lines could meet at tangents
[h,I]=findpeaks(-abs(g(x)-f(x)));
Sol=x(I(abs(f(x(I))-g(x(I)))<1e-5));
Err=f(Sol)-g(Sol)
end
% We also have to check each endpoint
if abs(f(x(end))-g(x(end)))<tol && abs(Sol(end)-x(end))>1e-12
Sol=[Sol x(end)];
Err=[Err g(x(end))-f(x(end))];
end
if abs(f(x(1))-g(x(1)))<tol && abs(Sol(1)-x(1))>1e-12
Sol=[x(1) Sol];
Err=[g(x(1))-f(x(1)) Err];
end
toc()
Sol
Err
This will "zoom" in to the region around each suspected root, and iteratively improve the accuracy. You can tweak the parameters to see whether they give better behaviour (the tolerance tol, the 15, number of new points to generate, could be higher probably).
Question 2
You would probably be best off avoiding ezplot, and using plot, which gives you greater control. You can vectorise inline functions so that you can evaluate them like anonymous functions, as I did in the previous code snippet, using
f=inline('x^2')
F=vectorize(f)
F(1:5)
and this should make plotting much easier:
plot(x,f(x),'b',Sol,f(Sol),'ro',x,g(x),'k',Sol,G(Sol),'ro')
Question 3
I'm not sure why you don't want to display your roots as strings, what's wrong with this:
text(xPos,yPos,['x0=' num2str(Sol(1))]);

Sampling random numbers from normal distribution with given probability (Matlab)

As seen in the code below, I am currently generating random numbers from a Normal Distribution and am selecting the ones within the -3*sigma and 3*sigma interval.
However, I now want to generate numbers such that there is a higher probability that I select numbers from outside the -3*sigma and 3*sigma interval. E.g. a number from [-4*sigma -3*sigma) should have 35% probability of being chosen and same for [3*sigma 4*sigma).
Basically, I'll be calling this function several times and am wondering if there is a way for me to select a higher proportion of random numbers from the "tails" of the normal distribution, without actually altering the shape of the normal distribution.
I have been told to use a "Rejection sampling algorithm" or the "Metropolis-Hastings algorithm" for this problem. I am struggling to understand how to implement either. Could someone provide a slight push in the right direction? I'm using
N = pdf('Normal',136e9-(3*9.067e9):1e8:136e9+(3*9.067e9),136e9,9.067e9)
to first generate a pdf to draw from. However, I am then unsure which I should take as my "target distribution" and which I should take as the "proposed distribution".
function [new_E11, new_E22] = elasticmodulusrng()
new_E11 = normrnd(136e9,9.067e9,[1 1]);
new_E22 = normrnd(8.9e9,2.373e9,[1 1]);
while new_E11<=-3*9.067e9 && new_E11>=3*9.067e9
new_E11 = normrnd(136e9,9.067e9,[1 1]);
end
while new_E11<=-3*2.373e9 && new_E11>=3*2.373e9
new_E22 = normrnd(8.9e9,2.373e9,[1 1]);
end
Thanks