matlab code on geometric random variable - matlab

I am asked to Write a code to generate a geometric RV with p=0.25 and use it to calculate the probability that the RV takes a value greater than or equal to 4. Basically, I am not aware of matlab but I tried using help in matlab. And I came to know that I should use geornd function. Can anyone help me how to use the function and how I should enter the parameters to get the required results?

See the doc for this function: http://www.mathworks.es/es/help/stats/geornd.html.
For example, if you want a 1x10000 vector of geometric samples with parameter p=0.25, use
values = geornd(.25,1,10000);
To estimate the probability that the RV exceeds or equals 4:
mean(values>=4)
Explanation: values>=4 is a vector which contains 1 or 0 according to whether the condition is fulfilled or not. Its sample mean (function mean) is an estimation of the probability of that event.
Anyway, in this case it would be easier to compute that probability exactly:
>> p = .25; N = 4; 1 - p*sum((1-p).^[0:N-1])
ans =
0.3164
or using geocdf:
p = .25; N = 4; 1-geocdf(N-1,p)

Related

Small bug in MATLAB R2017B LogLikelihood after fitnlm?

Background: I am working on a problem similar to the nonlinear logistic regression described in the link [1] (my problem is more complicated, but link [1] is enough for the next sections of this post). Comparing my results with those obtained in parallel with a R package, I got similar results for the coefficients, but (very approximately) an opposite logLikelihood.
Hypothesis: The logLikelihood given by fitnlm in Matlab is in fact the negative LogLikelihood. (Note that this impairs consequently the BIC and AIC computation by Matlab)
Reasonning: in [1], the same problem is solved through two different approaches. ML-approach/ By defining the negative LogLikelihood and making an optimization with fminsearch. GLS-approach/ By using fitnlm.
The negative LogLikelihood after the ML-approach is:380
The negative LogLikelihood after the GLS-approach is:-406
I imagine the second one should be at least multiplied by (-1)?
Questions: Did I miss something? Is the (-1) coefficient enough, or would this simple correction not be enough?
Self-contained code:
%copy-pasting code from [1]
myf = #(beta,x) beta(1)*x./(beta(2) + x);
mymodelfun = #(beta,x) 1./(1 + exp(-myf(beta,x)));
rng(300,'twister');
x = linspace(-1,1,200)';
beta = [10;2];
beta0=[3;3];
mu = mymodelfun(beta,x);
n = 50;
z = binornd(n,mu);
y = z./n;
%ML Approach
mynegloglik = #(beta) -sum(log(binopdf(z,n,mymodelfun(beta,x))));
opts = optimset('fminsearch');
opts.MaxFunEvals = Inf;
opts.MaxIter = 10000;
betaHatML = fminsearch(mynegloglik,beta0,opts)
neglogLH_MLApproach = mynegloglik(betaHatML);
%GLS Approach
wfun = #(xx) n./(xx.*(1-xx));
nlm = fitnlm(x,y,mymodelfun,beta0,'Weights',wfun)
neglogLH_GLSApproach = - nlm.LogLikelihood;
Source:
[1] https://uk.mathworks.com/help/stats/examples/nonlinear-logistic-regression.html
This answer (now) only details which code is used. Please see Tom Lane's answer below for a substantive answer.
Basically, fitnlm.m is a call to NonLinearModel.fit.
When opening NonLinearModel.m, one gets in line 1209:
model.LogLikelihood = getlogLikelihood(model);
getlogLikelihood is itself described between lines 1234-1251.
For instance:
function L = getlogLikelihood(model)
(...)
L = -(model.DFE + model.NumObservations*log(2*pi) + (...) )/2;
(...)
Please also not that this notably impacts ModelCriterion.AIC and ModelCriterion.BIC, as they are computed using model.LogLikelihood ("thinking" it is the logLikelihood).
To get the corresponding formula for BIC/AIC/..., type:
edit classreg.regr.modelutils.modelcriterion
this is Tom from MathWorks. Take another look at the formula quoted:
L = -(model.DFE + model.NumObservations*log(2*pi) + (...) )/2;
Remember the normal distribution has a factor (1/sqrt(2*pi)), so taking logs of that gives us -log(2*pi)/2. So the minus sign comes from that and it is part of the log likelihood. The property value is not the negative log likelihood.
One reason for the difference in the two log likelihood values is that the "ML approach" value is computing something based on the discrete probabilities from the binomial distribution. Those are all between 0 and 1, and they add up to 1. The "GLS approach" is computing something based on the probability density of the continuous normal distribution. In this example, the standard deviation of the residuals is about 0.0462. That leads to density values that are much higher than 1 at the peak. So the two things are not really comparable. You would need to convert the normal values to probabilities on the same discrete intervals that correspond to individual outcomes from the binomial distribution.

Gaussian Random Process with Unit Mean

I want to generate a Gaussian Random Process with Unit Mean(mean=1) in MATLAB. I tried to do randn function but I later learned that it can be only used when mean is 0 so I tried to write the process by hand. I wanted to write the Gaussian function with mean = 1 and var = 1. I tried this code:
N = rand(1000,1);
g1 = (1/(sqrt(2*pi)))*exp(-((N-1).^2)/2);
plot(g1)
m = mean(g1)
v = var(g1)
However, when I check the mean and variance values I get m=0.3406 and v=0.0024. Can you help?
If you take the vector from randn() and then add one it will have the same standard deviation as before but now it'll also have a mean of 1.
v=randn(1000,1)+1

How to resolve MATLAB trapz function error?

I am working on an assignment that requires me to use the trapz function in MATLAB in order to evaluate an integral. I believe I have written the code correctly, but the program returns answers that are wildly incorrect. I am attempting to find the integral of e^(-x^2) from 0 to 1.
x = linspace(0,1,2000);
y = zeros(1,2000);
for iCnt = 1:2000
y(iCnt) = e.^(-(x(iCnt)^2));
end
a = trapz(y);
disp(a);
This code currently returns
1.4929e+03
What am I doing incorrectly?
You need to just specify also the x values:
x = linspace(0,1,2000);
y = exp(-x.^2);
a = trapz(x,y)
a =
0.7468
More details:
First of all, in MATLAB you can use vectors to avoid for-loops for performing operation on arrays (vectors). So the whole four lines of code
y = zeros(1,2000);
for iCnt = 1:2000
y(iCnt) = exp(-(x(iCnt)^2));
end
will be translated to one line:
y = exp(-x.^2)
You defined x = linspace(0,1,2000) it means that you need to calculate the integral of the given function in range [0 1]. So there is a mistake in the way you calculate y which returns it to be in range [1 2000] and that is why you got the big number as the result.
In addition, in MATLAB you should use exp there is not function as e in MATLAB.
Also, if you plot the function in the range, you will see that the result makes sense because the whole page has an area of 1x1.

How can I plot data to a “best fit” cos² graph in Matlab?

I’m currently a Physics student and for several weeks have been compiling data related to ‘Quantum Entanglement’. I’ve now got to a point where I have to plot my data (which should resemble a cos² graph - and does) to a sort of “best fit” cos² graph. The lab script says the following:
A more precise determination of the visibility V (this is basically how 'clean' the data is) follows from the best fit to the measured data using the function:
f(b) = A/2[1-Vsin(b-b(center)/P)]
Granted this probably doesn’t mean much out of context, but essentially A is the amplitude, b is an angle and P is the periodicity. Hence this is also a “wave” like the experimental data I have found.
From this I understand, as previously mentioned, I am making a “best fit” curve. However, I have been told that this isn’t possible with Excel and that the best approach is Matlab.
I know intermediate JavaScript but do not know Matlab and was hoping for some direction.
Is there a tutorial I can read for this? Is it possible for someone to go through it with me? I really have no idea what it entails, so any feed back would be greatly appreciated.
Thanks a lot!
Initial steps
I guess we should begin by getting a representation in Matlab of the function that you're trying to model. A direct translation of your formula looks like this:
function y = targetfunction(A,V,P,bc,b)
y = (A/2) * (1 - V * sin((b-bc) / P));
end
Getting hold of the data
My next step is going to be to generate some data to work with (you'll use your own data, naturally). So here's a function that generates some noisy data. Notice that I've supplied some values for the parameters.
function [y b] = generateData(npoints,noise)
A = 2;
V = 1;
P = 0.7;
bc = 0;
b = 2 * pi * rand(npoints,1);
y = targetfunction(A,V,P,bc,b) + noise * randn(npoints,1);
end
The function rand generates random points on the interval [0,1], and I multiplied those by 2*pi to get points randomly on the interval [0, 2*pi]. I then applied the target function at those points, and added a bit of noise (the function randn generates normally distributed random variables).
Fitting parameters
The most complicated function is the one that fits a model to your data. For this I use the function fminunc, which does unconstrained minimization. The routine looks like this:
function [A V P bc] = bestfit(y,b)
x0(1) = 1; %# A
x0(2) = 1; %# V
x0(3) = 0.5; %# P
x0(4) = 0; %# bc
f = #(x) norm(y - targetfunction(x(1),x(2),x(3),x(4),b));
x = fminunc(f,x0);
A = x(1);
V = x(2);
P = x(3);
bc = x(4);
end
Let's go through line by line. First, I define the function f that I want to minimize. This isn't too hard. To minimize a function in Matlab, it needs to take a single vector as a parameter. Therefore we have to pack our four parameters into a vector, which I do in the first four lines. I used values that are close, but not the same, as the ones that I used to generate the data.
Then I define the function I want to minimize. It takes a single argument x, which it unpacks and feeds to the targetfunction, along with the points b in our dataset. Hopefully these are close to y. We measure how far they are from y by subtracting from y and applying the function norm, which squares every component, adds them up and takes the square root (i.e. it computes the root mean square error).
Then I call fminunc with our function to be minimized, and the initial guess for the parameters. This uses an internal routine to find the closest match for each of the parameters, and returns them in the vector x.
Finally, I unpack the parameters from the vector x.
Putting it all together
We now have all the components we need, so we just want one final function to tie them together. Here it is:
function master
%# Generate some data (you should read in your own data here)
[f b] = generateData(1000,1);
%# Find the best fitting parameters
[A V P bc] = bestfit(f,b);
%# Print them to the screen
fprintf('A = %f\n',A)
fprintf('V = %f\n',V)
fprintf('P = %f\n',P)
fprintf('bc = %f\n',bc)
%# Make plots of the data and the function we have fitted
plot(b,f,'.');
hold on
plot(sort(b),targetfunction(A,V,P,bc,sort(b)),'r','LineWidth',2)
end
If I run this function, I see this being printed to the screen:
>> master
Local minimum found.
Optimization completed because the size of the gradient is less than
the default value of the function tolerance.
A = 1.991727
V = 0.979819
P = 0.695265
bc = 0.067431
And the following plot appears:
That fit looks good enough to me. Let me know if you have any questions about anything I've done here.
I am a bit surprised as you mention f(a) and your function does not contain an a, but in general, suppose you want to plot f(x) = cos(x)^2
First determine for which values of x you want to make a plot, for example
xmin = 0;
stepsize = 1/100;
xmax = 6.5;
x = xmin:stepsize:xmax;
y = cos(x).^2;
plot(x,y)
However, note that this approach works just as well in excel, you just have to do some work to get your x values and function in the right cells.

How can I speed up this call to quantile in Matlab?

I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function, with the result that 2/3 of the computing time is used in the function levels:
The function levels takes a matrix of floats and splits each column into nLevels buckets, returning a matrix of the same size as the input, with each entry replaced by the number of the bucket it falls into.
To do this I use the quantile function to get the bucket limits, and a loop to assign the entries to buckets. Here's my implementation:
function [Y q] = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q)
q=transpose(q);
end
Y = zeros(size(X));
for i = 1:nLevels
% "The variables g and l indicate the entries that are respectively greater than
% or less than the relevant bucket limits. The line Y(g & l) = i is assigning the
% value i to any element that falls in this bucket."
if i ~= nLevels % "The default; doesnt include upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#lt,X,q(i+1,:));
else % "For the final level we include the upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#le,X,q(i+1,:));
end
Y(g & l) = i;
end
Is there anything I can do to speed this up? Can the code be vectorized?
If I understand correctly, you want to know how many items fell in each bucket.
Use:
n = hist(Y,nbins)
Though I am not sure that it will help in the speedup. It is just cleaner this way.
Edit : Following the comment:
You can use the second output parameter of histc
[n,bin] = histc(...) also returns an index matrix bin. If x is a vector, n(k) = >sum(bin==k). bin is zero for out of range values. If x is an M-by-N matrix, then
How About this
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
Y = zeros(size(X));
for i = 1:numel(q)-1
Y = Y+ X>=q(i);
end
This results in the following:
>>X = [3 1 4 6 7 2];
>>[Y, q] = levels(X,2)
Y =
1 1 2 2 2 1
q =
1 3.5 7
You could also modify the logic line to ensure values are less than the start of the next bin. However, I don't think it is necessary.
I think you shoud use histc
[~,Y] = histc(X,q)
As you can see in matlab's doc:
Description
n = histc(x,edges) counts the number of values in vector x that fall
between the elements in the edges vector (which must contain
monotonically nondecreasing values). n is a length(edges) vector
containing these counts. No elements of x can be complex.
I made a couple of refinements (including one inspired by Aero Engy in another answer) that have resulted in some improvements. To test them out, I created a random matrix of a million rows and 100 columns to run the improved functions on:
>> x = randn(1000000,100);
First, I ran my unmodified code, with the following results:
Note that of the 40 seconds, around 14 of them are spent computing the quantiles - I can't expect to improve this part of the routine (I assume that Mathworks have already optimized it, though I guess that to assume makes an...)
Next, I modified the routine to the following, which should be faster and has the advantage of being fewer lines as well!
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
Y = ones(size(X));
for i = 2:nLevels
Y = Y + bsxfun(#ge,X,q(i,:));
end
The profiling results with this code are:
So it is 15 seconds faster, which represents a 150% speedup of the portion of code that is mine, rather than MathWorks.
Finally, following a suggestion of Andrey (again in another answer) I modified the code to use the second output of the histc function, which assigns entries to bins. It doesn't treat the columns independently, so I had to loop over the columns manually, but it seems to be performing really well. Here's the code:
function [Y q] = levels(X,nLevels)
p = linspace(0,1,nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
q(end,:) = 2 * q(end,:);
Y = zeros(size(X));
for k = 1:size(X,2)
[junk Y(:,k)] = histc(X(:,k),q(:,k));
end
And the profiling results:
We now spend only 4.3 seconds in codes outside the quantile function, which is around a 500% speedup over what I wrote originally. I've spent a bit of time writing this answer because I think it's turned into a nice example of how you can use the MATLAB profiler and StackExchange in combination to get much better performance from your code.
I'm happy with this result, although of course I'll continue to be pleased to hear other answers. At this stage the main performance increase will come from increasing the performance of the part of the code that currently calls quantile. I can't see how to do this immediately, but maybe someone else here can. Thanks again!
You can sort the columns and divide+round the inverse indexes:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
[S,IX]=sort(X);
[grid1,grid2]=ndgrid(1:size(IX,1),1:size(IX,2));
invIX=zeros(size(X));
invIX(sub2ind(size(X),IX(:),grid2(:)))=grid1;
Y=ceil(invIX/size(X,1)*nLevels);
Or you can use tiedrank:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
R=tiedrank(X);
Y=ceil(R/size(X,1)*nLevels);
Surprisingly, both these solutions are slightly slower than the quantile+histc solution.