Matlab logncdf function is not producing expected result - matlab

So on this problem it seems pretty straight forward we are given
mean of x = 10,281 and sigma of x = 4112.4
We are asked to determine P(X<15,000)
Now I thought the code for this in matlab should be super straightforward
mu = 10281
sigma = 4112.4
p = logncdf(15000,10281,4112.4)
However this gives
p = .0063
The given answer is .8790 and just looking at p you can tell it is wrong because we are at 15000 which is over the mean which means it should be above .5. What is the deal with this function?
I saw somewhere you might need to take the exp(15000) for x in the function that results in a probability of 1 which is too high.
Any pointers would be much appreciated

%If X is lognormally distributed with parameters:-
mu = 10281;
sigma = 4112.4;
%then log(X) is normally distributed with following parameters:
mew_actual = log((mu^2)/sqrt(sigma^2+mu^2));
sigma_actual = sqrt(log((sigma^2)/(mu^2) +1));
Now you can use either of the following to compute CDF:-
p = cdf('Normal',log(15000),mew_actual,sigma_actual)
or
p=logncdf(15000,mew_actual,sigma_actual)
which gives 0.8796
(which I believe is the correct answer)
The answer given to you is 0.8790 because if you solve the question by hand, you'll get something like: z = 1.172759 and when you look this value in the table, you can only find z = 1.17(without the rest of decimal places) and for which φ(z)=0.8790.
You can verify the exact answer using this calculator. The related screenshot is attached below:

Related

Definite integration using int command

Firstly, I'm quite new to Matlab.
I am currently trying to do a definite integral with respect to y of a particular function. The function that I want to integrate is
(note that the big parenthesis is multiplying with the first factor - I can't get the latex to not make it look like power)
I have tried plugging the above integral into Desmos and it worked as intended. My plan was to vary the value of x and y and will be using for loop via matlab.
However, after trying to use the int function to calculate the definite integral with the code as follow:
h = 5;
a = 2;
syms y
x = 3.8;
p = 2.*x.^2+2.*a.*y;
q = x.^2+y.^2;
r = x.^2+a.^2;
f = (-1./sqrt(1-(p.^2./(4.*q.*r)))).*(2.*sqrt(q).*sqrt(r).*2.*a-p.*2.*y.*sqrt(r)./sqrt(q))./(4.*q.*r);
theta = int(f,y,a+0.01,h) %the integral is undefined at y=2, hence the +0.01
the result is not quite as expected
theta =
int(-((8*461^(1/2)*(y^2 + 361/25)^(1/2))/5 - (461^(1/2)*y*(8*y + 1444/25))/(5*(y^2 + 361/25)^(1/2)))/((1 - (4*y + 722/25)^2/((1844*y^2)/25 + 665684/625))^(1/2)*((1844*y^2)/25 + 665684/625)), y, 21/10, 5)
After browsing through various posts, the common mistake is the undefined interval but the +0.01 should have fixed it. Any guidance on what went wrong is much appreciated.
The Definite Integrals example in the docs shows exactly this type of output when a closed form cannot be computed. You can approximate it numerically using vpa, i.e.
F = int(f,y,a,h);
theta = vpa(F);
Or you can do a numerical computation directly
theta = vpaintegral(f,y,a,h);
From the docs:
The vpaintegral function is faster and provides control over integration tolerances.

Small bug in MATLAB R2017B LogLikelihood after fitnlm?

Background: I am working on a problem similar to the nonlinear logistic regression described in the link [1] (my problem is more complicated, but link [1] is enough for the next sections of this post). Comparing my results with those obtained in parallel with a R package, I got similar results for the coefficients, but (very approximately) an opposite logLikelihood.
Hypothesis: The logLikelihood given by fitnlm in Matlab is in fact the negative LogLikelihood. (Note that this impairs consequently the BIC and AIC computation by Matlab)
Reasonning: in [1], the same problem is solved through two different approaches. ML-approach/ By defining the negative LogLikelihood and making an optimization with fminsearch. GLS-approach/ By using fitnlm.
The negative LogLikelihood after the ML-approach is:380
The negative LogLikelihood after the GLS-approach is:-406
I imagine the second one should be at least multiplied by (-1)?
Questions: Did I miss something? Is the (-1) coefficient enough, or would this simple correction not be enough?
Self-contained code:
%copy-pasting code from [1]
myf = #(beta,x) beta(1)*x./(beta(2) + x);
mymodelfun = #(beta,x) 1./(1 + exp(-myf(beta,x)));
rng(300,'twister');
x = linspace(-1,1,200)';
beta = [10;2];
beta0=[3;3];
mu = mymodelfun(beta,x);
n = 50;
z = binornd(n,mu);
y = z./n;
%ML Approach
mynegloglik = #(beta) -sum(log(binopdf(z,n,mymodelfun(beta,x))));
opts = optimset('fminsearch');
opts.MaxFunEvals = Inf;
opts.MaxIter = 10000;
betaHatML = fminsearch(mynegloglik,beta0,opts)
neglogLH_MLApproach = mynegloglik(betaHatML);
%GLS Approach
wfun = #(xx) n./(xx.*(1-xx));
nlm = fitnlm(x,y,mymodelfun,beta0,'Weights',wfun)
neglogLH_GLSApproach = - nlm.LogLikelihood;
Source:
[1] https://uk.mathworks.com/help/stats/examples/nonlinear-logistic-regression.html
This answer (now) only details which code is used. Please see Tom Lane's answer below for a substantive answer.
Basically, fitnlm.m is a call to NonLinearModel.fit.
When opening NonLinearModel.m, one gets in line 1209:
model.LogLikelihood = getlogLikelihood(model);
getlogLikelihood is itself described between lines 1234-1251.
For instance:
function L = getlogLikelihood(model)
(...)
L = -(model.DFE + model.NumObservations*log(2*pi) + (...) )/2;
(...)
Please also not that this notably impacts ModelCriterion.AIC and ModelCriterion.BIC, as they are computed using model.LogLikelihood ("thinking" it is the logLikelihood).
To get the corresponding formula for BIC/AIC/..., type:
edit classreg.regr.modelutils.modelcriterion
this is Tom from MathWorks. Take another look at the formula quoted:
L = -(model.DFE + model.NumObservations*log(2*pi) + (...) )/2;
Remember the normal distribution has a factor (1/sqrt(2*pi)), so taking logs of that gives us -log(2*pi)/2. So the minus sign comes from that and it is part of the log likelihood. The property value is not the negative log likelihood.
One reason for the difference in the two log likelihood values is that the "ML approach" value is computing something based on the discrete probabilities from the binomial distribution. Those are all between 0 and 1, and they add up to 1. The "GLS approach" is computing something based on the probability density of the continuous normal distribution. In this example, the standard deviation of the residuals is about 0.0462. That leads to density values that are much higher than 1 at the peak. So the two things are not really comparable. You would need to convert the normal values to probabilities on the same discrete intervals that correspond to individual outcomes from the binomial distribution.

Matlab's solve: Solution not satisfying the equation

I am trying to solve equations with this code:
a = [-0.0008333 -0.025 -0.6667 -20];
length_OnePart = 7.3248;
xi = -6.4446;
yi = -16.5187;
syms x y
[sol_x,sol_y] = solve(y == poly2sym(a), ((x-xi)^2+(y-yi)^2) == length_OnePart^2,x,y,'Real',true);
sol_x = sym2poly(sol_x);
sol_y = sym2poly(sol_y);
The sets of solution it is giving are (-23.9067,-8.7301) and (11.0333,-24.2209), which are not even satisfying the equation of circle. How can I rectify this problem?
If you're trying to solve for the intersection of the cubic and the circle, i.e., where y==poly2sym(a) equals (x-xi)^2+(y-yi)^2==length_OnePart^2 it looks like solve may be confused about something when the circle is represented parametrically rather than as single valued functions. It might also have to do with the fact that x and y are not independent solutions, but rather that the latter depends on the former. It also could depend on the use of a numeric solver in this case. solve seems to work fine with similar inputs to yours, so you might report this behavior to the MathWorks to see what they think.
In any case, here is a better, more efficient way to to tackle this as a root-solving problem (as opposed to simultaneous equations):
a = [-0.0008333 -0.025 -0.6667 -20];
length_OnePart = 7.3248;
xi = -6.4446;
yi = -16.5187;
syms x real
f(x) = poly2sym(a);
sol_x = solve((x-xi)^2+(f(x)-yi)^2==length_OnePart^2,x)
sol_y = f(sol_x)
which returns:
sol_x =
0.00002145831413371390464567553686047
-13.182825373861454619370838716408
sol_y =
-20.000014306269544436430325843024
-13.646590348358951818881695033728
Note that you might get slightly more accurate results (one solution is clearly at 0,-20) if you represent your coefficients and parameters more precisely then just four decimal places, e.g., a = [-1/1200 -0.025 -2/3 -20]. In fact, solve might be able to find one or more solutions exactly, if you provide exact representations.
Also, in your code, the calls to sym2poly are doing nothing other than converting back to floating-point (double can be used for this) as the inputs are not in the form of symbolic polynomial equations.

How can I plot data to a “best fit” cos² graph in Matlab?

I’m currently a Physics student and for several weeks have been compiling data related to ‘Quantum Entanglement’. I’ve now got to a point where I have to plot my data (which should resemble a cos² graph - and does) to a sort of “best fit” cos² graph. The lab script says the following:
A more precise determination of the visibility V (this is basically how 'clean' the data is) follows from the best fit to the measured data using the function:
f(b) = A/2[1-Vsin(b-b(center)/P)]
Granted this probably doesn’t mean much out of context, but essentially A is the amplitude, b is an angle and P is the periodicity. Hence this is also a “wave” like the experimental data I have found.
From this I understand, as previously mentioned, I am making a “best fit” curve. However, I have been told that this isn’t possible with Excel and that the best approach is Matlab.
I know intermediate JavaScript but do not know Matlab and was hoping for some direction.
Is there a tutorial I can read for this? Is it possible for someone to go through it with me? I really have no idea what it entails, so any feed back would be greatly appreciated.
Thanks a lot!
Initial steps
I guess we should begin by getting a representation in Matlab of the function that you're trying to model. A direct translation of your formula looks like this:
function y = targetfunction(A,V,P,bc,b)
y = (A/2) * (1 - V * sin((b-bc) / P));
end
Getting hold of the data
My next step is going to be to generate some data to work with (you'll use your own data, naturally). So here's a function that generates some noisy data. Notice that I've supplied some values for the parameters.
function [y b] = generateData(npoints,noise)
A = 2;
V = 1;
P = 0.7;
bc = 0;
b = 2 * pi * rand(npoints,1);
y = targetfunction(A,V,P,bc,b) + noise * randn(npoints,1);
end
The function rand generates random points on the interval [0,1], and I multiplied those by 2*pi to get points randomly on the interval [0, 2*pi]. I then applied the target function at those points, and added a bit of noise (the function randn generates normally distributed random variables).
Fitting parameters
The most complicated function is the one that fits a model to your data. For this I use the function fminunc, which does unconstrained minimization. The routine looks like this:
function [A V P bc] = bestfit(y,b)
x0(1) = 1; %# A
x0(2) = 1; %# V
x0(3) = 0.5; %# P
x0(4) = 0; %# bc
f = #(x) norm(y - targetfunction(x(1),x(2),x(3),x(4),b));
x = fminunc(f,x0);
A = x(1);
V = x(2);
P = x(3);
bc = x(4);
end
Let's go through line by line. First, I define the function f that I want to minimize. This isn't too hard. To minimize a function in Matlab, it needs to take a single vector as a parameter. Therefore we have to pack our four parameters into a vector, which I do in the first four lines. I used values that are close, but not the same, as the ones that I used to generate the data.
Then I define the function I want to minimize. It takes a single argument x, which it unpacks and feeds to the targetfunction, along with the points b in our dataset. Hopefully these are close to y. We measure how far they are from y by subtracting from y and applying the function norm, which squares every component, adds them up and takes the square root (i.e. it computes the root mean square error).
Then I call fminunc with our function to be minimized, and the initial guess for the parameters. This uses an internal routine to find the closest match for each of the parameters, and returns them in the vector x.
Finally, I unpack the parameters from the vector x.
Putting it all together
We now have all the components we need, so we just want one final function to tie them together. Here it is:
function master
%# Generate some data (you should read in your own data here)
[f b] = generateData(1000,1);
%# Find the best fitting parameters
[A V P bc] = bestfit(f,b);
%# Print them to the screen
fprintf('A = %f\n',A)
fprintf('V = %f\n',V)
fprintf('P = %f\n',P)
fprintf('bc = %f\n',bc)
%# Make plots of the data and the function we have fitted
plot(b,f,'.');
hold on
plot(sort(b),targetfunction(A,V,P,bc,sort(b)),'r','LineWidth',2)
end
If I run this function, I see this being printed to the screen:
>> master
Local minimum found.
Optimization completed because the size of the gradient is less than
the default value of the function tolerance.
A = 1.991727
V = 0.979819
P = 0.695265
bc = 0.067431
And the following plot appears:
That fit looks good enough to me. Let me know if you have any questions about anything I've done here.
I am a bit surprised as you mention f(a) and your function does not contain an a, but in general, suppose you want to plot f(x) = cos(x)^2
First determine for which values of x you want to make a plot, for example
xmin = 0;
stepsize = 1/100;
xmax = 6.5;
x = xmin:stepsize:xmax;
y = cos(x).^2;
plot(x,y)
However, note that this approach works just as well in excel, you just have to do some work to get your x values and function in the right cells.

Matlab Code To Approximate The Exponential Function

Does anyone know how to make the following Matlab code approximate the exponential function more accurately when dealing with large and negative real numbers?
For example when x = 1, the code works well, when x = -100, it returns an answer of 8.7364e+31 when it should be closer to 3.7201e-44.
The code is as follows:
s=1
a=1;
y=1;
for k=1:40
a=a/k;
y=y*x;
s=s+a*y;
end
s
Any assistance is appreciated, cheers.
EDIT:
Ok so the question is as follows:
Which mathematical function does this code approximate? (I say the exponential function.) Does it work when x = 1? (Yes.) Unfortunately, using this when x = -100 produces the answer s = 8.7364e+31. Your colleague believes that there is a silly bug in the program, and asks for your assistance. Explain the behaviour carefully and give a simple fix which produces a better result. [You must suggest a modification to the above code, or it's use. You must also check your simple fix works.]
So I somewhat understand that the problem surrounds large numbers when there is 16 (or more) orders of magnitude between terms, precision is lost, but the solution eludes me.
Thanks
EDIT:
So in the end I went with this:
s = 1;
x = -100;
a = 1;
y = 1;
x1 = 1;
for k=1:40
x1 = x/10;
a = a/k;
y = y*x1;
s = s + a*y;
end
s = s^10;
s
Not sure if it's completely correct but it returns some good approximations.
exp(-100) = 3.720075976020836e-044
s = 3.722053303838800e-044
After further analysis (and unfortunately submitting the assignment), I realised increasing the number of iterations, and thus increasing terms, further improves efficiency. In fact the following was even more efficient:
s = 1;
x = -100;
a = 1;
y = 1;
x1 = 1;
for k=1:200
x1 = x/200;
a = a/k;
y = y*x1;
s = s + a*y;
end
s = s^200;
s
Which gives:
exp(-100) = 3.720075976020836e-044
s = 3.720075976020701e-044
As John points out in a comment, you have an error inside the loop. The y = y*k line does not do what you need. Look more carefully at the terms in the series for exp(x).
Anyway, I assume this is why you have been given this homework assignment, to learn that series like this don't converge very well for large values. Instead, you should consider how to do range reduction.
For example, can you use the identity
exp(x+y) = exp(x)*exp(y)
to your advantage? Suppose you store the value of exp(1) = 2.7182818284590452353...
Now, if I were to ask you to compute the value of exp(1.3), how would you use the above information?
exp(1.3) = exp(1)*exp(0.3)
But we KNOW the value of exp(1) already. In fact, with a little thought, this will let you reduce the range for an exponential down to needing the series to converge rapidly only for abs(x) <= 0.5.
Edit: There is a second way one can do range reduction using a variation of the same identity.
exp(x) = exp(x/2)*exp(x/2) = exp(x/2)^2
Thus, suppose you wish to compute the exponential of large number, perhaps 12.8. Getting this to converge acceptably fast will take many terms in the simple series, and there will be a great deal of subtractive cancellation happening, so you won't get good accuracy anyway. However, if we recognize that
12.8 = 2*6.4 = 2*2*3.2 = ... = 16*0.8
then IF you could efficiently compute the exponential of 0.8, then the desired value is easy to recover, perhaps by repeated squaring.
exp(12.8)
ans =
362217.449611248
a = exp(0.8)
a =
2.22554092849247
a = a*a;
a = a*a;
a = a*a;
a = a*a
362217.449611249
exp(0.8)^16
ans =
362217.449611249
Note that WHENEVER you do range reduction using methods like this, while you may incur numerical problems due to the additional computations necessary, you will usually come out way ahead due to the greatly enhanced convergence of your series.
Why do you think that's the wrong answer? Look at the last term of that sequence, and it's size, and tell me why you expect you should have an answer that's close to 0.
My original answer stated that roundoff error was the problem. That will be a problem with this basic approach, but why do you think 40 is enough terms for the appropriate mathematical ( as opposed to computer floating point arithmetic) answer.
100^40 / 40! ~= 10^31.
Woodchip has the right idea with range reduction. That's the typical approach people use to implement these kinds of functions very quickly. Once you get that all figured out, you deal with roundoff errors of alternating sequences, by summing adjacent terms within the loop, and stepping with k = 1 : 2 : 40 (for instance). That doesn't work here until you use woodchips's idea because for x = -100, the summands grow for a very long time. You need |x| < 1 to guarantee intermediate terms are shrinking, and thus a rewrite will work.