How to use matlab to fit data with negative binomial distribution under given p?
Looks like a job for mle in the statistics toolbox. You'll need to express the negative binomial distribution (or the log of it, which will probably be easier) as a function of p and whatever else, and invent some starting parameters to hand in.
If you mean fit as a function of the other parameter R with P fixed, the following shows how to use mle to fix the value of one parameter and estimate the other:
x = nbinrnd(20,.5,1000,1);
params = nbinfit(x) % unconstrained fit
r = mle(x,'pdf',#(x,r)nbinpdf(x,r,.5),'start',23) % constrain P=0.5
% Plot log likelihood as a function of R
rr = linspace(15,25);
yy = zeros(size(rr));
for j=1:length(rr)
yy(j) = sum(log(nbinpdf(x,rr(j),.5)));
end
plot(rr,yy,'-',...
params(1),sum(log(nbinpdf(x,params(1),params(2)))),'o')
legend(sprintf('r=%f,p=.5',r), sprintf('r=%f,p=%f',params),'location','sw')
Related
I have a histogram that seems to fit a poisson distribution.
In order to fit it, I declare the function myself as follows
xdata; ydata; % Arrays in which I have stored the data.
%Ydata tell us how many times the xdata is repeated in the set.
fun= #(x,xdata) (exp(-x(1))*(x(1).^(xdata)) )/(factorial(xdata)) %Function I
% want to use in the fit. It is a poisson distribution.
x0=[1]; %Approximated value of the parameter lambda to help the fit
p=lsqcurvefit(fun,x0,xdata,ydata); % Fit in the least square sense
I find an error. It probably has to do with the "factorial". Any ideas?
Factorial outputs a vector from vector xdata. Why are you using .xdata in factorial?
For example:
data = [1 2 3];
factorial(data) is then [1! 2! 3!].
Try ./factorial(xdata) (I cannot recall if the dot is even necessary at this case.)
You need to use gamma(xdata+1) function instead of factorial(xdata) function. Gamma function is a generalized form of factorial function which can be used for real and complex numbers. Thus, your code would be:
fun = #(x,xdata) exp(-x(1))*x(1).^xdata./gamma(xdata+1);
x = lsqcurvefit(fun,1,xdata,ydata);
Alternatively, you can MATLAB fitdist function which is already optimized and you might get better results:
pd = fitdist(xdata,'Poisson','Frequency',ydata);
pd.lambda
I need to plot the probability density function {p(z,phi)}and need to integrate it,as shown in the attached eq.#1
enter image description here
where Af and Vf are constants,
phi is angle,
z is distance(numerical value, can be in decimals)
The P(z,phi) will be the force values along with respective different values of z and phi.
Could someone guide me, on MATLAB, how can I write these set of equations?
To intregate your function, you should either create an m-file or anonymous function
f = #(z,phi) P(z,phi) * p(z,phi)
where you construct P and p similarly. Then you will need to use one of the numerical integrators, such as ode45 to integrate f twice... once over f and once over phi.
If I understand you correctly you multiply a uniform probability distribution between -lf/2 and lf/2 with another probability distribution that looks like the first quarter of a sine wave. You want to know the resulting probability distribution.
Basically if lf/2 > pi/2 you end up with the same distribution. The sine-distribution is entirely inside the uniform distribution. If (lf/2)<(pi/2) the uniform-distribution chops of part of your sine-distribution. You then want to divide your probability distribution by the part you choped off so the integral stays one. It must remain a probability distribution.
The integral of sin(x) is cos(x). So in that case you devide by (1-cos(lf/2))
Below is a script that makes it more visible:
lf=2;
xx = linspace(-lf,lf,1E4);
p1 = (xx>-lf/2&xx<lf/2)*(1/lf);
p2 = zeros(size(xx));
p2(xx>0&xx<pi/2) = sin(xx(xx>0&xx<pi/2));
p3 = p2.*p1.*lf;
if lf<pi
p3 = p3./(1-cos(lf/2));
end
plot(xx,p1,xx,p2,xx,p3)
legend({'uniform distribution','sine','result'})
%integrals (actually Riemann sums):
sum(p1.*(xx(2)-xx(1)))
sum(p2.*(xx(2)-xx(1)))
sum(p3.*(xx(2)-xx(1)))
Suppose I have a function in Matlab calc(x,a,b) which outputs a scalar. a and b are constants, x is treated as multivariate. How do I minimize calc(x,a,b) with respect to x in Matlab?
edit: The content of the function creates a vector $v(x)$ and a matrix $A(x)$ and then computes $v(x)'*A(x)^(-1)*v(x)$
This is a fairly general question with a million possible responses depending on what calc is. (For instance, Can you provide gradients for calc? Does x need to take on values in a specific range?)
But, as a start, go for fminunc. It is for functions where you have no gradient information available and you want to find an unconstrained minimum.
Sample Code:
Suppose you want to minimize dot(x,x).
calc = #(x,a,b) dot(x,x)
calc_to_pass_to_fminunc = #(x) calc(x,1,2)
X = fminunc(calc_to_pass_to_fminunc,ones(3,1))
Gives:
Warning: Gradient must be provided for trust-region algorithm;
using line-search algorithm instead.
> In fminunc at 383
Local minimum found.
Optimization completed because the size of the gradient is less than
the default value of the function tolerance.
<stopping criteria details>
X =
0
0
0
The easy answer is: if a and b are constants, and x is a one-dimensional variable, it's a 1-D optimization problem.
The previous answer suggests to usefminunc, which is part of the MATLAB Optimization Toolbox. If you don't have it, you can use fminbnd instead of it, which works just well in case of 1-D optimization in a given interval.
As example, let's say your calc function is:
function [y] = calc(x,a,b)
y = x.^3-2*x-5+a-b;
end
This is what you should do to find the minimum in the interval x1 < x < x2:
% constants
a = 1;
b = 2;
% boundaries of search interval
x1 = 0;
x2 = 2;
x = fminbnd(#(x)calc(x,a,b), x1, x2);
% value of function at the minimum
y = calc(x,y,a);
In the case of the x variable not being a scalar, you could use the analogous of fminbnd for a multidimensional variable: fminsearch, which performs an unconstrained search for the minimum of a multivariate function.
Addendum
fminbnd is a nice tool, but sometimes it's hard to make it behave as you expect. Of course you can specify the desired accuracy and a maximum number of iterations for converging in the options, but in my experience fminbnd might have problems with highly non-linear functions.
In these situations it's desirable to have a finer control on the optimization procedure, and especially on how it's defined the search interval. Given the search interval, arrayfun provides an elegant way to iterate over an array for finding a minimum of the function. Sample code:
% constants
a = 1;
b = 2;
% search interval
xi = linspace(0,2,1000);
yi = arrayfun(#(x)calc(x,a,b), xi);
% value of function at the minimum
[y, idx_m] = min(yi);
% location of minimum
x = xi(idx_m);
The drawback of this approach is that, in order to achieve a high accuracy, you might need a very long array xi. Good thing is that there are several ways to mitigate this issue: for instance, one could use a vector of log-spaced sampling points, or perform a multi-step minimization narrowing and increasing the sampling frequency at each step until the desired accuracy is achieved.
Examples for optimizations with functions like fmincon.m and fminsearchbnd.m usually minimize objective functions that are relatively simple. With simple I mean that the objective function only consists of some algebraic expression, e.g. the Rosenbrock formula.
In my problem, on the other hand, the objective function consists of several steps, including
computing an L2-norm misfit between an observed data point and a set of n training data points (n~5e4)
selecting those data points from the training data set that give the lowest misfit
then using the row indices of this selected subset to compute the final distance that I intend to minimize.
i.e. I perform operations that cannot be formulated as a single mathematical expression. Can I use such an objective function with tools like fminsearchbnd.m or fmincon.m at all? My results so far are not very promising...
There is an easy and obvious solution for that. You fminsearch() to find a minimum for some self-defined functions. In my example, it is fitting a polynomial, which of course is easy, but the trick is, that this could be anything. You can access the data if you make your objective function as a nested function, so they share the same variable scope.
You can start from the following code and fill in everything you want to do part by part and maybe ask followup questions, if any come up.
function main
verbose = 1; % some output
% optimize something, maybe a distorted polynomial
x = sort(rand(20,1));
p_original = [1.5, 3, 2, 1];
y = polyval(p_original,x) + 0.5*(rand(size(x))-0.5);
% optimize polynomial of order order. This is an example of how to pass
% a parameter to the fit function.
order = 3;
% obvious solution is this, but we want to do something else
p_polyfit = polyfit(x,y,order)
% we want to do it a bit more complex
pfit = optimize_something(x, y, order, verbose)
% what is happening?
figure
plot(x,polyval(p_original,x),'k-')
hold on
plot(x,y,'ko')
plot(x,polyval(p_polyfit,x),'rs-')
plot(x,fit_function(x,pfit),'gx-')
legend('original','noisy','polyfit','optimization')
end
function pfit = optimize_something(x,y, order, verbose)
% for polynomial of order order we need order+1 coefficients
p0 = ones(1,order+1); % initial guess: all coefficients are 1
if verbose
fprintf('optimize_something calling fminsearch(#objFun)\n');
end
% hand over only p0 to our objective function
pfit = fminsearch(#objFun, p0);
% ------------------------- NESTED objFUN --------------------------------%
function e = objFun(p)
% This function accepts only p as parameter and returns a value e, which
% will be minimized by some metric (maybe least squares).
% Since this function is nested, it can use also the predefined variables x, y (and also p0 and verbose).
% The magic is, we calculate a value yfitted out of x and p by a
% fit_function. This function can really be anything!
yfitted = fit_function(x, p);
e = sum((yfitted-y).^2);
% e = sum(abs(yfitted-y)); % another possibility
end
% ------------------------- NESTED objFUN --------------------------------%
if verbose
disp('pfit found')
end
end
function yfitted = fit_function(x, p)
% In our example we want to fit a polynomial, so we do so. We evaluate the
% polynomial p at x.
yfitted = polyval(p,x);
% But it could be anything, really.. each value in p could be something
% else, maybe the sum of an exponential function and a straight line
% yfitted = p(1)*exp(p(2)*x) + p(3)*x + p(4);
end
You can try to use CVX. It is an addon for Matlab that lets you describe your optimisation problem with normal Matlab code.
Alternatively, write down your objective function including any constraints. Your description is not clear to me, and it would help you too, if you would write this down in actual formulae.
I read your steps as this:
"Computing an L2-norm between an observed data point and a set of n training data points." It seems that there is a total of one (1) observed data points. Let's call the observed point x. Let's call the training data points y_i for i=1..n.
The L2-Norm is: |x-y_i|.
"Selecting those data points [multiple?] that give the lowest misfit". You haven't said how many data points you want, and how you'd combine multiple points to give a single L2-Norm. Let's assume you want exactly one such point (the closest to the observed data point x). Thus you get: argmin (over i) |x-y_i|. If you have multiple, you could greedily take the k closest points.
"Then using the row indices of this selected subset to compute the final distance that I intend to minimize." And what is the final distance that you intend to minimize?
Is it possible to compute the numerical hessian matrix for this function with respect to W_i,C, epsilon_i easily Matlab? I have computed a hessian by manually take a derivative, but I want to verify if my result is correct.
W = Nx1;
X = NxM;
X_i = Nx1;
y = 1xM;
C = 1x1;
DERIVEST on the file exchange has a function for doing this. There are also tips for doing this eg in Section 18 of this tutorial, or many other places.