Polynomial Constrained Least Squares curve fitting with matlab - matlab

I want to fit and draw a curve that is constrained with the following boundary condition:
diff (yfit)<=0
where yfit is the polynomial fitted function to degree n.
The condition ensures that the slope of the polynomial to any degree of is non-positive for all x .
How can I apply the condition using the "polyfit" function or any other polynomial fitting function?

From my limited point of mathematical view, a polynomial function of degree 2 for example has by definition a reagion with positive and negative slope.
One thing you can try is using absolute values:
Build your own fitting (ie least square is easy = polyfit) and dont use polynomial
Functions, but absolute functions thereof.
Least sqare: take 0 = d/da ( sum( func-point)^2 ) and this for each order.. Wikipedia and others provide in depth descriptions.

Related

Mixture of 1D Gaussians fit to data in Matlab / Python

I have a discrete curve y=f(x). I know the locations and amplitudes of peaks. I want to approximate the curve by fitting a gaussian at each peak. How should I go about finding the optimized gaussian parameters ? I would like to know if there is any inbuilt function which will make my task simpler.
Edit
I have fixed mean of gaussians and tried to optimize on sigma using
lsqcurvefit() in matlab. MSE is less. However, I have an additional hard constraint that the value of approximate curve should be equal to the original function at the peaks. This constraint is not satisfied by my model. I am pasting current working code here. I would like to have a solution which obeys the hard constraint at peaks and approximately fits the curve at other points. The basic idea is that the approximate curve has fewer parameters but still closely resembles the original curve.
fun = #(x,xdata)myFun(x,xdata,pks,locs); %pks,locs are the peak locations and amplitudes already available
x0=w(1:6)*0.25; % my initial guess based on domain knowledge
[sigma resnorm] = lsqcurvefit(fun,x0,xdata,ydata); %xdata and ydata are the original curve data points
recons = myFun(sigma,xdata,pks,locs);
figure;plot(ydata,'r');hold on;plot(recons);
function f=myFun(sigma,xdata,a,c)
% a is constant , c is mean of individual gaussians
f=zeros(size(xdata));
for i = 1:6 %use 6 gaussians to approximate function
f = f + a(i) * exp(-(xdata-c(i)).^2 ./ (2*sigma(i)^2));
end
end
If you know your peak locations and amplitudes, then all you have left to do is find the width of each Gaussian. You can think of this as an optimization problem.
Say you have x and y, which are samples from the curve you want to approximate.
First, define a function g() that will construct the approximation for given values of the widths. g() takes a parameter vector sigma containing the width of each Gaussian. The locations and amplitudes of the Gaussians will be constrained to the values you already know. g() outputs the value of the sum-of-gaussians approximation at each point in x.
Now, define a loss function L(), which takes sigma as input. L(sigma) returns a scalar that measures the error--how badly the given approximation (using sigma) differs from the curve you're trying to approximate. The squared error is a common loss function for curve fitting:
L(sigma) = sum((y - g(sigma)) .^ 2)
The task now is to search over possible values of sigma, and find the choice that minimizes the error. This can be done using a variety of optimization routines.
If you have the Mathworks optimization toolbox, you can use the function lsqnonlin() (in this case you won't have to define L() yourself). The curve fitting toolbox is probably an alternative. Otherwise, you can use an open source optimization routine (check out cvxopt).
A couple things to note. You need to impose the constraint that all values in sigma are greater than zero. You can tell the optimization algorithm about this constraint. Also, you'll need to specify an initial guess for the parameters (i.e. sigma). In this case, you could probably choose something reasonable by looking at the curve in the vicinity of each peak. It may be the case (when the loss function is nonconvex) that the final solution is different, depending on the initial guess (i.e. you converge to a local minimum). There are many fancy techniques for dealing with this kind of situation, but a simple thing to do is to just try with multiple different initial guesses, and pick the best result.
Edited to add:
In python, you can use optimization routines in the scipy.optimize module, e.g. curve_fit().
Edit 2 (response to edited question):
If your Gaussians have much overlap with each other, then taking their sum may cause the height of the peaks to differ from your known values. In this case, you could take a weighted sum, and treat the weights as another parameter to optimize.
If you want the peak heights to be exactly equal to some specified values, you can enforce this constraint in the optimization problem. lsqcurvefit() won't be able to do it because it only handles bound constraints on the parameters. Take a look at fmincon().
you can use Expectation–Maximization algorithm for fitting Mixture of Gaussians on your data. it don't care about data dimension.
in documentation of MATLAB you can lookup gmdistribution.fit or fitgmdist.

Exponential curve fit matlab

I have the following equation:
I want to do a exponential curve fitting using MATLAB for the above equation, where y = f(u,a). y is my output while (u,a) are my inputs. I want to find the coefficients A,B for a set of provided data.
I know how to do this for simple polynomials by defining states. As an example, if states= (ones(size(u)), u u.^2), this will give me L+Mu+Nu^2, with L, M and N being regression coefficients.
However, this is not the case for the above equation. How could I do this in MATLAB?
Building on what #eigenchris said, simply take the natural logarithm (log in MATLAB) of both sides of the equation. If we do this, we would in fact be linearizing the equation in log space. In other words, given your original equation:
We get:
However, this isn't exactly polynomial regression. This is more of a least squares fitting of your points. Specifically, what you would do is given a set of y and set pair of (u,a) points, you would build a system of equations and solve for this system via least squares. In other words, given the set y = (y_0, y_1, y_2,...y_N), and (u,a) = ((u_0, a_0), (u_1, a_1), ..., (u_N, a_N)), where N is the number of points that you have, you would build your system of equations like so:
This can be written in matrix form:
To solve for A and B, you simply need to find the least-squares solution. You can see that it's in the form of:
Y = AX
To solve for X, we use what is called the pseudoinverse. As such:
X = A^{*} * Y
A^{*} is the pseudoinverse. This can eloquently be done in MATLAB using the \ or mldivide operator. All you have to do is build a vector of y values with the log taken, as well as building the matrix of u and a values. Therefore, if your points (u,a) are stored in U and A respectively, as well as the values of y stored in Y, you would simply do this:
x = [u.^2 a.^3] \ log(y);
x(1) will contain the coefficient for A, while x(2) will contain the coefficient for B. As A. Donda has noted in his answer (which I embarrassingly forgot about), the values of A and B are obtained assuming that the errors with respect to the exact curve you are trying to fit to are normally (Gaussian) distributed with a constant variance. The errors also need to be additive. If this is not the case, then your parameters achieved may not represent the best fit possible.
See this Wikipedia page for more details on what assumptions least-squares fitting takes:
http://en.wikipedia.org/wiki/Least_squares#Least_squares.2C_regression_analysis_and_statistics
One approach is to use a linear regression of log(y) with respect to u² and a³:
Assuming that u, a, and y are column vectors of the same length:
AB = [u .^ 2, a .^ 3] \ log(y)
After this, AB(1) is the fit value for A and AB(2) is the fit value for B. The computation uses Matlab's mldivide operator; an alternative would be to use the pseudo-inverse.
The fit values found this way are Maximum Likelihood estimates of the parameters under the assumption that deviations from the exact equation are constant-variance normally distributed errors additive to A u² + B a³. If the actual source of deviations differs from this, these estimates may not be optimal.

Calculate cubic spline integral in matlab with 3 values

I am trying to calculate an integral using spline interpolation with matlab (version R2014a on windows 8).
I have the 3 values of the function (for x=0,0.5,1).
so I have 2 vectors - x and y that contain the values of the function, and I'm executing
cube_spline = spline(x,y);
coefficients = qube_spline.coefs
And I'm expecting to get 2 polynomials, each of degree 3, i.e I'm expecting coefficients to be a matrix of size 2*4, but somewhy I'm getting a matrix that is 1*4, which means only 1 polynomial for 2 panels.
On the other hand, if for example I'm using 4 dots, (i.e 3 panels) then I'm getting that coefficients's size is 3*4 as expected, which means 3 polynomials for 3 panels.
My question is Why does matlab return only 1 polynomial for 2 panels spline, and 3 polynomial for 3 panels spline (or any number that is greater then 2)?
There are multiple possible boundary conditions for splines, e.g.:
second derivatives equals zero on the boundary
given first derivatives on the boundary
periodic conditions, i.e. same first and second derivatives on the boundary
not-a-knot: take the outermost three points to specify the boundary conditions.
It seems spline is using the not-a-knot condition by default. So for three points only a single cubic polynomial is necessary to interpolate your data (a quadratic one would be enough too if it weren't for the not-a-knot condition), so there's no reason for spline to return one spline for each of the two intervals. This is however not a bad thing.
By the way: If all you want is to interpolate the values and don't need the polynomial coefficients, you could go with interp1 instead. You can specify in a simpler way which kind of discontinuities you want. You have the options to go with:
'pchip': C^1 continuity.
Shape-preserving piecewise cubic interpolation. The interpolated value at a query point is based on a shape-preserving piecewise cubic interpolation of the values at neighboring grid points.
integral(#(xs) interp1(x, y, xs, 'pchip'), xmin, xmax)
'spline': C^2 continuity. (Seems to be using the same not-a-knot end conditions as spline.)
Spline interpolation using not-a-knot end conditions. The interpolated value at a query point is based on a cubic interpolation of the values at neighboring grid points in each respective dimension.
integral(#(xs) interp1(x, y, xs, 'spline'), xmin, xmax)

weighted curve fitting with lsqcurvefit

I wanted to fit an arbitrary function to my data set. Therefore, I used lsqcurvefit in MATLAB. Now I want to give weight to the fit procedure, meaning when curve fitting function (lsqcurvefit) is calculating the residue of the fit, some data point are more important than the others. To be more specific I want to use statistical weighting method.
w=1/y(x),
where w is a matrix contains the weight of each data point and y is the data set.
I cannot find anyway to make weighted curve fitting with lsqcurvefit. Is there any trick I should follow or is there any other function rather than lsqcurvefit which do it for me?
For doing weighting, I find it much easier to use lsqnonlin which is the function that lsqcurvefit calls to do the actual fitting.
You first have to define a function that you are trying to minimize, ie. a cost function. You need to pass in your weighting function as an extra parameter to your function as a vector:
x = yourIndependentVariable;
y = yourData;
weightVector = sqrt(abs(1./y));
costFunction = #(A) weightVector.*(yourModelFunction(A) - y);
aFit = lsqnonlin(costFunction,aGuess);
The reason for the square root in the weighting function definition is that lsqnonlin requires the residuals, not the squared residuals or their sum, so you need to pre-unsquare the weights.
Alternatively, if you have the Statistics Toolbox, you can use nlinfit which will accept a weighting vector/matrix as one of the optional inputs.

Find approximation of sine using least squares

I am doing a project where i find an approximation of the Sine function, using the Least Squares method. Also i can use 12 values of my own choice.Since i couldn't figure out how to solve it i thought of using Taylor's series for Sine and then solving it as a polynomial of order 5. Here is my code :
%% Find the sine of the 12 known values
x=[0,pi/8,pi/4,7*pi/2,3*pi/4,pi,4*pi/11,3*pi/2,2*pi,5*pi/4,3*pi/8,12*pi/20];
y=zeros(12,1);
for i=1:12
y=sin(x);
end
n=12;
j=5;
%% Find the sums to populate the matrix A and matrix B
s1=sum(x);s2=sum(x.^2);
s3=sum(x.^3);s4=sum(x.^4);
s5=sum(x.^5);s6=sum(x.^6);
s7=sum(x.^7);s8=sum(x.^8);
s9=sum(x.^9);s10=sum(x.^10);
sy=sum(y);
sxy=sum(x.*y);
sxy2=sum( (x.^2).*y);
sxy3=sum( (x.^3).*y);
sxy4=sum( (x.^4).*y);
sxy5=sum( (x.^5).*y);
A=[n,s1,s2,s3,s4,s5;s1,s2,s3,s4,s5,s6;s2,s3,s4,s5,s6,s7;
s3,s4,s5,s6,s7,s8;s4,s5,s6,s7,s8,s9;s5,s6,s7,s8,s9,s10];
B=[sy;sxy;sxy2;sxy3;sxy4;sxy5];
Then at matlab i get this result
>> a=A^-1*B
a =
-0.0248
1.2203
-0.2351
-0.1408
0.0364
-0.0021
However when i try to replace the values of a in the taylor series and solve f.e t=pi/2 i get wrong results
>> t=pi/2;
fun=t-t^3*a(4)+a(6)*t^5
fun =
2.0967
I am doing something wrong when i replace the values of a matrix in the Taylor series or is my initial thought flawed ?
Note: i can't use any built-in function
If you need a least-squares approximation, simply decide on a fixed interval that you want to approximate on and generate some x abscissae on that interval (possibly equally spaced abscissae using linspace - or non-uniformly spaced as you have in your example). Then evaluate your sine function at each point such that you have
y = sin(x)
Then simply use the polyfit function (documented here) to obtain least squares parameters
b = polyfit(x,y,n)
where n is the degree of the polynomial you want to approximate. You can then use polyval (documented here) to obtain the values of your approximation at other values of x.
EDIT: As you can't use polyfit you can generate the Vandermonde matrix for the least-squares approximation directly (the below assumes x is a row vector).
A = ones(length(x),1);
x = x';
for i=1:n
A = [A x.^i];
end
then simply obtain the least squares parameters using
b = A\y;
You can clearly optimise the clumsy Vandermonde generation loop above I have just written to illustrate the concept. For better numerical stability you would also be better to use a nice orthogonal polynomial system like Chebyshev polynomials of the first kind. If you are not even allowed to use the matrix divide \ function then you will need to code up your own implementation of a QR factorisation and solve the system that way (or some other numerically stable method).