Estimating the error when fitting a curve with DCT and polyfit - matlab

I have a matlab script that performs curve fitting on a set of curves using polynomials of third, second and first order (using polyfit with the desired order) and also using DCT of 4,3 and 2 coefficients (invoking dct for the whole array and then truncating just the first 4,3 or 2 coeffs).
I'm able to get a graphical view of the accuracy of each curve fitting using polyval and idct for the 2 types of curve fitting, but I was wondering if there is any way of getting a numeric value of the accuracy that makes sense for both approaches (dct and polyfit).
I'm sure this is more a maths question rather than a Matlab question, but maybe there is some way to obtain a simple and elegant way in a array-based algorithm that I haven't thought of yet.
Thanks in advance for your comments!
EDIT: What about correlation? :D

In the cuve fitting tool there should be a residual that uses standard deviation. If you are interested in another way to do it maybe you should use rmse for the entire curve, scripting a function that does something like:
input args : y1 = (curve going to be fitted), y2 = (fitted curve)
For each value in y, sum up the difference y1-y2 squared
Divide with the number of entries
Provided you are now left with a number, return its square root
See http://en.wikipedia.org/wiki/Root-mean-square_deviation#Formula for more.

Related

Log-Log Graph, Curve Fit on Matlab

Im im trying to validate my engineering work using Matlab. I have a series of x and y data that I have plotted on a Log-Log Graph.
The result is a curve.
What I need to do is to apply a curve fit to this graph, and show what the equation of the fit is?
I have tried other answers on here and tried using polyfit and polyval but they aren't really doing what I need but what I lack is the forthwith understanding.
Kind regards
Apply polyfit to logx and logy instead of x and y, and then, to use the fitted result apply polyval to log(x) and use exp() on the result to get the actual fitted y:
logx = log(x);
logy = log(y);
fitp = polyfit(logx, logy, n);
newy = exp(polyval(fitp, log(newx)));
Fitting in the log-space may be undesirable. Most likely you want to show the equation that best fits the data, not a transformation of the data. As a result, I would fit the linear data, then transform it for visualization as necessary. If that's acceptable, polyfit and polyval should work.
If you believe fitting in the log-space is important, I've used lsqcurvefit before, but this requires both the optimization toolbox and some idea of which function you'd like to fit (i.e. is your data best represented by 10^x or x^2?). There's also the curve-fitting toolbox, which might be worth looking into if there are issues you could identify interactively with a GUI but not easily put into words. This provides a 'fit' function that could be useful too.

Utility of loglog plots in curve fitting inverse square relationship

I have a a bunch of data that I'd like to use to find an unknown parameter in a physical equation.
I'm trying to find a parameter k to characterise the output of a hall effect sensor as a function on input voltage and distance between the sensor and the magnet. However, I've found this function to be inversely proportional to the square of the distance.
I asked my professor about how to use MATLAB to find the unknown parameter, and he told me I could try to fit it by taking the logarithm of both sides of the equation and plotting that, seen as that would make the relationship linear and thus easier to plot.
I'd have to do this in MATLAB and I'm assuming the values I measured would have to be converted by hand before being able to perform any sort of curve fitting on them.
I was wondering if doing that was worth it, and if there is a faster way of doing this.
Thanks :)
In order to easily identify the relationship, for a set input voltage, I had to take the logarithm of the measured distance and the logarithm of the respective output voltages and plot those. Fitting a line through those points then enabled me to see that the coefficient was close enough to -2, confirming the inverse square relationship.
I then did the same for different input voltages and added everything together on the same plot.

Mixture of 1D Gaussians fit to data in Matlab / Python

I have a discrete curve y=f(x). I know the locations and amplitudes of peaks. I want to approximate the curve by fitting a gaussian at each peak. How should I go about finding the optimized gaussian parameters ? I would like to know if there is any inbuilt function which will make my task simpler.
Edit
I have fixed mean of gaussians and tried to optimize on sigma using
lsqcurvefit() in matlab. MSE is less. However, I have an additional hard constraint that the value of approximate curve should be equal to the original function at the peaks. This constraint is not satisfied by my model. I am pasting current working code here. I would like to have a solution which obeys the hard constraint at peaks and approximately fits the curve at other points. The basic idea is that the approximate curve has fewer parameters but still closely resembles the original curve.
fun = #(x,xdata)myFun(x,xdata,pks,locs); %pks,locs are the peak locations and amplitudes already available
x0=w(1:6)*0.25; % my initial guess based on domain knowledge
[sigma resnorm] = lsqcurvefit(fun,x0,xdata,ydata); %xdata and ydata are the original curve data points
recons = myFun(sigma,xdata,pks,locs);
figure;plot(ydata,'r');hold on;plot(recons);
function f=myFun(sigma,xdata,a,c)
% a is constant , c is mean of individual gaussians
f=zeros(size(xdata));
for i = 1:6 %use 6 gaussians to approximate function
f = f + a(i) * exp(-(xdata-c(i)).^2 ./ (2*sigma(i)^2));
end
end
If you know your peak locations and amplitudes, then all you have left to do is find the width of each Gaussian. You can think of this as an optimization problem.
Say you have x and y, which are samples from the curve you want to approximate.
First, define a function g() that will construct the approximation for given values of the widths. g() takes a parameter vector sigma containing the width of each Gaussian. The locations and amplitudes of the Gaussians will be constrained to the values you already know. g() outputs the value of the sum-of-gaussians approximation at each point in x.
Now, define a loss function L(), which takes sigma as input. L(sigma) returns a scalar that measures the error--how badly the given approximation (using sigma) differs from the curve you're trying to approximate. The squared error is a common loss function for curve fitting:
L(sigma) = sum((y - g(sigma)) .^ 2)
The task now is to search over possible values of sigma, and find the choice that minimizes the error. This can be done using a variety of optimization routines.
If you have the Mathworks optimization toolbox, you can use the function lsqnonlin() (in this case you won't have to define L() yourself). The curve fitting toolbox is probably an alternative. Otherwise, you can use an open source optimization routine (check out cvxopt).
A couple things to note. You need to impose the constraint that all values in sigma are greater than zero. You can tell the optimization algorithm about this constraint. Also, you'll need to specify an initial guess for the parameters (i.e. sigma). In this case, you could probably choose something reasonable by looking at the curve in the vicinity of each peak. It may be the case (when the loss function is nonconvex) that the final solution is different, depending on the initial guess (i.e. you converge to a local minimum). There are many fancy techniques for dealing with this kind of situation, but a simple thing to do is to just try with multiple different initial guesses, and pick the best result.
Edited to add:
In python, you can use optimization routines in the scipy.optimize module, e.g. curve_fit().
Edit 2 (response to edited question):
If your Gaussians have much overlap with each other, then taking their sum may cause the height of the peaks to differ from your known values. In this case, you could take a weighted sum, and treat the weights as another parameter to optimize.
If you want the peak heights to be exactly equal to some specified values, you can enforce this constraint in the optimization problem. lsqcurvefit() won't be able to do it because it only handles bound constraints on the parameters. Take a look at fmincon().
you can use Expectation–Maximization algorithm for fitting Mixture of Gaussians on your data. it don't care about data dimension.
in documentation of MATLAB you can lookup gmdistribution.fit or fitgmdist.

Polynomial data fitting in a special way in MATLAB

I have some data let's say the following vector:
[1.2 2.13 3.45 4.59 4.79]
And I want to get a polynomial function, say f to fit this data. Thus, I want to go with something like polyfit. However, what polyfit does is minimizing the sum of least square errors. But, what I want is to have
f(1)=1.2
f(2)=2.13
f(3)=3.45
f(4)=4.59
f(5)=4.79
That is to say, I want to manipulate the fitting algorithm so that it will give me the exact points that I already gave as well as some fitted values where exact values are not given.
How can I do that?
I think everyone is missing the point. You said that "That is to say, I want to manipulate the fitting algorithm so that I will give me the exact points as well as some fitted values where exact fits are not present. How can I do that?"
To me, this means you wish an exact (interpolatory) fit for a listed set, and for some other points, you want to do a least squares fit.
You COULD do that using LSQLIN, by setting a set of equality constraints on the points to be fit exactly, and then allowing the rest of the points to be fit in a least squares sense.
The problem is, this will require a high order polynomial. To be able to fit 5 points exactly, plus some others, the order of the polynomial will be quite a bit higher. And high order polynomials, especially those with constrained points, will do nasty things. But feel free to do what you will, just as long as you also expect a poor result.
Edit: I should add that a better choice is to use a least squares spline, which is something you CAN constrain to pass through a given set of points, while fitting other points in a least squares sense, and still not do something wild and crazy as a result.
Polyfit does what you want. An N-1 degree polynomial can fit N points exactly, thus, when it minimizes the sum of squared error, it gets 0 (which is what you want).
y=[1.2 2.13 3.45 4.59 4.79];
x=[1:5];
coeffs = polyfit(x,y,4);
Will get you a polynomial that goes through all of your points.
What you ask is known as Lagrange Interpolation . There is a MATLAB file exchange available. http://www.mathworks.com/matlabcentral/fileexchange/899-lagrange-polynomial-interpolation
However, you should note that least squares polynomial fitting is generally preferred to Lagrange Interpolation since the data you have in principle will have noise in it and Lagrange Interpolation will fit the noise as well as the data you have. So if you know that your data actually represents M dimensional polynomial and have N data, where N>>M, then you will have a order N polynomial with Lagrange.
You have options.
Use polyfit, just give it enough leeway to perform an exact fit. That is:
values = [1.2 2.13 3.45 4.59 4.79];
p = polyfit(1:length(values), values, length(values)-1);
Now
polyval(p,2) %returns 2.13
Use interpolation / extrapolation
values = [1.2 2.13 3.45 4.59 4.79];
xInterp = 0:0.1:6;
valueInterp = interp1(1:length(values), values, xInterp ,'linear','extrap');
Interpolation provides a lot of options for smoothing, extrapolation etc. For example, try:
valueInterp = interp1(1:length(values), values, xInterp ,'spline','extrap');

matlab interpolation

Starting from the plot of one curve, it is possible to obtain the parametric equation of that curve?
In particular, say x={1 2 3 4 5 6....} the x axis, and y = {a b c d e f....} the corresponding y axis. I have the plot(x,y).
Now, how i can obtain the equation that describe the plotted curve? it is possible to display the parametric equation starting from the spline interpolation?
Thank you
If you want to display a polynomial fit function alongside your graph, the following example should help:
x=-3:.1:3;
y=4*x.^3-5*x.^2-7.*x+2+10*rand(1,61);
p=polyfit(x,y,3); %# third order polynomial fit, p=[a,b,c,d] of ax^3+bx^2+cx+d
yfit=polyval(p,x); %# evaluate the curve fit over x
plot(x,y,'.')
hold on
plot(x,yfit,'-g')
equation=sprintf('y=%2.2gx^3+%2.2gx^2+%2.2gx+%2.2g',p); %# format string for equation
equation=strrep(equation,'+-','-'); %# replace any redundant signs
text(-1,-80,equation) %# place equation string on graph
legend('Data','Fit','Location','northwest')
Last year, I wrote up a set of three blogs for Loren, on the topic of modeling/interpolationg a curve. They may cover some of your questions, although I never did find the time to add another 3 blogs to finish the topic to my satisfaction. Perhaps one day I will get that done.
The problem is to recognize there are infinitely many curves that will interpolate a set of data points. A spline is a nice choice, because it can be made well behaved. However, that spline has no simple "equation" to write down. Instead, it has many polynomial segments, pieced together to be well behaved.
You're asking for the function/mapping between two data sets. Knowing the physics involved, the function can be derived by modeling the system. Write down the differential equations and solve it.
Left alone with just two data series, an input and an output with a 'black box' in between you may approximate the series with an arbitrary function. You may start with a polynomial function
y = a*x^2 + b*x + c
Given your input vector x and your output vector y, parameters a,b,c must be determined applying a fitness function.
There is an example of Polynomial Curve Fitting in the MathWorks documentation.
Curve Fitting Tool provides a flexible graphical user interfacewhere you can interactively fit curves and surfaces to data and viewplots. You can:
Create, plot, and compare multiple fits
Use linear or nonlinear regression, interpolation,local smoothing regression, or custom equations
View goodness-of-fit statistics, display confidenceintervals and residuals, remove outliers and assess fits with validationdata
Automatically generate code for fitting and plottingsurfaces, or export fits to workspace for further analysis