how to find the line of best fit for precision recall curves in matlab - matlab

I have calculated points for a recall-precision curve by varying a threshold and calculating recall and precision. I have plotted these points in a scatter graph as follows:
scatter(recall', precision')
I am trying to find the curve of best fit, but am not sure of the best way. I have tried this:
p = polyfit(recall', precision', 5)
r = polyval(p, recall')
plot(recall', precision', 'x');
hold on
plot(recall', r, '-');
hold off
But the problem with this is I have to estimate the degree of the polynomial (in this case 5).

you can try the program Eureqa Formulize. Its a free and easy to use tool for symbolic regression developed at Cornell Creative Machines Lab.
Regards,
Ben

You can try to take the logarithm of recall and precision variables and fit a line through them. The slope then should give a rough idea about the degree of the polynomial you might want to use, i.e.
p2 = polyfit(log(recall), log(precision), 1)

Related

Log-Log Graph, Curve Fit on Matlab

Im im trying to validate my engineering work using Matlab. I have a series of x and y data that I have plotted on a Log-Log Graph.
The result is a curve.
What I need to do is to apply a curve fit to this graph, and show what the equation of the fit is?
I have tried other answers on here and tried using polyfit and polyval but they aren't really doing what I need but what I lack is the forthwith understanding.
Kind regards
Apply polyfit to logx and logy instead of x and y, and then, to use the fitted result apply polyval to log(x) and use exp() on the result to get the actual fitted y:
logx = log(x);
logy = log(y);
fitp = polyfit(logx, logy, n);
newy = exp(polyval(fitp, log(newx)));
Fitting in the log-space may be undesirable. Most likely you want to show the equation that best fits the data, not a transformation of the data. As a result, I would fit the linear data, then transform it for visualization as necessary. If that's acceptable, polyfit and polyval should work.
If you believe fitting in the log-space is important, I've used lsqcurvefit before, but this requires both the optimization toolbox and some idea of which function you'd like to fit (i.e. is your data best represented by 10^x or x^2?). There's also the curve-fitting toolbox, which might be worth looking into if there are issues you could identify interactively with a GUI but not easily put into words. This provides a 'fit' function that could be useful too.

Higher order polynomial fitting is not so handy surprisingly

I have a simple question but was not able to fix it by myself. I want to use the MATLAB curvefitting toolbox and fit higher order polynomials. It works if I want to fit polynomials of order 1 to 9. But, to my surprise it does not work for polynomials with degree higher than 9. To make it simple, can you just see the following simple code which does not work for me, unfortunately.
l=1:0.01:10;y=l.^10;
[xData, yData] = prepareCurveData(l,y);
ft = fittype( 'poly10' );
[Fit, gof] = fit( xData, yData, ft, 'Normalize', 'on' );
Thanks in advance,
Babak
It might be surprising, but it is documented: List of Library Models for Curve and Surface Fitting. You can always use polyfit, but as per the warning it issues, once you start getting polynomials of that degree, the fit is likely to be problematic anyway.
This answer is some supplement to the Phil Goddard's answer.
There is no poly10 in the function fit. But there are at least two alternative ways to fit any degree of the polynomial: something like polyX, where x could be 1,2,...,M, (if it is necessary).
clc; clear;
%%data
l=1:0.01:10;y=l.^10;
[xData, yData] = prepareCurveData(l,y);
%%High degree polynomial fitting
%set the degree of the polunomial
Degree=10;
%Fit with customize option
%generate the cell array from 'x^Degree' to 'x^0'
syms x
Str=char(power(x,Degree:-1:0));
%set the fitting type & options, then call fit
HighPoly = fittype(strsplit(Str(10:end-3),','));
options = fitoptions('Normalize', 'off','Method','LinearLeastSquares','Robust','off');
[curve,gof] = fit(xData,yData,HighPoly,options)
%Polyfit with the degree of Degree
p = polyfit(xData,yData,Degree)
But both fit and polyfit show some warnings, in my humble opinion, it is due to the Runge's phenomenon, which is a problem of oscillation at the edges of an interval that occurs when using polynomial interpolation with polynomials of high degree over a set of equispaced interpolation points.
Discard the data in this situation or some similar ones, where the true function is polynomial with high degree, says something in the Pn[R], high degree polynomial is not recommended in the fitting of the complex function.
Edit: generalized the code.

Estimating the error when fitting a curve with DCT and polyfit

I have a matlab script that performs curve fitting on a set of curves using polynomials of third, second and first order (using polyfit with the desired order) and also using DCT of 4,3 and 2 coefficients (invoking dct for the whole array and then truncating just the first 4,3 or 2 coeffs).
I'm able to get a graphical view of the accuracy of each curve fitting using polyval and idct for the 2 types of curve fitting, but I was wondering if there is any way of getting a numeric value of the accuracy that makes sense for both approaches (dct and polyfit).
I'm sure this is more a maths question rather than a Matlab question, but maybe there is some way to obtain a simple and elegant way in a array-based algorithm that I haven't thought of yet.
Thanks in advance for your comments!
EDIT: What about correlation? :D
In the cuve fitting tool there should be a residual that uses standard deviation. If you are interested in another way to do it maybe you should use rmse for the entire curve, scripting a function that does something like:
input args : y1 = (curve going to be fitted), y2 = (fitted curve)
For each value in y, sum up the difference y1-y2 squared
Divide with the number of entries
Provided you are now left with a number, return its square root
See http://en.wikipedia.org/wiki/Root-mean-square_deviation#Formula for more.

Find minimum distance between a point and a curve in MATLAB

I would like to use a MATLAB function to find the minimum length between a point and a curve? The curve is described by a complicated function that is not quite smooth. So I hope to use an existing tool of matlab to compute this. Do you happen to know one?
When someone says "its complicated" the answer is always complicated too, since I never know exactly what you have. So I'll describe some basic ideas.
If the curve is a known nonlinear function, then use the symbolic toolbox to start with. For example, consider the function y=x^3-3*x+5, and the point (x0,y0) =(4,3) in the x,y plane.
Write down the square of the distance. Euclidean distance is easy to write.
(x - x0)^2 + (y - y0)^2 = (x - 4)^2 + (x^3 - 3*x + 5 - 3)^2
So, in MATLAB, I'll do this partly with the symbolic toolbox. The minimal distance must lie at a root of the first derivative.
sym x
distpoly = (x - 4)^2 + (x^3 - 3*x + 5 - 3)^2;
r = roots(diff(distpoly))
r =
-1.9126
-1.2035
1.4629
0.82664 + 0.55369i
0.82664 - 0.55369i
I'm not interested in the complex roots.
r(imag(r) ~= 0) = []
r =
-1.9126
-1.2035
1.4629
Which one is a minimzer of the distance squared?
subs(P,r(1))
ans =
35.5086
subs(P,r(2))
ans =
42.0327
subs(P,r(3))
ans =
6.9875
That is the square of the distance, here minimized by the last root in the list. Given that minimal location for x, of course we can find y by substitution into the expression for y(x)=x^3-3*x+5.
subs('x^3-3*x+5',r(3))
ans =
3.7419
So it is fairly easy if the curve can be written in a simple functional form as above. For a curve that is known only from a set of points in the plane, you can use my distance2curve utility. It can find the point on a space curve spline interpolant in n-dimensions that is closest to a given point.
For other curves, say an ellipse, the solution is perhaps most easily solved by converting to polar coordinates, where the ellipse is easily written in parametric form as a function of polar angle. Once that is done, write the distance as I did before, and then solve for a root of the derivative.
A difficult case to solve is where the function is described as not quite smooth. Is this noise or is it a non-differentiable curve? For example, a cubic spline is "not quite smooth" at some level. A piecewise linear function is even less smooth at the breaks. If you actually just have a set of data points that have a bit of noise in them, you must decide whether to smooth out the noise or not. Do you wish to essentially find the closest point on a smoothed approximation, or are you looking for the closest point on an interpolated curve?
For a list of data points, if your goal is to not do any smoothing, then a good choice is again my distance2curve utility, using linear interpolation. If you wanted to do the computation yourself, if you have enough data points then you could find a good approximation by simply choosing the closest data point itself, but that may be a poor approximation if your data is not very closely spaced.
If your problem does not lie in one of these classes, you can still often solve it using a variety of methods, but I'd need to know more specifics about the problem to be of more help.
There's two ways you could go about this.
The easy way that will work if your curve is reasonably smooth and you don't need too high precision is to evaluate your curve at a dense number of points and simply find the minimum distance:
t = (0:0.1:100)';
minDistance = sqrt( min( sum( bxsfun(#minus, [x(t),y(t)], yourPoint).^2,2)));
The harder way is to minimize a function of t (or x) that describes the distance
distance = #(t)sum( (yourPoint - [x(t),y(t)]).^2 );
%# you can use the minimum distance from above as a decent starting guess
tAtMin = fminsearch(distance,minDistance);
minDistanceFitte = distance(tAtMin);

matlab interpolation

Starting from the plot of one curve, it is possible to obtain the parametric equation of that curve?
In particular, say x={1 2 3 4 5 6....} the x axis, and y = {a b c d e f....} the corresponding y axis. I have the plot(x,y).
Now, how i can obtain the equation that describe the plotted curve? it is possible to display the parametric equation starting from the spline interpolation?
Thank you
If you want to display a polynomial fit function alongside your graph, the following example should help:
x=-3:.1:3;
y=4*x.^3-5*x.^2-7.*x+2+10*rand(1,61);
p=polyfit(x,y,3); %# third order polynomial fit, p=[a,b,c,d] of ax^3+bx^2+cx+d
yfit=polyval(p,x); %# evaluate the curve fit over x
plot(x,y,'.')
hold on
plot(x,yfit,'-g')
equation=sprintf('y=%2.2gx^3+%2.2gx^2+%2.2gx+%2.2g',p); %# format string for equation
equation=strrep(equation,'+-','-'); %# replace any redundant signs
text(-1,-80,equation) %# place equation string on graph
legend('Data','Fit','Location','northwest')
Last year, I wrote up a set of three blogs for Loren, on the topic of modeling/interpolationg a curve. They may cover some of your questions, although I never did find the time to add another 3 blogs to finish the topic to my satisfaction. Perhaps one day I will get that done.
The problem is to recognize there are infinitely many curves that will interpolate a set of data points. A spline is a nice choice, because it can be made well behaved. However, that spline has no simple "equation" to write down. Instead, it has many polynomial segments, pieced together to be well behaved.
You're asking for the function/mapping between two data sets. Knowing the physics involved, the function can be derived by modeling the system. Write down the differential equations and solve it.
Left alone with just two data series, an input and an output with a 'black box' in between you may approximate the series with an arbitrary function. You may start with a polynomial function
y = a*x^2 + b*x + c
Given your input vector x and your output vector y, parameters a,b,c must be determined applying a fitness function.
There is an example of Polynomial Curve Fitting in the MathWorks documentation.
Curve Fitting Tool provides a flexible graphical user interfacewhere you can interactively fit curves and surfaces to data and viewplots. You can:
Create, plot, and compare multiple fits
Use linear or nonlinear regression, interpolation,local smoothing regression, or custom equations
View goodness-of-fit statistics, display confidenceintervals and residuals, remove outliers and assess fits with validationdata
Automatically generate code for fitting and plottingsurfaces, or export fits to workspace for further analysis