Linear regression line in MATLAB scatter plot - matlab

I am trying to get the residuals for the scatter plot of two variables. I could get the least squares linear regression line using lsline function of matlab. However, I want to get the residuals as well. How can I get this in matlab. For that I need to know the parameters a and b of the linear regression line
ax+b

Use the function polyfit to obtain the regression parameters. You can then evaluate the fitted values and calculate your residuals accordingly.
Basically polyfit performs least-squares regression for a specified degree N which, in your case will be 1 for straight line regression. The regression parameters are returned by the function and you can use the other function polyval to get the fitted values from the regression parameters

If you have the curve fitting toolbox, type cftool and press enter and the GUI will appear.
You can use this tool to find a linear polynomial fit for a given data set, as well as many other fits.

Related

MATLAB: polyval function for N greater than 1

I am trying trying to graph the polynomial fit of a 2D dataset in Matlab.
This is what I tried:
rawTable = readtable('Test_data.xlsx','Sheet','Sheet1');
x = rawTable.A;
y = rawTable.B;
figure(1)
scatter(x,y)
c = polyfit(x,y,2);
y_fitted = polyval(c,x);
hold on
plot(x,y_fitted,'r','LineWidth',2)
rawTable.A and rawTable.A are randomly generated numbers. (i.e. the x dataset cannot be represented in the following form : x=0:0.1:100)
The result:
second-order polynomial
But the result I expect looks like this (generated in Excel):
enter image description here
How can I graph the second-order polynomial fit in MATLAB?
I sense some confusion regarding what the output of each of those Matlab function mean. So I'll clarify. And I think we need some details as well. So expect some verbosity. A quick answer, however, is available at the end.
c = polyfit(x,y,2) gives the coefficient vectors of the polynomial fit. You can get the fit information such as error estimate following the documentation.
Name this polynomial as P. P in Matlab is actually the function P=#(x)c(1)*x.^2+c(2)*x+c(3).
Suppose you have a single point X, then polyval(c,X) outputs the value of P(X). And if x is a vector, polyval(c,x) is a vector corresponding to [P(x(1)), P(x(2)),...].
Now that does not represent what the fit is. Just as a quick hack to see something visually, you can try plot(sort(x),polyval(c,sort(x)),'r','LineWidth',2), ie. you can first sort your data and try plotting on those x-values.
However, it is only a hack because a) your data set may be so irregularly spaced that the spline doesn't represent function or b) evaluating on the whole of your data set is unnecessary and inefficient.
The robust and 'standard' way to plot a 2D function of known analytical form in Matlab is as follows:
Define some evenly-spaced x-values over the interval you want to plot the function. For example, x=1:0.1:10. For example, x=linspace(0,1,100).
Evaluate the function on these x-values
Put the above two components into plot(). plot() can either plot the function as sampled points, or connect the points with automatic spline, which is the default.
(For step 1, quadrature is ambiguous but specific enough of a term to describe this process if you wish to communicate with a single word.)
So, instead of using the x in your original data set, you should do something like:
t=linspace(min(x),max(x),100);
plot(t,polyval(c,t),'r','LineWidth',2)

Confidence band around linear least-squares line (Matlab)

I am using the Matlab function lsline to add a linear least-squares line to a scatter plot. I would like to add a 95% confidence band around that fit line, such that it looks like this (plot is made with the python function seaborn):
However, lsline returns no fit parameters from which to construct the 95% confidence band, and the only Matlab function I could find that does return these, is nlpredci, but that function is used for something else (nonlinear regression prediction)

Estimating the error when fitting a curve with DCT and polyfit

I have a matlab script that performs curve fitting on a set of curves using polynomials of third, second and first order (using polyfit with the desired order) and also using DCT of 4,3 and 2 coefficients (invoking dct for the whole array and then truncating just the first 4,3 or 2 coeffs).
I'm able to get a graphical view of the accuracy of each curve fitting using polyval and idct for the 2 types of curve fitting, but I was wondering if there is any way of getting a numeric value of the accuracy that makes sense for both approaches (dct and polyfit).
I'm sure this is more a maths question rather than a Matlab question, but maybe there is some way to obtain a simple and elegant way in a array-based algorithm that I haven't thought of yet.
Thanks in advance for your comments!
EDIT: What about correlation? :D
In the cuve fitting tool there should be a residual that uses standard deviation. If you are interested in another way to do it maybe you should use rmse for the entire curve, scripting a function that does something like:
input args : y1 = (curve going to be fitted), y2 = (fitted curve)
For each value in y, sum up the difference y1-y2 squared
Divide with the number of entries
Provided you are now left with a number, return its square root
See http://en.wikipedia.org/wiki/Root-mean-square_deviation#Formula for more.

weighted curve fitting with lsqcurvefit

I wanted to fit an arbitrary function to my data set. Therefore, I used lsqcurvefit in MATLAB. Now I want to give weight to the fit procedure, meaning when curve fitting function (lsqcurvefit) is calculating the residue of the fit, some data point are more important than the others. To be more specific I want to use statistical weighting method.
w=1/y(x),
where w is a matrix contains the weight of each data point and y is the data set.
I cannot find anyway to make weighted curve fitting with lsqcurvefit. Is there any trick I should follow or is there any other function rather than lsqcurvefit which do it for me?
For doing weighting, I find it much easier to use lsqnonlin which is the function that lsqcurvefit calls to do the actual fitting.
You first have to define a function that you are trying to minimize, ie. a cost function. You need to pass in your weighting function as an extra parameter to your function as a vector:
x = yourIndependentVariable;
y = yourData;
weightVector = sqrt(abs(1./y));
costFunction = #(A) weightVector.*(yourModelFunction(A) - y);
aFit = lsqnonlin(costFunction,aGuess);
The reason for the square root in the weighting function definition is that lsqnonlin requires the residuals, not the squared residuals or their sum, so you need to pre-unsquare the weights.
Alternatively, if you have the Statistics Toolbox, you can use nlinfit which will accept a weighting vector/matrix as one of the optional inputs.

matlab interpolation

Starting from the plot of one curve, it is possible to obtain the parametric equation of that curve?
In particular, say x={1 2 3 4 5 6....} the x axis, and y = {a b c d e f....} the corresponding y axis. I have the plot(x,y).
Now, how i can obtain the equation that describe the plotted curve? it is possible to display the parametric equation starting from the spline interpolation?
Thank you
If you want to display a polynomial fit function alongside your graph, the following example should help:
x=-3:.1:3;
y=4*x.^3-5*x.^2-7.*x+2+10*rand(1,61);
p=polyfit(x,y,3); %# third order polynomial fit, p=[a,b,c,d] of ax^3+bx^2+cx+d
yfit=polyval(p,x); %# evaluate the curve fit over x
plot(x,y,'.')
hold on
plot(x,yfit,'-g')
equation=sprintf('y=%2.2gx^3+%2.2gx^2+%2.2gx+%2.2g',p); %# format string for equation
equation=strrep(equation,'+-','-'); %# replace any redundant signs
text(-1,-80,equation) %# place equation string on graph
legend('Data','Fit','Location','northwest')
Last year, I wrote up a set of three blogs for Loren, on the topic of modeling/interpolationg a curve. They may cover some of your questions, although I never did find the time to add another 3 blogs to finish the topic to my satisfaction. Perhaps one day I will get that done.
The problem is to recognize there are infinitely many curves that will interpolate a set of data points. A spline is a nice choice, because it can be made well behaved. However, that spline has no simple "equation" to write down. Instead, it has many polynomial segments, pieced together to be well behaved.
You're asking for the function/mapping between two data sets. Knowing the physics involved, the function can be derived by modeling the system. Write down the differential equations and solve it.
Left alone with just two data series, an input and an output with a 'black box' in between you may approximate the series with an arbitrary function. You may start with a polynomial function
y = a*x^2 + b*x + c
Given your input vector x and your output vector y, parameters a,b,c must be determined applying a fitness function.
There is an example of Polynomial Curve Fitting in the MathWorks documentation.
Curve Fitting Tool provides a flexible graphical user interfacewhere you can interactively fit curves and surfaces to data and viewplots. You can:
Create, plot, and compare multiple fits
Use linear or nonlinear regression, interpolation,local smoothing regression, or custom equations
View goodness-of-fit statistics, display confidenceintervals and residuals, remove outliers and assess fits with validationdata
Automatically generate code for fitting and plottingsurfaces, or export fits to workspace for further analysis