Fitting a parabola using Matlab polyfit - matlab

I am looking to fit a parabola to the following data.
x = [-10:2:16];
y = [0.0334,0.0230,0.0145,0.0079,0.0033,0.0009,0.0006,0.0026,0.0067,0.0130,0.0213,0.0317,0.0440,0.0580];
[p,~,~] = polyfit(x,y,2);
x2 = linspace(-10,16,100);
y2 = polyval(p,x2);
y3 = 0.0003.*x2.^2 -0.0006.*x2 + 0.0011;
figure
plot(x,y,'o',x2,y2,x2,y3)
However, the fit does not match with the data at all. After putting the data into excel and fitting using a 2nd order polynomial there, I get a very nice fit. y = 0.0003x2 - 0.0006x + 0.0011 (excel truncating the coefficients skews the fit a bit). What is happening with polyfit with this data?

Solved.
Matlab checks how many outputs the user is requesting. Since I requested three outputs even though I wasn't using them, polyfit changes the coefficients to map to a different domain xhat.
If I instead just did:
p = polyfit(x,y,2);
plot(x2,polyval(p,x2));
Then I would achieve the appropriate result. To recover the same answer using the three outputs:
[p2,S,mu] = polyfit(x,y,2);
xhat = (x2-mu(1))./mu(2)
y4 = polyval(p2,xhat)
plot(x2,y4)

I would solve this in matlab using least squares:
x = [-10:2:16]';
Y = [0.0334,0.0230,0.0145,0.0079,0.0033,0.0009,0.0006,0.0026,0.0067,0.0130,0.0213,0.0317,0.0440,0.0580]';
plot(x,Y,'.');
A=[ones(length(x),1) x x.^2];
beta=A\Y;
hold on
plot(x, beta(1)+beta(2)*x+beta(3)*x.^2)
leg_est=sprintf('Estimated (y=%.4f+%.4fx+%.4fx^2',beta(1),beta(2),beta(3))
legend('Data',leg_est)

Related

find the line which best fits to the data

I'm trying to find the line which best fits to the data. I use the following code below but now I want to have the data placed into an array sorted so it has the data which is closest to the line first how can I do this? Also is polyfit the correct function to use for this?
x=[1,2,2.5,4,5];
y=[1,-1,-.9,-2,1.5];
n=1;
p = polyfit(x,y,n)
f = polyval(p,x);
plot(x,y,'o',x,f,'-')
PS: I'm using Octave 4.0 which is similar to Matlab
You can first compute the error between the real value y and the predicted value f
err = abs(y-f);
Then sort the error vector
[val, idx] = sort(err);
And use the sorted indexes to have your y values sorted
y2 = y(idx);
Now y2 has the same values as y but the ones closer to the fitting value first.
Do the same for x to compute x2 so you have a correspondence between x2 and y2
x2 = x(idx);
Sembei Norimaki did a good job of explaining your primary question, so I will look at your secondary question = is polyfit the right function?
The best fit line is defined as the line that has a mean error of zero.
If it must be a "line" we could use polyfit, which will fit a polynomial. Of course, a "line" can be defined as first degree polynomial, but first degree polynomials have some properties that make it easy to deal with. The first order polynomial (or linear) equation you are looking for should come in this form:
y = mx + b
where y is your dependent variable and X is your independent variable. So the challenge is this: find the m and b such that the modeled y is as close to the actual y as possible. As it turns out, the error associated with a linear fit is convex, meaning it has one minimum value. In order to calculate this minimum value, it is simplest to combine the bias and the x vectors as follows:
Xcombined = [x.' ones(length(x),1)];
then utilized the normal equation, derived from the minimization of error
beta = inv(Xcombined.'*Xcombined)*(Xcombined.')*(y.')
great, now our line is defined as Y = Xcombined*beta. to draw a line, simply sample from some range of x and add the b term
Xplot = [[0:.1:5].' ones(length([0:.1:5].'),1)];
Yplot = Xplot*beta;
plot(Xplot, Yplot);
So why does polyfit work so poorly? well, I cant say for sure, but my hypothesis is that you need to transpose your x and y matrixies. I would guess that that would give you a much more reasonable line.
x = x.';
y = y.';
then try
p = polyfit(x,y,n)
I hope this helps. A wise man once told me (and as I learn every day), don't trust an algorithm you do not understand!
Here's some test code that may help someone else dealing with linear regression and least squares
%https://youtu.be/m8FDX1nALSE matlab code
%https://youtu.be/1C3olrs1CUw good video to work out by hand if you want to test
function [a0 a1] = rtlinreg(x,y)
x=x(:);
y=y(:);
n=length(x);
a1 = (n*sum(x.*y) - sum(x)*sum(y))/(n*sum(x.^2) - (sum(x))^2); %a1 this is the slope of linear model
a0 = mean(y) - a1*mean(x); %a0 is the y-intercept
end
x=[65,65,62,67,69,65,61,67]'
y=[105,125,110,120,140,135,95,130]'
[a0 a1] = rtlinreg(x,y); %a1 is the slope of linear model, a0 is the y-intercept
x_model =min(x):.001:max(x);
y_model = a0 + a1.*x_model; %y=-186.47 +4.70x
plot(x,y,'x',x_model,y_model)

Get the points from a MatLab fit object plot

I am a Matlab amateur so please bear with me -
I currently use Matlab to fit a complex equation to two-dimensional data. Right now I have a program which uses f = fit(xdata, ydata, function, options) to generate a fit object.
I can then use confint(f) and f.parameter etc. to get the fitted coefficients and confidence intervals, and I can use plot(f,x,y) to make a plot of the data and the fit.
From that point on the only way I know how to get the points which were plotted is to use the brush(?) tool and select all of the line, then copy the data to clipboard and paste it into excel or some such thing. I would much rather get those points directly from Matlab, perhaps into an array, but I have no idea how.
Can any MatLab veteran tell me if what I want is even possible? It would be very difficult due to the complexity of my equation to plot those points myself, but I will if need be (it can take ~30 minutes to do this fit and my computer is no slouch).
Since you have not shared any code or data, I use an example from MATLAB documentation:
load franke
f = fit([x, y],z,'poly23');
plot(f,[x,y],z)
So as you can sew, it first loads a dataset including x,y,z vectors. Then fits a surface to the data using 'poly23'. In your case it can be different sets of vectors and function and still as you said you will get the f function.
Now we can take a look at the function f
>> f
Linear model Poly23:
f(x,y) = p00 + p10*x + p01*y + p20*x^2 + p11*x*y + p02*y^2 + p21*x^2*y
+ p12*x*y^2 + p03*y^3
Coefficients (with 95% confidence bounds):
p00 = 1.118 (0.9149, 1.321)
p10 = -0.0002941 (-0.000502, -8.623e-05)
p01 = 1.533 (0.7032, 2.364)
p20 = -1.966e-08 (-7.084e-08, 3.152e-08)
p11 = 0.0003427 (-0.0001009, 0.0007863)
p02 = -6.951 (-8.421, -5.481)
p21 = 9.563e-08 (6.276e-09, 1.85e-07)
p12 = -0.0004401 (-0.0007082, -0.0001721)
p03 = 4.999 (4.082, 5.917)
It shows you the form of the function and the coefficients. So you can use it as follows:
zz = f(x,y);
To make sure you can plot the data again:
figure;
scatter3(x,y,zz,'.k');
hold on
scatter3(x,y,z,'.');
When you call f = fit(xdata, ydata, function, options), function name decides the equation. See list of official equations.
Simply iterate through data points and compute results using corresponding polynomial. So in your case lets say if function=poly2 you'll be doing computation as follows:
#Fit your data
f = fit([xdata, ydata],'poly2');
#Print name of coefficients (Just for Verification)
coeffnames(f)
#Fetch values of coefficients like p1, p2, ...
p = coeffvalues(c)
#Compute output points from min(xdata) to max(xdata) spaced at deltaX
deltaX = 0.1;
x = [min(xdata):deltaX:max(xdata)];
Y = p(1)*x^2+p(2)*x+p(3); #This is equation for function
I understand there can be alternate complex Java code to iterate through objects on a matlab figure and plot its value but using equations is a quick and valid approach.

What interpolation technique does Matlab plot function use to show the data?

It seems to be very basic question, but I wonder when I plot x values against y values, what interpolation technique is used behind the scene to show me the discrete data as continuous? Consider the following example:
x = 0:pi/100:2*pi;
y = sin(x);
plot(x,y)
My guess is it is a Lagrangian interpolation?
No, it's just a linear interpolation. Your example uses a quite long dataset, so you can't tell the difference. Try plotting a short dataset and you'll see it.
MATLAB's plot performs simple linear interpolation. For finer resolution you'd have to supply more sample points or interpolate between the given x values.
For example taking the sinus from the answer of FamousBlueRaincoat, one can just create an x vector with more equidistant values. Note, that the linear interpolated values coincide with the original plot lines, as the original does use linear interpolation as well. Note also, that the x_ip vector does not include (all) of the original points. This is why the do not coincide at point (~0.8, ~0.7).
Code
x = 0:pi/4:2*pi;
y = sin(x);
x_ip = linspace(x(1),x(end),5*numel(x));
y_lin = interp1(x,y,x_ip,'linear');
y_pch = interp1(x,y,x_ip,'pchip');
y_v5c = interp1(x,y,x_ip,'v5cubic');
y_spl = interp1(x,y,x_ip,'spline');
plot(x,y,x_ip,y_lin,x_ip,y_pch,x_ip,y_v5c,x_ip,y_spl,'LineWidth',1.2)
set(gca,'xlim',[pi/5 pi/2],'ylim',[0.5 1],'FontSize',16)
hLeg = legend(...
'No Interpolation','Linear Interpolation',...
'PChip Interpolation','v5cubic Interpolation',...
'Spline Interpolation');
set(hLeg,'Location','south','Fontsize',16);
By the way..this does also apply to mesh and others
[X,Y] = meshgrid(-8:2:8);
R = sqrt(X.^2 + Y.^2) + eps;
Z = sin(R)./R;
figure
mesh(Z)
No, Lagrangian interpolation with 200 equally spaced points would be an incredibly bad idea. (See: Runge's phenomenon).
The plot command simply connects the given (x,y) points by straight lines, in the order given. To see this for yourself, use fewer points:
x = 0:pi/4:2*pi;
y = sin(x);
plot(x,y)

3D curvefitting

I have discrete regular grid of a,b points and their corresponding c values and I interpolate it further to get a smooth curve. Now from interpolation data, I further want to create a polynomial equation for curve fitting. How to fit 3D plot in polynomial?
I try to do this in MATLAB. I used Surface fitting toolbox in MATLAB (r2010a) to curve fit 3-dimensional data. But, how does one find a formula that fits a set of data to the best advantage in MATLAB/MAPLE or any other software. Any advice? Also most useful would be some real code examples to look at, PDF files, on the web etc.
This is just a small portion of my data.
a = [ 0.001 .. 0.011];
b = [1, .. 10];
c = [ -.304860225, .. .379710865];
Thanks in advance.
To fit a curve onto a set of points, we can use ordinary least-squares regression. There is a solution page by MathWorks describing the process.
As an example, let's start with some random data:
% some 3d points
data = mvnrnd([0 0 0], [1 -0.5 0.8; -0.5 1.1 0; 0.8 0 1], 50);
As #BasSwinckels showed, by constructing the desired design matrix, you can use mldivide or pinv to solve the overdetermined system expressed as Ax=b:
% best-fit plane
C = [data(:,1) data(:,2) ones(size(data,1),1)] \ data(:,3); % coefficients
% evaluate it on a regular grid covering the domain of the data
[xx,yy] = meshgrid(-3:.5:3, -3:.5:3);
zz = C(1)*xx + C(2)*yy + C(3);
% or expressed using matrix/vector product
%zz = reshape([xx(:) yy(:) ones(numel(xx),1)] * C, size(xx));
Next we visualize the result:
% plot points and surface
figure('Renderer','opengl')
line(data(:,1), data(:,2), data(:,3), 'LineStyle','none', ...
'Marker','.', 'MarkerSize',25, 'Color','r')
surface(xx, yy, zz, ...
'FaceColor','interp', 'EdgeColor','b', 'FaceAlpha',0.2)
grid on; axis tight equal;
view(9,9);
xlabel x; ylabel y; zlabel z;
colormap(cool(64))
As was mentioned, we can get higher-order polynomial fitting by adding more terms to the independent variables matrix (the A in Ax=b).
Say we want to fit a quadratic model with constant, linear, interaction, and squared terms (1, x, y, xy, x^2, y^2). We can do this manually:
% best-fit quadratic curve
C = [ones(50,1) data(:,1:2) prod(data(:,1:2),2) data(:,1:2).^2] \ data(:,3);
zz = [ones(numel(xx),1) xx(:) yy(:) xx(:).*yy(:) xx(:).^2 yy(:).^2] * C;
zz = reshape(zz, size(xx));
There is also a helper function x2fx in the Statistics Toolbox that helps in building the design matrix for a couple of model orders:
C = x2fx(data(:,1:2), 'quadratic') \ data(:,3);
zz = x2fx([xx(:) yy(:)], 'quadratic') * C;
zz = reshape(zz, size(xx));
Finally there is an excellent function polyfitn on the File Exchange by John D'Errico that allows you to specify all kinds of polynomial orders and terms involved:
model = polyfitn(data(:,1:2), data(:,3), 2);
zz = polyvaln(model, [xx(:) yy(:)]);
zz = reshape(zz, size(xx));
There might be some better functions on the file-exchange, but one way to do it by hand is this:
x = a(:); %make column vectors
y = b(:);
z = c(:);
%first order fit
M = [ones(size(x)), x, y];
k1 = M\z;
%least square solution of z = M * k1, so z = k1(1) + k1(2) * x + k1(3) * y
Similarly, you can do a second order fit:
%second order fit
M = [ones(size(x)), x, y, x.^2, x.*y, y.^2];
k2 = M\z;
which seems to have numerical problems for the limited dataset you gave. Type help mldivide for more details.
To make an interpolation over some regular grid, you can do like so:
ngrid = 20;
[A,B] = meshgrid(linspace(min(a), max(a), ngrid), ...
linspace(min(b), max(b), ngrid));
M = [ones(numel(A),1), A(:), B(:), A(:).^2, A(:).*B(:), B(:).^2];
C2_fit = reshape(M * k2, size(A)); % = k2(1) + k2(2)*A + k2(3)*B + k2(4)*A.^2 + ...
%plot to compare fit with original data
surfl(A,B,C2_fit);shading flat;colormap gray
hold on
plot3(a,b,c, '.r')
A 3rd-order fit can be done using the formula given by TryHard below, but the formulas quickly become tedious when the order increases. Better write a function that can construct M given x, y and order if you have to do that more than once.
This sounds like more of a philosophical question than specific implementation, specifically to bit - "how does one find a formula that fits a set of data to the best advantage?" In my experience that is a choice you have to make depending on what you're trying to achieve.
What defines "best" for you? For a data fitting problem you can keep adding more and more polynomial coefficients and making a better R^2 value... but will eventually "over fit" the data. A downside of high order polynomials is behavior outside the bounds of the sample data which you've used to fit your response surface - it can quickly go off in some wild direction which may not be appropriate for whatever it is you're trying to model.
Do you have insight into the physical behavior of the system / data you're fitting? That can be used as a basis for what set of equations to use to create a math model. My recommendation would be to use the most economical (simple) model you can get away with.

MATLAB - How to calculate 2D least squares regression based on both x and y. (regression surface)

I have a set of data with independent variable x and y. Now I'm trying to build a two dimensional regression model that has a regression surface cutting through my data points. However, I couldn't find a way to achieve this. Can anyone give me some assistance?
You could use my favorite, polyfitn for linear or polynomial models. If you would like a different model, please edit your question or add a comment. HTH!
EDIT
Also, take a look here under Multiple Regression, likely can help you as well.
EDIT AGAIN
Sorry, I'm having too much fun with this, here's an example of multivariate regression using least squares with stock Matlab:
t = (1:10)';
x = t;
y = exp(-t);
A = [ y x ];
z = 10*y + 0.5*x;
A\z
ans =
10.0000
0.5000
If you are performing linear regression, the best tool is the regress function. Note that, if you are fitting a model of the form y(x1,x2) = b1.f(x1) + b2.g(x2) + b3 this is still a linear regression, as long as you know the functions f and g.
Nsamp = 100; %number of samples
X1 = randn(Nsamp,1); %regressor 1 (could also be some computed f(x1) )
X2 = randn(Nsamp,1); %regressor 2 (could also be some computed g(x2) )
Y = X1 + X2 + randn(Nsamp,1); %generate some data to be regressed
%now run the regression
[b,bint,r,rint,stats] = regress(Y,[X1 X2 ones(Nsamp,1)]);
% 'b' contains the coefficients, b1,b2,b3 of the fit; can be used to plot regression surface)
% 'r' contains residuals of the fit
% 'stats' contains the overall regression R^2, F stat, p-value and error variance