Fitting data in least square sense to nonlinear equation - matlab

I need help fitting data in a least square sense to a nonlinear function. Given data, how do I proceed when I have following equation?
f(x) = 20 + ax + b*e^(c*2x)
So I want to find a,b and c. If it was products, I would linearize the function by takin the natural logaritm all over the function but I can not seem to do that in this case.
Thanks

You can use the nlinfit tool, which doesn't require the Curve Fitting Toolbox (I don't think...)
Something like
f = #(b,x)(20 + b(1)*x + b(2)*exp(b(3)*2*x));
beta0 = [1, 1, 1];
beta = nlinfit(x, Y, f, beta0);
When MATLAB solves this least-squares problem, it passes the coefficients into the anonymous function f in the vector b. nlinfit returns the final values of these coefficients in the beta vector. beta0 is an initial guess of the values of b(1), b(2), and b(3). x and Y are the vectors with the data that you want to fit.
Alternatively, you can define the function in its own file, if it is a little more complicated. For this case, you would have something like (in the file my_function.m)
function y = my_function(b,x)
y = 20 + b(1)*x + b(2)*exp(b(3)*2*x);
end
and the rest of the code would look like
beta0 = [1, 1, 1];
beta = nlinfit(x, Y, #my_function, beta0);
See also: Using nlinfit in Matlab?

You can try the cftool which is an interactive tool for fitting data. The second part I don't quite understand. It may help if you describe it in more detail.

Related

Fitting Data with Linear Combination of Vectors in MATLAB with Constraints

Working in MATLAB. I have the following equation:
S = aW + bX + cY + dZ
where S,W,X,Y, and Z are all known n x 1 vectors. I am trying to fit the data of S with a linear combination of the basis vectors W,X,Y, and Z with the constraint of the constants (a,b,c,d) being greater than 0. I have managed to do this in Excel's solver, and have attempted to figure it out on MATLAB, being directed towards functions like fmincon, but I am not overly familiar with MATLAB and feel I am misunderstanding the use of fmincon.
I am looking for help with understanding fmincon's use towards my problem, or redirection towards a more efficient method for solving.
Currently I have:
initials = [0.2 0.2 0.2 0.2];
fun = #(x)x(1)*W + x(2)*X + x(3)*Y + x(4)*Z;
lb = [0,0,0,0];
soln = fmincon(fun,initials,data,b,Aeq,beq,lb,ub);
I am receiving an error stating "A must have 4 column(s)". Where A is referring to my variable data which corresponds to my S in the above equation. I do not understand why it is expecting 4 columns. Also to note the variables that are in my above snippet that are not explicitly defined are defined as [], serving as space holders.
Using fmincon is a huge overkill in this case. It's like using big heavy microscope to crack nuts... or martian rover to tow a pushcart or... anyway :) May be it's OK if you don't have to process large sets of vectors. If you need to fit hundreds of thousands of such vectors it can take hours. But this classic solution will be faster by several orders of magnitude.
%first make a n x 4 matrix of your vectors;
P=[W,X,Y,Z];
% now your equation looks like this S = P*m where m is 4 x 1 vectro
% containing your coefficients ( m = [a,b,c,d] )
% so solution will be simply
m_1 = inv(P'*P)*P'*S;
% or you can use this form
m_2 = (P'*P)\P'*S;
% or even simpler
m_3 = (S'/P')';
% all three solutions should give exactly same resul
% third solution is the neatest but may not work in every version of matlab
% Your modeled vector will be
Sm = P*m_3; %you can use any of m_1, m_2 or m_3;
% And your residual
R = S-Sm;
If you need to procees many vectors don't use for cycle. For cycles are very slow in Matlab and you should use matrices instead, if possible. S can also be nk matrix, where k is number vectors you want to process. In this case m will be 4k matrix.
What you are trying to do is similar to the answer I gave at is there a python (or matlab) function that achieves the minimum MSE between given set of output vector and calculated set of vector?.
The procedure is similar to what you are doing in EXCEL with solver. You create an objective function that takes (a, b, c, d) as the input parameters and output a measure of fit (mse) and then use fmincon or similar solver to get the best (a, b, c, d) that minimize this mse. See the code below (no MATLAB to test it but it should work).
function [a, b, c, d] = optimize_abcd(W, X, Y, Z)
%=========================================================
%Second argument is the starting point, second to the last argument is the lower bound
%to ensure the solutions are all positive
res = fmincon(#MSE, [0,0,0,0], [], [], [], [], [0,0,0,0], []);
a=res(1);
b=res(2);
c=res(3);
d=res(4);
function mse = MSE(x_)
a_=x_(1);
b_=x_(2);
c_=x_(3);
d_=x_(4);
S_ = a_*W + b_*X + c_*Y + d_*Z
mse = norm(S_-S);
end
end

find the line which best fits to the data

I'm trying to find the line which best fits to the data. I use the following code below but now I want to have the data placed into an array sorted so it has the data which is closest to the line first how can I do this? Also is polyfit the correct function to use for this?
x=[1,2,2.5,4,5];
y=[1,-1,-.9,-2,1.5];
n=1;
p = polyfit(x,y,n)
f = polyval(p,x);
plot(x,y,'o',x,f,'-')
PS: I'm using Octave 4.0 which is similar to Matlab
You can first compute the error between the real value y and the predicted value f
err = abs(y-f);
Then sort the error vector
[val, idx] = sort(err);
And use the sorted indexes to have your y values sorted
y2 = y(idx);
Now y2 has the same values as y but the ones closer to the fitting value first.
Do the same for x to compute x2 so you have a correspondence between x2 and y2
x2 = x(idx);
Sembei Norimaki did a good job of explaining your primary question, so I will look at your secondary question = is polyfit the right function?
The best fit line is defined as the line that has a mean error of zero.
If it must be a "line" we could use polyfit, which will fit a polynomial. Of course, a "line" can be defined as first degree polynomial, but first degree polynomials have some properties that make it easy to deal with. The first order polynomial (or linear) equation you are looking for should come in this form:
y = mx + b
where y is your dependent variable and X is your independent variable. So the challenge is this: find the m and b such that the modeled y is as close to the actual y as possible. As it turns out, the error associated with a linear fit is convex, meaning it has one minimum value. In order to calculate this minimum value, it is simplest to combine the bias and the x vectors as follows:
Xcombined = [x.' ones(length(x),1)];
then utilized the normal equation, derived from the minimization of error
beta = inv(Xcombined.'*Xcombined)*(Xcombined.')*(y.')
great, now our line is defined as Y = Xcombined*beta. to draw a line, simply sample from some range of x and add the b term
Xplot = [[0:.1:5].' ones(length([0:.1:5].'),1)];
Yplot = Xplot*beta;
plot(Xplot, Yplot);
So why does polyfit work so poorly? well, I cant say for sure, but my hypothesis is that you need to transpose your x and y matrixies. I would guess that that would give you a much more reasonable line.
x = x.';
y = y.';
then try
p = polyfit(x,y,n)
I hope this helps. A wise man once told me (and as I learn every day), don't trust an algorithm you do not understand!
Here's some test code that may help someone else dealing with linear regression and least squares
%https://youtu.be/m8FDX1nALSE matlab code
%https://youtu.be/1C3olrs1CUw good video to work out by hand if you want to test
function [a0 a1] = rtlinreg(x,y)
x=x(:);
y=y(:);
n=length(x);
a1 = (n*sum(x.*y) - sum(x)*sum(y))/(n*sum(x.^2) - (sum(x))^2); %a1 this is the slope of linear model
a0 = mean(y) - a1*mean(x); %a0 is the y-intercept
end
x=[65,65,62,67,69,65,61,67]'
y=[105,125,110,120,140,135,95,130]'
[a0 a1] = rtlinreg(x,y); %a1 is the slope of linear model, a0 is the y-intercept
x_model =min(x):.001:max(x);
y_model = a0 + a1.*x_model; %y=-186.47 +4.70x
plot(x,y,'x',x_model,y_model)

Exponential curve fitting without the Curve Fitting toolbox?

I have some data points to which I need to fit an exponential curve of the form
y = B * exp(A/x)
(without the help of Curve Fitting Toolbox).
What I have tried so far to linearize the model by applying log, which results in
log(y/B) = A/x
log(y) = A/x + log(B)
I can then write it in the form
Y = AX + B
Now, if I neglect B, then I am able to solve it with
A = pseudoinverse (X) * Y
but I am stuck with values of B...
Fitting a curve of the form
y = b * exp(a / x)
to some data points (xi, yi) in the least-squares sense is difficult. You cannot use linear least-squares for that, because the model parameters (a and b) do not appear in an affine manner in the equation. Unless you're ready to use some nonlinear-least-squares method, an alternative approach is to modify the optimization problem so that the modified problem can be solved using linear least squares (this process is sometimes called "data linearization"). Let's do that.
Under the assumption that b and the yi's be positive, you can apply the natural logarithm to both sides of the equations:
log(y) = log(b) + a / x
or
a / x + log(b) = log(y)
By introducing a new parameter b2, defined as log(b), it becomes evident that parameters a and b2 appear in a linear (affine, really) manner in the new equation:
a / x + b2 = log(y)
Therefore, you can compute the optimal values of those parameters using least squares; all you have left to do is construct the right linear system and then solve it using MATLAB's backslash operator:
A = [1 ./ x, ones(size(x))];
B = log(y);
params_ls = A \ B;
(I'm assuming x and y are column vectors, here.)
Then, the optimal values (in the least-squares sense) for the modified problem are given by:
a_ls = params_ls(1);
b_ls = exp(params_ls(2));
Although those values are not, in general, optimal for the original problem, they are often "good enough" in practice. If needed, you can always use them as initial guesses for some iterative nonlinear-least-squares method.
Doing the log transform then using linear regression should do it. Wikipedia has a nice section on how to do this:
http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29#The_general_problem
%MATLAB code for finding the best fit line using least squares method
x=input('enter a') %input in the form of matrix, rows contain points
a=[1,x(1,1);1,x(2,1);1,x(3,1)] %forming A of Ax=b
b=[x(1,2);x(2,2);x(3,2)] %forming b of Ax=b
yy=inv(transpose(a)*a)*transpose(a)*b %computing projection of matrix A on b, giving x
%plotting the best fit line
xx=linspace(1,10,50);
y=yy(1)+yy(2)*xx;
plot(xx,y)
%plotting the points(data) for which we found the best fit line
hold on
plot(x(2,1),x(2,2),'x')
hold on
plot(x(1,1),x(1,2),'x')
hold on
plot(x(3,1),x(3,2),'x')
hold off
I'm sure the code can be cleaned up, but that's the gist of it.

How do I plot relations in matlab?

I want to plot relations like y^2=x^2(x+3) in MATLAB without using ezplot or doing algebra to find each branch of the function.
Does anyone know how I can do this? I usually create a linspace and then create a function over the linspace. For example
x=linspace(-pi,pi,1001);
f=sin(x);
plot(x,f)
Can I do something similar for the relation I have provided?
What you could do is use solve and allow MATLAB's symbolic solver to symbolically solve for an expression of y in terms of x. Once you do this, you can use subs to substitute values of x into the expression found from solve and plot all of these together. Bear in mind that you will need to cast the result of subs with double because you want the numerical result of the substitution. Not doing this will still leave the answer in MATLAB's symbolic format, and it is incompatible for use when you want to plot the final points on your graph.
Also, what you'll need to do is that given equations like what you have posted above, you may have to loop over each solution, substitute your values of x into each, then add them to the plot.
Something like the following. Here, you also have control over the domain as you have desired:
syms x y;
eqn = solve('y^2 == x^2*(x+3)', 'y'); %// Solve for y, as an expression of x
xval = linspace(-1, 1, 1000);
%// Spawn a blank figure and remember stuff as we throw it in
figure;
hold on;
%// For as many solutions as we have...
for idx = 1 : numel(eqn)
%// Substitute our values of x into each solution
yval = double(subs(eqn(idx), xval));
%// Plot the points
plot(xval, yval);
end
%// Add a grid
grid;
Take special care of how I used solve. I specified y because I want to solve for y, which will give me an expression in terms of x. x is our independent variable, and so this is important. I then specify a grid of x points from -1 to 1 - exactly 1000 points actually. I spawn a blank figure, then for as many solutions to the equation that we have, we determine the output y values for each solution we have given the x values that I made earlier. I then plot these on a graph of these points. Note that I used hold on to add more points with each invocation to plot. If I didn't do this, the figure would refresh itself and only remember the most recent call to plot. You want to put all of the points on here generated from all of the solution. For some neatness, I threw a grid in.
This is what I get:
Ok I was about to write my answer and I just saw that #rayryeng proposed a similar idea (Good job Ray!) but here it goes. The idea is also to use solve to get an expression for y, then convert the symbolic function to an anonymous function and then plot it. The code is general for any number of solutions you get from solve:
clear
clc
close all
syms x y
FunXY = y^2 == x^2*(x+3);
%//Use solve to solve for y.
Y = solve(FunXY,y);
%// Create anonymous functions, stored in a cell array.
NumSol = numel(Y); %// Number of solutions.
G = cell(1,NumSol);
for k = 1:NumSol
G{k} = matlabFunction(Y(k))
end
%// Plot the functions...
figure
hold on
for PlotCounter = 1:NumSol
fplot(G{PlotCounter},[-pi,pi])
end
hold off
The result is the following:
n = 1000;
[x y] = meshgrid(linspace(-3,3,n),linspace(-3,3,n));
z = nan(n,n);
z = (y .^ 2 <= x .^2 .* (x + 3) + .1);
z = z & (y .^ 2 >= x .^2 .* (x + 3) - .1);
contour(x,y,z)
It's probably not what you want, but I it's pretty cool!

3D curvefitting

I have discrete regular grid of a,b points and their corresponding c values and I interpolate it further to get a smooth curve. Now from interpolation data, I further want to create a polynomial equation for curve fitting. How to fit 3D plot in polynomial?
I try to do this in MATLAB. I used Surface fitting toolbox in MATLAB (r2010a) to curve fit 3-dimensional data. But, how does one find a formula that fits a set of data to the best advantage in MATLAB/MAPLE or any other software. Any advice? Also most useful would be some real code examples to look at, PDF files, on the web etc.
This is just a small portion of my data.
a = [ 0.001 .. 0.011];
b = [1, .. 10];
c = [ -.304860225, .. .379710865];
Thanks in advance.
To fit a curve onto a set of points, we can use ordinary least-squares regression. There is a solution page by MathWorks describing the process.
As an example, let's start with some random data:
% some 3d points
data = mvnrnd([0 0 0], [1 -0.5 0.8; -0.5 1.1 0; 0.8 0 1], 50);
As #BasSwinckels showed, by constructing the desired design matrix, you can use mldivide or pinv to solve the overdetermined system expressed as Ax=b:
% best-fit plane
C = [data(:,1) data(:,2) ones(size(data,1),1)] \ data(:,3); % coefficients
% evaluate it on a regular grid covering the domain of the data
[xx,yy] = meshgrid(-3:.5:3, -3:.5:3);
zz = C(1)*xx + C(2)*yy + C(3);
% or expressed using matrix/vector product
%zz = reshape([xx(:) yy(:) ones(numel(xx),1)] * C, size(xx));
Next we visualize the result:
% plot points and surface
figure('Renderer','opengl')
line(data(:,1), data(:,2), data(:,3), 'LineStyle','none', ...
'Marker','.', 'MarkerSize',25, 'Color','r')
surface(xx, yy, zz, ...
'FaceColor','interp', 'EdgeColor','b', 'FaceAlpha',0.2)
grid on; axis tight equal;
view(9,9);
xlabel x; ylabel y; zlabel z;
colormap(cool(64))
As was mentioned, we can get higher-order polynomial fitting by adding more terms to the independent variables matrix (the A in Ax=b).
Say we want to fit a quadratic model with constant, linear, interaction, and squared terms (1, x, y, xy, x^2, y^2). We can do this manually:
% best-fit quadratic curve
C = [ones(50,1) data(:,1:2) prod(data(:,1:2),2) data(:,1:2).^2] \ data(:,3);
zz = [ones(numel(xx),1) xx(:) yy(:) xx(:).*yy(:) xx(:).^2 yy(:).^2] * C;
zz = reshape(zz, size(xx));
There is also a helper function x2fx in the Statistics Toolbox that helps in building the design matrix for a couple of model orders:
C = x2fx(data(:,1:2), 'quadratic') \ data(:,3);
zz = x2fx([xx(:) yy(:)], 'quadratic') * C;
zz = reshape(zz, size(xx));
Finally there is an excellent function polyfitn on the File Exchange by John D'Errico that allows you to specify all kinds of polynomial orders and terms involved:
model = polyfitn(data(:,1:2), data(:,3), 2);
zz = polyvaln(model, [xx(:) yy(:)]);
zz = reshape(zz, size(xx));
There might be some better functions on the file-exchange, but one way to do it by hand is this:
x = a(:); %make column vectors
y = b(:);
z = c(:);
%first order fit
M = [ones(size(x)), x, y];
k1 = M\z;
%least square solution of z = M * k1, so z = k1(1) + k1(2) * x + k1(3) * y
Similarly, you can do a second order fit:
%second order fit
M = [ones(size(x)), x, y, x.^2, x.*y, y.^2];
k2 = M\z;
which seems to have numerical problems for the limited dataset you gave. Type help mldivide for more details.
To make an interpolation over some regular grid, you can do like so:
ngrid = 20;
[A,B] = meshgrid(linspace(min(a), max(a), ngrid), ...
linspace(min(b), max(b), ngrid));
M = [ones(numel(A),1), A(:), B(:), A(:).^2, A(:).*B(:), B(:).^2];
C2_fit = reshape(M * k2, size(A)); % = k2(1) + k2(2)*A + k2(3)*B + k2(4)*A.^2 + ...
%plot to compare fit with original data
surfl(A,B,C2_fit);shading flat;colormap gray
hold on
plot3(a,b,c, '.r')
A 3rd-order fit can be done using the formula given by TryHard below, but the formulas quickly become tedious when the order increases. Better write a function that can construct M given x, y and order if you have to do that more than once.
This sounds like more of a philosophical question than specific implementation, specifically to bit - "how does one find a formula that fits a set of data to the best advantage?" In my experience that is a choice you have to make depending on what you're trying to achieve.
What defines "best" for you? For a data fitting problem you can keep adding more and more polynomial coefficients and making a better R^2 value... but will eventually "over fit" the data. A downside of high order polynomials is behavior outside the bounds of the sample data which you've used to fit your response surface - it can quickly go off in some wild direction which may not be appropriate for whatever it is you're trying to model.
Do you have insight into the physical behavior of the system / data you're fitting? That can be used as a basis for what set of equations to use to create a math model. My recommendation would be to use the most economical (simple) model you can get away with.