MATLAB curve fitting - least squares method - wrong "fit" using high degrees - matlab

Anyone here that could help me with the following problem?
The following code calculates the best polynomial fit to a given data-set, that is; a polynomial of a specified degree.
Unfortunately, whatever the data-set may be, usually at degree 6 or higher, MATLAB gets a totally wrong fit. Usually the fit curves totally away from the data in a sort of exponantial-looking-manner downwards. (see the example: degree = 8).
x=[1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5] % experimental x-values
y=[4.3 6.2 10.1 13.5 19.8 22.6 24.7 29.2] % experimental y-values
degree=8; % specify the degree
A = zeros(length(x),degree);
for exponent=0:degree;
for data=1:length(x);
A(data,exponent+1)=x(data).^exponent; % create matrix A
end;
end;
a=inv((transpose(A)*A))*transpose(A)*y'; % a are the coëfficients of the polynom
a=flipud(a);
fitpolynoom=polyval(a,x);
error=sum((y-fitpolynoom).^2); % calculates the fit-error using the least-squares method
fitpolynoom=polyval(a,x);
figure;
plot(x,y,'black*',x,fitpolynoom,'g-');
error % displays the value of the fit-error in the Matlab command window
Thanks in advance.

First, some remarks: for least-squares fitting polynomials in Matlab, you could use the existingpolyfit function instead. Furthermore (this may depend on your application) you probably should not be fitting $8$th degree polynomials, especially when you have $8$ data points. In this answer, I assume you have good reasons to fit polynomials to your data (e.g., just for self-study purposes).
The issue is a numeric problem arising from matrix inversion. For solving equations of type $Ax=b$ where $A$ is a square matrix, actually inverting $A$ is not recommended (See Blogpost 'Don't invert that matrix' by John D. Cook). In the least-squares case, instead of
\begin{equation}
a = (A^\mathrm{T} A)^{-1} A^\mathrm{T} y^\mathrm{T}
\end{equation}
it is better to solve
\begin{equation}
(A^\mathrm{T} A)a = A^\mathrm{T} y^\mathrm{T}
\end{equation}
by other means. In your MATLAB code, you may replace
a=inv((transpose(A)*A))*transpose(A)*y';
by
a = (transpose(A) * A) \ (transpose(A) * y');
By this modification to your code, I obtained a fit going through the data points.

Related

Higher order polynomial fitting is not so handy surprisingly

I have a simple question but was not able to fix it by myself. I want to use the MATLAB curvefitting toolbox and fit higher order polynomials. It works if I want to fit polynomials of order 1 to 9. But, to my surprise it does not work for polynomials with degree higher than 9. To make it simple, can you just see the following simple code which does not work for me, unfortunately.
l=1:0.01:10;y=l.^10;
[xData, yData] = prepareCurveData(l,y);
ft = fittype( 'poly10' );
[Fit, gof] = fit( xData, yData, ft, 'Normalize', 'on' );
Thanks in advance,
Babak
It might be surprising, but it is documented: List of Library Models for Curve and Surface Fitting. You can always use polyfit, but as per the warning it issues, once you start getting polynomials of that degree, the fit is likely to be problematic anyway.
This answer is some supplement to the Phil Goddard's answer.
There is no poly10 in the function fit. But there are at least two alternative ways to fit any degree of the polynomial: something like polyX, where x could be 1,2,...,M, (if it is necessary).
clc; clear;
%%data
l=1:0.01:10;y=l.^10;
[xData, yData] = prepareCurveData(l,y);
%%High degree polynomial fitting
%set the degree of the polunomial
Degree=10;
%Fit with customize option
%generate the cell array from 'x^Degree' to 'x^0'
syms x
Str=char(power(x,Degree:-1:0));
%set the fitting type & options, then call fit
HighPoly = fittype(strsplit(Str(10:end-3),','));
options = fitoptions('Normalize', 'off','Method','LinearLeastSquares','Robust','off');
[curve,gof] = fit(xData,yData,HighPoly,options)
%Polyfit with the degree of Degree
p = polyfit(xData,yData,Degree)
But both fit and polyfit show some warnings, in my humble opinion, it is due to the Runge's phenomenon, which is a problem of oscillation at the edges of an interval that occurs when using polynomial interpolation with polynomials of high degree over a set of equispaced interpolation points.
Discard the data in this situation or some similar ones, where the true function is polynomial with high degree, says something in the Pn[R], high degree polynomial is not recommended in the fitting of the complex function.
Edit: generalized the code.

Curve fitting in MATLAB, for a Sinusoidal function with more than 8 terms?

I'm trying to fit some data to a sum of sines function in MATLAB, however, the number of terms of sine function in MATLAB is limited,i.e. to 1 ≤ n ≤ 8. However, I want more terms in my fit functions, i.e. over 50 term. Is there anyway to make MATLAB to fit my data to a sum of sine function with over 8 sinusoidal terms? Why there is such constraint in MATLAB (is it technically or arbitrary)? Is there any toolbox to fit sinusoidal function (especially something that is capable of supporting wieghted data)?
>f = fit(X,Y, 'sin10')
>Error using fittype>iCreateFromLibrary (line 412)
>Library function sin10 not found.
It is o.k up to 'sin8' or 'sin9' parameters.
I appreciate any answer.
I'v found a solution to my question accidentally, while browsing MATLAB help. I post this answer in hope of helping people who have the same problem.
As the first shot to solve this , I tried 'fit' instruction. For some reasons, customized 'fit' based fitting code like below, didn't workout:
FitOptions = fitoptions('Method','NonlinearLeastSquares', 'Algorithm', 'Trust-Region', 'MaxIter');
FitType = fittype('a*sin(1*f) + b*sin(2*f) + c*sin(3*f) + d*sin(4*f) + e*sin(5*f) + g*sin(6*f) + h*sin(7*f) + k*sin(8*f) + l*sin(9*f) + m*sin(10*f) + n*sin(11*f)', 'independent', 'f');
[FittedModel, GOF] = fit(freq, data, FitType)
% `In above code, phase parameters are not included, they might be added.
What I found is that using 'lsqcurvefit' instruction from Optimization Toolbox, customized function fitting is more feasible and easier than 'fit' function. I tested it to fit my data to sum of 12 (>8) sines in below code:
clear;clc
xdata=1:0.1:10; % X or Independant Data
ydata=sin(xdata+0.2)+0.5*sin(0.3*xdata+0.3)+ 2*sin( 0.2*xdata+23 )+...
0.7*sin( 0.34*xdata+12 )+.76*sin( .23*xdata+.3 )+.98*sin(.76 *xdata+.56 )+...
+.34*sin( .87*xdata+.123 )+.234*sin(.234 *xdata+23 ); % Y or Dependant data
x0 = randn(36,1); % Initial Guess
fun = #(x,xdata)x(1)*sin(x(2)*xdata+x(3))+...
x(4)*sin(x(5)*xdata+x(6))+...
x(7)*sin(x(8)*xdata+x(9))+...
x(10)*sin(x(11)*xdata+x(12))+...
x(13)*sin(x(14)*xdata+x(15))+...
x(16)*sin(x(17)*xdata+x(18))+...
x(19)*sin(x(20)*xdata+x(21))+...
x(22)*sin(x(23)*xdata+x(24))+...
x(25)*sin(x(26)*xdata+x(27))+...
x(28)*sin(x(29)*xdata+x(30))+...
x(31)*sin(x(32)*xdata+x(33))+...
x(34)*sin(x(35)*xdata+x(36)); % Goal function which is Sum of 12 sines
options = optimoptions('lsqcurvefit','Algorithm','trust-region-reflective');% Options for fitting
x=lsqcurvefit(fun,x0,xdata,ydata) % the main instruction
times = linspace(xdata(1),xdata(end));
plot(xdata,ydata,'ko',times,fun(x,times),'r-')
legend('Data','Fitted Sum of 12 Sines')
title('Data and Fitted Curve')
The results is satisfactory (till now), it is shown in below:
The above problem is that when I use matlab fit function, with specified argument for Sum of Sines fitting (e.g fit(xdata,ydata,'sin6')), it easily converges to an optimum solution and fitting results are acceptable as below:
but when I tried to fit same data using a customarily defined function, it results are not satisfactory at all as you see in figure below:
fun=#(x,xdata)a1*sin(b1*xdata+c1)+...+a6*sin(b6*xdata+c6); %Sum if Six Sines
f=fit(xdata,ydata,fun);
First, I felt it is the fit instruction so I tried other instructions like lsqcurvefit , it worked well for some data but as soon as other data were ued it started to ill-behave.
From Maltab documentations, I figured out Sum of Sine fitting and Fourier fitting are extremely sensitive to Starting points or initial points, or values that fitting algorithm assumes for fitting parameters (amplitudes, frequencies and phases) for its first iteration. Through inspection of Matlab fitting toolbox .m files , I noticed matlab does some clever trick to obtain starting point when you use predefined function fitting (e.g. fit(x,y,'sin1'), or fit(x,y,'sin2'),... but when you chose ti enter your custom function the initial points are generated randomly! This is why Matlab build functions work and my custom function fitting does not (even though I enter the same function).
By the way, Matlab computes FFT of the ydata and through some (seems greedy) method extracts initial points for amplitudes, frequencies and phases (a function called startpt.m does this).

How to solve equations with complex coefficients using ode45 in MATLAB?

I am trying to solve two equations with complex coefficients using ode45.
But iam getting an error message as "Inputs must be floats, namely single or
double."
X = sym(['[',sprintf('X(%d) ',1:2),']']);
Eqns=[-(X(1)*23788605396486326904946699391889*1i)/38685626227668133590597632 + (X(2)*23788605396486326904946699391889*1i)/38685626227668133590597632; (X(2)*23788605396486326904946699391889*1i)/38685626227668133590597632 + X(1)*(- 2500000 + (5223289665997855453060886952725538686654593059791*1i)/324518553658426726783156020576256)] ;
f=#(t,X)[Eqns];
[t,Xabc]=ode45(f,[0 300*10^-6],[0 1])
How can i fix this ? Can somebody can help me ?
Per the MathWorks Support Team, the "ODE solvers in MATLAB 5 (R12) and later releases properly handle complex valued systems." So the complex numbers are the not the issue.
The error "Inputs must be floats, namely single or double." stems from your definition of f using Symbolic Variables that are, unlike complex numbers, not floats. The easiest way to get around this is to not use the Symbolic Toolbox at all; just makes Eqns an anonymous function:
Eqns= #(t,X) [-(X(1)*23788605396486326904946699391889*1i)/38685626227668133590597632 + (X(2)*23788605396486326904946699391889*1i)/38685626227668133590597632; (X(2)*23788605396486326904946699391889*1i)/38685626227668133590597632 + X(1)*(- 2500000 + (5223289665997855453060886952725538686654593059791*1i)/324518553658426726783156020576256)] ;
[t,Xabc]=ode45(Eqns,[0 300*10^-6],[0 1]);
That being said, I'd like to point out that numerically time integrating this system over 300 microseconds (I assume without units given) will take a long time since your coefficient matrix has imaginary eigenvalues on the order of 10E+10. The extremely short wavelength of those oscillations will more than likely be resolved by Matlab's adaptive methods, and that will take a while to solve for a time span just a few orders greater than the wavelength.
I'd, therefore, suggest an analytical approach to this problem; unless it is a stepping stone another problem that is non-analytically solvable.
Systems of ordinary differential equations of the form
,
which is a linear, homogenous system with a constant coefficient matrix, has the general solution
,
where the m-subscripted exponential function is the matrix exponential.
Therefore, the analytical solution to the system can be calculated exactly assuming the matrix exponential can be calculated.
In Matlab, the matrix exponential is calculate via the expm function.
The following code computes the analytical solution and compares it to the numerical one for a short time span:
% Set-up
A = [-23788605396486326904946699391889i/38685626227668133590597632,23788605396486326904946699391889i/38685626227668133590597632;...
-2500000+5223289665997855453060886952725538686654593059791i/324518553658426726783156020576256,23788605396486326904946699391889i/38685626227668133590597632];
Eqns = #(t,X) A*X;
X0 = [0;1];
% Numerical
options = odeset('RelTol',1E-8,'AbsTol',1E-8);
[t,Xabc]=ode45(Eqns,[0 1E-9],X0,options);
% Analytical
Xana = cell2mat(arrayfun(#(tk) expm(A*tk)*X0,t,'UniformOutput',false)')';
k = 1;
% Plots
figure(1);
subplot(3,1,1)
plot(t,abs(Xana(:,k)),t,abs(Xabc(:,k)),'--');
title('Magnitude');
subplot(3,1,2)
plot(t,real(Xana(:,k)),t,real(Xabc(:,k)),'--');
title('Real');
ylabel('Values');
subplot(3,1,3)
plot(t,imag(Xana(:,k)),t,imag(Xabc(:,k)),'--');
title('Imaginary');
xlabel('Time');
The comparison plot is:
The output of ode45 matches the magnitude and real parts of the solution very well, but the imaginary portion is out-of-phase by exactly π.
However, since ode45's error estimator only looks at norms, the phase difference is not noticed which may lead to problems depending on the application.
It will be noted that while the matrix exponential solution is far more costly than ode45 for the same number of time vector elements, the analytical solution will produce the exact solution for any time vector of any density given to it. So for long time solutions, the matrix exponential can be viewed as an improvement in some sense.

Find approximation of sine using least squares

I am doing a project where i find an approximation of the Sine function, using the Least Squares method. Also i can use 12 values of my own choice.Since i couldn't figure out how to solve it i thought of using Taylor's series for Sine and then solving it as a polynomial of order 5. Here is my code :
%% Find the sine of the 12 known values
x=[0,pi/8,pi/4,7*pi/2,3*pi/4,pi,4*pi/11,3*pi/2,2*pi,5*pi/4,3*pi/8,12*pi/20];
y=zeros(12,1);
for i=1:12
y=sin(x);
end
n=12;
j=5;
%% Find the sums to populate the matrix A and matrix B
s1=sum(x);s2=sum(x.^2);
s3=sum(x.^3);s4=sum(x.^4);
s5=sum(x.^5);s6=sum(x.^6);
s7=sum(x.^7);s8=sum(x.^8);
s9=sum(x.^9);s10=sum(x.^10);
sy=sum(y);
sxy=sum(x.*y);
sxy2=sum( (x.^2).*y);
sxy3=sum( (x.^3).*y);
sxy4=sum( (x.^4).*y);
sxy5=sum( (x.^5).*y);
A=[n,s1,s2,s3,s4,s5;s1,s2,s3,s4,s5,s6;s2,s3,s4,s5,s6,s7;
s3,s4,s5,s6,s7,s8;s4,s5,s6,s7,s8,s9;s5,s6,s7,s8,s9,s10];
B=[sy;sxy;sxy2;sxy3;sxy4;sxy5];
Then at matlab i get this result
>> a=A^-1*B
a =
-0.0248
1.2203
-0.2351
-0.1408
0.0364
-0.0021
However when i try to replace the values of a in the taylor series and solve f.e t=pi/2 i get wrong results
>> t=pi/2;
fun=t-t^3*a(4)+a(6)*t^5
fun =
2.0967
I am doing something wrong when i replace the values of a matrix in the Taylor series or is my initial thought flawed ?
Note: i can't use any built-in function
If you need a least-squares approximation, simply decide on a fixed interval that you want to approximate on and generate some x abscissae on that interval (possibly equally spaced abscissae using linspace - or non-uniformly spaced as you have in your example). Then evaluate your sine function at each point such that you have
y = sin(x)
Then simply use the polyfit function (documented here) to obtain least squares parameters
b = polyfit(x,y,n)
where n is the degree of the polynomial you want to approximate. You can then use polyval (documented here) to obtain the values of your approximation at other values of x.
EDIT: As you can't use polyfit you can generate the Vandermonde matrix for the least-squares approximation directly (the below assumes x is a row vector).
A = ones(length(x),1);
x = x';
for i=1:n
A = [A x.^i];
end
then simply obtain the least squares parameters using
b = A\y;
You can clearly optimise the clumsy Vandermonde generation loop above I have just written to illustrate the concept. For better numerical stability you would also be better to use a nice orthogonal polynomial system like Chebyshev polynomials of the first kind. If you are not even allowed to use the matrix divide \ function then you will need to code up your own implementation of a QR factorisation and solve the system that way (or some other numerically stable method).

Double integration over a polygon in Matlab

I am given a function #f(x,y) and I want to evaluate the integral of this function over a certain convex polygon in MATLAB. The polygon is not necessarily a rectangle and that's why I can't use MATLAB's function "dblquad". The polygon I have is given by a set of vertices represented by the vectors X and Y, i.e. the vertices are (X(1),Y(1)),....,(X(n),Y(n)). Is there any function or method that I can use?
The trick is to use tools to integrate inside the region of interest. I've written a few tools for integration in a triangulated domain.
% Define a function to integrate.
% This function takes an nx2 array, where each row
% contains a single point to evaluate the kernel at.
% This computes x^2 + y^2 at each point.
fun = #(xy) sum(xy.^2,2);
% define the domain as a triangulated polygon
% this tool uses ear clipping to do so.
sc = poly2tri([1 4 3 1],[1 3 5 4]);
% Gauss-Legendre integration over the 2-d domain
[integ,fev]= quadgsc(fun,sc,2)
integ =
113.166666666667
fev =
8
% the triangulated polygon...
plotsc(sc,'facecolor','none','markerfacecolor','r')
axis equal
grid on
We can visualize the function itself, as a mapping z(x,y) over that polygonal domain. When a range field is supplied, the simplicial complex turns into a 2-1 mapping from the 2-d (x,y) domain.
sc2 = refinesc(sc,'max',.5);
sc2.range = fun(sc2.domain);
plotsc(sc2,'markerfacecolor','r')
grid on
view(17,12)
This is a simple polynomial function over the domain of interest, so the default low order Gaussian integration was adequate. The scheme used is a Gauss-Legendre one in a tensor product form over a triangle, not truly optimal, but viable. The problem with Gaussian quadrature, is it is not adaptive. It computes an estimate, based on implicit approximation by polynomials over a finite set of points.
The above estimate used 8 function evals to compute that estimate. Since the kernel is a low order polynomial, it should do perfectly. The problem is, you need to know if it is a correct solution. This is the problem with a Gaussian quadrature, there is no simple way to know if the answer is correct, except for resolving the problem with a higher order scheme until it seems to converge.
See that with 1 point per triangle at the barycenter, we get the wrong answer, but the higher order estimates all agree.
[integ,fev]= quadgsc(fun,sc,1)
integ =
107.777777777778
fev =
2
[integ,fev]= quadgsc(fun,sc,3)
integ =
113.166666666667
fev =
18
[integ,fev]= quadgsc(fun,sc,4)
integ =
113.166666666667
fev =
32
After writing quadgsc, I had to try an adaptive solver, that works in the same way as the other quad tools do in MATLAB. This does an adaptive refinement of the triangulation, looking for triangles where the solution is not stable. The problem is, I never did finish writing these tools to my satisfaction. There are many different methods one can employ for the cubature problem over a triangulated domain. quadrsc does a low order solution, then refines it, uses a Richardson extrapolation, then compares the results. For any triangles where the difference is too large, it refines them further until it converges.
For example,
[integ,fev]= quadrsc(fun,sc)
integ =
113.166666666667
fev =
16
So this works. The problem shows up on more complex kernels, where the issue becomes to know when to stop the refinement, and to do so before one has used up too many function evaluations. I never did get that fully working to my satisfaction, so I never posted these tools. I can send the toolbox to those who send me direct mail. The zip file is about 2.4 MB. One day I'll get around to finishing those tools, I hope...