I wish to fit a polynomial curve (4 ou 5 degree) to my data. I did it with EXCEL and I get coefficient around 10^-13 for the 5th one, 10^-9 for the 4th one and 10^-5 for the third one...
I would like to constrain all the coefficients to not be lower than 10^-2. The curve won't be fitted that good but it is ok.
How can I do that with the polyfit function ?
And then, from a mathematical point of vue. Does it make sense to constrain coefficient ? Or is it useless and I better keep going with a second degree polyfit (which has coefficient lower than 10^-2).
The reason I'm asking this : I'm doing some research and from a physical point of view, it is interesting to test the 5th degree polyfit but I can't use coefficient lower than 10^-2.
Thank you
Use fit rather than polyfit
%What is the degree of the polynomial (quartic)
polyDegree = 4;
%This sets up the options
opts = fitoptions( 'Method', 'LinearLeastSquares' );
%All coefficients of degrees not specified between x^n and x^0 can have any value greater than 10^-2
opts.Lower = 1E-2;
opts.Upper = inf(1, polyDegree + 1);
%Do the fit using the specified polynomial degree.
[fitresult, gof] = fit( x, y, ['poly', num2str(polyDegree)] , opts );
Related
I am batch processing 1000s of data. Sometime the peak positions and magnitudes change drastically, and the program struggles to find these peaks with a single start point value. I have to divide my data into smaller batches to change the start point values, which is time consuming.
Is it possible to try various start point values and select the one with the best rsquare?
ft = fittype('y0 + a*exp(-((x-xa)/(wa))^2), 'independent', 'x', 'dependent', 'y' );
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
opts.StartPoint = [10 10 10 0]; % this is a, wa, xa and y0 - from the equation
[fitresult, gof] = fit(xData, yData, ft, opts);
alpha = gof.rsquare; % extract goodness of fit
if alpha < 0.98 % if rsquare (goodness of fit) is not good enough
for x = 100:10:500; y= 10:1:50 %these numbers are not set in stone - can be any number
opts.StartPoint = [10+x 10 10+y 0]; % tweak the start point values for the fit
[fitresult, gof] = fit(xData, yData, ft, opts); % fit again
Then select the start point with the best rsquare and plot the results.
% plot
f = figure('Name', 'Gauss','Pointer','crosshair');
h = plot(fitresult, xData, yData, '-o');
If they are difficulties in guessing, I suggest to use a different method which is not iterative and doesn't need guessed value of the parameters to start the numerical calculus.
Since I have no representative data of your problem, I cannot check if the method proposed below is convenient in your case. This depends on the scatter of the data and on the distribution of the points.
Try it and see. If the result is not correct, please let me know.
A numerical example with highly scattered data is shown below. With this example you can check if the method is correctly implemented.
NOTE : This method can be used to obtain some approximate values of the parameters which can be put as "guessed" values in the usual non-linear regression softwares.
For information : The method is a linear regression wrt an integral equation to which the Gaussian function is solution :
For the general principle, see : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
Suppose I have a vector t = [0 0.1 0.9 1 1.4], and a vector x = [1 3 5 2 3]. How can I compute the derivative of x with respect to time that has the same length as the original vectors?
I should not use any symbolic operations. The command diff(x)./diff(t) does not produce a vector of the same length. Should I first interpolate the x(t) function and then take its derivative?
Different approaches exist to calculate the derivative at the same points as your initial data:
Finite differences: Use a central difference scheme at your inner points and a forward/backward scheme at your first/last point
or
Curve fitting: Fit a curve through your points, calculate the derivative of this fitted function and sample them at the same points as the original data. Typical fitting functions are polynomials or spline functions.
Note that the curve fitting approach gives better results, but needs more tuning options and is slower (~100x).
Demonstration
As an example, I will calculate the derivative of a sine function:
t = 0:0.1:1;
y = sin(t);
Its exact derivative is well known:
dy_dt_exact = cos(t);
The derivative can approximately been calculated as:
Finite differences:
dy_dt_approx = zeros(size(y));
dy_dt_approx(1) = (y(2) - y(1))/(t(2) - t(1)); % forward difference
dy_dt_approx(end) = (y(end) - y(end-1))/(t(end) - t(end-1)); % backward difference
dy_dt_approx(2:end-1) = (y(3:end) - y(1:end-2))./(t(3:end) - t(1:end-2)); % central difference
or
Polynomial fitting:
p = polyfit(t,y,5); % fit fifth order polynomial
dp = polyder(p); % calculate derivative of polynomial
The results can be visualised as follows:
figure('Name', 'Derivative')
hold on
plot(t, dy_dt_exact, 'DisplayName', 'eyact');
plot(t, dy_dt_approx, 'DisplayName', 'finite difference');
plot(t, polyval(dp, t), 'DisplayName', 'polynomial');
legend show
figure('Name', 'Error')
hold on
plot(t, abs(dy_dt_approx - dy_dt_exact)/max(dy_dt_exact), 'DisplayName', 'finite difference');
plot(t, abs(polyval(dp, t) - dy_dt_exact)/max(dy_dt_exact), 'DisplayName', 'polynomial');
legend show
The first graph shows the derivatives itself and the second graph plots the relative errors made by both methods.
Discussion
One clearly sees that the curve fitting method gives better results than the finite differences, but it is ~100x slower. The curve fitting methods has a relative error of order 10^-5. Note that the finite differences approach becomes better when your data is sampled more densely or you use a higher order scheme. The disadvantage of the curve fitting approach is that one has to choose a good polynomial order. Spline functions may be better suited in general.
A 10x faster sampled dataset, i.e. t = 0:0.01:1;, results in the following graphs:
I am using Matlab curve-fitting tool cftool to fit my data. The issue is that the y values are largely varying (strongly decreasing) with respect to x-axis. A sample is given below,
x y
0.1 237.98
1 25.836
10 3.785
30 1.740
100 0.804
300 0.431
1000 0.230
2000 0.180
The fitted format is: y=a/x^b+c/x^d with a,b,c, and d as constants. The curve-fit from matlab is quite accurate for large y-values (that's at lower x-range) with less than 0.1% deviation. However, at higher x-values, the accuracy of the fit is not good (around 11% deviation). I would like also to include % deviation in the curve-fitting iteration to make sure the data is captured exactly. The plot is given for the fit and data reference.
Can anyone suggest me for better ways to fit the data?
The most common way to fit a curve is to do a least squares fit, which minimizes the sum of the square differences between the data and the fit. This is why your fit is tighter when y is large: an 11% deviation on a value of 0.18 is only a squared error of 0.000392, while a 0.1% deviation on a value of 240 is a squared error of 0.0576, much more significant.
If what you care about is deviations rather than absolute (squared) errors, then you can either reformulate the fitting algorithm, or transform your data in a clever way. The second way is a common and useful tool to know.
One way to do this in your case is fit the log(y) instead of y. This will have the effect of making small errors much more significant:
data = [0.1 237.98
1 25.836
10 3.785
30 1.740
100 0.804
300 0.431
1000 0.230
2000 0.180];
x = data(:,1);
y = data(:,2);
% Set up fittype and options.
ft = fittype( 'a/x^b + c/x^d', 'independent', 'x', 'dependent', 'y' );
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
opts.StartPoint = [0.420712466925742 0.585539298834167 0.771799485946335 0.706046088019609];
%% Usual least-squares fit
[fitresult] = fit( x, y, ft, opts );
yhat = fitresult(x);
% Plot fit with data.
figure
semilogy( x, y );
hold on
semilogy( x, yhat);
deviation = abs((y-yhat))./y * 100
%% log-transformed fit
[fitresult] = fit( x, log(y), ft, opts );
yhat = exp(fitresult(x));
% Plot fit with data.
figure
semilogy( x, y );
hold on
semilogy( x, yhat );
deviation = abs((y-yhat))./y * 100
One approach would be to fit to the lowest sum-of-squared relative error, rather than the lowest sum-of-squared absolute error. When I use the data posted in your question, fitting to lowest sum-of-squared relative error yields +/- 4 percent error - so this may be a useful option. To verify if you might want to consider this approach, here are the coefficients I determined from your posted data using this method:
a = 2.2254477037465399E+01
b = 1.0038013513610324E+00
c = 4.1544917994119190E+00
d = 4.2684956973959676E-01
I have two 2D points p1, p2 in MATLAB, and each point has a normal n1, n2. I wish to find the (cubic) polynomial which joins the two points and agrees with the specified normals at each end. Is there something built-in to MATLAB to do this?
Of course, I could derive the equations for the polynomial manually, but MATLAB's curve fitting toolbox has so much built-in that I assumed it would be possible. I haven't been able to find any examples of curve, spline or polynomial fitting where the normals are specified.
As an extrapolation of this, I would like to fit splines where each data point has a normal specified.
1. If your points are points of a function, then you need cubic Hermite spline interpolation:
In numerical analysis, a cubic Hermite spline or cubic Hermite
interpolator is a spline where each piece is a third-degree polynomial
specified in Hermite form: that is, by its values and first derivatives at the
end points of the corresponding domain interval.
Cubic Hermite splines
are typically used for interpolation of numeric data specified at
given argument values x(1), x(2), ..., x(n), to obtain a smooth
continuous function. The data should consist of the desired function
value and derivative at each x(k). (If only the values are provided,
the derivatives must be estimated from them.) The Hermite formula is
applied to each interval (x(k), x(k+1)) separately. The resulting
spline will be continuous and will have continuous first derivative.
Cubic polynomial splines can be specified in other ways, the Bézier
form being the most common. However, these two methods provide the
same set of splines, and data can be easily converted between the
Bézier and Hermite forms; so the names are often used as if they were
synonymous.
Specifying the normals at each point is the same as specifying the tangents (slopes, 1st derivatives), because the latter are perpendicular to the former.
In Matlab, the function for calculating the Piecewise Cubic Hermite Interpolating Polynomial is pchip. The only problem is that pchip is a bit too clever:
The careful reader will notice that pchip takes function values as
input, but no derivative values. This is because pchip uses the
function values f(x) to estimate the derivative values. [...] To do a
good derivative approximation, the function has to use an
approximation using 4 or more points [...] Luckily, using Matlab we
can write our own functions to do interpolation using real cubic
Hermite splines.
...the author shows how to do this, using the function mkpp.
2. If your points are not necessarily points of a function, then each interval should be interpolated by a quadratic Bezier curve:
In this example, 3 points are given: the endpoints P(0) and P(2), and P(1), which is the intersection of the tangents at the endpoints. The position of P(1) can be easily calculated from the coordinates of P(0) and P(2), and the normals at these points.
In Matlab, you can use spmak, see the examples here and here.
You could do something like this:
function neumann_spline(p, m, q, n)
% example data
p = [0; 1];
q = [2; 5];
m = [0; 1];
n = [1; 1];
if (m(2) ~= 0)
s1 = atan(-m(1)/m(2));
else
s1 = pi/2;
end
if (n(2) ~= 0)
s2 = atan(-n(1)/n(2));
else
s2 = pi/2;
end
hold on
grid on
axis equal
plot([p(1) p(1)+0.5*m(1)], [p(2) p(2)+0.5*m(2)], 'r', 'Linewidth', 1)
plot([q(1) q(1)+0.5*n(1)], [q(2) q(2)+0.5*n(2)], 'r', 'Linewidth', 1)
sp = csape([p(1) q(1)], [s1 p(2) q(2) s2], [1 1]);
fnplt(sp)
plot(p(1), p(2), 'k.', 'MarkerSize', 16)
plot(q(1), q(2), 'k.', 'MarkerSize', 16)
title('Cubic spline with prescribed normals at the endpoints')
end
The result is
I am trying to use Matlab's nlinfit function to estimate the best fitting Gaussian for x,y paired data. In this case, x is a range of 2D orientations and y is the probability of a "yes" response.
I have copied #norm_funct from relevant posts and I'd like to return a smoothed, normal distribution that best approximates the observed data in y, and returns the magnitude, mean and SD of the best fitting pdf. At the moment, the fitted function appears to be incorrectly scaled and less than smooth - any help much appreciated!
x = -30:5:30;
y = [0,0.20,0.05,0.15,0.65,0.85,0.88,0.80,0.55,0.20,0.05,0,0;];
% plot raw data
figure(1)
plot(x, y, ':rs');
axis([-35 35 0 1]);
% initial paramter guesses (based on plot)
initGuess(1) = max(y); % amplitude
initGuess(2) = 0; % mean centred on 0 degrees
initGuess(3) = 10; % SD in degrees
% equation for Gaussian distribution
norm_func = #(p,x) p(1) .* exp(-((x - p(2))/p(3)).^2);
% use nlinfit to fit Gaussian using Least Squares
[bestfit,resid]=nlinfit(y, x, norm_func, initGuess);
% plot function
xFine = linspace(-30,30,100);
figure(2)
plot(x, y, 'ro', x, norm_func(xFine, y), '-b');
Many thanks
If your data actually represent probability estimates which you expect come from normally distributed data, then fitting a curve is not the right way to estimate the parameters of that normal distribution. There are different methods of different sophistication; one of the simplest is the method of moments, which means you choose the parameters such that the moments of the theoretical distribution match those of your sample. In the case of the normal distribution, these moments are simply mean and variance (or standard deviation). Here's the code:
% normalize y to be a probability (sum = 1)
p = y / sum(y);
% compute weighted mean and standard deviation
m = sum(x .* p);
s = sqrt(sum((x - m) .^ 2 .* p));
% compute theoretical probabilities
xs = -30:0.5:30;
pth = normpdf(xs, m, s);
% plot data and theoretical distribution
plot(x, p, 'o', xs, pth * 5)
The result shows a decent fit:
You'll notice the factor 5 in the last line. This is due to the fact that you don't have probability (density) estimates for the full range of values, but from points at distances of 5. In my treatment I assumed that they correspond to something like an integral over the probability density, e.g. over an interval [x - 2.5, x + 2.5], which can be roughly approximated by multiplying the density in the middle by the width of the interval. I don't know if this interpretation is correct for your data.
Your data follow a Gaussian curve and you describe them as probabilities. Are these numbers (y) your raw data – or did you generate them from e.g. a histogram over a larger data set? If the latter, the estimate of the distribution parameters could be improved by using the original full data.