I need to fit data e.g. x, y, CI (where CI is confidence index of y) in Matlab.
Now, I use this code:
pf = polyfit(x, y, 2);
x1 = min(x):.1:max(x);
y1 = polyval(pf, x1);
figure
hold on
errorbar(x, y, CI, 'ko');
plot(x1, y1, 'k');
hold off
Of course, the fit comes out of some errors bars, and it's correct.
I would like obtain a fit curve closer to the points with a low confidence index, and discard the points with a high confidence index.
Thank you and bye,
Giacomo
What you are looking for are Weighted Least Squares. You can compute them with the function lscov. There is a nice example in its help page, but I'll try to make it clearer.
Let us construct a simple parabola, with a corrupted point
x = (0:0.1:1)';
y = 0.5*x.^2;
y(5) = 3*y(5);
and give some weights
w = ones(size(y));
w(5) = 0.1;
Next build the Vandermonde matrix (see here for the code) and solve the system
%// V = [x.^2 x ones(size(x))];
V = bsxfun(#power, x, 2:-1:0);
coeff = lscov(V, y, w);
The estimated coefficients, with and without the weights, are
x^2 x 1
with weights [0.4797 0.0186 -0.0004]
no weights [0.3322 0.1533 -0.0034]
Note that in your case w will have to be inverted.
If you don't like to build the Vandermonde matrix, and you have a license for the Curve Fitting Toolbox, you can use the following code
ft = fittype('poly2');
opts = fitoptions('Method', 'LinearLeastSquares');
opts.Weights = w;
fitresult = fit(x, y, ft, opts);
and you'll obtain the same result.
Related
I am trying to fit a 3D surface polynomial of n-degrees to some data points in 3D space. My system requires the surface to be monotonically increasing in the area of interest, that is the partial derivatives must be non-negative. This can be achieved using Matlab's built in lsqlin function.
So here's what I've done to try and achieve this:
I have a function that takes in four parameters;
x1 and x2 are my explanatory variables and y is my dependent variable. Finally, I can specify order of polynomial fit. First I build the design matrix A using data from x1 and x2 and the degree of fit I want. Next I build the matrix D that is my container for the partial derivatives of my datapoints. NOTE: the matrix D is double the length of matrix A since all datapoints must be differentiated with respect to both x1 and x2. I specify that Dx >= 0 by setting b to be zeroes.
Finally, I call lsqlin. I use "-D" since Matlab defines the function as Dx <= b.
function w_mono = monotone_surface_fit(x1, x2, y, order_fit)
% Initialize design matrix
A = zeros(length(x1), 2*order_fit+2);
% Adjusting for bias term
A(:,1) = ones(length(x1),1);
% Building design matrix
for i = 2:order_fit+1
A(:,(i-1)*2:(i-1)*2+1) = [x1.^(i-1), x2.^(i-1)];
end
% Initialize matrix containing derivative constraint.
% NOTE: Partial derivatives must be non-negative
D = zeros(2*length(y), 2*order_fit+1);
% Filling matrix that holds constraints for partial derivatives
% NOTE: Matrix D will be double length of A since all data points will have a partial derivative constraint in both x1 and x2 directions.
for i = 2:order_fit+1
D(:,(i-1)*2:(i-1)*2+1) = [(i-1)*x1.^(i-2), zeros(length(x2),1); ...
zeros(length(x1),1), (i-1)*x2.^(i-2)];
end
% Limit of derivatives
b = zeros(2*length(y), 1);
% Constrained LSQ fit
options = optimoptions('lsqlin','Algorithm','interior-point');
% Final weights of polynomial
w_mono = lsqlin(A,y,-D,b,[],[],[],[],[], options);
end
So now I get some weights out, but unfortunately they do not at all capture the structure of the data. I've attached an image so you can just how bad it looks. .
I'll give you my plotting script with some dummy data, so you can try it.
%% Plot different order polynomials to data with constraints
x1 = [-5;12;4;9;18;-1;-8;13;0;7;-5;-8;-6;14;-1;1;9;14;12;1;-5;9;-10;-2;9;7;-1;19;-7;12;-6;3;14;0;-8;6;-2;-7;10;4;-5;-7;-4;-6;-1;18;5;-3;3;10];
x2 = [81.25;61;73;61.75;54.5;72.25;80;56.75;78;64.25;85.25;86;80.5;61.5;79.25;76.75;60.75;54.5;62;75.75;80.25;67.75;86.5;81.5;62.75;66.25;78.25;49.25;82.75;56;84.5;71.25;58.5;77;82;70.5;81.5;80.75;64.5;68;78.25;79.75;81;82.5;79.25;49.5;64.75;77.75;70.25;64.5];
y = [-6.52857142857143;-1.04736842105263;-5.18750000000000;-3.33157894736842;-0.117894736842105;-3.58571428571429;-5.61428571428572;0;-4.47142857142857;-1.75438596491228;-7.30555555555556;-8.82222222222222;-5.50000000000000;-2.95438596491228;-5.78571428571429;-5.15714285714286;-1.22631578947368;-0.340350877192983;-0.142105263157895;-2.98571428571429;-4.35714285714286;-0.963157894736842;-9.06666666666667;-4.27142857142857;-3.43684210526316;-3.97894736842105;-6.61428571428572;0;-4.98571428571429;-0.573684210526316;-8.22500000000000;-3.01428571428571;-0.691228070175439;-6.30000000000000;-6.95714285714286;-2.57232142857143;-5.27142857142857;-7.64285714285714;-2.54035087719298;-3.45438596491228;-5.01428571428571;-7.47142857142857;-5.38571428571429;-4.84285714285714;-6.78571428571429;0;-0.973684210526316;-4.72857142857143;-2.84285714285714;-2.54035087719298];
% Used to plot the surface in all points in the grid
X1 = meshgrid(-10:1:20);
X2 = flipud(meshgrid(30:2:90).');
figure;
for i = 1:4
w_mono = monotone_surface_fit(x1, x2, y, i);
y_nr = w_mono(1)*ones(size(X1)) + w_mono(2)*ones(size(X2));
for j = 1:i
y_nr = w_mono(j*2)*X1.^j + w_mono(j*2+1)*X2.^j;
end
subplot(2,2,i);
scatter3(x1, x2, y); hold on;
axis tight;
mesh(X1, X2, y_nr);
set(gca, 'ZDir','reverse');
xlabel('x1'); ylabel('x2');
zlabel('y');
% zlim([-10 0])
end
I think it may have something to do with the fact that I haven't specified anything about the region of interest, but really I don't know. Thanks in advance for any help.
Alright I figured it out.
The main problem was simply an error in the plotting script. The value of y_nr should be updated and not overwritten in the loop.
Also I figured out that the second derivative should be monotonically decreasing. Here's the updated code if anybody is interested.
%% Plot different order polynomials to data with constraints
x1 = [-5;12;4;9;18;-1;-8;13;0;7;-5;-8;-6;14;-1;1;9;14;12;1;-5;9;-10;-2;9;7;-1;19;-7;12;-6;3;14;0;-8;6;-2;-7;10;4;-5;-7;-4;-6;-1;18;5;-3;3;10];
x2 = [81.25;61;73;61.75;54.5;72.25;80;56.75;78;64.25;85.25;86;80.5;61.5;79.25;76.75;60.75;54.5;62;75.75;80.25;67.75;86.5;81.5;62.75;66.25;78.25;49.25;82.75;56;84.5;71.25;58.5;77;82;70.5;81.5;80.75;64.5;68;78.25;79.75;81;82.5;79.25;49.5;64.75;77.75;70.25;64.5];
y = [-6.52857142857143;-1.04736842105263;-5.18750000000000;-3.33157894736842;-0.117894736842105;-3.58571428571429;-5.61428571428572;0;-4.47142857142857;-1.75438596491228;-7.30555555555556;-8.82222222222222;-5.50000000000000;-2.95438596491228;-5.78571428571429;-5.15714285714286;-1.22631578947368;-0.340350877192983;-0.142105263157895;-2.98571428571429;-4.35714285714286;-0.963157894736842;-9.06666666666667;-4.27142857142857;-3.43684210526316;-3.97894736842105;-6.61428571428572;0;-4.98571428571429;-0.573684210526316;-8.22500000000000;-3.01428571428571;-0.691228070175439;-6.30000000000000;-6.95714285714286;-2.57232142857143;-5.27142857142857;-7.64285714285714;-2.54035087719298;-3.45438596491228;-5.01428571428571;-7.47142857142857;-5.38571428571429;-4.84285714285714;-6.78571428571429;0;-0.973684210526316;-4.72857142857143;-2.84285714285714;-2.54035087719298];
% Used to plot the surface in all points in the grid
X1 = meshgrid(-10:1:20);
X2 = flipud(meshgrid(30:2:90).');
figure;
for i = 1:4
w_mono = monotone_surface_fit(x1, x2, y, i);
% NOTE: Should only have 1 bias term
y_nr = w_mono(1)*ones(size(X1));
for j = 1:i
y_nr = y_nr + w_mono(j*2)*X1.^j + w_mono(j*2+1)*X2.^j;
end
subplot(2,2,i);
scatter3(x1, x2, y); hold on;
axis tight;
mesh(X1, X2, y_nr);
set(gca, 'ZDir','reverse');
xlabel('x1'); ylabel('x2');
zlabel('y');
% zlim([-10 0])
end
And here's the updated function
function [w_mono, w] = monotone_surface_fit(x1, x2, y, order_fit)
% Initialize design matrix
A = zeros(length(x1), 2*order_fit+1);
% Adjusting for bias term
A(:,1) = ones(length(x1),1);
% Building design matrix
for i = 2:order_fit+1
A(:,(i-1)*2:(i-1)*2+1) = [x1.^(i-1), x2.^(i-1)];
end
% Initialize matrix containing derivative constraint.
% NOTE: Partial derivatives must be non-negative
D = zeros(2*length(y), 2*order_fit+1);
for i = 2:order_fit+1
D(:,(i-1)*2:(i-1)*2+1) = [(i-1)*x1.^(i-2), zeros(length(x2),1); ...
zeros(length(x1),1), -(i-1)*x2.^(i-2)];
end
% Limit of derivatives
b = zeros(2*length(y), 1);
% Constrained LSQ fit
options = optimoptions('lsqlin','Algorithm','active-set');
w_mono = lsqlin(A,y,-D,b,[],[],[],[],[], options);
w = lsqlin(A,y);
end
Finally a plot of the fitting (Have used a new simulation, but fit also works on given dummy data).
I have 2 arrays: one with x-coordinates, the other with y-coordinates.
Both are a normal distribution as a result of a Monte-Carlo simulation. I know how to find the sigma and mu for both array's, and get a 95% confidence interval:
[mu,sigma]=normfit(x_array);
hist(x_array);
x=norminv([0.025 0.975],mu,sigma)
However, both array's are correlated with each other. To plot the probability distribution of the combined array's, i use the multivariate normal distribution. In MATLAB this gives me:
[MuX,SigmaX]=normfit(x_array);
[MuY,SigmaY]=normfit(y_array);
mu = [MuX MuY];
Sigma=cov(x_array,y_array);
x1 = MuX-4*SigmaX:5:MuX+4*SigmaX; x2 = MuY-4*SigmaY:5:MuY+4*SigmaY;
[X1,X2] = meshgrid(x1,x2);
F = mvnpdf([X1(:) X2(:)],mu,Sigma);
F = reshape(F,length(x2),length(x1));
surf(x1,x2,F);
caxis([min(F(:))-.5*range(F(:)),max(F(:))]);
set(gca,'Ydir','reverse')
xlabel('x0-as'); ylabel('y0-as'); zlabel('Probability Density');
So far so good. Now I want to calculate the 95% probability area. I'am looking for a function as mndinv, just as norminv. However, such a function doesn't exist in MATLAB, which makes sense because there are endless possibilities... Does somebody have a tip about how to get a 95% probability area? Thanks in advance.
For the bivariate case you can add the ellispe whose area corresponds to NORMINV(95%). This ellipse is uniquely identified and for proof see the first source in the link.
% Suppose you know the distribution params, or you got them from normfit()
mu = [3, 7];
sigma = [1, 2.5
2.5 9];
% X/Y values for plotting grid
x = linspace(mu(1)-3*sqrt(sigma(1)), mu(1)+3*sqrt(sigma(1)),100);
y = linspace(mu(2)-3*sqrt(sigma(end)), mu(2)+3*sqrt(sigma(end)),100);
% Z values
[X1,X2] = meshgrid(x,y);
Z = mvnpdf([X1(:) X2(:)],mu,sigma);
Z = reshape(Z,length(y),length(x));
% Plot
h = pcolor(x,y,Z);
set(h,'LineStyle','none')
hold on
% Add level set
alpha = 0.05;
r = sqrt(-2*log(alpha));
rho = sigma(2)/sqrt(sigma(1)*sigma(end));
M = [sqrt(sigma(1)) rho*sqrt(sigma(end))
0 sqrt(sigma(end)-sigma(end)*rho^2)];
theta = 0:0.1:2*pi;
f = bsxfun(#plus, r*[cos(theta)', sin(theta)']*M, mu);
plot(f(:,1), f(:,2),'--r')
Sources
https://upload.wikimedia.org/wikipedia/commons/a/a2/Cumulative_function_n_dimensional_Gaussians_12.2013.pdf
https://en.wikipedia.org/wiki/Multivariate_normal_distribution
To get the numerical value of F where the top part lies, you should use top5=prctile(F(:),95) . This will return the value of F that limits the bottom 95% of data with the top 5%.
Then you can get just the top 5% with
Ftop=zeros(size(F));
Ftop=F>top5;
Ftop=Ftop.*F;
%// optional: Ftop(Ftop==0)=NaN;
surf(x1,x2,Ftop,'LineStyle','none');
I have three data points through which I have to fit a straight line of the form Y=m*X+C. I want the line to have pre-determined slope 'm' but the constant'C' can change to get the least error while fitting using matlab. Can someone help me out?
Just do the math:
C= mean(Y)-m*mean(X)
assuming Y is the vector containing the y coordinates, and X the x coordinates.
Reference: http://hotmath.com/hotmath_help/topics/line-of-best-fit.html
If you opt to use the Curve Fitting Toolbox the solution is as follows.
To start generate some data
m = 3;
x = (1:10).';
y = m*x + 2 + randn(size(x));
then select the model to fit and set the bounds for its coefficients
ft = fittype('poly1');
opts = fitoptions('Method', 'LinearLeastSquares');
opts.Lower = [m -Inf];
opts.Upper = [m Inf];
finally call the fitting routine
[fitresult, gof] = fit(x, y, ft, opts);
The intercept is stored in fitresult.p2.
I have been trying to fit a parabola to parts of data where y is positive. I am told, that P1(x1,y1) is the first data point, Pg(xg,yg) is the last, and that the top point is at x=(x1+xg)/2. I have written the following:
x=data(:,1);
y=data(:,2);
filter=y>0;
xp=x(filter);
yp=y(filter);
P1=[xp(1) yp(1)];
Pg=[xp(end) yp(end)];
xT=(xp(1)+xp(end))/2;
A=[1 xp(1) power(xp(1),2) %as the equation is y = a0 + x*a1 + x^2 * a2
1 xp(end) power(xp(end),2)
0 1 2*xT]; % as the top point satisfies dy/dx = a1 + 2*x*a2 = 0
b=[yg(1) yg(end) 0]'; %'
p=A\b;
x_fit=[xp(1):0.1:xp(end)];
y_fit=power(x_fit,2).*p(3)+x_fit.*p(2)+p(1);
figure
plot(xp,yp,'*')
hold on
plot(x_fit,y_fit,'r')
And then I get this parabola which is completely wrong. It doesn't fit the data at all! Can someone please tell me what's wrong with the code?
My parabola
Well, I think the primary problem is some mistake in your calculation. I think you should use three points on the parabola to obtain a system of linear equations. There is no need to calculate the derivative of your function as you do with
dy/dx = a1 + 2*x*a2 = 0
Instead of a point on the derivative you choose another point in your scatter plot, e.g. the maximum: PT = [xp_max yp_max]; and use it for your matrix A and b.
The equation dy/dx = a1 + 2*x*a2 = 0 does not fulfill the basic scheme of your system of linear equations: a0 + a1*x + a2*x^2 = y;
By the way: If you don't have to calculate your parabola necessarily in this way, you can maybe have a look at the Matlab/Octave-function polyfit() which calculates the least squares solution for your problem. This would result in a simple implementation:
p = polyfit(x, y, 2);
y2 = polyval(p, x);
figure(); plot(x, y, '*'); hold on;
plot(x, y2, 'or');
Does anyone know how to obtain a mean curve having a matrix with the correspondent x,y points from the original plot? I mean, I pretend a medium single curve.
Any code or just ideas would be very very helpful for me since I am new with matlab.
Thank you very much!
Well, one thing you can do is fit a parametric curve. Here's an example on how to do this for a figure-8 with noise on it:
function findParamFit
clc, clf, hold on
%# some sample data
noise = #(t) 0.25*rand(size(t))-0.125;
x = #(t) cos(t) + noise(t);
y = #(t) sin(2*t) + noise(t);
t = linspace(-100*rand, +100*rand, 1e4);
%# initial data
plot(x(t), y(t), 'b.')
%# find fits
options = optimset(...
'tolfun', 1e-12,...
'tolx', 1e-12);
a = lsqcurvefit(#myFun_x, [1 1], t, x(t), -10,10, options);
b = lsqcurvefit(#myFun_y, [1 2], t, y(t), -10,10, options);
%# fitted curve
xx = myFun_x(a,t);
yy = myFun_y(b,t);
plot(xx, yy, 'r.')
end
function F = myFun_x(a, tt)
F = a(1)*cos(a(2)*tt);
end
function F = myFun_y(b, tt)
F = b(1)*sin(b(2)*tt);
end
Note that this is a particularly bad way to fit parametric curves, as is apparent here by the extreme sensitivity of the solution to the quality of the initial values to lsqcurvefit. Nevertheless, fitting a parametric curve will be the way to go.
There's your google query :)