Least squares fit, unknown intercerpt - matlab

I have three data points through which I have to fit a straight line of the form Y=m*X+C. I want the line to have pre-determined slope 'm' but the constant'C' can change to get the least error while fitting using matlab. Can someone help me out?

Just do the math:
C= mean(Y)-m*mean(X)
assuming Y is the vector containing the y coordinates, and X the x coordinates.
Reference: http://hotmath.com/hotmath_help/topics/line-of-best-fit.html

If you opt to use the Curve Fitting Toolbox the solution is as follows.
To start generate some data
m = 3;
x = (1:10).';
y = m*x + 2 + randn(size(x));
then select the model to fit and set the bounds for its coefficients
ft = fittype('poly1');
opts = fitoptions('Method', 'LinearLeastSquares');
opts.Lower = [m -Inf];
opts.Upper = [m Inf];
finally call the fitting routine
[fitresult, gof] = fit(x, y, ft, opts);
The intercept is stored in fitresult.p2.

Related

Non-rectangular meshgrids in MATLAB

I want to create a non-rectangular meshgrid in matlab.
Basically I have a polygon shaped feasible set I need to make a grid of in order to interpolate 3D data points in this set. The function for interpolation is given and requires finite (x, y, z) inputs. Where x is nx1, y is 1xm and z is nxm. Right now I have the mesh set up with linspace and set all NaN (infeasible) values to 0 before using my function, which is wrong of course (third figure).
Is there a simple solution for this?
I added a picture illustrating what I'm currently doing: First plot is the feasible set, second plot are solved sample data points in this set and third plot is the interpolation (currently still with rectangular meshgrid and NaN = 0). What I need is a meshgrid looking like the first figure (red polygon) instead of a rectangular one. In the third plot you can see that the rectangular meshgrid in combination with setting NaN to 0 (=infeasible values, not included in the red polygon set) results in a wrong interpolation along the edges, because it includes infeasible regions.
Here is my code using a rectangular meshgrid:
figure (2) %sample data
plot3(X0(1,:), X0(2,:), U, 'x')
%X0(1,:) and X0(2,:) are vectors corresponding to the Z-Values (blue sample data)
%X0 and U are in the feasible set (red polygon)
xv = linspace(xLb(1), xUb(1), 100);
yv = linspace(xLb(2), xUb(2), 100); %xLb and xUb are upper and lower bounds for the rectangle mesh
[x1,x2] = meshgrid(xv, yv);
Z = griddata(X0(1,:), X0(2,:), U, x1, x2);
%This grid obviously includes values that are not in the feasible set (red polygon) by its rectangular nature
Z(isnan(Z))=0; %set infeasible values to 0, wrong of course
testMPC = someInterpolationFunction([0:length(Z)-1]',[0:length(Z)-1],Z);
testMPC.showInterpolation(20,20)
%this shows figure 3 in the attached picture
Try something like this:
nRows = 100;
nCols = 200;
x1 = #(x) max(0, x-50);
x2 = #(x) min(nCols, nCols - 50 + x);
RR = zeros(nRows, nCols);
CC = zeros(nRows, nCols);
for iRow = 1:nRows
c1 = x1(iRow);
c2 = x2(iRow);
colVec = linspace(c1, c2, nCols);
RR(iRow, :) = iRow;
CC(iRow, :) = colVec;
end
mesh(RR, CC, zeros(size(RR)))
You'd need to redefine the functions for x1 and x2 or course as well as the scaling, but this should give you an idea of how to get started.

Multivariate Normal Distribution Matlab, probability area

I have 2 arrays: one with x-coordinates, the other with y-coordinates.
Both are a normal distribution as a result of a Monte-Carlo simulation. I know how to find the sigma and mu for both array's, and get a 95% confidence interval:
[mu,sigma]=normfit(x_array);
hist(x_array);
x=norminv([0.025 0.975],mu,sigma)
However, both array's are correlated with each other. To plot the probability distribution of the combined array's, i use the multivariate normal distribution. In MATLAB this gives me:
[MuX,SigmaX]=normfit(x_array);
[MuY,SigmaY]=normfit(y_array);
mu = [MuX MuY];
Sigma=cov(x_array,y_array);
x1 = MuX-4*SigmaX:5:MuX+4*SigmaX; x2 = MuY-4*SigmaY:5:MuY+4*SigmaY;
[X1,X2] = meshgrid(x1,x2);
F = mvnpdf([X1(:) X2(:)],mu,Sigma);
F = reshape(F,length(x2),length(x1));
surf(x1,x2,F);
caxis([min(F(:))-.5*range(F(:)),max(F(:))]);
set(gca,'Ydir','reverse')
xlabel('x0-as'); ylabel('y0-as'); zlabel('Probability Density');
So far so good. Now I want to calculate the 95% probability area. I'am looking for a function as mndinv, just as norminv. However, such a function doesn't exist in MATLAB, which makes sense because there are endless possibilities... Does somebody have a tip about how to get a 95% probability area? Thanks in advance.
For the bivariate case you can add the ellispe whose area corresponds to NORMINV(95%). This ellipse is uniquely identified and for proof see the first source in the link.
% Suppose you know the distribution params, or you got them from normfit()
mu = [3, 7];
sigma = [1, 2.5
2.5 9];
% X/Y values for plotting grid
x = linspace(mu(1)-3*sqrt(sigma(1)), mu(1)+3*sqrt(sigma(1)),100);
y = linspace(mu(2)-3*sqrt(sigma(end)), mu(2)+3*sqrt(sigma(end)),100);
% Z values
[X1,X2] = meshgrid(x,y);
Z = mvnpdf([X1(:) X2(:)],mu,sigma);
Z = reshape(Z,length(y),length(x));
% Plot
h = pcolor(x,y,Z);
set(h,'LineStyle','none')
hold on
% Add level set
alpha = 0.05;
r = sqrt(-2*log(alpha));
rho = sigma(2)/sqrt(sigma(1)*sigma(end));
M = [sqrt(sigma(1)) rho*sqrt(sigma(end))
0 sqrt(sigma(end)-sigma(end)*rho^2)];
theta = 0:0.1:2*pi;
f = bsxfun(#plus, r*[cos(theta)', sin(theta)']*M, mu);
plot(f(:,1), f(:,2),'--r')
Sources
https://upload.wikimedia.org/wikipedia/commons/a/a2/Cumulative_function_n_dimensional_Gaussians_12.2013.pdf
https://en.wikipedia.org/wiki/Multivariate_normal_distribution
To get the numerical value of F where the top part lies, you should use top5=prctile(F(:),95) . This will return the value of F that limits the bottom 95% of data with the top 5%.
Then you can get just the top 5% with
Ftop=zeros(size(F));
Ftop=F>top5;
Ftop=Ftop.*F;
%// optional: Ftop(Ftop==0)=NaN;
surf(x1,x2,Ftop,'LineStyle','none');

How to fit polynomial into some error bars data

I need to fit data e.g. x, y, CI (where CI is confidence index of y) in Matlab.
Now, I use this code:
pf = polyfit(x, y, 2);
x1 = min(x):.1:max(x);
y1 = polyval(pf, x1);
figure
hold on
errorbar(x, y, CI, 'ko');
plot(x1, y1, 'k');
hold off
Of course, the fit comes out of some errors bars, and it's correct.
I would like obtain a fit curve closer to the points with a low confidence index, and discard the points with a high confidence index.
Thank you and bye,
Giacomo
What you are looking for are Weighted Least Squares. You can compute them with the function lscov. There is a nice example in its help page, but I'll try to make it clearer.
Let us construct a simple parabola, with a corrupted point
x = (0:0.1:1)';
y = 0.5*x.^2;
y(5) = 3*y(5);
and give some weights
w = ones(size(y));
w(5) = 0.1;
Next build the Vandermonde matrix (see here for the code) and solve the system
%// V = [x.^2 x ones(size(x))];
V = bsxfun(#power, x, 2:-1:0);
coeff = lscov(V, y, w);
The estimated coefficients, with and without the weights, are
x^2 x 1
with weights [0.4797 0.0186 -0.0004]
no weights [0.3322 0.1533 -0.0034]
Note that in your case w will have to be inverted.
If you don't like to build the Vandermonde matrix, and you have a license for the Curve Fitting Toolbox, you can use the following code
ft = fittype('poly2');
opts = fitoptions('Method', 'LinearLeastSquares');
opts.Weights = w;
fitresult = fit(x, y, ft, opts);
and you'll obtain the same result.

Parabola plot with data in Octave

I have been trying to fit a parabola to parts of data where y is positive. I am told, that P1(x1,y1) is the first data point, Pg(xg,yg) is the last, and that the top point is at x=(x1+xg)/2. I have written the following:
x=data(:,1);
y=data(:,2);
filter=y>0;
xp=x(filter);
yp=y(filter);
P1=[xp(1) yp(1)];
Pg=[xp(end) yp(end)];
xT=(xp(1)+xp(end))/2;
A=[1 xp(1) power(xp(1),2) %as the equation is y = a0 + x*a1 + x^2 * a2
1 xp(end) power(xp(end),2)
0 1 2*xT]; % as the top point satisfies dy/dx = a1 + 2*x*a2 = 0
b=[yg(1) yg(end) 0]'; %'
p=A\b;
x_fit=[xp(1):0.1:xp(end)];
y_fit=power(x_fit,2).*p(3)+x_fit.*p(2)+p(1);
figure
plot(xp,yp,'*')
hold on
plot(x_fit,y_fit,'r')
And then I get this parabola which is completely wrong. It doesn't fit the data at all! Can someone please tell me what's wrong with the code?
My parabola
Well, I think the primary problem is some mistake in your calculation. I think you should use three points on the parabola to obtain a system of linear equations. There is no need to calculate the derivative of your function as you do with
dy/dx = a1 + 2*x*a2 = 0
Instead of a point on the derivative you choose another point in your scatter plot, e.g. the maximum: PT = [xp_max yp_max]; and use it for your matrix A and b.
The equation dy/dx = a1 + 2*x*a2 = 0 does not fulfill the basic scheme of your system of linear equations: a0 + a1*x + a2*x^2 = y;
By the way: If you don't have to calculate your parabola necessarily in this way, you can maybe have a look at the Matlab/Octave-function polyfit() which calculates the least squares solution for your problem. This would result in a simple implementation:
p = polyfit(x, y, 2);
y2 = polyval(p, x);
figure(); plot(x, y, '*'); hold on;
plot(x, y2, 'or');

matlab determine curve

Does anyone know how to obtain a mean curve having a matrix with the correspondent x,y points from the original plot? I mean, I pretend a medium single curve.
Any code or just ideas would be very very helpful for me since I am new with matlab.
Thank you very much!
Well, one thing you can do is fit a parametric curve. Here's an example on how to do this for a figure-8 with noise on it:
function findParamFit
clc, clf, hold on
%# some sample data
noise = #(t) 0.25*rand(size(t))-0.125;
x = #(t) cos(t) + noise(t);
y = #(t) sin(2*t) + noise(t);
t = linspace(-100*rand, +100*rand, 1e4);
%# initial data
plot(x(t), y(t), 'b.')
%# find fits
options = optimset(...
'tolfun', 1e-12,...
'tolx', 1e-12);
a = lsqcurvefit(#myFun_x, [1 1], t, x(t), -10,10, options);
b = lsqcurvefit(#myFun_y, [1 2], t, y(t), -10,10, options);
%# fitted curve
xx = myFun_x(a,t);
yy = myFun_y(b,t);
plot(xx, yy, 'r.')
end
function F = myFun_x(a, tt)
F = a(1)*cos(a(2)*tt);
end
function F = myFun_y(b, tt)
F = b(1)*sin(b(2)*tt);
end
Note that this is a particularly bad way to fit parametric curves, as is apparent here by the extreme sensitivity of the solution to the quality of the initial values to lsqcurvefit. Nevertheless, fitting a parametric curve will be the way to go.
There's your google query :)