Confidence intervals for linear curve fit under constraints in MATLAB - matlab

I have fitted a straight line to a dataset with 68 samples, under the constraint that the line passes through (x0,y0) using the function lsqlin in MATLAB. How can I find the confidence intervals for this?
My code (Source):
I import the dataset containing x and y vectors from a mat file, which also contains the values of constraints x0 and y0.
n = 1; % Degree of polynomial to fit
V(:,n+1) = ones(length(x),1,class(x)); %V=Vandermonde matrix for 'x'
for j = n:-1:1
V(:,j) = x.*V(:,j+1);
end
d = y; % 'd' is the vector of target values, 'y'.
% There are no inequality constraints in this case, i.e.,
A = [];b = [];
% We use linear equality constraints to force the curve to hit the required point. In
% this case, 'Aeq' is the Vandermoonde matrix for 'x0'
Aeq = x0.^(n:-1:0);
% and 'beq' is the value the curve should take at that point
beq = y0;
%%
[p, resnorm, residual, exitflag, output, lambda] = lsqlin(V, d, A, b, Aeq, beq);
%%
% We can then use POLYVAL to evaluate the fitted curve
yhat = polyval( p, x );

The function bootci can be used to find confidence intervals when using lsqlin. Here's how it can be used:
ci=bootci(68,{#(x,y)func(x,y),x,y},'type','student');
The first argument is the number of data points, or the length of the vector x.
The function in the second argument is basically supposed to compute any statistic for which you need to find the confidence intervals. In this case, this statistic is the coefficients of our fitted line. Hence, the function func(x,y) here should return the regression coefficients returned by lsqnonlin. The inputs to this function are the dataset vectors x and y.
The third and fourth argument lets you specify the distribution of your dataset. You can get an idea of this by plotting a histogram of the residuals like this:
histogram(residuals);

Related

Interpolation using chebyshev points

Interpolate the Runge function of Example 10.6 at Chebyshev points for n from 10 to 170
in increments of 10. Calculate the maximum interpolation error on the uniform evaluation
mesh x = -1:.001:1 and plot the error vs. polynomial degree as in Figure 10.8 using
semilogy. Observe spectral accuracy.
The runge function is given by: f(x) = 1 / (1 + 25x^2)
My code so far:
x = -1:0.001:1;
n = 170;
i = 10:10:170;
cx = cos(((2*i + 1)/(2*(n+1)))*pi); %chebyshev pts
y = 1 ./ (1 + 25*x.^2); %true fct
%chebyshev polynomial, don't know how to construct using matlab
yc = polyval(c, x); %graph of approx polynomial fct
plot(x, yc);
mErr = (1 / ((2.^n).*(n+1)!))*%n+1 derivative of f evaluated at max x in [-1,1], not sure how to do this
%plotting stuff
I know very little matlab, so I am struggling on creating the interpolating polynomial. I did some google work, but I was confused with the current functions as I didn't find one that just simply took in points and the polynomial to be interpolated. I am also a bit confused in this case of whether I should be doing i = 0:1:n and n=10:10:170 or if n is fixed here. Any help is appreciated, thank you
Since you know very little about MATLAB, I will try explain everything step by step:
First, to visualize the Runge function, you can type:
f = #(x) 1./(1+25*x.^2); % Runge function
% plot Runge function over [-1,1];
x = -1:1e-3:1;
y = f(x);
figure;
plot(x,y); title('Runge function)'); xlabel('x');ylabel('y');
The #(x) part of the code is a function handle, a very useful feature of MATLAB. Notice the function is properly vecotrized, so it can receive as an argument a variable or an array. The plot function is straightforward.
To understand the Runge phenomenon, consider a linearly spaced vector of [-1,1] of 10 elements and use these points to obtain the interpolating (Lagrange) polynomial. You get the following:
% 10 linearly spaced points
xc = linspace(-1,1,10);
yc = f(xc);
p = polyfit(xc,yc,9); % gives the coefficients of the polynomial of degree 10
hold on; plot(xc,yc,'o',x,polyval(p,x));
The polyfit function does a polynomial curve fitting - it obtains the coefficients of the interpolating polynomial, given the poins x,y and the degree of the polynomial n. You can easily evaluate the polynomial at other points with the polyval function.
Obseve that, close to the end domains, you get an oscilatting polynomial and the interpolation is not a good approximation of the function. As a matter of fact, you can plot the absolute error, comparing the value of the function f(x) and the interpolating polynomial p(x):
plot(x,abs(y-polyval(p,x))); xlabel('x');ylabel('|f(x)-p(x)|');title('Error');
This error can be reduced if, instead of using a linearly space vector, you use other points to do the interpolation. A good choice is to use the Chebyshev nodes, which should reduce the error. As a matter of fact, notice that:
% find 10 Chebyshev nodes and mark them on the plot
n = 10;
k = 1:10; % iterator
xc = cos((2*k-1)/2/n*pi); % Chebyshev nodes
yc = f(xc); % function evaluated at Chebyshev nodes
hold on;
plot(xc,yc,'o')
% find polynomial to interpolate data using the Chebyshev nodes
p = polyfit(xc,yc,n-1); % gives the coefficients of the polynomial of degree 10
plot(x,polyval(p,x),'--'); % plot polynomial
legend('Runge function','Chebyshev nodes','interpolating polynomial','location','best')
Notice how the error is reduced close to the end domains. You don't get now that high oscillatory behaviour of the interpolating polynomial. If you plot the error, you will observe:
plot(x,abs(y-polyval(p,x))); xlabel('x');ylabel('|f(x)-p(x)|');title('Error');
If, now, you change the number of Chebyshev nodes, you will get an even better approximation. A little modification on the code lets you run it again for different numbers of nodes. You can store the maximum error and plot it as a function of the number of nodes:
n=1:20; % number of nodes
% pre-allocation for speed
e_ln = zeros(1,length(n)); % error for the linearly spaced interpolation
e_cn = zeros(1,length(n)); % error for the chebyshev nodes interpolation
for ii=1:length(n)
% linearly spaced vector
x_ln = linspace(-1,1,n(ii)); y_ln = f(x_ln);
p_ln = polyfit(x_ln,y_ln,n(ii)-1);
e_ln(ii) = max( abs( y-polyval(p_ln,x) ) );
% Chebyshev nodes
k = 1:n(ii); x_cn = cos((2*k-1)/2/n(ii)*pi); y_cn = f(x_cn);
p_cn = polyfit(x_cn,y_cn,n(ii)-1);
e_cn(ii) = max( abs( y-polyval(p_cn,x) ) );
end
figure
plot(n,e_ln,n,e_cn);
xlabel('no of points'); ylabel('maximum absolute error');
legend('linearly space','chebyshev nodes','location','best')

Curvefitting for power law function y= a(x^b)+c

I am new to MATLAB, and I am trying to fit a power law through a dataset. I have been trying to use isqcurvefit function, but I am unsure how to proceed as the instructions found through Google are too convoluted for a beginner. I would like to derive the values b and c from the equation y = a(x^b)+c, and any suggestions would be greatly appreciated. Thanks.
You can use lsqcurvefit to fit a non linear curve through measured data points in least-square sense as follows:
% constant parameters
a = 1; % set the value of a
% initial guesses for fitted parameters
b_guess = 1; % provide an initial guess for b
c_guess = 0; % provide an initial guess for c
% Definition of the fitted function
f = #(x, xdata) a*(xdata.^x(1))+x(2);
% generate example data for the x and y data to fit (this should be replaced with your real measured data)
xdata = 1:10;
ydata = f([2 3], xdata); % create data with b=2 and c=3
% fit the data with the desired function
x = lsqcurvefit(f,[b_guess c_guess],xdata,ydata);
%result of the fit, i.e. the fitted parameters
b = x(1)
c = x(2)

Matlab's hist3, which axis corresponds to X and which one to Y

So suppose you pass some matrix N to hist3 in Matlab, which is a m-by-2 matrix, simply for an example purposes. Where the first column is your variable X and column 2 corresponds to your variable Y.
When you run the cnt = hist3(N, {bins_X bins_Y}), you would get a m-by-m matrix. Rows here are which variable, X or Y?
OP seems to have solved his problem. However, I am leaving a code snippet exemplifying hist3's output indexing in case anyone finds it useful.
% Simulate random 2-column matrix
X = randn(1e5,2);
% Scale x-axis data to see label distinction
X(:,1) = X(:,1)*10;
% Define bins
bin_x = linspace(-30,30,80);
bin_y = linspace(-3,3,100);
% Get frequency grid
cnt = hist3(X,{bin_x,bin_y});
% Plot frequency values with surf
[x,y] = meshgrid(bin_x,bin_y);
figure
surf(x,y,cnt')
title('Original hist3 output')
xlabel('First Column')
ylabel('Second Column')
zlabel('Frequency')
% Access and modify cnt, and plot again
cnt(end,1:10) = 60;
cnt(25:55,1:55)= 0;
figure
surf(x,y,cnt')
title('Modified hist3 output')
xlabel('First Column')
ylabel('Second Column')
zlabel('Frequency')

3D points linear regression Matlab

I have a set of 3D points (x,y,z) and I would like to fit a straight line using Least absolute deviation method to those data.
I found a function from the internet which works pretty well with 2D data, how could I modify this to adapt 3D data points?
function B = L1LinearRegression(X,Y)
% Determine size of predictor data
[n m] = size(X);
% Initialize with least-squares fit
B = [ones(n,1) X] \ Y;
% Least squares regression
BOld = B;
BOld(1) = BOld(1) + 1e-5;
% Force divergence
% Repeat until convergence
while (max(abs(B - BOld)) > 1e-6) % Move old coefficients
BOld = B; % Calculate new observation weights (based on residuals from old coefficients)
W = sqrt(1 ./ max(abs((BOld(1) + (X * BOld(2:end))) - Y),1e-6)); % Floor to avoid division by zero
% Calculate new coefficients
B = (repmat(W,[1 m+1]) .* [ones(n,1) X]) \ (W .* Y);
end
Thank you very much!
I know that this is not answer to the question but rather to different problem leading to the question.
We can use fit function several times.
% XYZ=[x(:),y(:),z(:)]; % suppose we have data in this format
M=size(XYZ,1); % read size of our data
t=((0:M-1)/(M-1))'; % create arbitrary parameter t
% fit all coordinates as function x_i=a_i*t+b_i
fitX=fit(t,XYZ(:,1),'poly1');
fitY=fit(t,XYZ(:,2),'poly1');
fitZ=fit(t,XYZ(:,3),'poly1');
temp=[0;1]; % define the interval where the line shall be plotted
%Evaluate and plot the line coordinates
Line=[feval(fitX(temp)),feval(fitY(temp)),feval(fitZ(temp))];
plot(Line)
The advantage is that this work for any cloud, even if it is parallel to any axis. another advantage is that you are not limitted only to polynomes of 1st order, you can choose any function for different axis and fit any 3D curve.

Numerical integration over non-uniform grid in matlab. Is there any function?

I've got function values in a vector f and also the vector containing values of the argument x. I need to find the define integral value of f. But the argument vector x is not uniform. Is there any function in Matlab that deals with integration over non-uniform grids?
Taken from help :
Z = trapz(X,Y) computes the integral of Y with respect to X using
the trapezoidal method. X and Y must be vectors of the same
length, or X must be a column vector and Y an array whose first
non-singleton dimension is length(X). trapz operates along this
dimension.
As you can see x does not have to be uniform.
For instance:
x = sort(rand(100,1)); %# Create random values of x in [0,1]
y = x;
trapz( x, y)
Returns:
ans =
0.4990
Another example:
x = sort(rand(100,1)); %# Create random values of x in [0,1]
y = x.^2;
trapz( x, y)
returns:
ans =
0.3030
Depending on your function (and how x is distributed), you might get more accuracy by doing a spline interpolation through your data first:
pp = spline(x,y);
quadgk(#(t) ppval(pp,t), [range])
That's the quick-n-dirty way. Ther is a faster and more direct approach, but that is fugly and much less transparent:
result = sum(sum(...
bsxfun(#times, pp.coefs, 1./(4:-1:1)) .*... % coefficients of primitive
bsxfun(#power, diff(pp.breaks).', 4:-1:1)... % all 4 powers of shifted x-values
));
As an example why all this could be useful, I borrow the example from here. The exact answer should be
>> pi/2/sqrt(2)*(17-40^(3/4))
ans =
1.215778726893561e+00
Defining
>> x = [0 sort(3*rand(1,5)) 3];
>> y = (x.^3.*(3-x)).^(1/4)./(5-x);
we find
>> trapz(x,y)
ans =
1.142392438652055e+00
>> pp = spline(x,y);
>> tic; quadgk(#(t) ppval(pp,t), 0, 3), toc
ans =
1.213866446458034e+00
Elapsed time is 0.017472 seconds.
>> tic; result = sum(sum(...
bsxfun(#times, pp.coefs, 1./(4:-1:1)) .*... % coefficients of primitive
bsxfun(#power, diff(pp.breaks).', 4:-1:1)... % all 4 powers of shifted x-values
)), toc
result =
1.213866467945575e+00
Elapsed time is 0.002887 seconds.
So trapz underestimates the value by more than 0.07. With the latter two methods, the error is an order of magnitude less. Also, the less-readable version of the spline approach is an order of magnitude faster.
So, armed with this knowledge: choose wisely :)
You can do Gaussian quadrature over each piecewise pair of x and sum them up to get the complete integral.