MATLAB - How to calculate 2D least squares regression based on both x and y. (regression surface) - matlab

I have a set of data with independent variable x and y. Now I'm trying to build a two dimensional regression model that has a regression surface cutting through my data points. However, I couldn't find a way to achieve this. Can anyone give me some assistance?

You could use my favorite, polyfitn for linear or polynomial models. If you would like a different model, please edit your question or add a comment. HTH!
EDIT
Also, take a look here under Multiple Regression, likely can help you as well.
EDIT AGAIN
Sorry, I'm having too much fun with this, here's an example of multivariate regression using least squares with stock Matlab:
t = (1:10)';
x = t;
y = exp(-t);
A = [ y x ];
z = 10*y + 0.5*x;
A\z
ans =
10.0000
0.5000

If you are performing linear regression, the best tool is the regress function. Note that, if you are fitting a model of the form y(x1,x2) = b1.f(x1) + b2.g(x2) + b3 this is still a linear regression, as long as you know the functions f and g.
Nsamp = 100; %number of samples
X1 = randn(Nsamp,1); %regressor 1 (could also be some computed f(x1) )
X2 = randn(Nsamp,1); %regressor 2 (could also be some computed g(x2) )
Y = X1 + X2 + randn(Nsamp,1); %generate some data to be regressed
%now run the regression
[b,bint,r,rint,stats] = regress(Y,[X1 X2 ones(Nsamp,1)]);
% 'b' contains the coefficients, b1,b2,b3 of the fit; can be used to plot regression surface)
% 'r' contains residuals of the fit
% 'stats' contains the overall regression R^2, F stat, p-value and error variance

Related

Matlab : Convolution and deconvolution results weird

Data x is input to an autoregreesive model (AR) model. The output of the AR model is corrupted with Additive White Gaussian Noise at SNR = 30 dB. The observations are denoted by noisy_y.
Let there be close estimates h_hat of the AR model (these are obtained from Least Squares estimation). I want to see how close the input obtained from deconvolution with h_hat and the measurements is to the known x.
My confusion is which variable to use for deconvolution -- clean y or noisy y?
Upon deconvolution, I should get x_hat. I am not sure if the correct way to perform deconvolution is using the noisy_y or using the y before adding noise. I have used the following code.
Can somebody please help in what is the correct method to plot x and x_hat.
Below is the plot of x vs x_hat. As can be seen, that these do not match. Where is my understand wrong? Please help.
The code is:
clear all
N = 200; %number of data points
a1=0.1650;
b1=-0.850;
h = [1 a1 b1]; %true coefficients
x = rand(1,N);
%%AR model
y = filter(1,h,x); %transmitted signal through AR channel
noisy_y = awgn(y,30,'measured');
hat_h= [1 0.133 0.653];
x_hat = filter(hat_h,1,noisy_y); %deconvolution
plot(1:50,x(1:50),'b');
hold on;
plot(1:50,x_hat(1:50),'-.rd');
A first issue is that the coefficients h of your AR model correspond to an unstable system since one of its poles is located outside the unit circle:
>> abs(roots(h))
ans =
1.00814
0.84314
Parameter estimation techniques are then quite likely to fail to converge given a diverging input sequence. Indeed, looking at the stated hat_h = [1 0.133 0.653] it is pretty clear that the parameter estimation did not converge anywhere near the actual coefficients. In your specific case you did not provide the code illustrating how you obtained hat_h (other than specifying that it was "obtained from Least Squares estimation"), so it isn't possible to further comment on what went wrong with your estimation.
That said, the standard formulation of Least Mean Squares (LMS) filters is given for an MA model. A common method for AR parameter estimation is to solve the Yule-Walker equations:
hat_h = aryule(noisy_y - mean(noisy_y), length(h)-1);
If we were to use this estimation method with the stable system defined by:
h = [1 -a1 -b1];
x = rand(1,N);
%%AR model
y = filter(1,h,x); %transmitted signal through AR channel
noisy_y = awgn(y,30,'measured');
hat_h = aryule(noisy_y - mean(noisy_y), length(h)-1);
x_hat = filter(hat_h,1,noisy_y); %deconvolution
The plot of x and x_hat would look like:

find the line which best fits to the data

I'm trying to find the line which best fits to the data. I use the following code below but now I want to have the data placed into an array sorted so it has the data which is closest to the line first how can I do this? Also is polyfit the correct function to use for this?
x=[1,2,2.5,4,5];
y=[1,-1,-.9,-2,1.5];
n=1;
p = polyfit(x,y,n)
f = polyval(p,x);
plot(x,y,'o',x,f,'-')
PS: I'm using Octave 4.0 which is similar to Matlab
You can first compute the error between the real value y and the predicted value f
err = abs(y-f);
Then sort the error vector
[val, idx] = sort(err);
And use the sorted indexes to have your y values sorted
y2 = y(idx);
Now y2 has the same values as y but the ones closer to the fitting value first.
Do the same for x to compute x2 so you have a correspondence between x2 and y2
x2 = x(idx);
Sembei Norimaki did a good job of explaining your primary question, so I will look at your secondary question = is polyfit the right function?
The best fit line is defined as the line that has a mean error of zero.
If it must be a "line" we could use polyfit, which will fit a polynomial. Of course, a "line" can be defined as first degree polynomial, but first degree polynomials have some properties that make it easy to deal with. The first order polynomial (or linear) equation you are looking for should come in this form:
y = mx + b
where y is your dependent variable and X is your independent variable. So the challenge is this: find the m and b such that the modeled y is as close to the actual y as possible. As it turns out, the error associated with a linear fit is convex, meaning it has one minimum value. In order to calculate this minimum value, it is simplest to combine the bias and the x vectors as follows:
Xcombined = [x.' ones(length(x),1)];
then utilized the normal equation, derived from the minimization of error
beta = inv(Xcombined.'*Xcombined)*(Xcombined.')*(y.')
great, now our line is defined as Y = Xcombined*beta. to draw a line, simply sample from some range of x and add the b term
Xplot = [[0:.1:5].' ones(length([0:.1:5].'),1)];
Yplot = Xplot*beta;
plot(Xplot, Yplot);
So why does polyfit work so poorly? well, I cant say for sure, but my hypothesis is that you need to transpose your x and y matrixies. I would guess that that would give you a much more reasonable line.
x = x.';
y = y.';
then try
p = polyfit(x,y,n)
I hope this helps. A wise man once told me (and as I learn every day), don't trust an algorithm you do not understand!
Here's some test code that may help someone else dealing with linear regression and least squares
%https://youtu.be/m8FDX1nALSE matlab code
%https://youtu.be/1C3olrs1CUw good video to work out by hand if you want to test
function [a0 a1] = rtlinreg(x,y)
x=x(:);
y=y(:);
n=length(x);
a1 = (n*sum(x.*y) - sum(x)*sum(y))/(n*sum(x.^2) - (sum(x))^2); %a1 this is the slope of linear model
a0 = mean(y) - a1*mean(x); %a0 is the y-intercept
end
x=[65,65,62,67,69,65,61,67]'
y=[105,125,110,120,140,135,95,130]'
[a0 a1] = rtlinreg(x,y); %a1 is the slope of linear model, a0 is the y-intercept
x_model =min(x):.001:max(x);
y_model = a0 + a1.*x_model; %y=-186.47 +4.70x
plot(x,y,'x',x_model,y_model)

Gaussian iterative curve fitting [duplicate]

I have a set of frequency data with peaks to which I need to fit a Gaussian curve and then get the full width half maximum from. The FWHM part I can do, I already have a code for that but I'm having trouble writing code to fit the Gaussian.
Does anyone know of any functions that'll do this for me or would be able to point me in the right direction? (I can do least squares fitting for lines and polynomials but I can't get it to work for gaussians)
Also it would be helpful if it was compatible with both Octave and Matlab as I have Octave at the moment but don't get access to Matlab until next week.
Any help would be greatly appreciated!
Fitting a single 1D Gaussian directly is a non-linear fitting problem. You'll find ready-made implementations here, or here, or here for 2D, or here (if you have the statistics toolbox) (have you heard of Google? :)
Anyway, there might be a simpler solution. If you know for sure your data y will be well-described by a Gaussian, and is reasonably well-distributed over your entire x-range, you can linearize the problem (these are equations, not statements):
y = 1/(σ·√(2π)) · exp( -½ ( (x-μ)/σ )² )
ln y = ln( 1/(σ·√(2π)) ) - ½ ( (x-μ)/σ )²
= Px² + Qx + R
where the substitutions
P = -1/(2σ²)
Q = +2μ/(2σ²)
R = ln( 1/(σ·√(2π)) ) - ½(μ/σ)²
have been made. Now, solve for the linear system Ax=b with (these are Matlab statements):
% design matrix for least squares fit
xdata = xdata(:);
A = [xdata.^2, xdata, ones(size(xdata))];
% log of your data
b = log(y(:));
% least-squares solution for x
x = A\b;
The vector x you found this way will equal
x == [P Q R]
which you then have to reverse-engineer to find the mean μ and the standard-deviation σ:
mu = -x(2)/x(1)/2;
sigma = sqrt( -1/2/x(1) );
Which you can cross-check with x(3) == R (there should only be small differences).
Perhaps this has the thing you are looking for? Not sure about compatability:
http://www.mathworks.com/matlabcentral/fileexchange/11733-gaussian-curve-fit
From its documentation:
[sigma,mu,A]=mygaussfit(x,y)
[sigma,mu,A]=mygaussfit(x,y,h)
this function is doing fit to the function
y=A * exp( -(x-mu)^2 / (2*sigma^2) )
the fitting is been done by a polyfit
the lan of the data.
h is the threshold which is the fraction
from the maximum y height that the data
is been taken from.
h should be a number between 0-1.
if h have not been taken it is set to be 0.2
as default.
i had similar problem.
this was the first result on google, and some of the scripts linked here made my matlab crash.
finally i found here that matlab has built in fit function, that can fit Gaussians too.
it look like that:
>> v=-30:30;
>> fit(v', exp(-v.^2)', 'gauss1')
ans =
General model Gauss1:
ans(x) = a1*exp(-((x-b1)/c1)^2)
Coefficients (with 95% confidence bounds):
a1 = 1 (1, 1)
b1 = -8.489e-17 (-3.638e-12, 3.638e-12)
c1 = 1 (1, 1)
I found that the MATLAB "fit" function was slow, and used "lsqcurvefit" with an inline Gaussian function. This is for fitting a Gaussian FUNCTION, if you just want to fit data to a Normal distribution, use "normfit."
Check it
% % Generate synthetic data (for example) % % %
nPoints = 200; binSize = 1/nPoints ;
fauxMean = 47 ;fauxStd = 8;
faux = fauxStd.*randn(1,nPoints) + fauxMean; % REPLACE WITH YOUR ACTUAL DATA
xaxis = 1:length(faux) ;fauxData = histc(faux,xaxis);
yourData = fauxData; % replace with your actual distribution
xAxis = 1:length(yourData) ;
gausFun = #(hms,x) hms(1) .* exp (-(x-hms(2)).^2 ./ (2*hms(3)^2)) ; % Gaussian FUNCTION
% % Provide estimates for initial conditions (for lsqcurvefit) % %
height_est = max(fauxData)*rand ; mean_est = fauxMean*rand; std_est=fauxStd*rand;
x0 = [height_est;mean_est; std_est]; % parameters need to be in a single variable
options=optimset('Display','off'); % avoid pesky messages from lsqcurvefit (optional)
[params]=lsqcurvefit(gausFun,x0,xAxis,yourData,[],[],options); % meat and potatoes
lsq_mean = params(2); lsq_std = params(3) ; % what you want
% % % Plot data with fit % % %
myFit = gausFun(params,xAxis);
figure;hold on;plot(xAxis,yourData./sum(yourData),'k');
plot(xAxis,myFit./sum(myFit),'r','linewidth',3) % normalization optional
xlabel('Value');ylabel('Probability');legend('Data','Fit')

MATLAB Fit a line to a histogram

I am just wondering, how would I go about fitting a line to histogram, using the z-counts as weights? An example of this is shown below (although this post just discusses overlaying multiple plots), taken from Scatter plot with density in Matlab).
My initial thought is to make an array consisting of each pixel from the density plot, repeated n times to make a scatter plot (n == the number of counts), then do a linear polyfit. This seems awfully redundant though.
The other approach is to do a weighted least squares solution. You need the (x,y) location of each pixel and the number of counts n within each pixel. Then, I think that you'd do the weighted least-squares this way:
%gather your known data...have x,y, and n all in the same order as each other
A = [x(:) ones(length(x),1)]; %here are the x values from your histogram
b = y(:); %here are the y-values from your histogram
C = diag(n(:)); %counts from each pixel in your 2D histogram
%Define polynomial coefficients as p = [slope; y_offset];
%usual least-squares solution...written here for reference
% b = A*p; %remember, p = [slope; y_offset];
% p = inv(A'*A)*(A'*b); %remember, p = [slope; y_offset];
%We want to apply a weighting matrix, so incorporate the weighting matrix
% A' * b = A' * C * A * p;
p = inv(A' * C * A)*(A' * b); %remember, p = [slope; y_offset];
The biggest uncertainty for me with this solution is whether the C matrix should be made up of n or n.^2, I can never remember. Hopefully, someone can correct me in the comments, if needed.
If you have the original data, which is a collection of (x,y) points, you simply do a polyfit on the original data:
p = polyfit(x(:),y(:),1); %linear fit
That will give you a best fit (in the least-squares sense) to the original data, which is what you want.
If you do not have the original data, and you only have the 2D histogram, the approach that you defined (which basically recreates a facsimile of the original data) will give a similar answer as if you did the polyfit on the original data.

3D curvefitting

I have discrete regular grid of a,b points and their corresponding c values and I interpolate it further to get a smooth curve. Now from interpolation data, I further want to create a polynomial equation for curve fitting. How to fit 3D plot in polynomial?
I try to do this in MATLAB. I used Surface fitting toolbox in MATLAB (r2010a) to curve fit 3-dimensional data. But, how does one find a formula that fits a set of data to the best advantage in MATLAB/MAPLE or any other software. Any advice? Also most useful would be some real code examples to look at, PDF files, on the web etc.
This is just a small portion of my data.
a = [ 0.001 .. 0.011];
b = [1, .. 10];
c = [ -.304860225, .. .379710865];
Thanks in advance.
To fit a curve onto a set of points, we can use ordinary least-squares regression. There is a solution page by MathWorks describing the process.
As an example, let's start with some random data:
% some 3d points
data = mvnrnd([0 0 0], [1 -0.5 0.8; -0.5 1.1 0; 0.8 0 1], 50);
As #BasSwinckels showed, by constructing the desired design matrix, you can use mldivide or pinv to solve the overdetermined system expressed as Ax=b:
% best-fit plane
C = [data(:,1) data(:,2) ones(size(data,1),1)] \ data(:,3); % coefficients
% evaluate it on a regular grid covering the domain of the data
[xx,yy] = meshgrid(-3:.5:3, -3:.5:3);
zz = C(1)*xx + C(2)*yy + C(3);
% or expressed using matrix/vector product
%zz = reshape([xx(:) yy(:) ones(numel(xx),1)] * C, size(xx));
Next we visualize the result:
% plot points and surface
figure('Renderer','opengl')
line(data(:,1), data(:,2), data(:,3), 'LineStyle','none', ...
'Marker','.', 'MarkerSize',25, 'Color','r')
surface(xx, yy, zz, ...
'FaceColor','interp', 'EdgeColor','b', 'FaceAlpha',0.2)
grid on; axis tight equal;
view(9,9);
xlabel x; ylabel y; zlabel z;
colormap(cool(64))
As was mentioned, we can get higher-order polynomial fitting by adding more terms to the independent variables matrix (the A in Ax=b).
Say we want to fit a quadratic model with constant, linear, interaction, and squared terms (1, x, y, xy, x^2, y^2). We can do this manually:
% best-fit quadratic curve
C = [ones(50,1) data(:,1:2) prod(data(:,1:2),2) data(:,1:2).^2] \ data(:,3);
zz = [ones(numel(xx),1) xx(:) yy(:) xx(:).*yy(:) xx(:).^2 yy(:).^2] * C;
zz = reshape(zz, size(xx));
There is also a helper function x2fx in the Statistics Toolbox that helps in building the design matrix for a couple of model orders:
C = x2fx(data(:,1:2), 'quadratic') \ data(:,3);
zz = x2fx([xx(:) yy(:)], 'quadratic') * C;
zz = reshape(zz, size(xx));
Finally there is an excellent function polyfitn on the File Exchange by John D'Errico that allows you to specify all kinds of polynomial orders and terms involved:
model = polyfitn(data(:,1:2), data(:,3), 2);
zz = polyvaln(model, [xx(:) yy(:)]);
zz = reshape(zz, size(xx));
There might be some better functions on the file-exchange, but one way to do it by hand is this:
x = a(:); %make column vectors
y = b(:);
z = c(:);
%first order fit
M = [ones(size(x)), x, y];
k1 = M\z;
%least square solution of z = M * k1, so z = k1(1) + k1(2) * x + k1(3) * y
Similarly, you can do a second order fit:
%second order fit
M = [ones(size(x)), x, y, x.^2, x.*y, y.^2];
k2 = M\z;
which seems to have numerical problems for the limited dataset you gave. Type help mldivide for more details.
To make an interpolation over some regular grid, you can do like so:
ngrid = 20;
[A,B] = meshgrid(linspace(min(a), max(a), ngrid), ...
linspace(min(b), max(b), ngrid));
M = [ones(numel(A),1), A(:), B(:), A(:).^2, A(:).*B(:), B(:).^2];
C2_fit = reshape(M * k2, size(A)); % = k2(1) + k2(2)*A + k2(3)*B + k2(4)*A.^2 + ...
%plot to compare fit with original data
surfl(A,B,C2_fit);shading flat;colormap gray
hold on
plot3(a,b,c, '.r')
A 3rd-order fit can be done using the formula given by TryHard below, but the formulas quickly become tedious when the order increases. Better write a function that can construct M given x, y and order if you have to do that more than once.
This sounds like more of a philosophical question than specific implementation, specifically to bit - "how does one find a formula that fits a set of data to the best advantage?" In my experience that is a choice you have to make depending on what you're trying to achieve.
What defines "best" for you? For a data fitting problem you can keep adding more and more polynomial coefficients and making a better R^2 value... but will eventually "over fit" the data. A downside of high order polynomials is behavior outside the bounds of the sample data which you've used to fit your response surface - it can quickly go off in some wild direction which may not be appropriate for whatever it is you're trying to model.
Do you have insight into the physical behavior of the system / data you're fitting? That can be used as a basis for what set of equations to use to create a math model. My recommendation would be to use the most economical (simple) model you can get away with.