Line of best fit scatter plot - matlab

I'm trying to do a scatter plot with a line of best fit in matlab, I can get a scatter plot using either scatter(x1,x2) or scatterplot(x1,x2) but the basic fitting option is shadowed out and lsline returns the error 'No allowed line types found. Nothing done'
Any help would be great,
Thanks,
Jon.

lsline is only available in the Statistics Toolbox, do you have the statistics toolbox? A more general solution might be to use polyfit.
You need to use polyfit to fit a line to your data. Suppose you have some data in y and you have corresponding domain values in x, (ie you have data approximating y = f(x) for arbitrary f) then you can fit a linear curve as follows:
p = polyfit(x,y,1); % p returns 2 coefficients fitting r = a_1 * x + a_2
r = p(1) .* x + p(2); % compute a new vector r that has matching datapoints in x
% now plot both the points in y and the curve fit in r
plot(x, y, 'x');
hold on;
plot(x, r, '-');
hold off;
Note that if you want to fit an arbitrary polynomial to your data you can do so by changing the last parameter of polyfit to be the dimensionality of the curvefit. Suppose we call this dimension d, you'll receive back d+1 coefficients in p, which represent a polynomial conforming to an estimate of f(x):
f(x) = p(1) * x^d + p(2) * x^(d-1) + ... + p(d)*x + p(d+1)
Edit, as noted in a comment you can also use polyval to compute r, its syntax would like like this:
r = polyval(p, x);

Infs, NaNs, and imaginaryparts of complex numbers are ignored in the data.
Curve Fitting Tool provides a flexible graphical user interfacewhere you can interactively fit curves and surfaces to data and viewplots. You can:
Create, plot, and compare multiple fits
Use linear or nonlinear regression, interpolation,local smoothing regression, or custom equations
View goodness-of-fit statistics, display confidenceintervals and residuals, remove outliers and assess fits with validationdata
Automatically generate code for fitting and plottingsurfaces, or export fits to workspace for further analysis

Related

Non-symbolic derivative at all sample points including boundary points

Suppose I have a vector t = [0 0.1 0.9 1 1.4], and a vector x = [1 3 5 2 3]. How can I compute the derivative of x with respect to time that has the same length as the original vectors?
I should not use any symbolic operations. The command diff(x)./diff(t) does not produce a vector of the same length. Should I first interpolate the x(t) function and then take its derivative?
Different approaches exist to calculate the derivative at the same points as your initial data:
Finite differences: Use a central difference scheme at your inner points and a forward/backward scheme at your first/last point
or
Curve fitting: Fit a curve through your points, calculate the derivative of this fitted function and sample them at the same points as the original data. Typical fitting functions are polynomials or spline functions.
Note that the curve fitting approach gives better results, but needs more tuning options and is slower (~100x).
Demonstration
As an example, I will calculate the derivative of a sine function:
t = 0:0.1:1;
y = sin(t);
Its exact derivative is well known:
dy_dt_exact = cos(t);
The derivative can approximately been calculated as:
Finite differences:
dy_dt_approx = zeros(size(y));
dy_dt_approx(1) = (y(2) - y(1))/(t(2) - t(1)); % forward difference
dy_dt_approx(end) = (y(end) - y(end-1))/(t(end) - t(end-1)); % backward difference
dy_dt_approx(2:end-1) = (y(3:end) - y(1:end-2))./(t(3:end) - t(1:end-2)); % central difference
or
Polynomial fitting:
p = polyfit(t,y,5); % fit fifth order polynomial
dp = polyder(p); % calculate derivative of polynomial
The results can be visualised as follows:
figure('Name', 'Derivative')
hold on
plot(t, dy_dt_exact, 'DisplayName', 'eyact');
plot(t, dy_dt_approx, 'DisplayName', 'finite difference');
plot(t, polyval(dp, t), 'DisplayName', 'polynomial');
legend show
figure('Name', 'Error')
hold on
plot(t, abs(dy_dt_approx - dy_dt_exact)/max(dy_dt_exact), 'DisplayName', 'finite difference');
plot(t, abs(polyval(dp, t) - dy_dt_exact)/max(dy_dt_exact), 'DisplayName', 'polynomial');
legend show
The first graph shows the derivatives itself and the second graph plots the relative errors made by both methods.
Discussion
One clearly sees that the curve fitting method gives better results than the finite differences, but it is ~100x slower. The curve fitting methods has a relative error of order 10^-5. Note that the finite differences approach becomes better when your data is sampled more densely or you use a higher order scheme. The disadvantage of the curve fitting approach is that one has to choose a good polynomial order. Spline functions may be better suited in general.
A 10x faster sampled dataset, i.e. t = 0:0.01:1;, results in the following graphs:

MATLAB Fit a line to a histogram

I am just wondering, how would I go about fitting a line to histogram, using the z-counts as weights? An example of this is shown below (although this post just discusses overlaying multiple plots), taken from Scatter plot with density in Matlab).
My initial thought is to make an array consisting of each pixel from the density plot, repeated n times to make a scatter plot (n == the number of counts), then do a linear polyfit. This seems awfully redundant though.
The other approach is to do a weighted least squares solution. You need the (x,y) location of each pixel and the number of counts n within each pixel. Then, I think that you'd do the weighted least-squares this way:
%gather your known data...have x,y, and n all in the same order as each other
A = [x(:) ones(length(x),1)]; %here are the x values from your histogram
b = y(:); %here are the y-values from your histogram
C = diag(n(:)); %counts from each pixel in your 2D histogram
%Define polynomial coefficients as p = [slope; y_offset];
%usual least-squares solution...written here for reference
% b = A*p; %remember, p = [slope; y_offset];
% p = inv(A'*A)*(A'*b); %remember, p = [slope; y_offset];
%We want to apply a weighting matrix, so incorporate the weighting matrix
% A' * b = A' * C * A * p;
p = inv(A' * C * A)*(A' * b); %remember, p = [slope; y_offset];
The biggest uncertainty for me with this solution is whether the C matrix should be made up of n or n.^2, I can never remember. Hopefully, someone can correct me in the comments, if needed.
If you have the original data, which is a collection of (x,y) points, you simply do a polyfit on the original data:
p = polyfit(x(:),y(:),1); %linear fit
That will give you a best fit (in the least-squares sense) to the original data, which is what you want.
If you do not have the original data, and you only have the 2D histogram, the approach that you defined (which basically recreates a facsimile of the original data) will give a similar answer as if you did the polyfit on the original data.

Using nlinfit to fit Gaussian to x,y paired data

I am trying to use Matlab's nlinfit function to estimate the best fitting Gaussian for x,y paired data. In this case, x is a range of 2D orientations and y is the probability of a "yes" response.
I have copied #norm_funct from relevant posts and I'd like to return a smoothed, normal distribution that best approximates the observed data in y, and returns the magnitude, mean and SD of the best fitting pdf. At the moment, the fitted function appears to be incorrectly scaled and less than smooth - any help much appreciated!
x = -30:5:30;
y = [0,0.20,0.05,0.15,0.65,0.85,0.88,0.80,0.55,0.20,0.05,0,0;];
% plot raw data
figure(1)
plot(x, y, ':rs');
axis([-35 35 0 1]);
% initial paramter guesses (based on plot)
initGuess(1) = max(y); % amplitude
initGuess(2) = 0; % mean centred on 0 degrees
initGuess(3) = 10; % SD in degrees
% equation for Gaussian distribution
norm_func = #(p,x) p(1) .* exp(-((x - p(2))/p(3)).^2);
% use nlinfit to fit Gaussian using Least Squares
[bestfit,resid]=nlinfit(y, x, norm_func, initGuess);
% plot function
xFine = linspace(-30,30,100);
figure(2)
plot(x, y, 'ro', x, norm_func(xFine, y), '-b');
Many thanks
If your data actually represent probability estimates which you expect come from normally distributed data, then fitting a curve is not the right way to estimate the parameters of that normal distribution. There are different methods of different sophistication; one of the simplest is the method of moments, which means you choose the parameters such that the moments of the theoretical distribution match those of your sample. In the case of the normal distribution, these moments are simply mean and variance (or standard deviation). Here's the code:
% normalize y to be a probability (sum = 1)
p = y / sum(y);
% compute weighted mean and standard deviation
m = sum(x .* p);
s = sqrt(sum((x - m) .^ 2 .* p));
% compute theoretical probabilities
xs = -30:0.5:30;
pth = normpdf(xs, m, s);
% plot data and theoretical distribution
plot(x, p, 'o', xs, pth * 5)
The result shows a decent fit:
You'll notice the factor 5 in the last line. This is due to the fact that you don't have probability (density) estimates for the full range of values, but from points at distances of 5. In my treatment I assumed that they correspond to something like an integral over the probability density, e.g. over an interval [x - 2.5, x + 2.5], which can be roughly approximated by multiplying the density in the middle by the width of the interval. I don't know if this interpretation is correct for your data.
Your data follow a Gaussian curve and you describe them as probabilities. Are these numbers (y) your raw data – or did you generate them from e.g. a histogram over a larger data set? If the latter, the estimate of the distribution parameters could be improved by using the original full data.

3D curvefitting

I have discrete regular grid of a,b points and their corresponding c values and I interpolate it further to get a smooth curve. Now from interpolation data, I further want to create a polynomial equation for curve fitting. How to fit 3D plot in polynomial?
I try to do this in MATLAB. I used Surface fitting toolbox in MATLAB (r2010a) to curve fit 3-dimensional data. But, how does one find a formula that fits a set of data to the best advantage in MATLAB/MAPLE or any other software. Any advice? Also most useful would be some real code examples to look at, PDF files, on the web etc.
This is just a small portion of my data.
a = [ 0.001 .. 0.011];
b = [1, .. 10];
c = [ -.304860225, .. .379710865];
Thanks in advance.
To fit a curve onto a set of points, we can use ordinary least-squares regression. There is a solution page by MathWorks describing the process.
As an example, let's start with some random data:
% some 3d points
data = mvnrnd([0 0 0], [1 -0.5 0.8; -0.5 1.1 0; 0.8 0 1], 50);
As #BasSwinckels showed, by constructing the desired design matrix, you can use mldivide or pinv to solve the overdetermined system expressed as Ax=b:
% best-fit plane
C = [data(:,1) data(:,2) ones(size(data,1),1)] \ data(:,3); % coefficients
% evaluate it on a regular grid covering the domain of the data
[xx,yy] = meshgrid(-3:.5:3, -3:.5:3);
zz = C(1)*xx + C(2)*yy + C(3);
% or expressed using matrix/vector product
%zz = reshape([xx(:) yy(:) ones(numel(xx),1)] * C, size(xx));
Next we visualize the result:
% plot points and surface
figure('Renderer','opengl')
line(data(:,1), data(:,2), data(:,3), 'LineStyle','none', ...
'Marker','.', 'MarkerSize',25, 'Color','r')
surface(xx, yy, zz, ...
'FaceColor','interp', 'EdgeColor','b', 'FaceAlpha',0.2)
grid on; axis tight equal;
view(9,9);
xlabel x; ylabel y; zlabel z;
colormap(cool(64))
As was mentioned, we can get higher-order polynomial fitting by adding more terms to the independent variables matrix (the A in Ax=b).
Say we want to fit a quadratic model with constant, linear, interaction, and squared terms (1, x, y, xy, x^2, y^2). We can do this manually:
% best-fit quadratic curve
C = [ones(50,1) data(:,1:2) prod(data(:,1:2),2) data(:,1:2).^2] \ data(:,3);
zz = [ones(numel(xx),1) xx(:) yy(:) xx(:).*yy(:) xx(:).^2 yy(:).^2] * C;
zz = reshape(zz, size(xx));
There is also a helper function x2fx in the Statistics Toolbox that helps in building the design matrix for a couple of model orders:
C = x2fx(data(:,1:2), 'quadratic') \ data(:,3);
zz = x2fx([xx(:) yy(:)], 'quadratic') * C;
zz = reshape(zz, size(xx));
Finally there is an excellent function polyfitn on the File Exchange by John D'Errico that allows you to specify all kinds of polynomial orders and terms involved:
model = polyfitn(data(:,1:2), data(:,3), 2);
zz = polyvaln(model, [xx(:) yy(:)]);
zz = reshape(zz, size(xx));
There might be some better functions on the file-exchange, but one way to do it by hand is this:
x = a(:); %make column vectors
y = b(:);
z = c(:);
%first order fit
M = [ones(size(x)), x, y];
k1 = M\z;
%least square solution of z = M * k1, so z = k1(1) + k1(2) * x + k1(3) * y
Similarly, you can do a second order fit:
%second order fit
M = [ones(size(x)), x, y, x.^2, x.*y, y.^2];
k2 = M\z;
which seems to have numerical problems for the limited dataset you gave. Type help mldivide for more details.
To make an interpolation over some regular grid, you can do like so:
ngrid = 20;
[A,B] = meshgrid(linspace(min(a), max(a), ngrid), ...
linspace(min(b), max(b), ngrid));
M = [ones(numel(A),1), A(:), B(:), A(:).^2, A(:).*B(:), B(:).^2];
C2_fit = reshape(M * k2, size(A)); % = k2(1) + k2(2)*A + k2(3)*B + k2(4)*A.^2 + ...
%plot to compare fit with original data
surfl(A,B,C2_fit);shading flat;colormap gray
hold on
plot3(a,b,c, '.r')
A 3rd-order fit can be done using the formula given by TryHard below, but the formulas quickly become tedious when the order increases. Better write a function that can construct M given x, y and order if you have to do that more than once.
This sounds like more of a philosophical question than specific implementation, specifically to bit - "how does one find a formula that fits a set of data to the best advantage?" In my experience that is a choice you have to make depending on what you're trying to achieve.
What defines "best" for you? For a data fitting problem you can keep adding more and more polynomial coefficients and making a better R^2 value... but will eventually "over fit" the data. A downside of high order polynomials is behavior outside the bounds of the sample data which you've used to fit your response surface - it can quickly go off in some wild direction which may not be appropriate for whatever it is you're trying to model.
Do you have insight into the physical behavior of the system / data you're fitting? That can be used as a basis for what set of equations to use to create a math model. My recommendation would be to use the most economical (simple) model you can get away with.

plot Quadric Surfaces in General Form in matlab

I have Quadric Surface equation
I know A,B,C...
How can I plot my equation in matlab?
Your best bet is to produce a 3D contour plot of your function with a single contour at the function value 0. To do this with reasonable accuracy, compute your function F at a number of points x, y, z as follows:
gv = linspace(-30,30,50); % adjust for appropriate domain
[xx yy zz]=meshgrid(gv, gv, gv);
F = A*xx.*xx + B*yy.*yy + C*zz.*zz+ ... etc
figure
isosurface(xx, yy, zz, F, 0)
The reason to do it this way is that your function is typically multi-valued- that is, for a given value of X and Y there may be two possible answers for Z. By doing things this way you effectively bypass that problem - instructing matlab to put a surface anywhere that the function is zero.
Note that I gave an arbitrary vector gv for the grid - that is, the points on which the function is evaluated. To get an accurate and visually pleasing result you probably need around 50 points in each dimension within the range over which a solution is possible (this may be different in the three dimensions);
For example, with
F = xx.^2 + 2*yy.^2 + 0.5*zz.^2 + .4*xx.*yy + .5*xx.*zz + .6*yy.*zz + 7*xx + 8*yy + 9*zz - 100;
You get the following figure: