How to scale data before fitting in matlab 2016b - matlab

I use the fit() function of matlab 2016b to fit a complicate exponential function to data x,y.
The x values are very small (1e-3) and the y values are in the order of 1.
The fitting works well, if I multiply x by a constant factor 1000 to 100000.
This effect is described in the mathworks online help in section "Using Normalize or Center and Scale".
The Normalize: 'on' setting fails here (1) even with very narrow upper and lower limits.
Is the Normalize option transparent for the user, or will I have to change my boundaries after enabling it?
For now I help my self by multiplying x and correct the coefficients in the fit results afterwards.
ScalingFactor = 1000;
[xData, yData] = prepareCurveData( myx * ScalingFactor, myy );
ft = fittype(..)
[fitresult, gof] = fit( xData, yData, ft, opts );
%% now correct the results
This does not look very elegant to me for a quite expensive software.
4 years ago there was a similar question on sx without a handy solution.
Is there a better solution today?
1) Perhaps the error message helps other users to find this question/answer some day, so I paste it here:
Error using fit>iFit (line 340)
Complex value computed by model function, fitting cannot continue.
Try using or tightening upper and lower bounds on coefficients.

You might consider placing the scaling factor inside the function itself as a fitted parameter. For example, instead of:
scaledX = x * 1000
y = f(scaledX)
you would instead directly fit to:
y = f(scalingFactor * x)
so that the fitting (hopefully) finds the optimum scaling factor for you.

Related

Higher order polynomial fitting is not so handy surprisingly

I have a simple question but was not able to fix it by myself. I want to use the MATLAB curvefitting toolbox and fit higher order polynomials. It works if I want to fit polynomials of order 1 to 9. But, to my surprise it does not work for polynomials with degree higher than 9. To make it simple, can you just see the following simple code which does not work for me, unfortunately.
l=1:0.01:10;y=l.^10;
[xData, yData] = prepareCurveData(l,y);
ft = fittype( 'poly10' );
[Fit, gof] = fit( xData, yData, ft, 'Normalize', 'on' );
Thanks in advance,
Babak
It might be surprising, but it is documented: List of Library Models for Curve and Surface Fitting. You can always use polyfit, but as per the warning it issues, once you start getting polynomials of that degree, the fit is likely to be problematic anyway.
This answer is some supplement to the Phil Goddard's answer.
There is no poly10 in the function fit. But there are at least two alternative ways to fit any degree of the polynomial: something like polyX, where x could be 1,2,...,M, (if it is necessary).
clc; clear;
%%data
l=1:0.01:10;y=l.^10;
[xData, yData] = prepareCurveData(l,y);
%%High degree polynomial fitting
%set the degree of the polunomial
Degree=10;
%Fit with customize option
%generate the cell array from 'x^Degree' to 'x^0'
syms x
Str=char(power(x,Degree:-1:0));
%set the fitting type & options, then call fit
HighPoly = fittype(strsplit(Str(10:end-3),','));
options = fitoptions('Normalize', 'off','Method','LinearLeastSquares','Robust','off');
[curve,gof] = fit(xData,yData,HighPoly,options)
%Polyfit with the degree of Degree
p = polyfit(xData,yData,Degree)
But both fit and polyfit show some warnings, in my humble opinion, it is due to the Runge's phenomenon, which is a problem of oscillation at the edges of an interval that occurs when using polynomial interpolation with polynomials of high degree over a set of equispaced interpolation points.
Discard the data in this situation or some similar ones, where the true function is polynomial with high degree, says something in the Pn[R], high degree polynomial is not recommended in the fitting of the complex function.
Edit: generalized the code.

How do I fit distributions to data sets in matlab?

I'm trying to find a fit to my data using matlab but I'm having a lot of trouble, here's what ive done so far:
A = load('homicide_crime.txt') % A is a two column array the first column is the year and the second column is crime in that year
norm_crime = (A(:,2)-mean(A(:,2)))/std(A(:,2));
[f,x]=hist(norm_crime,20);
plot(x,f/trapz(x,f))
y=normpdf(x,0,1);
hold on
plot(x,y)
This is the resulting plot
.
So i tried afterwards using the distribution fitter which gave me this.
Neither of these things look right since the peak aren't aligned and the fit is too small.
Here is the data set for anyone intrested
https://pastebin.com/CyddrN1R.
Any help is much appreciated.
Actually, I think you are confusing data transformation with distribution fitting.
DATA TRANSFORMATION
In this approach, data is manipulated through a non-linear transformation in order to achieve a perfect fit. This means that it forces your data to follow the chosen distribution rule. To accomplish this with a normal distribution, all you have to do is applying the following code:
A = load('homicide_crime.txt');
years = A(:,1);
crimes = A(:,2);
figure(),histfit(crimes);
rank = tiedrank(crimes);
p = rank ./ (numel(rank) + 1);
crimes_normal = norminv(p,0,1);
figure(),histfit(crimes_normal);
Using the following manipulation:
crimes_normal = (crimes - mean(crimes)) ./ std(crimes);
that can also be written as:
crimes_normal = zscore(crimes);
you modify your observations so that they have mu=0 and sigma=1, but this is far from making them perfectly fit a normal distribution.
DISTRIBUTION FITTING
In this approach, the parameters of the chosen distribution are calculated over the given dataset, and then random observations are drawn. On one side you have your empirical observations, and on the other side you have your fitted data. A goodness-of-fit test can finally tell you how well empirical observations fit the given distribution comparing them to theoretical observations.
Since your are working with a normal distribution, you know that it is fully described by two parameters: mu and sigma. Hence:
A = load('homicide_crime.txt');
years = A(:,1);
crimes_emp = A(:,2);
[mu,sigma] = normfit(crimes_emp);
% you can also use
% mu = mean(crimes);
% sigma = std(crimes);
% to achieve the same result
[f,x] = hist(crimes_emp);
crimes_the = normpdf(x,mu,sigma) .* max(f);
figure();
bar(x, (f ./ sum(f)));
hold on;
plot(x,crimes_the,'-r','LineWidth',2);
hold off;
And this returns something very close to the problem you originally noticed. As you can clearly see, without even running a Kolmogorov-Smirnov or an Anderson-Darling... your data doesn't fit a normal distribution well.
You can try a non-parametric density estimation method. I used kernel density estimation (KDE) with the default normal distribution as the kernel, to obtain the result as shown below. The Matlab command for the same is ksdensity() and documentation can be found here.
A = load('homicide_crime.txt') % load data
years = A(:,1);
values = A(:,2);
[f0,x0] = hist(values,100); % plot the actual histogram
[f1,x1,b1] = ksdensity(values); % KDE with automatically assigned bandwidth
[f2,x2,b2] = ksdensity(values,'Bandwidth',b1*0.6); % 60% of initial bandwidth (b1)
[f3,x3,b3] = ksdensity(values,'Bandwidth',b2*0.6); % 60% of previous bandwidth (b2) = 36% of initial bandwidth (b1)
[f4,x4,b4] = ksdensity(values,'Bandwidth',b3*0.6); % 60% of previous bandwidth (b3) = 21.6% of initial bandwidth (b1)
figure; hold on;
bar(years, f0/(sum(f0)*10) ); % scaled for visualization
plot(years, f1, 'y')
plot(years, f2, 'c')
plot(years, f3, 'g')
plot(years, f4, 'r','linewidth',3) % Final fit
In the code above, I first plot the histogram and then calculate the kde without any user specified bandwidth. This leads to an oversmooth fitting (yellow curve). With a few trials by reducing the bandwidth as only 60% of the previous iteration, I finally was able to get the closest fit (red curve). You can play around the bandwidth to get a still better desirable fit.

spline interpolation and its (exact) derivatives

Suppose I have the following data and commands:
clc;clear;
t = [0:0.1:1];
t_new = [0:0.01:1];
y = [1,2,1,3,2,2,4,5,6,1,0];
p = interp1(t,y,t_new,'spline');
plot(t,y,'o',t_new,p)
You can see they work quite fine, in the sense interpolating function matches the data points at the nodes fine. But my problem is, I need to compute the exact derivative of y (i.e., p function) w.r.t. time and plot it against the t vector. How can it be done? I shall not use diff commands, because I need to make sure the derivative function has the same length as t vector. Thanks a lot.
Method A: Using the derivative
This method calculates the actual derivative of the polynomial. If you have the curve fitting toolbox you can use:
% calculate the polynominal
pp = interp1(t,y,'spline','pp')
% take the first order derivative of it
pp_der=fnder(pp,1);
% evaluate the derivative at points t (or any other points you wish)
slopes=ppval(pp_der,t);
If you don't have the curve fitting toolbox you can replace the fnderline with:
% piece-wise polynomial
[breaks,coefs,l,k,d] = unmkpp(pp);
% get its derivative
pp_der = mkpp(breaks,repmat(k-1:-1:1,d*l,1).*coefs(:,1:k-1),d);
Source: This mathworks question. Thanks to m7913d for linking it.
Appendix:
Note that
p = interp1(t,y,t_new,'spline');
is a shortcut for
% get the polynomial
pp = interp1(t,y,'spline','pp');
% get the height of the polynomial at query points t_new
p=ppval(pp,t_new);
To get the derivative we obviously need the polynomial and can't just work with the new interpolated points. To avoid interpolating the points twice which can take quite long for a lot of data, you should replace the shortcut with the longer version. So a fully working example that includes your code example would be:
t = [0:0.1:1];
t_new = [0:0.01:1];
y = [1,2,1,3,2,2,4,5,6,1,0];
% fit a polynomial
pp = interp1(t,y,'spline','pp');
% get the height of the polynomial at query points t_new
p=ppval(pp,t_new);
% plot the new interpolated curve
plot(t,y,'o',t_new,p)
% piece-wise polynomial
[breaks,coefs,l,k,d] = unmkpp(pp);
% get its derivative
pp_der = mkpp(breaks,repmat(k-1:-1:1,d*l,1).*coefs(:,1:k-1),d);
% evaluate the derivative at points t (or any other points you wish)
slopes=ppval(pp_der,t);
Method B: Using finite differences
A derivative of a continuous function is at its base just the difference of f(x) to f(x+infinitesimal difference) divided by said infinitesimal difference.
In matlab, eps is the smallest difference possible with a double precision. Therefore after each t_new we add a second point which is eps larger and interpolate y for the new points. Then the difference between each point and it's +eps pair divided by eps gives the derivative.
The problem is that if we work with such small differences the precision of the output derivatives is severely limited, meaning it can only have integer values. Therefore we add values slightly larger than eps to allow for higher precisions.
% how many floating points the derivatives can have
precision = 10;
% add after each t_new a second point with +eps difference
t_eps=[t_new; t_new+eps*precision];
t_eps=t_eps(:).';
% interpolate with those points and get the differences between them
differences = diff(interp1(t,y,t_eps,'spline'));
% delete all differences wich are not between t_new and t_new + eps
differences(2:2:end)=[];
% get the derivatives of each point
slopes = differences./(eps*precision);
You can of course replace t_new with t (or any other time you want to get the differential of) if you want to get the derivatives at the old points.
This method is slightly inferior to method a) in your case, as it is slower and a bit less precise. But maybe it's useful to somebody else who is in a different situation.

how to graph error in parameters from polynomial fit - MATLAB

I have a list of data that I am trying to fit to a polynomial and I am trying to plot the 95% confidence bands for the parameters as well (in Matlab).
If my data are x and y
f=fit(x,y,'poly2')
plot(f,x,y)
ci=confint(f,0.95);
a_ci=ci(1,:);
b_ci=ci(2,:);
I do not know how to proceed after that to get the minimum and maximum band around my data. Does anyone know how to do that?
I can see that you have the curve fitting toolbox installed, which is good, because you need it for the following code to work.
Basic fit of example data
Let's define some example data and a possible fit function. (I could also have used poly2 here, but I wanted to keep it a bit more general.)
xdata = (0:0.1:1)'; % column vector!
noise = 0.1*randn(size(xdata));
ydata = xdata.^2 + noise;
f = fittype('a*x.^2 + b');
fit1 = fit(xdata, ydata, f, 'StartPoint', [1,1])
plot(fit1, xdata, ydata)
Side note: plot() is not our usual plot function, but a method of the cfit-object fit1.
Confidence intervals of the fitted parameters
Our fit uses the data to determine the coefficients a,b of the underlying model f(x)=ax2+b. You already did this, but for completeness here is how you can read out the uncertainty of the coefficients for any confidence interval. The coefficients are alphabetically ordered, which is why I can use ci(1,:) for a, and so on.
names = coeffnames(fit1) % check the coefficient order!
ci = confint(fit1, 0.95); % 2 sigma interval
a_ci = ci(1,:)
b_ci = ci(2,:)
By default, Matlab uses 2σ (0.95) confidence intervals. Some people (physicists) prefer to quote the 1σ (0.68) intervals.
Confidence and Prediction Bands
It's a good habit to plot confidence bands or prediction bands around the data – especially when the coefficients are correlated! But you should take a moment to think about which one of the two you want to plot:
Prediction band: If I take a new measurement value, where would I expect it to lie? In Matlab terms, this is called the “observation band”.
Confidence band: Where do I expect the true value to lie? In Matlab terms, this is called the “functional band”.
As with the coefficient’s confidence intervals, Matlab uses 2σ bands by default, and the physicists among us switch this to 1σ intervals. By its nature, the prediction band is bigger, because it is the combination of the error of the model (the confidence band!) and the error of the measurement.
There is a another destinction to make, one that I don’t fully understand. Both Matlab and Wikipedia make that distinction.
Pointwise: How big is the prediction/confidence band for a single measurement/true value? In virtually all cases I can think of, this is what you would want to ask as a physicist.
Simultaneous: How big do you have to make the prediction/confidence band if you want a set of all new measurements/all prediction points to lie within the band with a given confidence?
In my personal opinion, the “simultaneous band” is not a band! For a measurement with n points, it should be n individual error bars!
The prediction/confidence distinction and the pointwise/simultaneous distinction give you a total of four options for “the” band around the plot. Matlab makes the 2σ pointwise prediction band easily accessible, but what you seem to be interested in is the 2σ pointwise confidence band. It is a bit more cumbersome to plot, because you have to specify dummy data over which to evaluate the prediction band:
x_dummy = linspace(min(xdata), max(xdata), 100);
figure(1); clf(1);
hold all
plot(xdata,ydata,'.')
plot(fit1) % by default, evaluates the fit over the currnet XLim
% use "functional" (confidence!) band; use "simultaneous"=off
conf1 = predint(fit1,x_dummy,0.95,'functional','off');
plot(x_dummy, conf1, 'r--')
hold off
Note that the confidence band at x=0 equals the confidence interval of the fit-coefficient b!
Extrapolation
If you want to extrapolate to x-values that are not covered by the range of your data, you can evaluate the fit and the prediction/confidence band for a bigger range:
x_range = [0, 2];
x_dummy = linspace(x_range(1), x_range(2), 100);
figure(1); clf(1);
hold all
plot(xdata,ydata,'.')
xlim(x_range)
plot(fit1)
conf1 = predint(fit1,x_dummy,0.68,'functional','off');
plot(x_dummy, conf1, 'r--')
hold off

How to create 3D joint density plot MATLAB?

I 'm having a problem with creating a joint density function from data. What I have is queue sizes from a stock as two vectors saved as:
X = [askQueueSize bidQueueSize];
I then use the hist3-function to create a 3D histogram. This is what I get:
http://dl.dropbox.com/u/709705/hist-plot.png
What I want is to have the Z-axis normalized so that it goes from [0 1].
How do I do that? Or do someone have a great joint density matlab function on stock?
This is similar (How to draw probability density function in MatLab?) but in 2D.
What I want is 3D with x:ask queue, y:bid queue, z:probability.
Would greatly appreciate if someone could help me with this, because I've hit a wall over here.
I couldn't see a simple way of doing this. You can get the histogram counts back from hist3 using
[N C] = hist3(X);
and the idea would be to normalise them with:
N = N / sum(N(:));
but I can't find a nice way to plot them back to a histogram afterwards (You can use bar3(N), but I think the axes labels will need to be set manually).
The solution I ended up with involves modifying the code of hist3. If you have access to this (edit hist3) then this may work for you, but I'm not really sure what the legal situation is (you need a licence for the statistics toolbox, if you copy hist3 and modify it yourself, this is probably not legal).
Anyway, I found the place where the data is being prepared for a surf plot. There are 3 matrices corresponding to x, y, and z. Just before the contents of the z matrix were calculated (line 256), I inserted:
n = n / sum(n(:));
which normalises the count matrix.
Finally once the histogram is plotted, you can set the axis limits with:
xlim([0, 1]);
if necessary.
With help from a guy at mathworks forum, this is the great solution I ended up with:
(data_x and data_y are values, which you want to calculate at hist3)
x = min_x:step:max_x; % axis x, which you want to see
y = min_y:step:max_y; % axis y, which you want to see
[X,Y] = meshgrid(x,y); *%important for "surf" - makes defined grid*
pdf = hist3([data_x , data_y],{x y}); %standard hist3 (calculated for yours axis)
pdf_normalize = (pdf'./length(data_x)); %normalization means devide it by length of
%data_x (or data_y)
figure()
surf(X,Y,pdf_normalize) % plot distribution
This gave me the joint density plot in 3D. Which can be checked by calculating the integral over the surface with:
integralOverDensityPlot = sum(trapz(pdf_normalize));
When the variable step goes to zero the variable integralOverDensityPlot goes to 1.0
Hope this help someone!
There is a fast way how to do this with hist3 function:
[bins centers] = hist3(X); % X should be matrix with two columns
c_1 = centers{1};
c_2 = centers{2};
pdf = bins / (sum(sum(bins))*(c_1(2)-c_1(1)) * (c_2(2)-c_2(1)));
If you "integrate" this you will get 1.
sum(sum(pdf * (c_1(2)-c_1(1)) * (c_2(2)-c_2(1))))