Fit modified ERF to data in MATLAB - matlab

I'm using Matlab for data fitting for the first time and I cannot get it to fit my data properly. I have a few hundreds measured values which I normalise from 0-1 (see the linked image).
I then want to fit the data with a modified ERF namely: 0.5 + {0.5*[erf(x/(2*(t*d)^(1/2)))]}. I want to extrapolate the value for t hence I even tried assigning a value to d (which is a know constant anyway) and substitute the initial 0.5 with a constant of unknown value to be found: a + {0.5*[erf(x/(2*(t*6E-20)^(1/2)))]}. I also tried using ERFC instead of ERF.
However, I always get a very steep fitted curve which does not match my data.
I know that, given a fixed d, I should get a value for t of around 3-7 from Matlab as I can fit the data well in Excel (in a qualitative way, ie: by eye) with the given function and a value for t of 3-7.
I should probably mention that the fit in excel is done by finding the inflection point and using a slightly different equation to model the data above and below the inflection point. I also tried this method in Matlab but still could not make it work.
For some reason the fit from Matlab always returns the same value for t as the inserted start point and I always get the same steep curve no matter what method I use. My limits for t are +/-inf.
What am I doing wrong?
Thanks!
Giuseppe
Real data and fitted curve image

Related

Matlab - compute percentage residual of the fit against data

I use cftool to fit my data using a custom equation. I could see the fit and also the corresponding residual data of the fit. Since the residuals are absolute values, I am also interested in the percentage deviation with respect to data.
Of course, I could implement the equation and the compare it with data through a separate script. But, is there a more easier to check the percentage residuals?
I searched a bit and a way to obtain residuals is mentioned here.
Basically, export the fit to workspace using: Fit->Save to Workspace. Then, the percentage error can be computed as:
residual = z - fittedmodel(x,y);
percentageResidual = residual./z*100;

Implementation of Radon transform in Matlab, output size

Due to the nature of my problem, I want to evaluate the numerical implementations of the Radon transform in Matlab (i.e. different interpolation methods give different numerical values).
while trying to code my own Radon, and compare it to Matlab's output, I found out that my radon projection sizes are different than Matlab's.
So a bit of intuition of how I compute the amount if radon samples needed. Let's do the 2D case.
The idea is that the maximum size would be when the diagonal (in a rectangular shape at least) part is proyected in the radon transform, so diago=sqrt(size(I,1),size(I,2)). As we dont wan nothing out, n_r=ceil(diago). n_r should be the amount of discrete samples of the radon transform should be to ensure no data is left out.
I noticed that Matlab's radon output is always even, which makes sense as you would want a "ray" through the rotation center always. And I noticed that there are 2 zeros in the endpoints of the array in all cases.
So in that case, n_r=ceil(diago)+mod(ceil(diago)+1,2)+2;
However, it seems that I get small discrepancies with Matlab.
A MWE:
% Try: 255,256
pixels=256;
I=phantom('Modified Shepp-Logan',pixels);
rd=radon(I,pi/4);
size(rd,1)
s=size(I);
diagsize=sqrt(sum(s.^2));
n_r=ceil(diagsize)+mod(ceil(diagsize)+1,2)+2
rd=
367
n_r =
365
As Matlab's Radon transform is a function I can not look into, I wonder why could it be this discrepancy.
I took another look at the problem and I believe this is actually the right answer. From the "hidden documentation" of radon.m (type in edit radon.m and scroll to the bottom)
Grandfathered syntax
R = RADON(I,THETA,N) returns a Radon transform with the
projection computed at N points. R has N rows. If you do not
specify N, the number of points the projection is computed at
is:
2*ceil(norm(size(I)-floor((size(I)-1)/2)-1))+3
This number is sufficient to compute the projection at unit
intervals, even along the diagonal.
I did not try to rederive this formula, but I think this is what you're looking for.
This is a fairly specialized question, so I'll offer up an idea without being completely sure it is the answer to your specific question (normally I would pass and let someone else answer, but I'm not sure how many readers of stackoverflow have studied radon). I think what you might be overlooking is the floor function in the documentation for the radon function call. From the doc:
The radial coordinates returned in xp are the values along the x'-axis, which is
oriented at theta degrees counterclockwise from the x-axis. The origin of both
axes is the center pixel of the image, which is defined as
floor((size(I)+1)/2)
For example, in a 20-by-30 image, the center pixel is (10,15).
This gives different behavior for odd- or even-sized problems that you pass in. Hence, in your example ("Try: 255, 256"), you would need a different case for odd versus even, and this might involve (in effect) padding with a row and column of zeros.

Differentiating a Centred and Scaled Polyfit Fit

I have some data which I wish to model in order to be able to get relatively accurate values in the same range as the data.
To do this I used polyfit to fit a 6th order polynomial and due to my x-axis values it suggested I centred and scaled it to get a more accurate fit which I did.
However, now I want to find the derivative of this function in order to model the velocity of my model.
But I am not sure how the polyder function interacts with the scaled and fitted polyfit which I have produced. (I don't want to use the unscaled model as this is not very accurate).
Here is some code which reproduces my problem. I attempted to rescale the x values before putting them into the fit for the derivative but this still did no fix the problem.
x = 0:100;
y = 2*x.^2 + x + 1;
Fit = polyfit(x,y,2);
[ScaledFit,s,mu] = polyfit(x,y,2);
Deriv = polyder(Fit);
ScaledDeriv = polyder(ScaledFit);
plot(x,polyval(Deriv,x),'b.');
hold on
plot(x,polyval(ScaledDeriv,(x-mu(1))/mu(2)),'r.');
Here I have chosen a simple polynomial so that I could fit it accurate and produce the actual derivative.
Any help would be greatly appreciated thanks.
I am using Matlab R2014a BTW.
Edit.
Just been playing about with it and by dividing the resulting points for the differential by the standard deviation mu(2) it gave a very close result within the range -3e-13 to about 5e-13.
polyval(ScaledDeriv,(x-mu(1))/mu(2))/mu(2);
Not sure quite why this is the case, is there another more elegant way to solve this?
Edit2. Sorry for another edit but again was mucking around and found that for a large sample x = 1:1000; the deviation became much bigger up to 10. I am not sure if this is due to a bad polyfit even though it is centred and scaled or due to the funny way the derivative is plotted.
Thanks for your time
A simple application of the chain rule gives
Since by definition
it follows that
Which is exactly what you have verified numerically.
The lack of accuracy for large samples is due to the global, rather then local, Lagrange polynomial interpolation which you have done. I would suggest that you try to fit your data with splines, and obtain the derivative with fnder(). Another option is to apply the polyfit() function locally, i.e. to a moving small set of points, and then apply polyder() to all the fitted polynomials.

Matlab gmdistribution.fit only finding one peak

I have an array named Area, which contains a set of values.
The histogram of the array looks like this
The bin width is 60 in this case. I'd like to fit two gaussians to the two peaks here (even if it won't be a great fit).
So I used:
options = statset('Display','final');
obj = gmdistribution.fit(area,2,'Options',options);
gausspdf = pdf(obj, xaxis);
A = sum(gausspdf);
gausspdf = gausspdf/A;
But when I try to plot the two fitted Gaussians, the resulting curve looks like this:
I'm quite confused, as there should be two peaks appearing in the plot?
The gmdistribution.fit method fits data according to maximum-likelihood criterion; that is, it tries to find parameters which maximize the likelihood given the data. It will not necessarily fit what you see or expect visually. Still, there is the possibility that the algorithm converged to a "bad" local minimum. You can try and set the initial conditions according to what you want to get, practically 'helping' the algorithm to converge to the desired result. You do this using the Start option to the fit method, which enables you to give it either an initial guess, in which case you should try and estimate the parameters from the histogram, or an initial component index for each data sample. See the documentation for more details.
I think that your peaks are too close and the function can't distinguish them. so maybe you should change the options for gmdistribution or apply a non-linear function to your data first to get more separate peaks in histogram.

Detect incorrect points in a homogeneous surface

In my project i have hige surfaces of 20.000 points computed by a algorithm. This algorithm, sometimes, has an error, computing 1 or more points in an small area incorrectly.
This error can not be solved in the algorithm, but needs to be detected afterwards.
The error can be seen in the next figure:
As you can see, there is a point wrongly computed that not only breaks the full homogeneous surface, but also destroys the aestetics of the plot (wich is also important in the project.)
Sometimes it can be more than a point, in general no more than 5 or 6. The error is allways the Z axis, so no need to check X and Y
I have been squeezing my mind to find a bit "generic" algorithm to detect this poitns.
I thougth that maybe taking patches of surface and meaning the Z, then detecting the points out of the variance... but I dont think it will work allways.
Any ideas?
NOTE: I dont want someone to write code for me, just an idea.
PD: relevant code for the avobe image:
[x,y] = meshgrid([-2:.07:2]);
Z = x.*exp(-x.^2-y.^2);
subplot(1,2,1)
surf(x,y,Z,gradient(Z))
subplot(1,2,2)
Z(35,35)=Z(35,35)+0.3;
surf(x,y,Z,gradient(Z))
The standard trick is to use a Laplacian, looking for the largest outliers. (This is not unlike what Mohsen posed for an answer, but is actually a bit easier.) You could even probably do it with conv2, so it would be pretty efficient.
I could offer a few ways to implement the idea. A simple one is to use my gridfit tool, found on the File Exchange. (Gridfit essentially uses a Laplacian for its smoothing operation.) Fit the surface with all points included, then look for the single point that was perturbed the most by the fit. Exclude it, then rerun the fit, again looking for the largest outlier. (With gridfit, you can use weights to give points a zero weight, a simple way to exclude a point or list of points.) When the largest perturbation that was needed is small enough, you can decide to stop the process. A nice thing is gridfit will also impute new values for the outliers, filling in all of the holes.
A second approach is to use the Laplacian directly, in more of a filtering approach. Here, you simply compute a value at each point that is the average of each neighbor to the left, right, above, and below. The single value that is most largely in disagreement with its computed average is replaced with a new value. Or, you can use a weighted average of the new value with the old one there. Again, iterate until the process does not generate anything larger than some tolerance. (This is the basis of an old outlier detection and correction scheme that I recall from the Fortran IMSL libraries, but probably dates back to roughly 30 years ago.)
Since your functions seems to vary smoothly these abrupt changes can be detected by looking into the derivatives. You can
Take the derivative in one direction
Calculate mean and standard deviation of derivative
Find the points by looking for points that are further from mean by certain multiple of standard deviation.
Here is the code
U=diff(Z);
V=(U-mean(U(:)))/std(U(:));
surf(x(2:end,:),y(2:end,:),V)
V=[zeros(1,size(V,2)); V];
V(abs(V)<10)=0;
V=sign(V);
W=cumsum(V);
[I,J]=find(W);
outliers = [I, J];
For your example you get this plot for V with a peak at around 21.7 while second peak is at around 1.9528, so maybe a threshold of 10 is ok.
and running the code returns
outliers =
35 35
The need for cumsum is for the cases that you have a patch of points next to each other that are incorrect.