how to evaluate curve fitting in Matlab - matlab

I'm using Matlab to analyse a couple of data, for I that I need the curve fitting, I've wrote this code from the documentation :
% I is 14 points vector that change its value in a loop
y =0:13;
[p,S] = polyfit(I,y,1);
[fx, delta] = polyval(p,I,S);
plot(y,I,'+',fx,I,'-');
here is what I get :
my question is , how can evaluate this 'fitting', I mean how good it is , and how can I get the slope of this line?
UPDATE
after Rafaeli's answer , I had some trouble understand the results, since fx is the fitting curve fitting for y considering 'I' , meaning that I get for `fx':
-1.0454 3.0800 4.3897 6.5324 4.0947 3.8975 4.3476 9.0088 5.8307 6.7166 9.8243 11.4009 11.9223
instead the I values are :
0.0021 0.0018 0.0017 0.0016 0.0018 0.0018 0.0017 0.0014 0.0016 0.0016 0.0014 0.0012 0.0012 0.0013
and the plot has exactly the value of `I' :
so the result I hope to get should be near to those values ! Itried to switch the
[p,S] = polyfit(y,I,1);
but is didn't the wasn't any better fx= 0.0020,so my question is how can I do that ?
2nd UPDATE
got it, here is the code :
y = 0:13
p = polyfit(y,I,1)
fx = polyval(p,y);
plot(y,I,'+',y,fx,'o')
here is the result :
thanks for any help !

The line is defined by y = ax + b, where a = p(1) and b = p(2), so the slope is p(1).
A simple way to know how good is the fit is to take the root mean square of the error: rms(fx - I). The lesser the value, better the fit.

Related

Strange result of spap2

I encounter strange results from spap2 on some data:
The actual data is the blue curve, red circles are the knots I am using and yellow curve is the display of the cubic spline curve.
The code is quite simple, I cannot figure out what is the problem:
spgood = spap2(knots_zY, 4, ec, Y);
plot(ec, Y);
hold on;
scatter(knots_zY, Y(ec==knots_zY));
fnplt(spgood)
ec is the vector -4.12:0.02:-0.54.
Y is the following vector:
4.1291 4.0732 4.0173 4.2624 4.3826 4.3267 4.2708 4.4367 4.3808 4.1031 4.1721 3.8152 4.1572
4.1013 4.0454 3.5916 3.8367 3.7808 3.8218 3.6690 3.9141 3.7333 3.8023 3.3204 3.5656 3.4305
3.5787 3.3978 3.3419 3.2860 3.4062 3.4753 3.5706 3.2385 3.1826 3.4947 3.5315 3.1746 3.2089
3.2276 3.1940 2.9162 3.0364 3.0263 2.8155 2.7596 2.9555 2.8996 2.9081 2.7322 2.8524 2.6397
2.7662 2.5279 2.5417 2.2005 2.3409 2.5108 2.5202 2.3359 2.3660 2.3100 2.1682 2.1123 2.2140
2.1288 2.1116 1.9856 2.0089 1.8845 1.9148 1.9308 1.7273 1.7642 1.7326 1.6606 1.7378 1.6570
1.5815 1.5701 1.4630 1.5503 1.5181 1.4385 1.3083 1.3168 1.2991 1.2523 1.1390 0.9988 1.0373
0.9913 1.0113 0.9754 0.8912 0.8790 0.7491 0.7557 0.7544 0.7119 0.7031 0.6843 0.6418 0.5938
0.5193 0.5334 0.4312 0.4839 0.4437 0.3992 0.3689 0.3287 0.3348 0.3076 0.2274 0.2174 0.1970
0.2188 0.1760 0.1384 0.1773 0.1342 0.1388 0.1097 0.0830 0.0782 0.0725 0.0863 0.0581 0.0466
0.0398 0.0431 0.0187 0.0187 0.0176 0.0167 0.0231 0.0033 -0.0117 -0.0016 0.0084 -0.0055 -0.0120
-0.0080 -0.0064 -0.0075 -0.0134 -0.0075 0.0012 -0.0077 -0.0024 0.0006 0.0010 0.0043 0.0016 0.0018
0.0042 0.0030 0.0029 0.0029 0.0021 0.0013 -0.0002 -0.0020 -0.0030 -0.0032 -0.0002 -0.0013 0.0035
0.0028 -0.0000 -0.0057 -0.0032 0.0020 0.0597 0.1835 0.5083 1.0275 1.6448 3.0549
The knots are defined with the following 12 values:
-4.1200 -3.9400 -3.5400 -3.3000 -3.1400 -2.6800 -2.3600 -2.0600 -1.5000 -1.1600 -0.7000 -0.5400
I don't expect a nice fit, but at least the spline fit sticks with the knots ... but here the result is completely erroneous. I am stuck with this, unable to see where is the problem with this data sample.
Note: the knots are computed in a separate algorithm and should be used for the interpolator, getting a good fit is not the question here. The question is why the spline fit does not pass through the knots.
I have made several errors.
First, it's a mistake to assume that the result spline will pass through the knots, as it is an approximation (see this answer). The approximation smoothes the whole original data so there is no way to stick on knots.
Second, I have forgot to extend the end knots to impose boundary conditions. The default boundary condition is to have all derivatives (including the 0th-order) to be zero, resulting in this shape. The solution is then to use augknt to get an actual cubic spline with two continuous derivatives:
spgood = spap2(augknt(knots_zY,4), 4, ec, Y);
The resulting fit is:
which is way better, given the choice of the knot sequence.

Selecting the right essential matrix

I want to code myself the sfm pipeline using Matlab because I need some outputs that opencv functions don't provide. However, I'm using opencv for comparison.
The Opencv function [E,mask] = cv.findEssentialMat(points1, points2, 'CameraMatrix',K, 'Method','Ransac'); provides the essential matrix solution using Nister's fivepoint algorithm and RANSAC.
the inlier indices are found using :InliersIndices=find(mask>0);
I used this Matlab impelmentation of Nister's algorithm:
Fivepoint_algoithm_code
The call to the function is as follows:
[E_all, R_all, t_all, Eo_all] = five_point_algorithm( pts1, pts2, K, K);
The algorithm outputs up to 10 solutions of essential matrices. However, I encountered the following issues:
The impelmentation stated above is only for perfect correspondances (without Ransac) and I'm providing to the algorithm 5 correspondances using InliersIndices, the outputted essential matrices (up to 10) are all different from the one returned by Opencv.
All the returned essential matrices should be solutions so why when I triangulate for each one using the below function, I don't obtain the same 3D points?
How to choose the right essential marix solution?
I triangulate using the function of matlab toolbox
Projection matrices:
P1=K*[eye(3) [0;0;0]];
P2=K*[R_all{i} t_all{i}];
[pts3D,rep_error] = triangulate(pts1', pts2', P1',P2');
Edit
The returned E from [E,mask] = cv.findEssentialMat(points1, points2, 'CameraMatrix',K, 'Method','Ransac');
E =
0.0052 -0.7068 0.0104
0.7063 0.0050 -0.0305
-0.0113 0.0168 0.0002
For the 5-point Matlab implementation,5 random indices from inliers are taken so:
pts1 =
736.7744 740.2372 179.2428 610.5297 706.8776
112.2673 109.9687 45.7010 91.4371 87.8194
pts2 =
722.3037 725.3770 150.3997 595.3550 692.5383
111.7898 108.6624 43.6847 90.6638 86.8139
K =
723.3631 7.9120 601.7643
-3.8553 719.6517 182.0588
0.0075 0.0044 1.0000
and 4 solutions are returned:
E1 =
-0.2205 0.9436 -0.1835
0.8612 0.2447 -0.1531
0.4442 -0.0600 -0.0378
E2 =
-0.2153 0.9573 0.1626
0.8948 0.2456 -0.3474
0.1003 0.1348 -0.0306
E3 =
0.0010 -0.9802 -0.0957
0.9768 0.0026 -0.1912
0.0960 0.1736 -0.0019
E4 =
-0.0005 -0.9788 -0.1427
0.9756 0.0021 -0.1658
0.1436 0.1470 -0.0030
Edit2:
pts1 and pts2 when triangulated using the essential matrix E, R and t returned [R, t] = cv.recoverPose(E, p1, p2,'CameraMatrix',K);
X1 =
-0.0940 0.0478 -0.4984
-0.0963 0.0497 -0.4987
0.3033 0.1009 -0.5202
-0.0065 0.0636 -0.5053
-0.0737 0.0653 -0.5011
with
R =
-0.9977 -0.0063 0.0670
0.0084 -0.9995 0.0305
0.0667 0.0310 0.9973
and
t =
0.0239
0.0158
0.9996
When triangulated with the Matlab code, the chosen solution is E_all{2}
R_all{2}=
-0.8559 -0.2677 0.4425
-0.1505 0.9475 0.2821
-0.4948 0.1748 -0.8512
and
t_all{2}=
-0.1040
-0.1355
0.9853
X2 =
0.1087 -0.0552 0.5762
0.1129 -0.0578 0.5836
0.4782 0.1582 -0.8198
0.0028 -0.0264 0.2099
0.0716 -0.0633 0.4862
When doing
X1./X2
ans =
-0.8644 -0.8667 -0.8650
-0.8524 -0.8603 -0.8546
0.6343 0.6376 0.6346
-2.3703 -2.4065 -2.4073
-1.0288 -1.0320 -1.0305
There is an almost constant scale factor between triangulated 3D points.
However, rotation matrices are different and there is no scale factor between translations.
t./t_all{2}=
-0.2295
-0.1167
1.0145
which makes the plotted trajectory wrong
Answering your numbered questions:
Beware that Nister's 5 point algorithm has many implementations, but most of them don't work well. Personal experience and unpublished work by colleagues show that OpenCV does not have a good implementation. The open implementation in Bundler and other working SfM pipelines work better in practice (but there is a lot of room for improvement).
The 10 solutions are simply zeros of a certain polynomial equation. As far as the polynomial equation can describe the problem, these 10 solutions all zero the equation. The equation does not describe that these 10 points are real, or that the 3D points corresponding to the 5 point correspondences have to be the same for each solution, but only that there are some 3D points (for each solution) that project to the 5 points, without even considering if the 3D points are in front of the respective cameras. Moreover, there may well be two sets of 3D points and cameras that happen to generate the same images of 5 points, so you would have to weed them out with some other procedure (below).
The choice of the right solution among the 10 complex solutions is usually done by many techniques:
Discard solutions that would lead to purely complex points or 3D points with negative depth (currently Bundler does not do this last check)
Discard solutions that are not physical for some other reason (you may have to do some of that yourself for your application)
The more usual procedure: For each remaining solution, check which one is more consistent with additional correspondences. In a real system you don't know which additional correspondences are right and which are pure trash. So run RANSAC for each of the solutions and keep the one with the most inliers. This is computationally heavy so should be used as a last resort.
You can see how Bundler does this at file 5point.c line 668:
generate_Ematrix_hypotheses(5, r_pts_inner, l_pts_inner, &num_hyp, E);
for (i = 0; i < num_hyp; i++) {
int best_inlier;
double score = 0.0;
double E2[9], tmp[9], F[9];
memcpy(E2, E + 9 * i, 9 * sizeof(double));
E2[0] = -E2[0];
E2[1] = -E2[1];
E2[3] = -E2[3];
E2[4] = -E2[4];
E2[8] = -E2[8];
matrix_transpose_product(3, 3, 3, 3, K2_inv, E2, tmp);
matrix_product(3, 3, 3, 3, tmp, K1_inv, F);
inliers = evaluate_Ematrix(n, r_pts, l_pts, // r_pts_norm, l_pts_norm,
thresh_norm, F, // E + 9 * i,
&best_inlier, &score);
if (inliers > max_inliers ||
(inliers == max_inliers && score < min_score)) {
best = 1;
max_inliers = inliers;
min_score = score;
memcpy(E_best, E + 9 * i, sizeof(double) * 9);
r_best = r_pts_norm[best_inlier];
l_best = l_pts_norm[best_inlier];
}
inliers_hyp[i] = inliers;
}

Dimensions Reduction in Matlab using PCA

I have a matrix with 35 columns and I'm trying to reduce the dimension using PCA. I run PCA on my data:
[coeff,score,latent,tsquared,explained,mu] = pca(data);
explained =
99.9955
0.0022
0.0007
0.0003
0.0002
0.0001
0.0001
0.0001
Then, by looking at the vector explained, I notice the value of the first element is 99. Based on this, I decided to take only the first compoenet. So I did the follwoing:
k=1;
X = bsxfun(#minus, data, mean(data)) * coeff(:, 1:k);
and now, I used X for SVM training:
svmStruct = fitcsvm(X,Y,'Standardize',true, 'Prior','uniform','KernelFunction','linear','KernelScale','auto','Verbose',0,'IterationLimit', 1000000);
However, when I tried to predict and calculate the miss-classification rate:
[label,score,cost] = predict(svmStruct, X);
the result was disappointing. I notice, when I select only one component (k=1), I all classification was wrong. However, as I increase number of included components, k, the result improves, as you can see from the diagram below. But this doesn't make sense according to explained, which indicates that I should be fine with only the first eigenvector.
Did I do any mistake?
This diagram shows the classification error as a function of the number of included eginvectors:
This graph is generated after by doing normalization before doing PCA as suggested by #zelanix:
This is also plotted graph:
and this explained values obtained after doing normalization before PCA:
>> [coeff,score,latent,tsquared,explained,mu] = pca(data_normalised);
Warning: Columns of X are linearly dependent to within machine precision.
Using only the first 27 components to compute TSQUARED.
> In pca>localTSquared (line 501)
In pca (line 347)
>> explained
explained =
32.9344
15.6790
5.3093
4.7919
4.0905
3.8655
3.0015
2.7216
2.6300
2.5098
2.4275
2.3078
2.2077
2.1726
2.0892
2.0425
2.0273
1.9135
1.8809
1.7055
0.8856
0.3390
0.2204
0.1061
0.0989
0.0334
0.0085
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
Parag S. Chandakkar is absolutely right that there is no reason to expect that PCA will automatically improve your classification result. It is an unsupervised method so is not intended to improve separability, only to find the components with the largest variance.
But there are some other problems with your code. In particular, this line confuses me:
X = bsxfun(#minus, data, mean(data)) * coeff(:, 1:k);
You need to normalise your data before performing PCA, and each feature needs to be normalised separately. I use the following:
data_normalised = data;
for f = 1:size(data, 2)
data_normalised(:, f) = data_normalised(:, f) - nanmean(data_normalised(:, f));
data_normalised(:, f) = data_normalised(:, f) / nanstd(data_normalised(:, f));
end
pca_coeff = pca(data_normalised);
data_pca = data_normalised * pca_coeff;
You can then extract the first principal component as data_pca(:, 1).
Also, always plot your PCA results to get an idea of what is actually going on:
figure
scatter(data_pca(Y == 1, 1), data_pca(Y == 1, 2))
hold on;
scatter(data_pca(Y == 2, 1), data_pca(Y == 2, 2))
PCA gives the direction of maximum variance in the data, it does not necessarily have to do better classification. If you want to reduce your data while trying to maximize your accuracy, you should do LDA.
The following picture illustrates exactly what I want to convey.

How to use Matlab for non linear least squares Michaelis–Menten parameters estimation

I have a set of measurements and I started making a linear approximation (as in this plot). A linear least squares estimation of the parameters V_{max} and K_{m} from this code in Matlab:
data=[2.0000 0.0615
2.0000 0.0527
0.6670 0.0334
0.6670 0.0334
0.4000 0.0138
0.4000 0.0258
0.2860 0.0129
0.2860 0.0183
0.2220 0.0083
0.2200 0.0169
0.2000 0.0129
0.2000 0.0087 ];
x = 1./data(:,1);
y = 1./data(:,2);
J = [x,ones(length(x),1)];
k = J\y;
vmax = 1/k(2);
km = k(1)*vmax;
lse = (vmax.*data(:,1))./(km+data(:,1));
plot(data(:,1),data(:,2),'o','color','red','linewidth',1)
line(data(:,1),lse,'linewidth',2)
This yields a fit that looks alright. Next, I wanted to do the same thing but with non-linear least squares. However, the fit always looks wrong, here is the code for that attempt:
options = optimset('MaxIter',10000,'MaxFunEvals',50000,'FunValCheck',...
'on','Algorithm',{'levenberg-marquardt',.00001});
p=lsqnonlin(#myfun,[0.1424,2.5444]);
lse = (p(1).*data(:,1))./(p(2)+data(:,1));
plot(data(:,1),data(:,2),'o','color','red','linewidth',1)
line(data(:,1),lse,'linewidth',2)
which requires this function in an M-File:
function F = myfun(x)
F = data(:,2)-(x(1).*data(:,1))./x(2)+data(:,1);
If you run the code you will see my problem. But hopefully, unlike me, you see what I'm doing wrong.
I think that you forgot some parentheses (some others are superfluous) in your nonlinear function. Using an anonymous function:
myfun = #(x)data(:,2)-x(1).*data(:,1)./(x(2)+data(:,1)); % Parentheses were missing
options = optimset('MaxIter',10000,'MaxFunEvals',50000,'FunValCheck','on',...
'Algorithm',{'levenberg-marquardt',.00001});
p = lsqnonlin(myfun,[0.1424,2.5444],[],[],options);
lse = p(1).*data(:,1)./(p(2)+data(:,1));
plot(data(:,1),data(:,2),'o','color','red','linewidth',1)
line(data(:,1),lse,'linewidth',2)
You also weren't actually applying any of your options.
You might look into using lsqcurvefit instead as it was designed for data fitting problems:
myfun = #(x,dat)x(1).*dat./(x(2)+dat);
options = optimset('MaxIter',10000,'MaxFunEvals',50000,'FunValCheck','on',...
'Algorithm',{'levenberg-marquardt',.00001});
p = lsqcurvefit(myfun,[0.1424,2.5444],data(:,1),data(:,2),[],[],options);
lse = myfun(p,data(:,1));
plot(data(:,1),data(:,2),'o','color','red','linewidth',1)
line(data(:,1),lse,'linewidth',2)

Unexpected behaviour of function findpeaks in MATLAB's Signal Processing Toolbox

Edit: Actually this is not unexpected behaviour, but I still need a solution.
findpeaks compares each element of data to its neighboring values.
I have data which contains peaks which I detect with the function findpeaks from the Signal Processing Toolbox. Sometimes the function seems not to detect the peaks properly, when I have the same value twice next to each other. This occurs very rarly in my data, but here is a sample to illustrate my problem:
>> values
values =
-0.0324
-0.0371
-0.0393
-0.0387
-0.0331
-0.0280
-0.0216
-0.0134
-0.0011
0.0098
0.0217
0.0352
0.0467
0.0548
0.0639
0.0740
0.0813
0.0858 <-- here should be another peak
0.0858 <--
0.0812
0.0719
0.0600
0.0473
0.0353
0.0239
0.0151
0.0083
0.0034
-0.0001
-0.0025
-0.0043
-0.0057
-0.0048
-0.0038
-0.0026
0.0007
0.0043
0.0062
0.0083
0.0106
0.0111
0.0116
0.0102
0.0089
0.0057
0.0025
-0.0025
-0.0056
Now the findpeaks function only finds one peak:
>> [pks loc] = findpeaks(values)
pks =
0.0116
loc =
42
If I plot the data, it becomes obvious that findpeaks misses one of the peaks at the location 18/19 because they both have the value 0.08579.
What is the best way to find those missing peaks?
If you have the image processing toolbox, you can use IMREGIONALMAX to find the peaks, after which you can use regionprops to find the center of the regions (if that's what you need), i.e.
bw = imregionalmax(signal);
peakLocations = find(bw); %# returns n peaks for an n-tuple of max-values
stats = regionprops(bw,'Centroid');
peakLocations = cat(1,stats.Centroid); %# returns the center of the n-tuple of max-values
This is an old topic, but maybe some are still looking for an easier solution to this (like I did today):
You could also just substract some very small fixed value from all values on a plateau, except from the first value. This causes each first value on a plateau to always be the highest on the respective plateaus, causing them to be included as peaks.
Just make something like this part of your code:
peaks = yourdata;
verysmallvalue = .001;
plateauvalue = peaks(1);
for i = 2:size(peaks,1)
if peaks(i) == plateauvalue
peaks(i) = peaks(i) - verysmallvalue;
else
plateauvalue = peaks(i);
end
end
[PKS,LOCS] = findpeaks(peaks);
plot(yourdata);
hold on;
plot(LOCS, yourdata(LOCS), 'Color', 'Red', 'Line', 'None', 'Marker', 'o');
Hope this helps!
Use the second derivative test instead?
I ended up writing my own simpler version of findpeaks, which seems to work for my purpose.
function [pks,locs] = my_findpeaks(X)
M = numel(X);
pks = [];
locs = [];
if (M < 4)
datamsgid = generatemsgid('emptyDataSet');
error(datamsgid,'Data set must contain at least 4 samples.');
else
for idx=1:M-3
if X(idx)< X(idx+1) && X(idx+1)>=X(idx+2) && X(idx+2)> X(idx+3)
pks = [pks X(idx)];
locs = [locs idx];
end
end
end
end
Edit: To clarify, the problem arose, when I had a peak which was exactly between two sample points and those two sample points had coincidentally the same value. It only happend a couple of times in more than 10.000 cases.
The behavior that you describe is a known bug in versions of MATLAB prior to R2010b. The minimum example is
findpeaks([0 1 1 0])
which returns [], while
findpeaks([0 1 0])
returns the (position of the) peak.
The bug has been fixed in R2010b and later, see the official Bug Report. With that fix, findpeaks returns the rising edge of "peaks with repeated values" (which I would call plateaus).