I want to code myself the sfm pipeline using Matlab because I need some outputs that opencv functions don't provide. However, I'm using opencv for comparison.
The Opencv function [E,mask] = cv.findEssentialMat(points1, points2, 'CameraMatrix',K, 'Method','Ransac'); provides the essential matrix solution using Nister's fivepoint algorithm and RANSAC.
the inlier indices are found using :InliersIndices=find(mask>0);
I used this Matlab impelmentation of Nister's algorithm:
Fivepoint_algoithm_code
The call to the function is as follows:
[E_all, R_all, t_all, Eo_all] = five_point_algorithm( pts1, pts2, K, K);
The algorithm outputs up to 10 solutions of essential matrices. However, I encountered the following issues:
The impelmentation stated above is only for perfect correspondances (without Ransac) and I'm providing to the algorithm 5 correspondances using InliersIndices, the outputted essential matrices (up to 10) are all different from the one returned by Opencv.
All the returned essential matrices should be solutions so why when I triangulate for each one using the below function, I don't obtain the same 3D points?
How to choose the right essential marix solution?
I triangulate using the function of matlab toolbox
Projection matrices:
P1=K*[eye(3) [0;0;0]];
P2=K*[R_all{i} t_all{i}];
[pts3D,rep_error] = triangulate(pts1', pts2', P1',P2');
Edit
The returned E from [E,mask] = cv.findEssentialMat(points1, points2, 'CameraMatrix',K, 'Method','Ransac');
E =
0.0052 -0.7068 0.0104
0.7063 0.0050 -0.0305
-0.0113 0.0168 0.0002
For the 5-point Matlab implementation,5 random indices from inliers are taken so:
pts1 =
736.7744 740.2372 179.2428 610.5297 706.8776
112.2673 109.9687 45.7010 91.4371 87.8194
pts2 =
722.3037 725.3770 150.3997 595.3550 692.5383
111.7898 108.6624 43.6847 90.6638 86.8139
K =
723.3631 7.9120 601.7643
-3.8553 719.6517 182.0588
0.0075 0.0044 1.0000
and 4 solutions are returned:
E1 =
-0.2205 0.9436 -0.1835
0.8612 0.2447 -0.1531
0.4442 -0.0600 -0.0378
E2 =
-0.2153 0.9573 0.1626
0.8948 0.2456 -0.3474
0.1003 0.1348 -0.0306
E3 =
0.0010 -0.9802 -0.0957
0.9768 0.0026 -0.1912
0.0960 0.1736 -0.0019
E4 =
-0.0005 -0.9788 -0.1427
0.9756 0.0021 -0.1658
0.1436 0.1470 -0.0030
Edit2:
pts1 and pts2 when triangulated using the essential matrix E, R and t returned [R, t] = cv.recoverPose(E, p1, p2,'CameraMatrix',K);
X1 =
-0.0940 0.0478 -0.4984
-0.0963 0.0497 -0.4987
0.3033 0.1009 -0.5202
-0.0065 0.0636 -0.5053
-0.0737 0.0653 -0.5011
with
R =
-0.9977 -0.0063 0.0670
0.0084 -0.9995 0.0305
0.0667 0.0310 0.9973
and
t =
0.0239
0.0158
0.9996
When triangulated with the Matlab code, the chosen solution is E_all{2}
R_all{2}=
-0.8559 -0.2677 0.4425
-0.1505 0.9475 0.2821
-0.4948 0.1748 -0.8512
and
t_all{2}=
-0.1040
-0.1355
0.9853
X2 =
0.1087 -0.0552 0.5762
0.1129 -0.0578 0.5836
0.4782 0.1582 -0.8198
0.0028 -0.0264 0.2099
0.0716 -0.0633 0.4862
When doing
X1./X2
ans =
-0.8644 -0.8667 -0.8650
-0.8524 -0.8603 -0.8546
0.6343 0.6376 0.6346
-2.3703 -2.4065 -2.4073
-1.0288 -1.0320 -1.0305
There is an almost constant scale factor between triangulated 3D points.
However, rotation matrices are different and there is no scale factor between translations.
t./t_all{2}=
-0.2295
-0.1167
1.0145
which makes the plotted trajectory wrong
Answering your numbered questions:
Beware that Nister's 5 point algorithm has many implementations, but most of them don't work well. Personal experience and unpublished work by colleagues show that OpenCV does not have a good implementation. The open implementation in Bundler and other working SfM pipelines work better in practice (but there is a lot of room for improvement).
The 10 solutions are simply zeros of a certain polynomial equation. As far as the polynomial equation can describe the problem, these 10 solutions all zero the equation. The equation does not describe that these 10 points are real, or that the 3D points corresponding to the 5 point correspondences have to be the same for each solution, but only that there are some 3D points (for each solution) that project to the 5 points, without even considering if the 3D points are in front of the respective cameras. Moreover, there may well be two sets of 3D points and cameras that happen to generate the same images of 5 points, so you would have to weed them out with some other procedure (below).
The choice of the right solution among the 10 complex solutions is usually done by many techniques:
Discard solutions that would lead to purely complex points or 3D points with negative depth (currently Bundler does not do this last check)
Discard solutions that are not physical for some other reason (you may have to do some of that yourself for your application)
The more usual procedure: For each remaining solution, check which one is more consistent with additional correspondences. In a real system you don't know which additional correspondences are right and which are pure trash. So run RANSAC for each of the solutions and keep the one with the most inliers. This is computationally heavy so should be used as a last resort.
You can see how Bundler does this at file 5point.c line 668:
generate_Ematrix_hypotheses(5, r_pts_inner, l_pts_inner, &num_hyp, E);
for (i = 0; i < num_hyp; i++) {
int best_inlier;
double score = 0.0;
double E2[9], tmp[9], F[9];
memcpy(E2, E + 9 * i, 9 * sizeof(double));
E2[0] = -E2[0];
E2[1] = -E2[1];
E2[3] = -E2[3];
E2[4] = -E2[4];
E2[8] = -E2[8];
matrix_transpose_product(3, 3, 3, 3, K2_inv, E2, tmp);
matrix_product(3, 3, 3, 3, tmp, K1_inv, F);
inliers = evaluate_Ematrix(n, r_pts, l_pts, // r_pts_norm, l_pts_norm,
thresh_norm, F, // E + 9 * i,
&best_inlier, &score);
if (inliers > max_inliers ||
(inliers == max_inliers && score < min_score)) {
best = 1;
max_inliers = inliers;
min_score = score;
memcpy(E_best, E + 9 * i, sizeof(double) * 9);
r_best = r_pts_norm[best_inlier];
l_best = l_pts_norm[best_inlier];
}
inliers_hyp[i] = inliers;
}
Related
Summary: This question deals with the improvement of an algorithm for the computation of linear regression.
I have a 3D (dlMAT) array representing monochrome photographs of the same scene taken at different exposure times (the vector IT) . Mathematically, every vector along the 3rd dimension of dlMAT represents a separate linear regression problem that needs to be solved. The equation whose coefficients need to be estimated is of the form:
DL = R*IT^P, where DL and IT are obtained experimentally and R and P must be estimated.
The above equation can be transformed into a simple linear model after applying a logarithm:
log(DL) = log(R) + P*log(IT) => y = a + b*x
Presented below is the most "naive" way to solve this system of equations, which essentially involves iterating over all "3rd dimension vectors" and fitting a polynomial of order 1 to (IT,DL(ind1,ind2,:):
%// Define some nominal values:
R = 0.3;
IT = 600:600:3000;
P = 0.97;
%// Impose some believable spatial variations:
pMAT = 0.01*randn(3)+P;
rMAT = 0.1*randn(3)+R;
%// Generate "fake" observation data:
dlMAT = bsxfun(#times,rMAT,bsxfun(#power,permute(IT,[3,1,2]),pMAT));
%// Regression:
sol = cell(size(rMAT)); %// preallocation
for ind1 = 1:size(dlMAT,1)
for ind2 = 1:size(dlMAT,2)
sol{ind1,ind2} = polyfit(log(IT(:)),log(squeeze(dlMAT(ind1,ind2,:))),1);
end
end
fittedP = cellfun(#(x)x(1),sol); %// Estimate of pMAT
fittedR = cellfun(#(x)exp(x(2)),sol); %// Estimate of rMAT
The above approach seems like a good candidate for vectorization, since it does not utilize MATLAB's main strength that is MATrix operations. For this reason, it does not scale very well and takes much longer to execute than I think it should.
There exist alternative ways to perform this computation based on matrix division, as demonstrated here and here, which involve something like this:
sol = [ones(size(x)),log(x)]\log(y);
That is, appending a vector of 1s to the observations, followed by mldivide to solve the equation system.
The main challenge I'm facing is how to adapt my data to the algorithm (or vice versa).
Question #1: How can the matrix-division-based solution be extended to solve the problem presented above (and potentially replace the loops I am using)?
Question #2 (bonus): What is the principle behind this matrix-division-based solution?
The secret ingredient behind the solution that includes matrix division is the Vandermonde matrix. The question discusses a linear problem (linear regression), and those can always be formulated as a matrix problem, which \ (mldivide) can solve in a mean-square error senseā”. Such an algorithm, solving a similar problem, is demonstrated and explained in this answer.
Below is benchmarking code that compares the original solution with two alternatives suggested in chat1, 2 :
function regressionBenchmark(numEl)
clc
if nargin<1, numEl=10; end
%// Define some nominal values:
R = 5;
IT = 600:600:3000;
P = 0.97;
%// Impose some believable spatial variations:
pMAT = 0.01*randn(numEl)+P;
rMAT = 0.1*randn(numEl)+R;
%// Generate "fake" measurement data using the relation "DL = R*IT.^P"
dlMAT = bsxfun(#times,rMAT,bsxfun(#power,permute(IT,[3,1,2]),pMAT));
%% // Method1: loops + polyval
disp('-------------------------------Method 1: loops + polyval')
tic; [fR,fP] = method1(IT,dlMAT); toc;
fprintf(1,'Regression performance:\nR: %d\nP: %d\n',norm(fR-rMAT,1),norm(fP-pMAT,1));
%% // Method2: loops + Vandermonde
disp('-------------------------------Method 2: loops + Vandermonde')
tic; [fR,fP] = method2(IT,dlMAT); toc;
fprintf(1,'Regression performance:\nR: %d\nP: %d\n',norm(fR-rMAT,1),norm(fP-pMAT,1));
%% // Method3: vectorized Vandermonde
disp('-------------------------------Method 3: vectorized Vandermonde')
tic; [fR,fP] = method3(IT,dlMAT); toc;
fprintf(1,'Regression performance:\nR: %d\nP: %d\n',norm(fR-rMAT,1),norm(fP-pMAT,1));
function [fittedR,fittedP] = method1(IT,dlMAT)
sol = cell(size(dlMAT,1),size(dlMAT,2));
for ind1 = 1:size(dlMAT,1)
for ind2 = 1:size(dlMAT,2)
sol{ind1,ind2} = polyfit(log(IT(:)),log(squeeze(dlMAT(ind1,ind2,:))),1);
end
end
fittedR = cellfun(#(x)exp(x(2)),sol);
fittedP = cellfun(#(x)x(1),sol);
function [fittedR,fittedP] = method2(IT,dlMAT)
sol = cell(size(dlMAT,1),size(dlMAT,2));
for ind1 = 1:size(dlMAT,1)
for ind2 = 1:size(dlMAT,2)
sol{ind1,ind2} = flipud([ones(numel(IT),1) log(IT(:))]\log(squeeze(dlMAT(ind1,ind2,:)))).'; %'
end
end
fittedR = cellfun(#(x)exp(x(2)),sol);
fittedP = cellfun(#(x)x(1),sol);
function [fittedR,fittedP] = method3(IT,dlMAT)
N = 1; %// Degree of polynomial
VM = bsxfun(#power, log(IT(:)), 0:N); %// Vandermonde matrix
result = fliplr((VM\log(reshape(dlMAT,[],size(dlMAT,3)).')).');
%// Compressed version:
%// result = fliplr(([ones(numel(IT),1) log(IT(:))]\log(reshape(dlMAT,[],size(dlMAT,3)).')).');
fittedR = exp(real(reshape(result(:,2),size(dlMAT,1),size(dlMAT,2))));
fittedP = real(reshape(result(:,1),size(dlMAT,1),size(dlMAT,2)));
The reason why method 2 can be vectorized into method 3 is essentially that matrix multiplication can be separated by the columns of the second matrix. If A*B produces matrix X, then by definition A*B(:,n) gives X(:,n) for any n. Moving A to the right-hand side with mldivide, this means that the divisions A\X(:,n) can be done in one go for all n with A\X. The same holds for an overdetermined system (linear regression problem), in which there is no exact solution in general, and mldivide finds the matrix that minimizes the mean-square error. In this case too, the operations A\X(:,n) (method 2) can be done in one go for all n with A\X (method 3).
The implications of improving the algorithm when increasing the size of dlMAT can be seen below:
For the case of 500*500 (or 2.5E5) elements, the speedup from Method 1 to Method 3 is about x3500!
It is also interesting to observe the output of profile (here, for the case of 500*500):
Method 1
Method 2
Method 3
From the above it is seen that rearranging the elements via squeeze and flipud takes up about half (!) of the runtime of Method 2. It is also seen that some time is lost on the conversion of the solution from cells to matrices.
Since the 3rd solution avoids all of these pitfalls, as well as the loops altogether (which mostly means re-evaluation of the script on every iteration) - it unsurprisingly results in a considerable speedup.
Notes:
There was very little difference between the "compressed" and the "explicit" versions of Method 3 in favor of the "explicit" version. For this reason it was not included in the comparison.
A solution was attempted where the inputs to Method 3 were gpuArray-ed. This did not provide improved performance (and even somewhat degradaed them), possibly due to wrong implementation, or the overhead associated with copying matrices back and forth between RAM and VRAM.
I have a matrix with 35 columns and I'm trying to reduce the dimension using PCA. I run PCA on my data:
[coeff,score,latent,tsquared,explained,mu] = pca(data);
explained =
99.9955
0.0022
0.0007
0.0003
0.0002
0.0001
0.0001
0.0001
Then, by looking at the vector explained, I notice the value of the first element is 99. Based on this, I decided to take only the first compoenet. So I did the follwoing:
k=1;
X = bsxfun(#minus, data, mean(data)) * coeff(:, 1:k);
and now, I used X for SVM training:
svmStruct = fitcsvm(X,Y,'Standardize',true, 'Prior','uniform','KernelFunction','linear','KernelScale','auto','Verbose',0,'IterationLimit', 1000000);
However, when I tried to predict and calculate the miss-classification rate:
[label,score,cost] = predict(svmStruct, X);
the result was disappointing. I notice, when I select only one component (k=1), I all classification was wrong. However, as I increase number of included components, k, the result improves, as you can see from the diagram below. But this doesn't make sense according to explained, which indicates that I should be fine with only the first eigenvector.
Did I do any mistake?
This diagram shows the classification error as a function of the number of included eginvectors:
This graph is generated after by doing normalization before doing PCA as suggested by #zelanix:
This is also plotted graph:
and this explained values obtained after doing normalization before PCA:
>> [coeff,score,latent,tsquared,explained,mu] = pca(data_normalised);
Warning: Columns of X are linearly dependent to within machine precision.
Using only the first 27 components to compute TSQUARED.
> In pca>localTSquared (line 501)
In pca (line 347)
>> explained
explained =
32.9344
15.6790
5.3093
4.7919
4.0905
3.8655
3.0015
2.7216
2.6300
2.5098
2.4275
2.3078
2.2077
2.1726
2.0892
2.0425
2.0273
1.9135
1.8809
1.7055
0.8856
0.3390
0.2204
0.1061
0.0989
0.0334
0.0085
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
Parag S. Chandakkar is absolutely right that there is no reason to expect that PCA will automatically improve your classification result. It is an unsupervised method so is not intended to improve separability, only to find the components with the largest variance.
But there are some other problems with your code. In particular, this line confuses me:
X = bsxfun(#minus, data, mean(data)) * coeff(:, 1:k);
You need to normalise your data before performing PCA, and each feature needs to be normalised separately. I use the following:
data_normalised = data;
for f = 1:size(data, 2)
data_normalised(:, f) = data_normalised(:, f) - nanmean(data_normalised(:, f));
data_normalised(:, f) = data_normalised(:, f) / nanstd(data_normalised(:, f));
end
pca_coeff = pca(data_normalised);
data_pca = data_normalised * pca_coeff;
You can then extract the first principal component as data_pca(:, 1).
Also, always plot your PCA results to get an idea of what is actually going on:
figure
scatter(data_pca(Y == 1, 1), data_pca(Y == 1, 2))
hold on;
scatter(data_pca(Y == 2, 1), data_pca(Y == 2, 2))
PCA gives the direction of maximum variance in the data, it does not necessarily have to do better classification. If you want to reduce your data while trying to maximize your accuracy, you should do LDA.
The following picture illustrates exactly what I want to convey.
I have all the data and an ODE system of three equations which has 9 unknown coefficients (a1, a2,..., a9).
dS/dt = a1*S+a2*D+a3*F
dD/dt = a4*S+a5*D+a6*F
dF/dt = a7*S+a8*D+a9*F
t = [1 2 3 4 5]
S = [17710 18445 20298 22369 24221]
D = [1357.33 1431.92 1448.94 1388.33 1468.95]
F = [104188 104792 112097 123492 140051]
How to find these coefficients (a1,..., a9) of an ODE using Matlab?
I can't spend too much time on this, but basically you need to use math to reduce the equation to something more meaningful:
your equation is of the order
dx/dt = A*x
ergo the solution is
x(t-t0) = exp(A*(t-t0)) * x(t0)
Thus
exp(A*(t-t0)) = x(t-t0) * Pseudo(x(t0))
Pseudo is the Moore-Penrose Pseudo-Inverse.
EDIT: Had a second look at my solution, and I didn't calculate the pseudo-inverse properly.
Basically, Pseudo(x(t0)) = x(t0)'*inv(x(t0)*x(t0)'), as x(t0) * Pseudo(x(t0)) equals the identity matrix
Now what you need to do is assume each time step (1 to 2, 2 to 3, 3 to 4) is an experiment (therefore t-t0=1), so the solution would be to:
1- Build your pseudo inverse:
xt = [S;D;F];
xt0 = xt(:,1:4);
xInv = xt0'*inv(xt0*xt0');
2- Get exponential result
xt1 = xt(:,2:5);
expA = xt1 * xInv;
3- Get the logarithm of the matrix:
A = logm(expA);
And since t-t0= 1, A is our solution.
And a simple proof to check
[t, y] = ode45(#(t,x) A*x,[1 5], xt(1:3,1));
plot (t,y,1:5, xt,'x')
You have a linear, coupled system of ordinary differential equations,
y' = Ay with y = [S(t); D(t); F(t)]
and you're trying to solve the inverse problem,
A = unknown
Interesting!
First line of attack
For given A, it is possible to solve such systems analytically (read the wiki for example).
The general solution for 3x3 design matrices A take the form
[S(t) D(t) T(t)].' = c1*V1*exp(r1*t) + c2*V2*exp(r2*t) + c3*V3*exp(r3*t)
with V and r the eigenvectors and eigenvalues of A, respectively, and c scalars that are usually determined by the problem's initial values.
Therefore, there would seem to be two steps to solve this problem:
Find vectors c*V and scalars r that best-fit your data
reconstruct A from the eigenvalues and eigenvectors.
However, going down this road is treaturous. You'd have to solve the non-linear least-squares problem for the sum-of-exponentials equation you have (using lsqcurvefit, for example). That would give you vectors c*V and scalars r. You'd then have to unravel the constants c somehow, and reconstruct the matrix A with V and r.
So, you'd have to solve for c (3 values), V (9 values), and r (3 values) to build the 3x3 matrix A (9 values) -- that seems too complicated to me.
Simpler method
There is a simpler way; use brute-force:
function test
% find
[A, fval] = fminsearch(#objFcn, 10*randn(3))
end
function objVal = objFcn(A)
% time span to be integrated over
tspan = [1 2 3 4 5];
% your desired data
S = [17710 18445 20298 22369 24221 ];
D = [1357.33 1431.92 1448.94 1388.33 1468.95 ];
F = [104188 104792 112097 123492 140051 ];
y_desired = [S; D; F].';
% solve the ODE
y0 = y_desired(1,:);
[~,y_real] = ode45(#(~,y) A*y, tspan, y0);
% objective function value: sum of squared quotients
objVal = sum((1 - y_real(:)./y_desired(:)).^2);
end
So far so good.
However, I tried both the complicated way and the brute-force approach above, but I found it very difficult to get the squared error anywhere near satisfyingly small.
The best solution I could find, after numerous attempts:
A =
1.216731997197118e+000 2.298119167536851e-001 -2.050312097914556e-001
-1.357306715497143e-001 -1.395572220988427e-001 2.607184719979916e-002
5.837808840775175e+000 -2.885686207763313e+001 -6.048741083713445e-001
fval =
3.868360951628554e-004
Which isn't bad at all :) But I would've liked a solution that was less difficult to find...
The problem says:
Three tensile tests were carried out on an aluminum bar. In each test the strain was measured at the same values of stress. The results were
where the units of strain are mm/m.Use linear regression to estimate the modulus of elasticity of the bar (modulus of elasticity = stress/strain).
I used this program for this problem:
function coeff = polynFit(xData,yData,m)
% Returns the coefficients of the polynomial
% a(1)*x^(m-1) + a(2)*x^(m-2) + ... + a(m)
% that fits the data points in the least squares sense.
% USAGE: coeff = polynFit(xData,yData,m)
% xData = x-coordinates of data points.
% yData = y-coordinates of data points.
A = zeros(m); b = zeros(m,1); s = zeros(2*m-1,1);
for i = 1:length(xData)
temp = yData(i);
for j = 1:m
b(j) = b(j) + temp;
temp = temp*xData(i);
end
temp = 1;
for j = 1:2*m-1
s(j) = s(j) + temp;
temp = temp*xData(i);
end
end
for i = 1:m
for j = 1:m
A(i,j) = s(i+j-1);
end
end
% Rearrange coefficients so that coefficient
% of x^(m-1) is first
coeff = flipdim(gaussPiv(A,b),1);
The problem is solved without a program as follows
MY ATTEMPT
T=[34.5,69,103.5,138];
D1=[.46,.95,1.48,1.93];
D2=[.34,1.02,1.51,2.09];
D3=[.73,1.1,1.62,2.12];
Mod1=T./D1;
Mod2=T./D2;
Mod3=T./D3;
xData=T;
yData1=Mod1;
yData2=Mod2;
yData3=Mod3;
coeff1 = polynFit(xData,yData1,2);
coeff2 = polynFit(xData,yData2,2);
coeff3 = polynFit(xData,yData3,2);
x1=(0:.5:190);
y1=coeff1(2)+coeff1(1)*x1;
subplot(1,3,1);
plot(x1,y1,xData,yData1,'o');
y2=coeff2(2)+coeff2(1)*x1;
subplot(1,3,2);
plot(x1,y2,xData,yData2,'o');
y3=coeff3(2)+coeff3(1)*x1;
subplot(1,3,3);
plot(x1,y3,xData,yData3,'o');
What do I have to do to get this result?
As a general advice:
avoid for loops wherever possible.
avoid using i and j as variable names, as they are Matlab built-in names for the imaginary unit (I really hope that disappears in a future release...)
Due to m being an interpreted language, for-loops can be very slow compared to their compiled alternatives. Matlab is named MATtrix LABoratory, meaning it is highly optimized for matrix/array operations. Usually, when there is an operation that cannot be done without a loop, Matlab has a built-in function for it that runs way way faster than a for-loop in Matlab ever will. For example: computing the mean of elements in an array: mean(x). The sum of all elements in an array: sum(x). The standard deviation of elements in an array: std(x). etc. Matlab's power comes from these built-in functions.
So, your problem. You have a linear regression problem. The easiest way in Matlab to solve this problem is this:
%# your data
stress = [ %# in Pa
34.5 69 103.5 138] * 1e6;
strain = [ %# in m/m
0.46 0.95 1.48 1.93
0.34 1.02 1.51 2.09
0.73 1.10 1.62 2.12]' * 1e-3;
%# make linear array for the data
yy = strain(:);
xx = repmat(stress(:), size(strain,2),1);
%# re-formulate the problem into linear system Ax = b
A = [xx ones(size(xx))];
b = yy;
%# solve the linear system
x = A\b;
%# modulus of elasticity is coefficient
%# NOTE: y-offset is relatively small and can be ignored)
E = 1/x(1)
What you did in the function polynFit is done by A\b, but the \-operator is capable of doing it way faster, way more robust and way more flexible than what you tried to do yourself. I'm not saying you shouldn't try to make these thing yourself (please keep on doing that, you learn a lot from it!), I'm saying that for the "real" results, always use the \-operator (and check your own results against it as well).
The backslash operator (type help \ on the command prompt) is extremely useful in many situations, and I advise you learn it and learn it well.
I leave you with this: here's how I would write your polynFit function:
function coeff = polynFit(X,Y,m)
if numel(X) ~= numel(X)
error('polynFit:size_mismathc',...
'number of elements in matrices X and Y must be equal.');
end
%# bad condition number, rank errors, etc. taken care of by \
coeff = bsxfun(#power, X(:), m:-1:0) \ Y(:);
end
I leave it up to you to figure out how this works.
Edit: Actually this is not unexpected behaviour, but I still need a solution.
findpeaks compares each element of data to its neighboring values.
I have data which contains peaks which I detect with the function findpeaks from the Signal Processing Toolbox. Sometimes the function seems not to detect the peaks properly, when I have the same value twice next to each other. This occurs very rarly in my data, but here is a sample to illustrate my problem:
>> values
values =
-0.0324
-0.0371
-0.0393
-0.0387
-0.0331
-0.0280
-0.0216
-0.0134
-0.0011
0.0098
0.0217
0.0352
0.0467
0.0548
0.0639
0.0740
0.0813
0.0858 <-- here should be another peak
0.0858 <--
0.0812
0.0719
0.0600
0.0473
0.0353
0.0239
0.0151
0.0083
0.0034
-0.0001
-0.0025
-0.0043
-0.0057
-0.0048
-0.0038
-0.0026
0.0007
0.0043
0.0062
0.0083
0.0106
0.0111
0.0116
0.0102
0.0089
0.0057
0.0025
-0.0025
-0.0056
Now the findpeaks function only finds one peak:
>> [pks loc] = findpeaks(values)
pks =
0.0116
loc =
42
If I plot the data, it becomes obvious that findpeaks misses one of the peaks at the location 18/19 because they both have the value 0.08579.
What is the best way to find those missing peaks?
If you have the image processing toolbox, you can use IMREGIONALMAX to find the peaks, after which you can use regionprops to find the center of the regions (if that's what you need), i.e.
bw = imregionalmax(signal);
peakLocations = find(bw); %# returns n peaks for an n-tuple of max-values
stats = regionprops(bw,'Centroid');
peakLocations = cat(1,stats.Centroid); %# returns the center of the n-tuple of max-values
This is an old topic, but maybe some are still looking for an easier solution to this (like I did today):
You could also just substract some very small fixed value from all values on a plateau, except from the first value. This causes each first value on a plateau to always be the highest on the respective plateaus, causing them to be included as peaks.
Just make something like this part of your code:
peaks = yourdata;
verysmallvalue = .001;
plateauvalue = peaks(1);
for i = 2:size(peaks,1)
if peaks(i) == plateauvalue
peaks(i) = peaks(i) - verysmallvalue;
else
plateauvalue = peaks(i);
end
end
[PKS,LOCS] = findpeaks(peaks);
plot(yourdata);
hold on;
plot(LOCS, yourdata(LOCS), 'Color', 'Red', 'Line', 'None', 'Marker', 'o');
Hope this helps!
Use the second derivative test instead?
I ended up writing my own simpler version of findpeaks, which seems to work for my purpose.
function [pks,locs] = my_findpeaks(X)
M = numel(X);
pks = [];
locs = [];
if (M < 4)
datamsgid = generatemsgid('emptyDataSet');
error(datamsgid,'Data set must contain at least 4 samples.');
else
for idx=1:M-3
if X(idx)< X(idx+1) && X(idx+1)>=X(idx+2) && X(idx+2)> X(idx+3)
pks = [pks X(idx)];
locs = [locs idx];
end
end
end
end
Edit: To clarify, the problem arose, when I had a peak which was exactly between two sample points and those two sample points had coincidentally the same value. It only happend a couple of times in more than 10.000 cases.
The behavior that you describe is a known bug in versions of MATLAB prior to R2010b. The minimum example is
findpeaks([0 1 1 0])
which returns [], while
findpeaks([0 1 0])
returns the (position of the) peak.
The bug has been fixed in R2010b and later, see the official Bug Report. With that fix, findpeaks returns the rising edge of "peaks with repeated values" (which I would call plateaus).