A is a matrix containing some matching points for a stereo vision system and the cameras' matrices. Regarding the problem, I know that I need to minimize a cost function relating the distance between a projected and a detected point.
Investigating some functions inside MATLAB I found this code that I suppose does this minimization because of the output I receive.
I'd like to understand, if possible, what mathgician is happening here:
[~,~,V] = svd(A);
X = V(:,end);
X = X/X(end);
Thanks in advance for any help
[~,~,V] = svd(A);
performs a singular value decomposition of the matrix A which produces three matrices as output. The first two of these are ignored (by assigning them to ~, according to MATLAB convention) and the third is assigned to the variable V.
X = V(:,end);
assigns the rightmost column of matrix V to the variable X - the : means 'all', in this case 'all the rows'
X = X/X(end);
divides each element of X by the last element of X - or in other words, scales the vector X so that its last element is equal to 1.
Related
I have multiple linear equations in the form of Zi=ai*Xi+bi*Yi for i = 1..30.
How can I calculate every pair of regression coefficient values, or those 30 values of a and b for each (Z,X,Y) combination using MATLAB?
I've tried the following code:
A=Z; B=[Xs Ys];
C = B \ A;
A are my Z points while B is a matrix of my X and Y points. However, I seem to only get one pair of regression coefficients for all of the points.
Thanks in advance!
What you have set up there is unfortunately not the right way to solve it if I understand your problem formulation. That assumption assumes that you are trying to fit all of the points on a single line. Each row of B would thus serve as one point on only one line that you are trying to find the linear regression of. If you want to solve for multiple lines simultaneously, you are going to need to change your formulation.
That is actually very simple. I'm going to assume that you have 30 (x,y) points where each point denotes one equation of a line. You have these set as Xs and Ys respectively. The outputs for each of these equations is also in Zs. I'm also going to assume these are column vectors, and therefore, you have a system set up such that:
a_i and b_i are the coefficients for each line. You know the (x,y) for each line and your goal is to solve for each corresponding a and b. As such, you would need to reformulate your system so that you're solve for a and b.
Rewriting that problem in matrix form, it can be done like so:
The right hand side vector of a_1, b_1, a_2, b_2, ... is what you are ultimately solving for. You can see that we have a matrix equation of Y = M*X where M and Y are known and X is what we need to solve for by doing X = M\Y. As such, all you need to rearrange your x and y values into a block matrix like the above. First we need to find the correct linear indices so that we can place our x and y values into this matrix, then solve the system by least squares with the ldivide operator. The matrix is a N x 2N matrix where N is the total number of equations or constraints that we have (so in your case, 30):
N = numel(Xs);
M = zeros(N, 2*N);
xind = sub2ind(size(M), 1:N, 1:2:2*N);
yind = sub2ind(size(M), 1:N, 2:2:2*N);
M(xind) = Xs;
M(yind) = Ys;
sub2ind allows you to place multiple values into a matrix with a single line of code. Specifically, sub2ind determine the linear indices from a set of row and column coordinates to access into a matrix. If you don't already know, you can access values (and set values) in a matrix using a single number instead of a pair of row and columns. sub2ind will allow you to set multiple values in a matrix at once by specifying a set of linear indices to access in the matrix with a corresponding vector.
In our case, we need two sets of linear indices - one for the x values and one for the y values. Note that the x values start from the first column and skip every other column. The same behaviour can be said for the y values but we start at the second column. Once we have those indices, we set the x and y values in this matrix and we now we simply solve for the coefficients:
coeff = M \ Z;
coeff will now be 2N x 1 vector, so if you want, you can reshape this into a matrix:
coeff = reshape(coeff, 2, []);
Now, coeff will be shaped such that each column will give you the pair of a,b for each equation that you had. As such, the first column denotes a_1, b_1, the second column denotes a_2, b_2 and so on. The first row of coeff is all of the a coefficients for each constraint while the second row is all of the b coefficients for each constraint.
Looking over some MATLAB code related to multivariate Gaussian distributions and I come across this line:
params.means(k, :) = mean(X(Y == y, :));
Looking at the MATLAB documentation http://www.mathworks.com/help/matlab/ref/mean.html, my assumption is that it calculates the mean over the matrix X in the first dimension (the column). What I don't see is the parentheses that comes after. Is this a conditional probability (where Y = y)? Can someone point me to some documentation where this is explained?
Unpacked, this single line might look like:
row_indices = find(Y==y);
new_X = X(row_indices,:);
params.means(k,:) = mean(new_X);
So, as you can see, the Y==y is simply being used to find a subset of X over which the mean is taken.
Given that you said that this was for computing multivariate Gaussian distributions, I bet that X and Y are paired sets of data. I bet that the code is looping (using the variable k) over different values y. So, it finds all of the Y equal to y and then calculates the mean of the X values that correspond to those Y values.
Please see the following issue:
P=rand(4,4);
for i=1:size(P,2)
for j=1:size(P,2)
[r,p]=corr(P(:,i),P(:,j))
end
end
Clearly, the loop will cause the number of correlations to be doubled (i.e., corr(P(:,1),P(:,4)) and corr(P(:,4),P(:,1)). Does anyone have a suggestion on how to avoid this? Perhaps not using a loop?
Thanks!
I have four suggestions for you, depending on what exactly you are doing to compute your matrices. I'm assuming the example you gave is a simplified version of what needs to be done.
First Method - Adjusting the inner loop index
One thing you can do is change your j loop index so that it only goes from 1 up to i. This way, you get a lower triangular matrix and just concentrate on the values within the lower triangular half of your matrix. The upper half would essentially be all set to zero. In other words:
for i = 1 : size(P,2)
for j = 1 : i
%// Your code here
end
end
Second Method - Leave it unchanged, but then use unique
You can go ahead and use the same matrix like you did before with the full two for loops, but you can then filter the duplicates by using unique. In other words, you can do this:
[Y,indices] = unique(P);
Y will give you a list of unique values within the matrix P and indices will give you the locations of where these occurred within P. Note that these are column major indices, and so if you wanted to find the row and column locations of where these locations occur, you can do:
[rows,cols] = ind2sub(size(P), indices);
Third Method - Use pdist and squareform
Since you're looking for a solution that requires no loops, take a look at the pdist function. Given a M x N matrix, pdist will find distances between each pair of rows in a matrix. squareform will then transform these distances into a matrix like what you have seen above. In other words, do this:
dists = pdist(P.', 'correlation');
distMatrix = squareform(dists);
Fourth Method - Use the corr method straight out of the box
You can just use corr in the following way:
[rho, pvals] = corr(P);
corr in this case will produce a m x m matrix that contains the correlation coefficient between each pair of columns an n x m matrix stored in P.
Hopefully one of these will work!
this works ?
for i=1:size(P,2)
for j=1:i
Since you are just correlating each column with the other, then why not just use (straight from the documentation)
[Rho,Pval] = corr(P);
I don't have the Statistics Toolbox, but according to http://www.mathworks.com/help/stats/corr.html,
corr(X) returns a p-by-p matrix containing the pairwise linear correlation coefficient between each pair of columns in the n-by-p matrix X.
I am trying to understand principal component analysis in Matlab,
There seems to be at least 3 different functions that do it.
I have some questions re the code below:
Am I creating approximate x values using only one eigenvector (the one corresponding to the largest eigenvalue) correctly? I think so??
Why are PC and V which are both meant to be the loadings for (x'x) presented differently? The column order is reversed because eig does not order the eigenvalues with the largest value first but why are they the negative of each other?
Why are the eig values not in ordered with the eigenvector corresponding to the largest eigenvalue in the first column?
Using the code below I get back to the input matrix x when using svd and eig, but the results from princomp seem to be totally different? What so I have to do to make princomp match the other two functions?
Code:
x=[1 2;3 4;5 6;7 8 ]
econFlag=0;
[U,sigma,V] = svd(x,econFlag);%[U,sigma,coeff] = svd(z,econFlag);
U1=U(:,1);
V1=V(:,1);
sigma_partial=sigma(1,1);
score1=U*sigma;
test1=score1*V';
score_partial=U1*sigma_partial;
test1_partial=score_partial*V1';
[PC, D] = eig(x'*x)
score2=x*PC;
test2=score2*PC';
PC1=PC(:,2);
score2_partial=x*PC1;
test2_partial=score2_partial*PC1';
[o1 o2 o3]=princomp(x);
Yes. According to the documentation of svd, diagonal elements of the output S are in decreasing order. There is no such guarantee for the the output D of eig though.
Eigenvectors and singular vectors have no defined sign. If a is an eigenvector, so is -a.
I've often wondered the same. Laziness on the part of TMW? Optimization, because sorting would be an additional step and not everybody needs 'em sorted?
princomp centers the input data before computing the principal components. This makes sense as normally the PCA is computed with respect to the covariance matrix, and the eigenvectors of x' * x are only identical to those of the covariance matrix if x is mean-free.
I would compute the PCA by transforming to the basis of the eigenvectors of the covariance matrix (centered data), but apply this transform to the original (uncentered) data. This allows to capture a maximum of variance with as few principal components as possible, but still to recover the orginal data from all of them:
[V, D] = eig(cov(x));
score = x * V;
test = score * V';
test is identical to x, up to numerical error.
In order to easily pick the components with the most variance, let's fix that lack of sorting ourselves:
[V, D] = eig(cov(x));
[D, ind] = sort(diag(D), 'descend');
V = V(:, ind);
score = x * V;
test = score * V';
Reconstruct the signal using the strongest principal component only:
test_partial = score(:, 1) * V(:, 1)';
In response to Amro's comments: It is of course also possible to first remove the means from the input data, and transform these "centered" data. In that case, for perfect reconstruction of the original data it would be necessary to add the means again. The way to compute the PCA given above is the one described by Neil H. Timm, Applied Multivariate Analysis, Springer 2002, page 446:
Given an observation vector Y with mean mu and covariance matrix Sigma of full rank p, the goal of PCA is to create a new set of variables called principal components (PCs) or principal variates. The principal components are linear combinations of the variables of the vector Y that are uncorrelated such that the variance of the jth component is maximal.
Timm later defines "standardized components" as those which have been computed from centered data and are then divided by the square root of the eigenvalues (i.e. variances), i.e. "standardized principal components" have mean 0 and variance 1.
I have two matrices X and Y. Both represent a number of positions in 3D-space. X is a 50*3 matrix, Y is a 60*3 matrix.
My question: why does applying the mean-function over the output of pdist2() in combination with 'Mahalanobis' not give the result obtained with mahal()?
More details on what I'm trying to do below, as well as the code I used to test this.
Let's suppose the 60 observations in matrix Y are obtained after an experimental manipulation of some kind. I'm trying to assess whether this manipulation had a significant effect on the positions observed in Y. Therefore, I used pdist2(X,X,'Mahalanobis') to compare X to X to obtain a baseline, and later, X to Y (with X the reference matrix: pdist2(X,Y,'Mahalanobis')), and I plotted both distributions to have a look at the overlap.
Subsequently, I calculated the mean Mahalanobis distance for both distributions and the 95% CI and did a t-test and Kolmogorov-Smirnoff test to asses if the difference between the distributions was significant. This seemed very intuitive to me, however, when testing with mahal(), I get different values, although the reference matrix is the same. I don't get what the difference between both ways of calculating mahalanobis distance is exactly.
Comment that is too long #3lectrologos:
You mean this: d(I) = (Y(I,:)-mu)inv(SIGMA)(Y(I,:)-mu)'? This is just the formula for calculating mahalanobis, so should be the same for pdist2() and mahal() functions. I think mu is a scalar and SIGMA is a matrix based on the reference distribution as a whole in both pdist2() and mahal(). Only in mahal you are comparing each point of your sample set to the points of the reference distribution, while in pdist2 you are making pairwise comparisons based on a reference distribution. Actually, with my purpose in my mind, I think I should go for mahal() instead of pdist2(). I can interpret a pairwise distance based on a reference distribution, but I don't think it's what I need here.
% test pdist2 vs. mahal in matlab
% the purpose of this script is to see whether the average over the rows of E equals the values in d...
% data
X = []; % 50*3 matrix, data omitted
Y = []; % 60*3 matrix, data omitted
% calculations
S = nancov(X);
% mahal()
d = mahal(Y,X); % gives an 60*1 matrix with a value for each Cartesian element in Y (second matrix is always the reference matrix)
% pairwise mahalanobis distance with pdist2()
E = pdist2(X,Y,'mahalanobis',S); % outputs an 50*60 matrix with each ij-th element the pairwise distance between element X(i,:) and Y(j,:) based on the covariance matrix of X: nancov(X)
%{
so this is harder to interpret than mahal(), as elements of Y are not just compared to the "mahalanobis-centroid" based on X,
% but to each individual element of X
% so the purpose of this script is to see whether the average over the rows of E equals the values in d...
%}
F = mean(E); % now I averaged over the rows, which means, over all values of X, the reference matrix
mean(d)
mean(E(:)) % not equal to mean(d)
d-F' % not zero
% plot output
figure(1)
plot(d,'bo'), hold on
plot(mean(E),'ro')
legend('mahal()','avaraged over all x values pdist2()')
ylabel('Mahalanobis distance')
figure(2)
plot(d,'bo'), hold on
plot(E','ro')
plot(d,'bo','MarkerFaceColor','b')
xlabel('values in matrix Y (Yi) ... or ... pairwise comparison Yi. (Yi vs. all Xi values)')
ylabel('Mahalanobis distance')
legend('mahal()','pdist2()')
One immediate difference between the two is that mahal subtracts the sample mean of X from each point in Y before computing distances.
Try something like E = pdist2(X,Y-mean(X),'mahalanobis',S); to see if that gives you the same results as mahal.
Note that
mahal(X,Y)
is equivalent to
pdist2(X,mean(Y),'mahalanobis',cov(Y)).^2
Well, I guess there are two different ways to calculate mahalanobis distance between two clusters of data like you explain above:
1) you compare each data point from your sample set to mu and sigma matrices calculated from your reference distribution (although labeling one cluster sample set and the other reference distribution may be arbitrary), thereby calculating the distance from each point to this so called mahalanobis-centroid of the reference distribution.
2) you compare each datapoint from matrix Y to each datapoint of matrix X, with, X the reference distribution (mu and sigma are calculated from X only)
The values of the distances will be different, but I guess the ordinal order of dissimilarity between clusters is preserved when using either method 1 or 2? I actually wonder when comparing 10 different clusters to a reference matrix X, or to each other, if the order of the dissimilarities would differ using method 1 or method 2? Also, I can't imagine a situation where one method would be wrong and the other method not. Although method 1 seems more intuitive in some situations, like mine.