Finding eigenvector corresponding to smallest eigenvalue - matlab

I want to find the corresponding eigenvector of the eigenvalue of minimum magnitude of a matrix U. What is the easiest way to do this?
Currently I am using the algorithm
[evecs, D] = eigs(U);
evals = diag(D);
smallesteig = inf;
for k = 1:length(evals)
if (evals(k) < smallesteig)
smallesteig = evals(k);
vec = evecs(:, k);
end
end
Is there a more efficient way of doing this?

There is a very simple shorthand for this: [V,D] = eigs(U,1,'SM').
If you look at the eigs documentation, it states:
EIGS(A,K,SIGMA) and EIGS(A,B,K,SIGMA) return K eigenvalues. If SIGMA is:
'LM' or 'SM' - Largest or Smallest Magnitude
For real symmetric problems, SIGMA may also be:
'LA' or 'SA' - Largest or Smallest Algebraic
'BE' - Both Ends, one more from high end if K is odd
For nonsymmetric or complex problems, SIGMA may also be:
'LR' or 'SR' - Largest or Smallest Real part
'LI' or 'SI' - Largest or Smallest Imaginary part
So, [V,D] = eigs(U,1,'SM') returns the eigenvector and value for the 1st eigenvalue of U when sorted by Smallest Magnitude.

Related

MATLAB: The determinant of a covariance matrix is either 0 or inf

I have a 1500x1500 covariance matrix of which I am trying to calculate the determinant for EM-ML method. The covariance matrix is obtained by finding the SIGMA matrix and then passing it into the nearestSPD library (Link) to make the matrix positive definite . In this case the matrix is always singular. Another method I tried was of manually generating a positive definite matrix using A'*A technique. (A was taken as a 1600x1500 matrix). This always gives me the determinant as infinite. Any idea on how I can get a positive definite matrix with a finite determinant?
Do you actually need the determinant, or the log of the determinant?
For example if you are computing a log likelihood of gaussians then what enters into the log likelihood is the log of the determinant. In high dimensions determinants mey not fit in a double, but its log most likely will.
If you perform a cholesky factorisation of the covariance C, with (lower triangular) factor L say so that
C = L*L'
then
det C = det(L) * det( L') = det(L) * det(L)
But the determinant of a lower triangular matrix is the product of its diagonal elements, so, taking logs above we get:
log det C = 2*Sum{ i | log( L[i,i])}
(In response to a comment)
Even if you need to calculate a gaussian pdf, it is better to calculate the log of that and exponentiate only when you need to. For example a d dimenions gaussian with covariance C (which has a cholesky factor L) and mean 0 (purely to save typing) is:
p(x) = exp( -0.5*x'*inv(C)*x) /( sqrt( pow(2pi,d) * det(C))
so
log p(x) = -0.5*x'*inv(C)*x - 0.5*d*log(2pi) - 0.5*log(det(C))
which can also be written
log p(x) = -0.5*y'*y - 0.5*d*log(2pi) - log(det(L))
where
y = inv(L)*x

MATLAB eig returns inverted signs sometimes

I'm trying to write a program that gets a matrix A of any size, and SVD decomposes it:
A = U * S * V'
Where A is the matrix the user enters, U is an orthogonal matrix composes of the eigenvectors of A * A', S is a diagonal matrix of the singular values, and V is an orthogonal matrix of the eigenvectors of A' * A.
Problem is: the MATLAB function eig sometimes returns the wrong eigenvectors.
This is my code:
function [U,S,V]=badsvd(A)
W=A*A';
[U,S]=eig(W);
max=0;
for i=1:size(W,1) %%sort
for j=i:size(W,1)
if(S(j,j)>max)
max=S(j,j);
temp_index=j;
end
end
max=0;
temp=S(temp_index,temp_index);
S(temp_index,temp_index)=S(i,i);
S(i,i)=temp;
temp=U(:,temp_index);
U(:,temp_index)=U(:,i);
U(:,i)=temp;
end
W=A'*A;
[V,s]=eig(W);
max=0;
for i=1:size(W,1) %%sort
for j=i:size(W,1)
if(s(j,j)>max)
max=s(j,j);
temp_index=j;
end
end
max=0;
temp=s(temp_index,temp_index);
s(temp_index,temp_index)=s(i,i);
s(i,i)=temp;
temp=V(:,temp_index);
V(:,temp_index)=V(:,i);
V(:,i)=temp;
end
s=sqrt(s);
end
My code returns the correct s matrix, and also "nearly" correct U and V matrices. But some of the columns are multiplied by -1. obviously if t is an eigenvector, then also -t is an eigenvector, but with the signs inverted (for some of the columns, not all) I don't get A = U * S * V'.
Is there any way to fix this?
Example: for the matrix A=[1,2;3,4] my function returns:
U=[0.4046,-0.9145;0.9145,0.4046]
and the built-in MATLAB svd function returns:
u=[-0.4046,-0.9145;-0.9145,0.4046]
Note that eigenvectors are not unique. Multiplying by any constant, including -1 (which simply changes the sign), gives another valid eigenvector. This is clear given the definition of an eigenvector:
A·v = λ·v
MATLAB chooses to normalize the eigenvectors to have a norm of 1.0, the sign is arbitrary:
For eig(A), the eigenvectors are scaled so that the norm of each is 1.0.
For eig(A,B), eig(A,'nobalance'), and eig(A,B,flag), the eigenvectors are not normalized
Now as you know, SVD and eigendecomposition are related. Below is some code to test this fact. Note that svd and eig return results in different order (one sorted high to low, the other in reverse):
% some random matrix
A = rand(5);
% singular value decomposition
[U,S,V] = svd(A);
% eigenvectors of A'*A are the same as the right-singular vectors
[V2,D2] = eig(A'*A);
[D2,ord] = sort(diag(D2), 'descend');
S2 = diag(sqrt(D2));
V2 = V2(:,ord);
% eigenvectors of A*A' are the same as the left-singular vectors
[U2,D2] = eig(A*A');
[D2,ord] = sort(diag(D2), 'descend');
S3 = diag(sqrt(D2));
U2 = U2(:,ord);
% check results
A
U*S*V'
U2*S2*V2'
I get very similar results (ignoring minor floating-point errors):
>> norm(A - U*S*V')
ans =
7.5771e-16
>> norm(A - U2*S2*V2')
ans =
3.2841e-14
EDIT:
To get consistent results, one usually adopts a convention of requiring that the first element in each eigenvector be of a certain sign. That way if you get an eigenvector that does not follow this rule, you multiply it by -1 to flip the sign...

Comparing two sets of vectors

I've got matrices A and B
size(A) = [n x]; size(B) = [n y];
Now I need to compare euclidian distance of each column vector of A from each column vector of B. I'm using dist method right now
Q = dist([A B]); Q = Q(1:x, x:end);
But it does also lot of needless work (like calculating distances between vectors of A and B separately).
What is the best way to calculate this?
You are looking for pdist2.
% Compute the ordinary Euclidean distance
D = pdist2(A.',B.','euclidean'); % euclidean distance
You should take the transpose of the matrices since pdist2 assumes the observations are in rows, not in columns.
An alternative solution to pdist2, if you don't have the Statistics Toolbox, is to compute this manually. For example, one way to do it is:
[X, Y] = meshgrid(1:size(A, 2), 1:size(B, 2)); %// or meshgrid(1:x, 1:y)
Q = sqrt(sum((A(:, X(:)) - B(:, Y(:))) .^ 2, 1));
The indices of the columns from A and B for each value in vector Q can be obtained by computing:
[X(:), Y(:)]
where each row contains a pair of indices: the first is the column index in matrix A, and the second is the column index in matrix B.
Another solution if you don't have pdist2 and which may also be faster for very large matrices is to vectorize the following mathematical fact:
||x-y||^2 = ||x||^2 + ||y||^2 - 2*dot(x,y)
where ||a|| is the L2-norm (euclidean norm) of a.
Comments:
C=-2*A'*B (this is a x by y matrix) is the vectorization of the dot products.
||x-y||^2 is the square of the euclidean distance which you are looking for.
Is that enough or do you need the explicit code?
The reason this may be faster asymptotically is that you avoid doing the metric calculation for all x*y comparisons, since you are instead making the bottleneck a matrix multiplication (matrix multiplication is highly optimized in matlab). You are taking advantage of the fact that this is the euclidean distance and not just some unknown metric.

Calculate distance from point p to high dimensional Gaussian (M, V)

I have a high dimensional Gaussian with mean M and covariance matrix V. I would like to calculate the distance from point p to M, taking V into consideration (I guess it's the distance in standard deviations of p from M?).
Phrased differentially, I take an ellipse one sigma away from M, and would like to check whether p is inside that ellipse.
If V is a valid covariance matrix of a gaussian, it then is symmetric positive definite and therefore defines a valid scalar product. By the way inv(V) also does.
Therefore, assuming that M and p are column vectors, you could define distances as:
d1 = sqrt((M-p)'*V*(M-p));
d2 = sqrt((M-p)'*inv(V)*(M-p));
the Matlab way one would rewrite d2as (probably some unnecessary parentheses):
d2 = sqrt((M-p)'*(V\(M-p)));
The nice thing is that when V is the unit matrix, then d1==d2and it correspond to the classical euclidian distance. To find wether you have to use d1 or d2is left as an exercise (sorry, part of my job is teaching). Write the multi-dimensional gaussian formula and compare it to the 1D case, since the multidimensional case is only a particular case of the 1D (or perform some numerical experiment).
NB: in very high dimensional spaces or for very many points to test, you might find a clever / faster way from the eigenvectors and eigenvalues of V (i.e. the principal axes of the ellipsoid and their corresponding variance).
Hope this helps.
A.
Consider computing the probability of the point given the normal distribution:
M = [1 -1]; %# mean vector
V = [.9 .4; .4 .3]; %# covariance matrix
p = [0.5 -1.5]; %# 2d-point
prob = mvnpdf(p,M,V); %# probability P(p|mu,cov)
The function MVNPDF is provided by the Statistics Toolbox
Maybe I'm totally off, but isn't this the same as just asking for each dimension: Am I inside the sigma?
PSEUDOCODE:
foreach(dimension d)
(M(d) - sigma(d) < p(d) < M(d) + sigma(d)) ?
Because you want to know if p is inside every dimension of your gaussian. So actually, this is just a space problem and your Gaussian hasn't have to do anything with it (except for M and sigma which are just distances).
In MATLAB you could try something like:
all(M - sigma < p < M + sigma)
A distance to that place could be, where I don't know the function for the Euclidean distance. Maybe dist works:
dist(M, p)
Because M is just a point in space and p as well. Just 2 vectors.
And now the final one. You want to know the distance in a form of sigma's:
% create a distance vector and divide it by sigma
M - p ./ sigma
I think that will do the trick.

Eigenvalues in MATLAB

In MATLAB, when I run the command [V,D] = eig(a) for a symmetric matrix, the largest eigenvalue (and its associated vector) is located in last column. However, when I run it with a non-symmetric matrix, the largest eigenvalue is in the first column.
I am trying to calculate eigenvector centrality which requires that I take the compute the eigenvector associated with the largest eigenvalue. So the fact that the largest eigenvalue appears in two separate places it makes it difficult for me to find the solution.
What I usually do is:
[V D] = eig(a);
[D order] = sort(diag(D),'descend'); %# sort eigenvalues in descending order
V = V(:,order);
You just have to find the index of the largest eigenvalue in D, which can easily be done using the function DIAG to extract the main diagonal and the function MAX to get the maximum eigenvalue and the index where it occurs:
[V,D] = eig(a);
[maxValue,index] = max(diag(D)); %# The maximum eigenvalue and its index
maxVector = V(:,index); %# The associated eigenvector in V
NOTE: As woodchips points out, you can have complex eigenvalues for non-symmetric matrices. When operating on a complex input X, the MAX function uses the magnitude of the complex number max(abs(X)). In the case of equal magnitude elements, the phase angle max(angle(X)) is used.
Note that non-symmetric matrices tend to have complex eigenvalues.
eig(rand(7))
ans =
3.2957
-0.22966 + 0.58374i
-0.22966 - 0.58374i
-0.38576
0.49064
0.17144 + 0.27968i
0.17144 - 0.27968i
Also note that eig does not explicitly return sorted eigenvalues (although the underlying algorithm tends to produce them in a nearly sorted order, based on the magnitude of the eigenvalue), but even if you do do a sort, you need to understand how sort works on complex vectors.
sort(rand(5,1) + i*rand(5,1))
ans =
0.42343 + 0.51539i
0.0098208 + 0.76145i
0.20348 + 0.88695i
0.43595 + 0.83893i
0.8225 + 0.91264i
Sort, when applied to complex inputs, works on the magnitude of the complex number.
If you only care for the eigenvector associated with the largest eigenvalue, isn't it better to use eigs?
[V, D] = eigs( a, 1, 'lm' ); %// get first eigenvector with largest eigenvalue magnitude.