Principal Component Analysis in MATLAB - matlab

I'm implementing PCA using eigenvalue decomposition for sparse data. I know matlab has PCA implemented, but it helps me understand all the technicalities when I write code.
I've been following the guidance from here, but I'm getting different results in comparison to built-in function princomp.
Could anybody look at it and point me in the right direction.
Here's the code:
function [mu, Ev, Val ] = pca(data)
% mu - mean image
% Ev - matrix whose columns are the eigenvectors corresponding to the eigen
% values Val
% Val - eigenvalues
if nargin ~= 1
error ('usage: [mu,E,Values] = pca_q1(data)');
end
mu = mean(data)';
nimages = size(data,2);
for i = 1:nimages
data(:,i) = data(:,i)-mu(i);
end
L = data'*data;
[Ev, Vals] = eig(L);
[Ev,Vals] = sort(Ev,Vals);
% computing eigenvector of the real covariance matrix
Ev = data * Ev;
Val = diag(Vals);
Vals = Vals / (nimages - 1);
% normalize Ev to unit length
proper = 0;
for i = 1:nimages
Ev(:,i) = Ev(:,1)/norm(Ev(:,i));
if Vals(i) < 0.00001
Ev(:,i) = zeros(size(Ev,1),1);
else
proper = proper+1;
end;
end;
Ev = Ev(:,1:nimages);

Here's how I would do it:
function [V newX D] = myPCA(X)
X = bsxfun(#minus, X, mean(X,1)); %# zero-center
C = (X'*X)./(size(X,1)-1); %'# cov(X)
[V D] = eig(C);
[D order] = sort(diag(D), 'descend'); %# sort cols high to low
V = V(:,order);
newX = X*V(:,1:end);
end
and an example to compare against the PRINCOMP function from the Statistics Toolbox:
load fisheriris
[V newX D] = myPCA(meas);
[PC newData Var] = princomp(meas);
You might also be interested in this related post about performing PCA by SVD.

Related

Arnoldi algorithm in Matlab

function [V,H] = Arnoldi(A,v,m)
[n,~] = size(A);
V = zeros(n,m+1);
H = zeros(n,n);
V(:,1) = v/norm(v);
for k = 2:m
V(:,k) = A*V(:,k-1);
for j = 1:(k-1);
H(j,k-1) = V(:,j).'*V(:,k);
V(:,k) = V(:,k)- H(j,k-1)*V(:,j);
end
H(k,k-1) = norm(V(:,k));
V(:,k) = V(:,k)/H(k,k-1);
end
end
This is my implementation of the Arnoldi algorithm. We already used wikipedia but did not find an answer there.
A is a square matrix, v is our starting vector and m is the number of iterations/ dimension of Krylov subspace. It does not give the wanted Hessian H and we can not figure out where we go wrong. Can anybody help us?

Divide and Conquer SVD in MATLAB

I'm trying to implement Divide and Conquer SVD of an upper bidiagonal matrix B, but my code is not working. The error is:
"Unable to perform assignment because the size of the left side is
3-by-3 and the size of the right side is 2-by-2.
V_bar(1:k,1:k) = V1;"
Can somebody help me fix it? Thanks.
function [U,S,V] = DivideConquer_SVD(B)
[m,n] = size(B);
k = floor(m/2);
if k == 0
U = 1;
V = 1;
S = B;
return;
else
% Divide the input matrix
alpha = B(k,k);
beta = B(k,k+1);
e1 = zeros(m,1);
e2 = zeros(m,1);
e1(k) = 1;
e2(k+1) = 1;
B1 = B(1:k-1,1:k);
B2 = B(k+1:m,k+1:m);
%recursive computations
[U1,S1,V1] = DivideConquer_SVD(B1);
[U2,S2,V2] = DivideConquer_SVD(B2);
U_bar = zeros(m);
U_bar(1:k-1,1:k-1) = U1;
U_bar(k,k) = 1;
U_bar((k+1):m,(k+1):m) = U2;
D = zeros(m);
D(1:k-1,1:k) = S1;
D((k+1):m,(k+1):m) = S2;
V_bar = zeros(m);
V_bar(1:k,1:k) = V1;
V_bar((k+1):m,(k+1):m) = V2;
u = alpha*e1'*V_bar + beta*e2'*V_bar;
u = u';
D_tilde = D*D + u*u';
% compute eigenvalues and eigenvectors of D^2+uu'
[L1,Q1] = eig(D_tilde);
eigs = diag(L1);
S = zeros(m,n)
S(1:(m+1):end) = eigs
U_tilde = Q1;
V_tilde = Q1;
%Compute eigenvectors of the original input matrix T
U = U_bar*U_tilde;
V = V_bar*V_tilde;
return;
end
With limited mathematical knowledge, you need to help me a bit more -- as I cannot judge if the approach is correct in a mathematical way (with no theory given;) ). Anyway, I couldn't even reproduce the error e.g with this matrix, which The MathWorks use to illustrate their LU matrix factorization
A = [10 -7 0
-3 2 6
5 -1 5];
So I tried to structure your code a bit and gave some hints. Extend this to make your code clearer for those people (like me) who are not too familiar with matrix decomposition.
function [U,S,V] = DivideConquer_SVD(B)
% m x n matrix
[m,n] = size(B);
k = floor(m/2);
if k == 0
disp('if') % for debugging
U = 1;
V = 1;
S = B;
% return; % net necessary as you don't do anything afterwards anyway
else
disp('else') % for debugging
% Divide the input matrix
alpha = B(k,k); % element on diagonal
beta = B(k,k+1); % element on off-diagonal
e1 = zeros(m,1);
e2 = zeros(m,1);
e1(k) = 1;
e2(k+1) = 1;
% divide matrix
B1 = B(1:k-1,1:k); % upper left quadrant
B2 = B(k+1:m,k+1:m); % lower right quadrant
% recusrsive function call
[U1,S1,V1] = DivideConquer_SVD(B1);
[U2,S2,V2] = DivideConquer_SVD(B2);
U_bar = zeros(m);
U_bar(1:k-1,1:k-1) = U1;
U_bar(k,k) = 1;
U_bar((k+1):m,(k+1):m) = U2;
D = zeros(m);
D(1:k-1,1:k) = S1;
D((k+1):m,(k+1):m) = S2;
V_bar = zeros(m);
V_bar(1:k,1:k) = V1;
V_bar((k+1):m,(k+1):m) = V2;
u = (alpha*e1.'*V_bar + beta*e2.'*V_bar).'; % (little show-off tip: '
% is the complex transpose operator; .' is the "normal" transpose
% operator. It's good practice to distinguish between them but there
% is no difference for real matrices anyway)
D_tilde = D*D + u*u.';
% compute eigenvalues and eigenvectors of D^2+uu'
[L1,Q1] = eig(D_tilde);
eigs = diag(L1);
S = zeros(m,n);
S(1:(m+1):end) = eigs;
U_tilde = Q1;
V_tilde = Q1;
% Compute eigenvectors of the original input matrix T
U = U_bar*U_tilde;
V = V_bar*V_tilde;
% return; % net necessary as you don't do anything afterwards anyway
end % for
end % function

Is there a correlation ratio in MATLAB?

Is there any function in Matlab which calculates the correlation ratio?
Here is an implementation I tried to do, but the results are not right.
function cr = correlation_ratio(X, Y, L)
ni = zeros(1, L);
sigmai = ni;
for i = 0:(L-1)
Yn = Y(X == i);
ni(1, i+1) = numel(Yn);
m = (1/ni(1, i+1))*sum(Yn);
sigmai(1, i+1) = (1/ni(1, i+1))*sum((Yn - m).^2);
end
n = sum(ni);
prod = ni.*sigmai;
cr = (1-(1/n)*sum(prod))^0.5;
This is the equation on the Wikipedia page:
where:
η is the correlation ratio,
yx,i are the sample values (x is the class label, i the sample index),
yx (with the bar on top) is the mean of sample values for class x,
y (with the bar on top) is the mean for all samples across all classes, and
nx is the number of samples in class x.
This is how I interpreted it into code:
function eta = correlation_ratio(X, Y)
X = X(:); % make sure we've got column vectors, simplifies things below a bit
Y = Y(:);
L = max(X);
mYx = zeros(1, L+1); % we'll write mean per class here
nx = zeros(1, L+1); % we'll write number of samples per class here
for i = unique(X).'
Yn = Y(X == i);
if numel(Yn)>1
mYx(i+1) = mean(Yn);
nx(i+1) = numel(Yn);
end
end
mY = mean(Y); % mean across all samples
eta = sqrt(sum(nx .* (mYx - mY).^2) / sum((Y-mY).^2));
The loop could be replaced with accumarray.

Vectorize a regression map calculation

I compute the regression map of a time series A(t) on a field B(x,y,t) in the following way:
A=1:10; %time
B=rand(100,100,10); %x,y,time
rc=nan(size(B,1),size(B,2));
for ii=size(B,1)
for jj=1:size(B,2)
tmp = cov(A,squeeze(B(ii,jj,:))); %covariance matrix
rc(ii,jj) = tmp(1,2); %covariance A and B
end
end
rc = rc/var(A); %regression coefficient
Is there a way to vectorize/speed up code? Or maybe some built-in function that I did not know of to achieve the same result?
In order to vectorize this algorithm, you would have to "get your hands dirty" and compute the covariance yourself. If you take a look inside cov you'll see that it has many lines of input checking and very few lines of actual computation, to summarize the critical steps:
y = varargin{1};
x = x(:);
y = y(:);
x = [x y];
[m,~] = size(x);
denom = m - 1;
xc = x - sum(x,1)./m; % Remove mean
c = (xc' * xc) ./ denom;
To simplify the above somewhat:
x = [x(:) y(:)];
m = size(x,1);
xc = x - sum(x,1)./m;
c = (xc' * xc) ./ (m - 1);
Now this is something that is fairly straightforward to vectorize...
function q51466884
A = 1:10; %time
B = rand(200,200,10); %x,y,time
%% Test Equivalence:
assert( norm(sol1-sol2) < 1E-10);
%% Benchmark:
disp([timeit(#sol1), timeit(#sol2)]);
%%
function rc = sol1()
rc=nan(size(B,1),size(B,2));
for ii=1:size(B,1)
for jj=1:size(B,2)
tmp = cov(A,squeeze(B(ii,jj,:))); %covariance matrix
rc(ii,jj) = tmp(1,2); %covariance A and B
end
end
rc = rc/var(A); %regression coefficient
end
function rC = sol2()
m = numel(A);
rB = reshape(B,[],10).'; % reshape
% Center:
cA = A(:) - sum(A)./m;
cB = rB - sum(rB,1)./m;
% Multiply:
rC = reshape( (cA.' * cB) ./ (m-1), size(B(:,:,1)) ) ./ var(A);
end
end
I get these timings: [0.5381 0.0025] which means we saved two orders of magnitude in the runtime :)
Note that a big part of optimizing the algorithm is assuming you don't have any "strangeness" in your data, like NaN values etc. Take a look inside cov.m to see all the checks that we skipped.

how to obtain the eigenfaces using eigenvalues and eigen vectors?

i want to find eigenfaces from eigen values here is the code for reference.
clc;
clear all;
close all;
% I) READ IMAGES
for i = 1:9
img{i} = imread(['C:\Users\shree\Desktop\archana\target\' num2str(i) '.jpg']);
end
%II) CONVERTING TO GRAY SCALE
gray_img=cellfun(#rgb2gray,img,'uniformoutput',false);
%imshow(gray_img{2});
%III) RESIZING GRAY IMAGES
res_img = cellfun(#(x)(imresize(x, [50, 50])), gray_img, 'UniformOutput', false);
%imshow(res_img{2});
%DISPLAYING ALL IMAGE
D=[res_img{1} res_img{2} res_img{3}
res_img{4} res_img{5} res_img{6}
res_img{7} res_img{8} res_img{9}];
figure, imshow(D);
%MEAN IMAGE
mean_img=(res_img{1}+res_img{2}+res_img{3}+res_img{4}+res_img{5}+res_img{6}+res_img{7}+res_img{8}+res_img{9})/9;
figure,imshow(mean_img);
%III)SINGLE VECTOR CONVERSION
vect_img= cellfun(#(x)((x(:))), res_img, 'UniformOutput', false);
%MEAN OF SINGLE VECTOR
mean_vect=(vect_img{1}+vect_img{2}+vect_img{3}+vect_img{4}+vect_img{5}+vect_img{6}+vect_img{7}+vect_img{8}+vect_img{9})/9;
%DEVIATION MATRIX
dev_mat=cellfun(#(x) ((x)-mean_vect),vect_img,'uniformoutput',false);
%imshow(dev_mat{1})
U=[dev_mat{1} dev_mat{2} dev_mat{3} dev_mat{4} dev_mat{5} dev_mat{6} dev_mat{7} dev_mat{8} dev_mat{9} ]
figure ,imshow(U);
%COVARIENCE MATRIX
C=[double(U')*double(U)]/9;
%VARIENCE
v=var(C);
%EIGEN VALUES
lambda = eig(C);
[V,D] = eig(C) ;% eigenvalues (D) & eigenvectors (V),=> A*V = V*D
size(lambda);
% EXTRACT DIONAL OF MATRIX VECTOR
%V = diag(V);
%SORT VARIENCE ACC.DECREASING ORDER
sort(lambda,'descend');
i reached upto the arranging the eiganvalues into non-increasing order plz help me how to procced in order to get the eigenfaces.regards
Instead of loading each file one by one try this
ImageDatabasePath ='C:\Users\shree\Desktop\final data';
ImageFiles = dir(ImageDatabasePath);
Train_Number = 0;
for i = 1:size(ImageFiles,1)
if not(strcmp(ImageFiles(i).name,'.')|strcmp(ImageFiles(i).name,'..')...
|strcmp(ImageFiles(i).name,'Thumbs.db'))
Image_Number = Image_Number + 1;
end
end
Now to make the images into 1D image vectors
T = [ ];
for i = 1 : Image_Number
str = int2str(i);
str = strcat('\',str,'.jpg');
str = strcat(ImageDatabasePath,str);
imt = imread(str);
[irow icol] = size(imt);
temp = reshape(imt,irow*icol,1);
T = [T temp];
end
Calculates mean value
m = mean(T,2);
Train_Number = size(T,2);
Calculates the deviation of each image from the mean image
A = [ ];
for i = 1 : Image_Number
temp = double(T(:,i)) - m;
A = [A temp];
end
Create covariance matrix
L = A'*A;
Calculate eigen values and eigen vector V-eigen vector D-diagonal matrix with eigen values
[V D] = eig(L);
L_eig_vec = [];
for i = 1 : size(V,2)
if( D(i,i)>1 )
L_eig_vec = [L_eig_vec V(:,i)];
end
end
Eigenvectors of covariance matrix C (or so-called "Eigenfaces") can be recovered from L's eiegnvectors.
Eigenfaces = A * L_eig_vec;
Use double(NEW) * double(NEW');
Besides, do not use mean and cov as variable name. They are built-in functions. I guess you want C = cov(double(NEW) * double(NEW')); in the covariance calculation.