PCA for feature extraction MATLAB - matlab

I have a data matrix A of size N-by-M.
I wanted use PCA for dimensionality reduction. I want to set the dimensions to 'k'.
I understand that after feature extraction, I should get a Nxk matrix.
I have tried pcares as follows,
[residuals,reconstructed] = pcares(A,k)
But this does not help me.
I am also trying to use the dr toolbox (here)
This returns me a k-by-k matrix. How do I proceede further?
Any help would be appreciated.
Thank You

pcares gives you the residual, which is the error when subtracting the input with the reconstructed input. You can use the pca command. It returns a MxM matrix whose columns are the principle components. You can use the first k of them to construct the feature, just do the following
X = bsxfun(#minus, A, mean(A)) * coeff(:, 1:k);, where coeff is what is returned from the pca command. The function call with bsxfun subtracts the mean (centers the data, as this is what pca did when calculating the output coeff).

Related

Dimensionality reduction in HOG feature vector

I found out the HOG feature vector of the following image in MATLAB.
Input Image
I used the following code.
I = imread('input.jpg');
I = rgb2gray(I);
[features, visualization] = extractHOGFeatures(I,'CellSize',[16 16]);
features comes out to be a 1x1944 vector and I need to reduce the dimensionality of this vector (say to 1x100), what method should I employ for the same?
I thought of Principal Component Analysis and ran the following in MATLAB.
prinvec = pca(features);
prinvec comes out to be an empty matrix (1944x0). Am I doing it wrong? If not PCA, what other methods can I use to reduce the dimension?
You can't do PCA on this, since you have more features than your single observation. Get more observations, some 10,000 presumably, and you can do PCA.
See PCA in matlab selecting top n components for the more detailed and mathematical explanation as to why this is the case.

Vectorization in PCA

i am doing Principal Component Analysis,and want help to know if can represent
summation from i to m (X(i)*X(i)^T) in terms of data matrix..direct multiplication of two matrices.
Can this be done..or need i use a for loop and do it.
Currently i have tried
sum=zeros(n,n);
for i=1:m
sum=sum+ X(i,:)*(X(i,:)^T);
end
My goal is to find the principal eigen values of the resulting matrix.
Thanks in advance
Say the shape of the data matrix X is (Dim, Num), you can just compute sum of all sample correlations with:
S = X*X'
For implementing PCA, also don't forget to divide the matrix by the amount of samples.
Sigma = (1/N)X*X'
If your data has zero mean, this is also the covariance matrix.

How to multiply matrix of nxm with matrix nxmxp different dimensions in matlab

In my current analysis, I am trying to multiply a matrix (flm), of dimension nxm, with the inverse of a matrix nxmxp, and then use this result to multiply it by the inverse of the matrix (flm).
I was trying using the following code:
flm = repmat(Data.fm.flm(chan,:),[1 1 morder]); %chan -> is a vector 1by3
A = (flm(:,:,:)/A_inv(:,:,:))/flm(:,:,:);
However. due to the problem of dimensions, I am getting the following error message:
Error using ==> mrdivide
Inputs must be 2-D, or at least one
input must be scalar.
To compute elementwise RDIVIDE, use
RDIVIDE (./) instead.
I have no idea on how to proceed without using a for loop, so anyone as any suggestion?
I think you are looking for a way to conveniently multiply matrices when one is of higher dimensionality than the other. In that case you can use bxsfun to automatically 'expand' the smaller matrix.
x = rand(3,4);
y = rand(3,4,5);
bsxfun(#times,x,y)
It is quite simple, and very efficient.
Make sure to check out doc bsxfun for more examples.

Multivariate Emperical CDF

How can I compute a multivariate emperical CDF? Is there anything in Matlab, or perhaps an approach that can give me similar output as ecdf but as an input uses a matrix instead of a vector.
Appreciate any input.
Basically would like something like this:
http://reference.wolfram.com/mathematica/ref/EmpiricalDistribution.html
So, to provide an official answer (based on our comment conversation):
Use hist3 to get the emprical pdf, and then do a 2D cumsum (I'm not sure this is built in, but you could write your own) to sum across the pdf and create a 2D cdf. Each entry in the cdf matrix is the sum of all values of lesser row and column index in the pdf matrix.
If ecdf works for what you need, and you only need matrix functionality, you can try vectorizing the input to ecdf and then reshaping the output.
y = rand(100); % replace this with your actual code...
f = ecdf(y(:)); % pass in the vectorized version of y
f = reshape(f, size(y)); % Reshape output

Matlab - how to compute PCA on a huge data set [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
MATLAB is running out of memory but it should not be
I want to perform PCA analysis on a huge data set of points. To be more specific, I have size(dataPoints) = [329150 132] where 328150 is the number of data points and 132 are the number of features.
I want to extract the eigenvectors and their corresponding eigenvalues so that I can perform PCA reconstruction.
However, when I am using the princomp function (i.e. [eigenVectors projectedData eigenValues] = princomp(dataPoints); I obtain the following error :
>> [eigenVectors projectedData eigenValues] = princomp(pointsData);
Error using svd
Out of memory. Type HELP MEMORY for your options.
Error in princomp (line 86)
[U,sigma,coeff] = svd(x0,econFlag); % put in 1/sqrt(n-1) later
However, if I am using a smaller data set, I have no problem.
How can I perform PCA on my whole dataset in Matlab? Have someone encountered this problem?
Edit:
I have modified the princomp function and tried to use svds instead of svd, but however, I am obtaining pretty much the same error. I have dropped the error bellow :
Error using horzcat
Out of memory. Type HELP MEMORY for your options.
Error in svds (line 65)
B = [sparse(m,m) A; A' sparse(n,n)];
Error in princomp (line 86)
[U,sigma,coeff] = svds(x0,econFlag); % put in 1/sqrt(n-1) later
Solution based on Eigen Decomposition
You can first compute PCA on X'X as #david said. Specifically, see the script below:
sz = [329150 132];
X = rand(sz);
[V D] = eig(X.' * X);
Actually, V holds the right singular vectors, and it holds the principal vectors if you put your data vectors in rows. The eigenvalues, D, are the variances among each direction. The singular vectors, which are the standard deviations, are computed as the square root of the variances:
S = sqrt(D);
Then, the left singular vectors, U, are computed using the formula X = USV'. Note that U refers to the principal components if your data vectors are in columns.
U = X*V*S^(-1);
Let us reconstruct the original data matrix and see the L2 reconstruction error:
X2 = U*S*V';
L2ReconstructionError = norm(X(:)-X2(:))
It is almost zero:
L2ReconstructionError =
6.5143e-012
If your data vectors are in columns and you want to convert your data into eigenspace coefficients, you should do U.'*X.
This code snippet takes around 3 seconds in my moderate 64-bit desktop.
Solution based on Randomized PCA
Alternatively, you can use a faster approximate method which is based on randomized PCA. Please see my answer in Cross Validated. You can directly compute fsvd and get U and V instead of using eig.
You may employ randomized PCA if the data size is too big. But, I think the previous way is sufficient for the size you gave.
My guess is that you have a huge data set. You don't need all of the svd coefficients. In this case, use svds instead of svd :
Taken directly from Matlab help:
s = svds(A,k) computes the k largest singular values and associated singular vectors of matrix A.
From your question, I understand that you don't call svd directly. But you might as well take a look at princomp (It is editable!) and alter the line that calls it.
You probably needed to calculate an n by n matrix in your computation somehow that is to say:
329150 * 329150 * 8btyes ~ 866GB`
of space which explains why you're getting a memory error. There seems to be an efficient way to calculate pca using princomp(X, 'econ') which I suggest you give it a try.
More on this in stackoverflow and mathworks..
Manually compute X'X (132x132) and svd on it. Or find NIPALS script.