PCA (Principle Component Analysis) on multiple datasets - matlab

I have a set of climate data (temperature, pressure and moisture for example), X, Y, Z which are matricies with dimensions (n x p) where n is the number of observations and p is the number of spatial points.
Previously, to investigate modes of variability in dataset X, I simply performed a empirical orthogonal function (EOF) analysis OR Principle component Analysis (PCA) on X. This involved decomposing (via SVD), the matrix X.
To investigate the coupling of the modes of variability of X and Y, I used maximum covariance analysis (MCA) which involved decomposing a covariance matrix proportional to XY^{T}. (T is transpose)
However, if I wish to looked at all three datasets, how do I go about doing this? One idea I had was to form a fourth matrix, L, which will be the 'feature' concatenation of the three datasets:
L = [X, Y, Z]
so that my matrix L will have dimensions (n x 3p).
I would then use standard PCA/EOF analysis and use SVD to decompose this matrix L and then I would obtain modes of variabiilty with size (3p x 1) and thus subsequently the mode associated with X is the first p values, the mode associated with Y is the second set of p values and the mode associated with Z is the last p values.
Is this correct? Or can anyone suggest a better way of looking at the coupling of all three (or more) datasets?
Thank you so much!

I'd recommend to treat spatial points as extra dimension, i.e. f x n x p, where 'f' is your number of features. At this point you should use multilinear extension of PCA that can work on tensor data.

Related

Computing the SVD of a rectangular matrix

I have a matrix like M = K x N ,where k is 49152 and is the dimension of the problem and N is 52 and is the number of observations.
I have tried to use [U,S,V]=SVD(M) but doing this I get less memory space.
I found another code which uses [U,S,V]=SVD(COV(M)) and it works well. My questions are what is the meaning of using the COV(M) command inside the SVD and what is the meaning of the resultant [U,S,V]?
Finding the SVD of the covariance matrix is a method to perform Principal Components Analysis or PCA for short. I won't get into the mathematical details here, but PCA performs what is known as dimensionality reduction. If you like a more formal treatise on the subject, you can read up on my post about it here: What does selecting the largest eigenvalues and eigenvectors in the covariance matrix mean in data analysis?. However, simply put dimensionality reduction projects your data stored in the matrix M onto a lower dimensional surface with the least amount of projection error. In this matrix, we are assuming that each column is a feature or a dimension and each row is a data point. I suspect the reason why you are getting more memory occupied by applying the SVD on the actual data matrix M itself rather than the covariance matrix is because you have a significant amount of data points with a small amount of features. The covariance matrix finds the covariance between pairs of features. If M is a m x n matrix where m is the total number of data points and n is the total number of features, doing cov(M) would actually give you a n x n matrix, so you are applying SVD on a small amount of memory in comparison to M.
As for the meaning of U, S and V, for dimensionality reduction specifically, the columns of V are what are known as the principal components. The ordering of V is in such a way where the first column is the first axis of your data that describes the greatest amount of variability possible. As you start going to the second columns up to the nth column, you start to introduce more axes in your data and the variability starts to decrease. Eventually when you hit the nth column, you are essentially describing your data in its entirety without reducing any dimensions. The diagonal values of S denote what is called the variance explained which respect the same ordering as V. As you progress through the singular values, they tell you how much of the variability in your data is described by each corresponding principal component.
To perform the dimensionality reduction, you can either take U and multiply by S or take your data that is mean subtracted and multiply by V. In other words, supposing X is the matrix M where each column has its mean computed and the is subtracted from each column of M, the following relationship holds:
US = XV
To actually perform the final dimensionality reduction, you take either US or XV and retain the first k columns where k is the total amount of dimensions you want to retain. The value of k depends on your application, but many people choose k to be the total number of principal components that explains a certain percentage of your variability in your data.
For more information about the link between SVD and PCA, please see this post on Cross Validated: https://stats.stackexchange.com/q/134282/86678
Instead of [U, S, V] = svd(M), which tries to build a matrix U that is 49152 by 49152 (= 18 GB 😱!), do svd(M, 'econ'). That returns the “economy-class” SVD, where U will be 52 by 52, S is 52 by 52, and V is also 52 by 52.
cov(M) will remove each dimension’s mean and evaluate the inner product, giving you a 52 by 52 covariance matrix. You can implement your own version of cov, called mycov, as
function [C] = mycov(M)
M = bsxfun(#minus, M, mean(M, 1)); % subtract each dimension’s mean over all observations
C = M' * M / size(M, 1);
(You can verify this works by looking at mycov(randn(49152, 52)), which should be close to eye(52), since each element of that array is IID-Gaussian.)
There’s a lot of magical linear algebraic properties and relationships between the SVD and EVD (i.e., singular value vs eigenvalue decompositions): because the covariance matrix cov(M) is a Hermitian matrix, it’s left- and right-singular vectors are the same, and in fact also cov(M)’s eigenvectors. Furthermore, cov(M)’s singular values are also its eigenvalues: so svd(cov(M)) is just an expensive way to get eig(cov(M)) 😂, up to ±1 and reordering.
As #rayryeng explains at length, usually people look at svd(M, 'econ') because they want eig(cov(M)) without needing to evaluate cov(M), because you never want to compute cov(M): it’s numerically unstable. I recently wrote an answer that showed, in Python, how to compute eig(cov(M)) using svd(M2, 'econ'), where M2 is the 0-mean version of M, used in the practical application of color-to-grayscale mapping, which might help you get more context.

Matlab Multiply A Matrix By Individual Sections of Another Matrix And Get the Diagonal Elements

The title of this post may be a bit confusing. Please allow me to provide a bit of context and then elaborate on what I'm asking. For your reference, the question I'm asking is toward the end and is denoted by bold letters. I provide some code, outlining where I'm currently at in solving the problem, immediately beforehand.
Essentially what I'm trying to do is Kernel Regression, which is usually done using a single test point x and a set of training instances . A reference to this can be found on wikipedia here. The kernel I'm using is the RBF kernel, a Wikipedia reference for which can be found here.
Anyway, I have some code written in Matlab so that this can be done quickly for a single instance of x, which is 1 x p in size. What I'd like to do is make it so I can estimate for numerous points very quickly, say m x p.
For the sake of avoiding notational mixups, I'll let the training instances be denoted Train and the instances I want estimates for as Test: and . It also needs to be mentioned that I want to estimate a vector of numbers for each of the m points. For a single point this vector would be 1 x v in size. Now I need it to be m x v. Therefore, Train will also have a vector of these know values associated with it called TS: . Lastly, we need a vector of sigmas that is 1 x v in size. This is denoted as Sig.
Here's the code I have so far:
%First, we have to get the matrices to equivalent size so we can subtract Train from Test
tm0 = kron(ones(size(Train,1),1),Test) - kron(ones(size(Test,1),1),Train);
%Secondly, we apply the Euclidean norm sq by row and then multiply each of these results by each element (j) in Sig times 1/2j^2
tm3 = exp(-kron(sum((tm0).^2,2),1/2./(Sig.^2)));
Now, at this point tm3 is an (m*n) x v matrix. This is where my question is: I now need to multiply TS' (TS transpose) times each of the n x v-sized segments in tm3 (there are m of these segments), get the diagonal elements of each of these resulting segments (after multiplication one of the m segments will be v x v, so each chunk of diagonal elements will be 1 x v meaning the resulting matrix is m x v) and sum these diagonal elements together to produce an m x 1 sized matrix. Lastly, I will need to divide each entry i in this m x 1 matrix by each of the v elements in the ith row of the diagonal-holding m x v-sized matrix, producing an m x v-sized result matrix.
I hope all of that makes sense. I'm sure there's some kind of trick that can be employed, but I'm just not coming up with it. Any help is greatly appreciated.
Edit 1: I was asked to provide more of an example to help demonstrate what it is that I would like done. The following represent that two matrices I'm talking about, TS and tm3:
As you can see, TS'(TS transpose) is v x n and tm3 is mn x v. In tm3 there are blocks that are of size n x v -- there are m blocks of this size. Notice that the size of TS' is of size v x n. This means that I can multiply TS' by a single block of tm3, which again is of size n x v. This would result in a matrix that is v x v in size. I would like to do this operation -- individually multiplying TS' by each of the n x v-sized blocks of tm3, which would produce m v x v matrices.
From here, though, I would like to obtain the diagonal elements from each of these v x v matrices. So, for a single v x v matrix, denoted using a:
Ultimately, I would to do this for each of the m v x v matrices giving me something that looks like the following, where s is the mth v x v matrix:
If I denote this last matrix as Q, which is m x v in size, it is trivial to sum the elements across the rows to produce the m x 1 vector I was looking for. I will refer to this vector as C. However, I would then like to divide each of these m scalar values by the corresponding row of matrix Q, to produce another m x v matrix:
This is the final matrix I'm looking for. Hopefully this helps make it clear what I'm looking for. Thanks for taking the time to read this!
Thought: I'm pretty sure I could accomplish this by converting tm3 to a cell matrix by doing tc1 = mat2cell(tm3,repmat(length(Train),1,m),length(Sig)), and then put replicate TS m times in another cell matrix tc2 = mat2cell(TS',length(indirectSigma),repmat(length(Train),1,m))'. Finally, I could do operations like tc3 = cellfun(#(a,b) a*b, tc2,tc1,'UniformOutput',false), which would give me m cells filled with the v x v matrices I was looking for. I could proceed from there. However, I'm not sure how fast these cell operations are. Can anybody comment? I'm afraid they might be slow, so I would prefer operations be performed on normal matrices, which I know to be fast. Thanks!

Cannonical Correlation Analysis

I have just started working using CCA in Matlab. I have two vectors X and Y of dimension 60x1920 and 60x1536 with the number of samples being 60 and variables in the different set of vectors being 1920 and 1536 respectively. I want to know do CCA for reducing them to the subspace and then do feature matching.
I am using this commands.
%% DO CCA
[A,B,r,U,V] = canoncorr(X,Y);
The output I get is this :
Name Size Bytes Class Attributes
A 1920x58 890880 double
B 1536x58 712704 double
U 60x58 27840 double
V 60x58 27840 double
r 1x58 464 double
Can anyone please tell me what these variables mean. I have gone over the documentation several times and still is unclear about them. As I understand CCA finds two linear projection matrices Wx and Wy such that the projection of X and Y on Wx and Wy are maximally correlated.
1) Could anyone please tell me which of the following matrices are these?
2) Also how can I find the projected vectors in the learned subspace of CCA?
Any help will be appreciated. Thanks in advance.
As I understand it, with X and Y being your original data matrices, A and B are the sets of coefficients that perform a change of basis to maximally correlate your original data. Your data is represented in the new bases as the matrices U and V.
So to answer your questions:
The projection matrices you are looking for would be A and B since they transform X and Y into the new space.
The resulting projections of X and Y into the new space would be U and V, respectively. (The r vector represents the entries of the correlation matrix between U and V, which is a diagonal matrix.)
The The MATLAB documentation says this transformation can be done with the following formulae, where N is the number of observations:
U = (X-repmat(mean(X),N,1))*A
V = (Y-repmat(mean(Y),N,1))*B
This page lays out the process nicely so you can see what each coefficient means in the transformation process.

Principal Components calculated using different functions in Matlab

I am trying to understand principal component analysis in Matlab,
There seems to be at least 3 different functions that do it.
I have some questions re the code below:
Am I creating approximate x values using only one eigenvector (the one corresponding to the largest eigenvalue) correctly? I think so??
Why are PC and V which are both meant to be the loadings for (x'x) presented differently? The column order is reversed because eig does not order the eigenvalues with the largest value first but why are they the negative of each other?
Why are the eig values not in ordered with the eigenvector corresponding to the largest eigenvalue in the first column?
Using the code below I get back to the input matrix x when using svd and eig, but the results from princomp seem to be totally different? What so I have to do to make princomp match the other two functions?
Code:
x=[1 2;3 4;5 6;7 8 ]
econFlag=0;
[U,sigma,V] = svd(x,econFlag);%[U,sigma,coeff] = svd(z,econFlag);
U1=U(:,1);
V1=V(:,1);
sigma_partial=sigma(1,1);
score1=U*sigma;
test1=score1*V';
score_partial=U1*sigma_partial;
test1_partial=score_partial*V1';
[PC, D] = eig(x'*x)
score2=x*PC;
test2=score2*PC';
PC1=PC(:,2);
score2_partial=x*PC1;
test2_partial=score2_partial*PC1';
[o1 o2 o3]=princomp(x);
Yes. According to the documentation of svd, diagonal elements of the output S are in decreasing order. There is no such guarantee for the the output D of eig though.
Eigenvectors and singular vectors have no defined sign. If a is an eigenvector, so is -a.
I've often wondered the same. Laziness on the part of TMW? Optimization, because sorting would be an additional step and not everybody needs 'em sorted?
princomp centers the input data before computing the principal components. This makes sense as normally the PCA is computed with respect to the covariance matrix, and the eigenvectors of x' * x are only identical to those of the covariance matrix if x is mean-free.
I would compute the PCA by transforming to the basis of the eigenvectors of the covariance matrix (centered data), but apply this transform to the original (uncentered) data. This allows to capture a maximum of variance with as few principal components as possible, but still to recover the orginal data from all of them:
[V, D] = eig(cov(x));
score = x * V;
test = score * V';
test is identical to x, up to numerical error.
In order to easily pick the components with the most variance, let's fix that lack of sorting ourselves:
[V, D] = eig(cov(x));
[D, ind] = sort(diag(D), 'descend');
V = V(:, ind);
score = x * V;
test = score * V';
Reconstruct the signal using the strongest principal component only:
test_partial = score(:, 1) * V(:, 1)';
In response to Amro's comments: It is of course also possible to first remove the means from the input data, and transform these "centered" data. In that case, for perfect reconstruction of the original data it would be necessary to add the means again. The way to compute the PCA given above is the one described by Neil H. Timm, Applied Multivariate Analysis, Springer 2002, page 446:
Given an observation vector Y with mean mu and covariance matrix Sigma of full rank p, the goal of PCA is to create a new set of variables called principal components (PCs) or principal variates. The principal components are linear combinations of the variables of the vector Y that are uncorrelated such that the variance of the jth component is maximal.
Timm later defines "standardized components" as those which have been computed from centered data and are then divided by the square root of the eigenvalues (i.e. variances), i.e. "standardized principal components" have mean 0 and variance 1.

reording kernel matrices

I have an mxm kernel matrix, K, which is for the sake of simplicity, a linear kernel computed as pdist2(X,X), where X is mxn and the m dimension relates to feature vectors with n dimensions.
since n is large, I save computation time by precalculating K for all X.
Later on, I need to swap two of the features in X, say X_1 and X_5.
Can I somehow rearrange K, without having to recompute the entire matrix?
If pv is your permutation vector and J0=pdist2(X,X), then
Y=X(pv,:); J1=pdist2(Y,Y);
should get you the same answer as
J1=J0(pv,pv);
If you are permuting the columns (I couldn't quite tell from your question), then it seems like J1 and J0 should be equal...