The original data is Y, the size of Y is L*n ( n is the number of features; L is the number of observations. B is the covariance matrix of the original data Y. Suppose A is the eigenvectors of the covariance matrix B. I represent A as A = (e1, e2,...,en), where ei is an eigenvector. Matrix Aq is the first q eigenvectors and ai be the row vectors of Aq: Aq = (e1,e2,...,eq) = (a1,a2,...,an)'. I want to apply the k-means algorithm to Aq to cluster the row vector ai to k clusters or more (note: I do not want to apply k-means algorithm to the eigenvector ei to k clusters). For each cluster, only the vector closest to the center of cluster is retained, and the feature corresponding to this vector is finally selected as the informative features.
My question is:
1) What is the difference between applying the k-means algorithm to Aq to cluster the row vector ai to k clusters and applying k-means algorithm to Aq to cluster the eigenvector ei to k clusters?
2) the closest_vectors I get is from this command: closest_vectors = Aq(min_idxs, :), the size of the closest_vectors is k*qdouble. How to get the final informative features? Since the final informative features have to be obtained from the original data Y.
Thanks!
I found two function about pca and pfa:
function [e m lambda, sqsigma] = cvPca(X, M)
[D, N] = size(X);
if ~exist('M', 'var') || isempty(M) || M == 0
M = D;
end
M = min(M,min(D,N-1));
%% mean subtraction
m = mean(X, 2); %%% calculate the mean of every row
X = X - repmat(m, 1, N);
%% singular value decomposition. X = U*S*V.' or X.' = V*S*U.'
[U S V] = svd(X,'econ');
e = U(:,1:M);
if nargout > 2
s = diag(S);
s = s(1:min(D,N-1));
lambda = s.^2 / N; % biased (1/N) estimator of variance
end
% sqsigma. Used to model distribution of errors by univariate Gaussian
if nargout > 3
d = cvPcaDist(X, e, m); % Use of validation set would be better
N = size(d,2);
sqsigma = sum(d) / N; % or (N-1) unbiased est
end
end
%/////////////////////////////////////////////////////////////////////////////
function [IDX, Me] = cvPfa(X, p, q)
[D, N] = size(X);
if ~exist('p', 'var') || isempty(p) || p == 0
p = D;
end
p = min(p, min(D, N-1));
if ~exist('q', 'var') || isempty(q)
q = p - 1;
end
%% PCA step
[U Me, Lambda] = cvPca(X, q);
%% cluter row vectors (q x D). not col
[Cl, Mu] = kmeans(U, p, 'emptyaction', 'singleton', 'distance', 'sqEuclidean');
%% find axis which are nearest to mean vector
IDX = logical(zeros(D,1));
for i = 1:p
Cli = find(Cl == i);
d = cvEucdist(Mu(i,:).', U(Cli,:).');
[mini, argmin] = min(d);
IDX(Cli(argmin)) = 1;
end
Summarizing Olologin's comments, it doesn't make sense to cluster the eigenvectors of the covariance matrix, or the columns of the U matrix of the SVD. Eigenvectors in this case are all orthogonal so if you tried to cluster them, you would only get one member per cluster and this cluster's centroid is defined by the eigenvector itself.
Now, what you're really after is selecting out the features in your data matrix that describe your data in terms of discriminatory analysis.
The functions that you have provided both compute the SVD and pluck out the k principal components of your data and also determine which features out of these k to select as the most prominent. By default, the amount of features to select out is equal to k, but you can override this if you want. Let's just stick with the default.
The cvPfa function performs this feature selection for you, but a warning to you that the data matrix in the function is organized where each row is a feature and each column is a sample. The output is a logical vector that tells you which features are the strongest to select in your data.
Simply put, you just do this:
k = 10; %// Example
IDX = cvPfa(Y.', k);
Ynew = Y(:,IDX);
This code will choose the 10 most prominent features in your data matrix and pluck out those 10 features that are the most representative of your data, or the most discriminative. You can then use the output for whatever application you're targetting.
1) I don't think that clustering eigenvectors (columns of PCA result) of covariance matrix makes any sense. All eigenvectors pairwise orthogonal and equally far one from another in sense of Euclidian distance. You can pick any eigenvectors and compute distance between them, distance will be sqrt(2) between any pair. But clustering rows of PCA result can provide something useful.
Related
I need to calculate the cumulative variance of a vector. I have tried to build and script, but this script takes too much time to calculate the cumulative variance of my vectors of size 1*100000. Do you know if there exists a faster way to find this cumulative variance?
This is the code I am using
%%Creation of the rand vectors. ans calculation of the variances
d=100000; %dimension of the vectors
nv=6 %quantity of vectors
for j=1:nv;
VItimeseries(:,j)=rand(d,1); % Final matrix with vectors
end
%% script to calculate the cumulative variance in the columns of my matrix
VectorVarianza=0;
VectoFinalVar=0;
VectorFinalTotalVAriances=zeros(d,nv);
for k=1:nv %number of columns
for j=1:numel(VItimeseries(:,k)) %size of the rows
Vector=VItimeseries(:,k);
VectorVarianza(1:j)= Vector(1:j); % Vector to calculate the variance...
...Independently
VectorFinalVar(j,k)= var(VectorVarianza);%Calculation of variances
end
VectorFinalTotalVAriances(:,k)=VectorFinalVar(:,k)% construction of the...
...Final Vector with the cumulative variances
end
Looping over the n elements of x, and within the loop computing the variance of all elements up to i using var(x(1:i)) amounts to an algorithm O(n2). This is inherently expensive.
Sample variance (what var computes) is defined as sum((x-mean(x)).^2) / (n-1), with n = length(x). This can be rewritten as (sum(x.^2) - sum(x).^2 / n) / (n-1). This formula allows us to accumulate sum(x) and sum(x.^2) within a single loop, then compute the variance later. It also allows us to compute the cumulative variance in O(n).
For a vector x, we'd have the following loop:
x = randn(100,1); % some data
v = zeros(size(x)); % cumulative variance
s = x(1); % running sum of x
s2 = x(1).^2; % running sum of square of x
for ii = 2:numel(x) % loop starts at 2, for ii=1 we cannot compute variance
s = s + x(ii);
s2 = s2 + x(ii).^2;
v(ii) = (s2 - s.^2 / ii) / (ii-1);
end
We can avoid the explicit loop by using cumsum:
s = cumsum(x);
s2 = cumsum(x.^2);
n = (1:numel(x)).';
v = (s2 - s.^2 ./ n) ./ (n-1); % v(1) will be NaN, rather than 0 as in the first version
v(1) = 0; % so we set it to 0 explicitly here
The code in the OP computes the cumulative variance for each column of a matrix. The code above can be trivially adapted to do the same:
s = cumsum(VItimeseries,1); % cumulative sum explicitly along columns
s2 = cumsum(VItimeseries.^2,1);
n = (1:size(VItimeseries,1)).'; % use number of rows, rather than `numel`.
v = (s2 - s.^2 ./ n) ./ (n-1);
v(1,:) = 0; % fill first row with zeros, not just first element
I try to figure out the best way to perform a kind of convolution.
I have a 3D matrix I = [N x M x P] and a 2D matrix S = [1 x 1 x K x P]. For each pth frame (third dimension) of my 3D matrix I want to return the valid convolution between I(:, :, p-K/2:p+K/2) and S(1, 1, :, p). Do you see a way to do this ?
In fact, in terms of computation the numbers of operation in very close to a standard convolution, the difference is that I need to change the second matrix for each frame...
This is the method I currently use:
% I = 3D matrix [N x M x P]
% S = Filter [1 x 1 x K x P] (K is an odd number)
% OUT = Result
[N, M, P] = size(I); % Data size
K = size(S, 3); % Filter length
win = (K-1)/2 ; % Window
OUT = zeros(size(I)); % Pre-allocation
for p = win+1:P-win
OUT(:, :, p) = convn(I(:, :, p-win:p+win), S(1, 1, :, p), 'valid'); % Perform convolution
end
At the end we have the same number of operations than the standard convolution, the only difference is that the filter is different for each frame...
Any idea ?
Thanks ;)
So you want to convolve a NxMxK sub-image with a 1x1xKx1 kernel, and then only take the valid part, which is an NxM image.
Let's look at this operation for a single (x,y) location. This 1D convolution, of which you only keep 1 value, is equivalent to the dot product of the sub-image and your kernel:
OUT(x,y,p) = squeeze(I(x,y,p-win:p+win))' * squeeze(S(1,1,:,p))
This you can vectorize across all (x,y) by reshaping the sub-image of I to a (N*M)xK matrix (the K is horizontal, S is a column vector).
Repeating this across all p is easiest to implement with a loop, as you do now. The alternative is to create a larger S where each column is shifted by one, so you can do a single dot product between tge two matrices. But that S is also espensive to create, presumably requires a loop too. I don't think that avoiding loops is that pressing any more in MATLAB (it's gotten a lot faster over the years) and the product itself is probably the most expensive part of the algorithm anyway.
In the Matlab SVM tutorial, it says
You can set your own kernel function, for example, kernel, by setting 'KernelFunction','kernel'. kernel must have the following form:
function G = kernel(U,V)
where:
U is an m-by-p matrix.
V is an n-by-p matrix.
G is an m-by-n Gram matrix of the rows of U and V.
When I followed the custom SVM kernel example, I set a break point in mysigmoid.m function. However, I found U and V were in fact 1-by-p vectors and G was a scalar.
Why does not MATLAB process the kernel by matrices?
My custom kernel function is
function G = mysigmoid(U,V)
% Sigmoid kernel function with slope gamma and intercept c
gamma = 0.5;
c = -1;
G = tanh(gamma*U*V' + c);
end
My Matlab script is
%% Train SVM Classifiers Using a Custom Kernel
rng(1); % For reproducibility
n = 100; % Number of points per quadrant
r1 = sqrt(rand(2*n,1)); % Random radius
t1 = [pi/2*rand(n,1); (pi/2*rand(n,1)+pi)]; % Random angles for Q1 and Q3
X1 = [r1.*cos(t1), r1.*sin(t1)]; % Polar-to-Cartesian conversion
r2 = sqrt(rand(2*n,1));
t2 = [pi/2*rand(n,1)+pi/2; (pi/2*rand(n,1)-pi/2)]; % Random angles for Q2 and Q4
X2 = [r2.*cos(t2), r2.*sin(t2)];
X = [X1; X2]; % Predictors
Y = ones(4*n,1);
Y(2*n + 1:end) = -1; % Labels
% Plot the data
figure(1);
gscatter(X(:,1),X(:,2),Y);
title('Scatter Diagram of Simulated Data');
SVMModel1 = fitcsvm(X,Y,'KernelFunction','mysigmoid','Standardize',true);
% Compute the scores over a grid
d = 0.02; % Step size of the grid
[x1Grid,x2Grid] = meshgrid(min(X(:,1)):d:max(X(:,1)),...
min(X(:,2)):d:max(X(:,2)));
xGrid = [x1Grid(:),x2Grid(:)]; % The grid
[~,scores1] = predict(SVMModel1,xGrid); % The scores
figure(2);
h(1:2) = gscatter(X(:,1),X(:,2),Y);
hold on;
h(3) = plot(X(SVMModel1.IsSupportVector,1),X(SVMModel1.IsSupportVector,2),...
'ko','MarkerSize',10);
% Support vectors
contour(x1Grid,x2Grid,reshape(scores1(:,2),size(x1Grid)),[0,0],'k');
% Decision boundary
title('Scatter Diagram with the Decision Boundary');
legend({'-1','1','Support Vectors'},'Location','Best');
hold off;
CVSVMModel1 = crossval(SVMModel1);
misclass1 = kfoldLoss(CVSVMModel1);
disp(misclass1);
Kernels add dimensions to a feature. If you have, for example, one feature for sample x={a} it will expand it into something like x= {a_1... a_q}. As you are doing this for all of your data at once, you are going to have a M x P (M is the number of examples in your training set and P is the number of features). The second matrix it asks for is P x N, where N is the number of examples in the training/test set.
That said, your output should be M x N. Since it is instead 1, it means that you have U = 1XM and V=Nx1 where N=M. To have an output of M x N logic follows that you should simply transpose your inputs.
I am using Singular Value Decomposition (SVD) applied to Singular Spectrum Analysis (SSA) of a timeseries.
% original time series
x1= rand(1,10000);
N = length(x1);
% windows for trajectory matrix
L = 600;
K=N-L+1;
% trajectory matrix/matrix of lagged vectors
X = buffer(x1, L, L-1, 'nodelay');
% Covariance matrix
A = X * X' / K;
% SVD
[U, S_temp, ~] = svd(A);
% The eigenvalues of A are the squared eigenvalues of X
S = sqrt(S_temp);
d = diag(S);
% Principal components
V = X' * U;
for i = 1 : L
V(:, i) = V(:, i) / d(i);
end
I wanted to know if there is a way to have the singular components (i.e. the columns of V) always positive.
X is always > 0 in my case (and also the Covariance matrix A)
You may be looking for an algorithm such as non-negative matrix factorization.
This is available in Statistics Toolbox in the command nnmf, and there is a freely available third-party toolbox as well.
I used the following code to compute PCA :
function [signals,PC,V] = pca2(data)
[M,N] = size(data);
% subtract off the mean for each dimension
mn = mean(data,2);
data = data - repmat(mn,1,N);
% construct the matrix Y
Y = data’ / sqrt(N-1);
% SVD does it all
[u,S,PC] = svd(Y);
% calculate the variances
S = diag(S);
V = S .* S;
% project the original data
signals = PC’ * data;
I want to keep the principal components with the maximum variance , say maybe the first 10 principal components which contribute to the maximum variance. How do i go about this?
function [signals,V] = pca2(data)
[M,N] = size(data);
data = reshape(data, M*N,1);
% subtract off the mean for each dimension
mn = mean(data,2);
data = bsxfun(#minus, data, mean(data,1));
% construct the matrix Y
Y = data'*data / (M*N-1);
[V D] = eigs(Y, 10); % reduce to 10 dimension
% project the original data
signals = data * V;
I guess svds can do the job for you.
In the doc, it says:
s = svds(A,k) computes the k largest singular values and associated
singular vectors of matrix A.
Which is essentially the k largest eigenvalues and eigenvectors. These are sorted by eigenvalues in descending order.
So for 10 principal components, just use [eigvec eigval] = svds(Y, 10);