Dimensionality reduction in Matlab - matlab

I want to reduce the dimension of data to ndim dimensions in MATLAB. I am using pcares to reduce dimension but the result (i.e. residuals,reconstructed) has the same dimensions as the data and not ndim. How can I project the residuals to ndim dimensions only.
[residuals,reconstructed] = pcares(X,ndim)
Sample code
MU = [0 0];
SIGMA = [4/3 2/3; 2/3 4/3];
X = mvnrnd(MU,SIGMA,1000);
[residuals,reconstructed] = pcares(X,1)
Now I expect the residuals to have 1 dimensions i.e. the data X projected to prime component as I specified it as pcares(X,1). But here both residuals and reconstructed have the same of 2.

pcares is doing its job. If you read the documentation, you call the function this way:
[RESIDUALS,RECONSTRUCTED] = pcares(X,NDIM);
RESIDUALS returns the residuals for each data point by retaining the first NDIM dimensions of your data and RECONSTRUCTED is the reconstructed data using the first NDIM principal components.
If you want the actual projection vectors, you need to use pca instead. You'd call it this way:
[coeff,score] = pca(x);
In fact, this is what pcares does under the hood but it also reconstructs the data for you using the above outputs. coeff returns the principal coefficients for your data while score returns the actual projection vectors themselves. score is such that each column is a single projection vector. It should be noted that these are ordered with respect to dominance as you'd expect with PCA... and so the first column is the most dominant direction, second column second dominant direction, etc.
Once you call the above, you simply index into coeff and score to retain whatever components you want. In your case, you just want the first component, and so do this:
c = coeff(1);
s = score(:,1);
If you want to reconstruct the data given your projection vectors, referring to the second last line of code, it's simply:
[coeff,score] = pca(x);
n = size(X,1);
ndim = 1; %// For your case
reconstructed = repmat(mean(X,1),n,1) + score(:,1:ndim)*coeff(:,1:ndim)';
The above is basically what pcares does under the hood.

Related

How do I write correlation coefficient manually in matlab?

The following is a function that takes two equal sized vectors X and Y, and is supposed to return a vector containing single correlation coefficients for image correspondence. The function is supposed to work similarly to the built in corr(X,Y) function in matlab if given two equal sized vectors. Right now my code is producing a vector containing multiple two-number vectors instead of a vector containing single numbers. How do I fix this?
function result = myCorr(X, Y)
meanX = mean(X);
meanY = mean(Y);
stdX = std(X);
stdY = std(Y);
for i = 1:1:length(X),
X(i) = (X(i) - meanX)/stdX;
Y(i) = (Y(i) - meanY)/stdY;
mult = X(i) * Y(i);
end
result = sum(mult)/(length(X)-1);
end
Edit: To clarify I want myCorr(X,Y) above to produce the same output at matlab's corr(X,Y) when given equal sized vectors of image intensity values.
Edit 2: Now the format of the output vector is correct, however the values are off by a lot.
I recommend you use r=corrcoef(X,Y) it will give you a normalized r value you are looking for in a 2x2 matrix and you can just return the r(2,1) entry as your answer. Doing this is equivalent to
r=(X-mean(X))*(Y-mean(Y))'/(sqrt(sum((X-mean(X)).^2))*sqrt(sum((Y-mean(Y)).^2)))
However, if you really want to do what you mentioned in the question you can also do
r=(X)*(Y)'/(sqrt(sum((X-mean(X)).^2))*sqrt(sum((Y-mean(Y)).^2)))

Understanding PCA in MATLAB

What are the difference between the following two functions?
prepTransform.m
function [mu trmx] = prepTransform(tvec, comp_count)
% Computes transformation matrix to PCA space
% tvec - training set (one row represents one sample)
% comp_count - count of principal components in the final space
% mu - mean value of the training set
% trmx - transformation matrix to comp_count-dimensional PCA space
% this is memory-hungry version
% commented out is the version proper for Win32 environment
tic;
mu = mean(tvec);
cmx = cov(tvec);
%cmx = zeros(size(tvec,2));
%f1 = zeros(size(tvec,1), 1);
%f2 = zeros(size(tvec,1), 1);
%for i=1:size(tvec,2)
% f1(:,1) = tvec(:,i) - repmat(mu(i), size(tvec,1), 1);
% cmx(i, i) = f1' * f1;
% for j=i+1:size(tvec,2)
% f2(:,1) = tvec(:,j) - repmat(mu(j), size(tvec,1), 1);
% cmx(i, j) = f1' * f2;
% cmx(j, i) = cmx(i, j);
% end
%end
%cmx = cmx / (size(tvec,1)-1);
toc
[evec eval] = eig(cmx);
eval = sum(eval);
[eval evid] = sort(eval, 'descend');
evec = evec(:, evid(1:size(eval,2)));
% save 'nist_mu.mat' mu
% save 'nist_cov.mat' evec
trmx = evec(:, 1:comp_count);
pcaTransform.m
function [pcaSet] = pcaTransform(tvec, mu, trmx)
% tvec - matrix containing vectors to be transformed
% mu - mean value of the training set
% trmx - pca transformation matrix
% pcaSet - output set transforrmed to PCA space
pcaSet = tvec - repmat(mu, size(tvec,1), 1);
%pcaSet = zeros(size(tvec));
%for i=1:size(tvec,1)
% pcaSet(i,:) = tvec(i,:) - mu;
%end
pcaSet = pcaSet * trmx;
Which one is actually doing PCA?
If one is doing PCA, what is the other one doing?
The first function prepTransform is actually doing the PCA on your training data where you are determining the new axes to represent your data onto a lower dimensional space. What it does is that it finds the eigenvectors of the covariance matrix of your data and then orders the eigenvectors such that the eigenvector with the largest eigenvalue appears in the first column of the eigenvector matrix evec and the eigenvector with the smallest eigenvalue appears in the last column. What's important with this function is that you can define how many dimensions you want to reduce the data down to by keeping the first N columns of evec which will allow you to reduce your data down to N dimensions. The discarding of the other columns and keeping only the first N is what is set as trmx in the code. The variable N is defined by the prep_count variable in prepTransform function.
The second function pcaTransform finally transforms data that is defined within the same domain as your training data but not necessarily the training data itself (it could be if you wish) onto the lower dimensional space that is defined by the eigenvectors of the covariance matrix. To finally perform the reduction of dimensions, or dimensionality reduction as it is popularly known, you simply take your training data where each feature is subtracted from its mean and you multiply your training data by the matrix trmx. Note that prepTransform outputting the mean of each feature in the vector mu is important in order to mean subtract your data when you finally call pcaTransform.
How to use these functions
To use these functions effectively, first determine the trmx matrix, which contain the principal components of your data by first defining how many dimensions you want to reduce your data down to as well as the mean of each feature stored in mu:
N = 2; % Reduce down to two dimensions for example
[mu, trmx] = prepTransform(tvec, N);
Next you can finally perform dimensionality reduction on your data that is defined within the same domain as tvec (or even tvec if you wish, but it doesn't have to be) by:
pcaSet = pcaTransform(tvec, mu, trmx);
In terms of vocabulary, pcaSet contain what are known as the principal scores of your data, which is the term used for the transformation of your data to the lower dimensional space.
If I can recommend something...
Finding PCA through the eigenvector approach is known to be unstable. I highly recommend you use the Singular Value Decomposition via svd on the covariance matrix where the V matrix of the result already gives you the eigenvectors sorted which correspond to your principal components:
mu = mean(tvec, 1);
[~,~,V] = svd(cov(tvec));
Then perform the transformation by taking the mean subtracted data per feature and multiplying by the V matrix, once you subset and grab the first N columns of V:
N = 2;
X = bsxfun(#minus, tvec, mu);
pcaSet = X*V(:, 1:N);
X is the mean subtracted data which performs the same thing as doing pcaSet = tvec - repmat(mu, size(tvec,1), 1);, but you are not explicitly replicating the mean vector over each training example but letting bsxfun do that for you internally. However, taking advantage of MATLAB R2016b, this repeating can be done without the explicit call to bsxfun:
X = tvec - mu;
Further Reading
If you fully want to understand the code that was written and the theory behind what it's doing, I recommend the following two Stack Overflow posts that I have written that talk about the topic:
What does selecting the largest eigenvalues and eigenvectors in the covariance matrix mean in data analysis?
How to use eigenvectors obtained through PCA to reproject my data?
The first post brings the code you presented into light which performs PCA using the eigenvector approach. The second post touches base on how you'd do it using the SVD towards the end of the answer. This answer I've written here is a mix between the two posts above.

How to use the reduced data - the output of principal component analysis

I am finding it hard to link the theory with the implementation. I would appreciate help in knowing where my understanding is wrong.
Notations - matrix in bold capital and vectors in bold font small letter
is a dataset on observations, each of variables. So, given these observed -dimensional data vectors, the -dimensional principal axes are , for in where is the target dimension.
The principal components of the observed data matrix would be where matrix , matrix , and matrix .
Columns of form an orthogonal basis for the features and the output is the principal component projection that minimizes the squared reconstruction error:
and the optimal reconstruction of is given by .
The data model is
X(i,j) = A(i,:)*S(:,j) + noise
where PCA should be done on X to get the output S. S must be equal to Y.
Problem 1: The reduced data Y is not equal to S that is used in the model. Where is my understanding wrong?
Problem 2: How to reconstruct such that the error is minimum?
Please help. Thank you.
clear all
clc
n1 = 5; %d dimension
n2 = 500; % number of examples
ncomp = 2; % target reduced dimension
%Generating data according to the model
% X(i,j) = A(i,:)*S(:,j) + noise
Ar = orth(randn(n1,ncomp))*diag(ncomp:-1:1);
T = 1:n2;
%generating synthetic data from a dynamical model
S = [ exp(-T/150).*cos( 2*pi*T/50 )
exp(-T/150).*sin( 2*pi*T/50 ) ];
% Normalizing to zero mean and unit variance
S = ( S - repmat( mean(S,2), 1, n2 ) );
S = S ./ repmat( sqrt( mean( Sr.^2, 2 ) ), 1, n2 );
Xr = Ar * S;
Xrnoise = Xr + 0.2 * randn(n1,n2);
h1 = tsplot(S);
X = Xrnoise;
XX = X';
[pc, ~] = eigs(cov(XX), ncomp);
Y = XX*pc;
UPDATE [10 Aug]
Based on the Answer, here is the full code that
clear all
clc
n1 = 5; %d dimension
n2 = 500; % number of examples
ncomp = 2; % target reduced dimension
%Generating data according to the model
% X(i,j) = A(i,:)*S(:,j) + noise
Ar = orth(randn(n1,ncomp))*diag(ncomp:-1:1);
T = 1:n2;
%generating synthetic data from a dynamical model
S = [ exp(-T/150).*cos( 2*pi*T/50 )
exp(-T/150).*sin( 2*pi*T/50 ) ];
% Normalizing to zero mean and unit variance
S = ( S - repmat( mean(S,2), 1, n2 ) );
S = S ./ repmat( sqrt( mean( S.^2, 2 ) ), 1, n2 );
Xr = Ar * S;
Xrnoise = Xr + 0.2 * randn(n1,n2);
X = Xrnoise;
XX = X';
[pc, ~] = eigs(cov(XX), ncomp);
Y = XX*pc; %Y are the principal components of X'
%what you call pc is misleading, these are not the principal components
%These Y columns are orthogonal, and should span the same space
%as S approximatively indeed (not exactly, since you introduced noise).
%If you want to reconstruct
%the original data can be retrieved by projecting
%the principal components back on the original space like this:
Xrnoise_reconstructed = Y*pc';
%Then, you still need to project it through
%to the S space, if you want to reconstruct S
S_reconstruct = Ar'*Xrnoise_reconstructed';
plot(1:length(S_reconstruct),S_reconstruct,'r')
hold on
plot(1:length(S),S)
The plot is which is very different from the one that is shown in the Answer. Only one component of S exactly matches with that of S_reconstructed. Shouldn't the entire original 2 dimensional space of the source input S be reconstructed?
Even if I cut off the noise, then also onely one component of S is exactly reconstructed.
I see nobody answered your question, so here goes:
What you computed in Y are the principal components of X' (what you call pc is misleading, these are not the principal components). These Y columns are orthogonal, and should span the same space as S approximatively indeed (not exactly, since you introduced noise).
If you want to reconstruct Xrnoise, you must look at the theory (e.g. here) and apply it correctly: the original data can be retrieved by projecting the principal components back on the original space like this:
Xrnoise_reconstructed = Y*pc'
Then, you still need to transform it through pinv(Ar)*Xrnoise_reconstructed, if you want to reconstruct S.
Matches nicely for me:
answer to UPDATE [10 Aug]: (EDITED 12 Aug)
Your Ar matrix does not define an orthonormal basis, and as such, the transpose Ar' is not the reverse transformation. The earlier answer I provided was thus wrong. The answer has been corrected above.
Your understanding is quite right. One of the reasons for somebody to use PCA would be to reduce the dimensionality of the data. The first principal component has the largest sample variance amongst of all the normalized linear combinations of the columns of X. The second principal component has maximum variance subject to being orthogonal to the next one, etc.
One might then do a PCA on a dataset, and decide to cut off the last principal component or several of the last principal components of the data. This is done to reduce the effect of the curse of dimensionality. The curse of dimensionality is a term used to point out the fact that any group of vectors is sparse in a relatively high dimensional space. Conversely, this means that you would need an absurd amount of data to form any model on a fairly high dimension dataset, such as an word histogram of a text document with possibly tens of thousands of dimensions.
In effect a dimensionality reduction by PCA removes components that are strongly correlated. For example let's take a look at a picture:
As you can see, most of the values are almost the same, strongly correlated. You could meld some of these correlated pixels by removing the last principal components. This would reduce the dimensionality of the image, pack it, by removing some of the information in the image.
There is no magic way to determine the best amount of principal components or the best reconstruction that I'm aware of.
Forgive me if i am not mathematically rigorous.
If we look at the equation: X = A*S we can say that we are taking some two dimensional data and we map it to a 2 dimensional subspace in 5 dimensional space. Were A is some base for that 2 dimensional subspace.
When we solve the PCA problem for X and look at PC (principal compononet) we see that the two big eignvectors (which coresponds to the two largest eignvalues) span the same subspace that A did. (multpily A'*PC and see that for the first three small eignvectors we get 0 which means that the vectors are orthogonal to A and only for the two largest we get values that are different than 0).
So what i think that the reason why we get a different base for this two dimensional space is because X=A*S can be product of some A1 and S1 and also for some other A2 and S2 and we will still get X=A1*S1=A2*S2. What PCA gives us is a particular base that maximize the variance in each dimension.
So how to solve the problem you have? I can see that you chose as the testing data some exponential times sin and cos so i think you are dealing with a specific kind of data. I am not an expert in signal processing but look at MUSIC algorithm.
You could use the pca function from Statistics toolbox.
coeff = pca(X)
From documentation, each column of coeff contains coefficients for one principal component. So you can reconstruct the observed data X by multiplying with coeff, e.g. X*coeff

Convolution of multiple 1D signals in a 2D matrix with multiple 1D kernels in a 2D matrix

I have a randomly defined H matrix of size 600 x 10. Each element in this matrix H can be represented as H(k,t). I obtained a speech spectrogram S which is 600 x 597. I obtained it using Mel features, so it should be 40 x 611 but then I used a frame stacking concept in which I stacked 15 frames together. Therefore it gave me (40x15) x (611-15+1) which is 600 x 597.
Now I want to obtain an output matrix Y which is given by the equation based on convolution Y(k,t) = ∑ H(k,τ)S(k,t-τ). The sum goes from τ=0 to τ=Lh-1. Lh in this case would be 597.
I don't know how to obtain Y. Also, my doubt is the indexing into both H and S when computing the convolution. Specifically, for Y(1,1), we have:
Y(1,1) = H(1,0)S(1,1) + H(1,1)S(1,0) + H(1,2)S(1,-1) + H(1,3)S(1,-2) + ...
Now, there is no such thing as negative indices in MATLAB - for example, S(1,-1) S(1,-2) and so on. So, what type of convolution should I use to obtain Y? I tried using conv2 or fftfilt but I think that will not give me Y because Y must also be the size of S.
That's very easy. That's a convolution on a 2D signal only being applied to 1 dimension. If we assume that the variable k is used to access the rows and t is used to access the columns, you can consider each row of H and S as separate signals where each row of S is a 1D signal and each row of H is a convolution kernel.
There are two ways you can approach this problem.
Time domain
If you want to stick with time domain, the easiest thing would be to loop over each row of the output, find the convolution of each pair of rows of S and H and store the output in the corresponding output row. From what I can tell, there is no utility that can convolve in one dimension only given an N-D signal.... unless you go into frequency domain stuff, but let's leave that for later.
Something like:
Y = zeros(size(S));
for idx = 1 : size(Y,1)
Y(idx,:) = conv(S(idx,:), H(idx,:), 'same');
end
For each row of the output, we perform a row-wise convolution with a row of S and a row of H. I use the 'same' flag because the output should be the same size as a row of S... which is the bigger row.
Frequency domain
You can also perform the same computation in frequency domain. If you know anything about the properties of convolution and the Fourier Transform, you know that convolution in time domain is multiplication in the frequency domain. You take the Fourier Transform of both signals, multiply them element-wise, then take the Inverse Fourier Transform back.
However, you need to keep the following intricacies in mind:
Performing a full convolution means that the final length of the output signal is length(A)+length(B)-1, assuming A and B are 1D signals. Therefore, you need to make sure that both A and B are zero-padded so that they both match the same size. The reason why you make sure that the signals are the same size is to allow for the multiplication operation to work.
Once you multiply the signals in the frequency domain then take the inverse, you will see that each row of Y is the full length of the convolution. To ensure that you get an output that is the same size as the input, you need to trim off some points at the beginning and at the end. Specifically, since each kernel / column length of H is 10, you would have to remove the first 5 and last 5 points of each signal in the output to match what you get in the for loop code.
Usually after the inverse Fourier Transform, there are some residual complex coefficients due to the nature of the FFT algorithm. It's good practice to use real to remove the complex valued parts of the results.
Putting all of this theory together, this is what the code would look like:
%// Define zero-padded H and S matrices
%// Rows are the same, but columns must be padded to match point #1
H2 = zeros(size(H,1), size(H,2)+size(S,2)-1);
S2 = zeros(size(S,1), size(H,2)+size(S,2)-1);
%// Place H and S at the beginning and leave the rest of the columns zero
H2(:,1:size(H,2)) = H;
S2(:,1:size(S,2)) = S;
%// Perform Fourier Transform on each row separately of padded matrices
Hfft = fft(H2, [], 2);
Sfft = fft(S2, [], 2);
%// Perform convolution
Yfft = Hfft .* Sfft;
%// Take inverse Fourier Transform and convert to real
Y2 = real(ifft(Yfft, [], 2));
%// Trim off unnecessary values
Y2 = Y2(:,size(H,2)/2 + 1 : end - size(H,2)/2 + 1);
Y2 should be the convolved result and should match Y in the previous for loop code.
Comparison between them both
If you actually want to compare them, we can. What we'll need to do first is define H and S. To reconstruct what I did, I generated random values with a known seed:
rng(123);
H = rand(600,10);
S = rand(600,597);
Once we run the above code for both the time domain version and frequency domain version, let's see how they match up in the command prompt. Let's show the first 5 rows and 5 columns:
>> format long g;
>> Y(1:5,1:5)
ans =
1.63740867892464 1.94924208172753 2.38365646354643 2.05455605619097 2.21772526557861
2.04478411247085 2.15915645246324 2.13672842742653 2.07661341840867 2.61567534623066
0.987777477630861 1.3969752201781 2.46239452105228 3.07699790208937 3.04588738611503
1.36555260994797 1.48506871890027 1.69896157726456 1.82433906982894 1.62526864072424
1.52085236885395 2.53506897420001 2.36780282057747 2.22335617436888 3.04025523335182
>> Y2(1:5,1:5)
ans =
1.63740867892464 1.94924208172753 2.38365646354643 2.05455605619097 2.21772526557861
2.04478411247085 2.15915645246324 2.13672842742653 2.07661341840867 2.61567534623066
0.987777477630861 1.3969752201781 2.46239452105228 3.07699790208937 3.04588738611503
1.36555260994797 1.48506871890027 1.69896157726456 1.82433906982894 1.62526864072424
1.52085236885395 2.53506897420001 2.36780282057747 2.22335617436888 3.04025523335182
Looks good to me! As another measure, let's figure out what the largest difference is between one value in Y and a corresponding value in Y2:
>> max(abs(Y(:) - Y2(:)))
ans =
5.32907051820075e-15
That's saying that the max error seen between both outputs is in the order of 10-15. I'd say that's pretty good.

Numerical derivative of a vector

I have a problem with numerical derivative of a vector that is x: Nx1 with respect to another vector t (time) that is the same size of x.
I do the following (x is chosen to be sine function as an example):
t=t0:ts:tf;
x=sin(t);
xd=diff(x)/ts;
but the answer xd is (N-1)x1 and I figured out that it does not compute derivative corresponding to the first element of x.
is there any other way to compute this derivative?
You are looking for the numerical gradient I assume.
t0 = 0;
ts = pi/10;
tf = 2*pi;
t = t0:ts:tf;
x = sin(t);
dx = gradient(x)/ts
The purpose of this function is a different one (vector fields), but it offers what diff doesn't: input and output vector of equal length.
gradient calculates the central difference between data points. For an
array, matrix, or vector with N values in each row, the ith value is
defined by
The gradient at the end points, where i=1 and i=N, is calculated with
a single-sided difference between the endpoint value and the next
adjacent value within the row. If two or more outputs are specified,
gradient also calculates central differences along other dimensions.
Unlike the diff function, gradient returns an array with the same
number of elements as the input.
I know I'm a little late to the game here, but you can also get an approximation of the numerical derivative by taking the derivatives of the polynomial (cubic) splines that runs through your data:
function dy = splineDerivative(x,y)
% the spline has continuous first and second derivatives
pp = spline(x,y); % could also use pp = pchip(x,y);
[breaks,coefs,K,r,d] = unmkpp(pp);
% pre-allocate the coefficient vector
dCoeff = zeroes(K,r-1);
% Columns are ordered from highest to lowest power. Both spline and pchip
% return 4xn matrices, ordered from 3rd to zeroth power. (Thanks to the
% anonymous person who suggested this edit).
dCoeff(:, 1) = 3 * coefs(:, 1); % d(ax^3)/dx = 3ax^2;
dCoeff(:, 2) = 2 * coefs(:, 2); % d(ax^2)/dx = 2ax;
dCoeff(:, 3) = 1 * coefs(:, 3); % d(ax^1)/dx = a;
dpp = mkpp(breaks,dCoeff,d);
dy = ppval(dpp,x);
The spline polynomial is always guaranteed to have continuous first and second derivatives at each point. I haven not tested and compared this against using pchip instead of spline, but that might be another option as it too has continuous first derivatives (but not second derivatives) at every point.
The advantage of this is that there is no requirement that the step size be even.
There are some options to work-around your issue.
First: you can make your domain larger. Instead of N, use N+1 gridpoints.
Second: depending on the end-point of interest, you can use
Forward difference: F(x + dx) - F(x)
Backward difference: F(x) - F(x - dx)