Dimension reduction using PCA in Matlab - matlab

I am trying to classify vehicles in matlab. I need to reduce the dimensionality of the features to eliminate redundancy. Am using pca for this. Unfortunately, the the pca function is not returning the expected results. The output seems truncated and i don't understand why.
summary of this is as follows:
Components_matrix = [Areas_vector MajorAxisLengths_vector MinorAxisLengths_vector Perimeters_vector...
EquivDiameters_vector Extents_vector Orientations_vector Soliditys_vector]
The output is:
Components_matrix =
1.0e+03 *
1.4000 0.1042 0.0220 0.3352 0.0422 0.0003 0.0222 0.0006
2.7690 0.0998 0.0437 0.3973 0.0594 0.0005 0.0234 0.0007
1.7560 0.0853 0.0317 0.2610 0.0473 0.0005 0.0236 0.0008
1.0870 0.0920 0.0258 0.3939 0.0372 0.0003 0.0157 0.0005
0.7270 0.0583 0.0233 0.2451 0.0304 0.0004 0.0093 0.0006
1.2380 0.0624 0.0317 0.2436 0.0397 0.0004 0.0106 0.0007
Then i used the pca function as follows:
[COEFF, SCORE, LATENT] = pca(Components_matrix)
The displayed results are:
COEFF =
0.9984 -0.0533 -0.0057 -0.0177 0.0045
0.0162 0.1810 0.8788 0.0695 -0.3537
0.0099 -0.0218 -0.2809 0.8034 -0.2036
0.0514 0.9817 -0.1739 -0.0016 0.0468
0.0138 -0.0018 0.0616 0.4276 -0.3585
0.0001 -0.0008 -0.0025 0.0215 0.0210
0.0069 0.0158 0.3388 0.4070 0.8380
0.0001 -0.0011 0.0022 0.0198 0.0016
SCORE =
1.0e+03 *
-0.0946 0.0312 0.0184 -0.0014 -0.0009
1.2758 0.0179 -0.0086 -0.0008 0.0001
0.2569 -0.0642 0.0107 0.0016 0.0012
-0.4043 0.1031 -0.0043 0.0015 0.0003
-0.7721 -0.0299 -0.0079 -0.0017 0.0012
-0.2617 -0.0580 -0.0083 0.0008 -0.0020
LATENT =
1.0e+05 *
5.0614
0.0406
0.0014
0.0000
0.0000
I expected for instance COEFF and LATENT to be 8x8 and 8x1 matrices respectively. But that is not what i get. Why is this so and how can the situation be rectified. Kindly help.

Your usage of pca() and Matlab's output are correct. The issue is that you have more dimensions than you have samples, i.e., you only have 6 vehicles but 8 variables. If you have N samples and N or greater variables, the number of principal components there are is only N-1, because further components would not be unique. So COEFF are the eigenvectors of the covariance matrix of the input, and SCORE(:,1) is the first principal component, SCORE(:,2) is the second, etc., of which there are only N-1=5 in total, and LATENT are the eigenvalues of the covariance matrix, or the amount of variance explained by each successive principal component, of which there are, again, only N-1=5.
There is a more detailed discussion of this here.

Related

Changing correlation matrix into covariane matrix Matlab

I'm trying to change a correlation matrix into co-variance matrix...
Importing some data, I found the co-variance (sigma_a)
sigma_a = (sigma_d + (mu_d'+1)*(mu_d+1)).^N - (mu_d'+1).^N *(mu_d+1).^N;
Which returns...
0.1211 0.0231 0.0422 0.0278 0.0411 0.0354 0.0289 0.0366 0.0343 0.0165
0.0231 0.0788 0.0283 0.0242 0.0199 0.0248 0.0219 0.0199 0.0253 0.0140
0.0422 0.0283 0.1282 0.0339 0.0432 0.0366 0.0321 0.0399 0.0420 0.0216
0.0278 0.0242 0.0339 0.0554 0.0261 0.0294 0.0312 0.0269 0.0297 0.0164
0.0411 0.0199 0.0432 0.0261 0.0849 0.0289 0.0271 0.0371 0.0317 0.0173
0.0354 0.0248 0.0366 0.0294 0.0289 0.0728 0.0293 0.0400 0.0339 0.0149
0.0289 0.0219 0.0321 0.0312 0.0271 0.0293 0.0454 0.0276 0.0309 0.0135
0.0366 0.0199 0.0399 0.0269 0.0371 0.0400 0.0276 0.0726 0.0356 0.0162
0.0343 0.0253 0.0420 0.0297 0.0317 0.0339 0.0309 0.0356 0.0715 0.0198
0.0165 0.0140 0.0216 0.0164 0.0173 0.0149 0.0135 0.0162 0.0198 0.0927
Then I found the correlation matrix (rho)
rho = inv(sqrt(diag(diag(sigma_a))))*sigma_a*inv(sqrt(diag(diag(sigma_a))));
Which returns...
1.0000 0.2365 0.3388 0.3396 0.4050 0.3772 0.3897 0.3899 0.3686 0.1556
0.2365 1.0000 0.2812 0.3656 0.2437 0.3274 0.3658 0.2631 0.3377 0.1638
0.3388 0.2812 1.0000 0.4027 0.4141 0.3792 0.4199 0.4133 0.4382 0.1985
0.3396 0.3656 0.4027 1.0000 0.3809 0.4638 0.6221 0.4246 0.4728 0.2295
0.4050 0.2437 0.4141 0.3809 1.0000 0.3681 0.4366 0.4732 0.4068 0.1948
0.3772 0.3274 0.3792 0.4638 0.3681 1.0000 0.5093 0.5499 0.4707 0.1813
0.3897 0.3658 0.4199 0.6221 0.4366 0.5093 1.0000 0.4797 0.5428 0.2079
0.3899 0.2631 0.4133 0.4246 0.4732 0.5499 0.4797 1.0000 0.4936 0.1971
0.3686 0.3377 0.4382 0.4728 0.4068 0.4707 0.5428 0.4936 1.0000 0.2435
0.1556 0.1638 0.1985 0.2295 0.1948 0.1813 0.2079 0.1971 0.2435 1.0000
I know there is the function corrcov() in matlab that finds the correlation matrix... So I tried,
corrcov(sigma_a)
I compared the results and both corrcov(sigma_a) and rho produced the same correlation matrix.
However then I wanted to change all of the pairwise correlations by exactly +0.1. Which I did, with
rho_u = (rho + .1) - .1*eye(10);
And I got the following correlation matrix...
1.0000 0.3365 0.4388 0.4396 0.5050 0.4772 0.4897 0.4899 0.4686 0.2556
0.3365 1.0000 0.3812 0.4656 0.3437 0.4274 0.4658 0.3631 0.4377 0.2638
0.4388 0.3812 1.0000 0.5027 0.5141 0.4792 0.5199 0.5133 0.5382 0.2985
0.4396 0.4656 0.5027 1.0000 0.4809 0.5638 0.7221 0.5246 0.5728 0.3295
0.5050 0.3437 0.5141 0.4809 1.0000 0.4681 0.5366 0.5732 0.5068 0.2948
0.4772 0.4274 0.4792 0.5638 0.4681 1.0000 0.6093 0.6499 0.5707 0.2813
0.4897 0.4658 0.5199 0.7221 0.5366 0.6093 1.0000 0.5797 0.6428 0.3079
0.4899 0.3631 0.5133 0.5246 0.5732 0.6499 0.5797 1.0000 0.5936 0.2971
0.4686 0.4377 0.5382 0.5728 0.5068 0.5707 0.6428 0.5936 1.0000 0.3435
0.2556 0.2638 0.2985 0.3295 0.2948 0.2813 0.3079 0.2971 0.3435 1.0000
However, when I attempt to take the adjusted correlation matrix and make it a co-variance matrix the cov() is not producing the right matrix. I tried...
b = cov(rho_u);
Why is that? Is there another way to do that? Or is there a way to adjust what I did with
rho = inv(sqrt(diag(diag(sigma_a))))*sigma_a*inv(sqrt(diag(diag(sigma_a))));
so that it does the opposite (rho found the correlation matrix) to get the co-varience matrix instead?
Based on my understanding from the answer below, then the co-variance matrix for rho_u would be achieved by, doing...
sigma = sqrt(var(rho_u));
D = diag(sigma);
sigma_u = D*rho_u*D
Is this what was meant? I was little confused by which variables I should take the variance to. I thought that meant rho_u?
The MATLAB function cov is not defined to transform a correlation matrix to covariance matrix, as its documentation says
cov(X), if X is a vector, returns the variance. For matrices, where
each row is an observation, and each column a variable, cov(X) is the
covariance matrix.
So simply feeding the correlation matrix to cov() won't work. What you need to calculate the covariance matrix is the variance of your variables (which you can calculate from your data, but didn't post here)
So in your example using the 10x10 correlation matrix rho you posted and using some random numbers for the the standard deviations
sigma = rand(size(rho(:,1)));
D = diag(sigma); % Make the sigmas appear on the diagonal of an 10x10 matrix
(you have to insert the calculated values from your input data, of course). You can then calculate the covariance matrix by
S = D*rho*D

compute inverse fft manually

I am trying to compute the inverse FFT of a fft-output manually. I am using the following script, which first uses fft to compute the FFT of a data set. I then try to find the inverse FFT manually, but it doesn't resemble the result I get from ifft.
Can you spot my error? I am merely using the standard inverse formula of the FFT presented here, https://en.wikipedia.org/wiki/Fast_Fourier_transform#Definition_and_speed
data = [
-0.0005
-0.0004
-0.0003
-0.0002
-0.0001
-0.0000
0.0001
0.0001
0.0001
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0009
0.0010
0.0011
0.0011
0.0012
0.0011
0.0011
0.0011
0.0010
0.0011];
delta = 0.0125;
fs = 1/delta;
x = (0:1:length(data)-1)/fs;
X=fft(data);
%find fft
N=length(data);
ws = 2*pi/N;
wnorm = -pi:ws:pi;
wnorm = wnorm(1:length(x));
w = wnorm*fs;
figure(2)
plot(w/(2*pi),abs(fftshift(X)))
%find inverse fft manually
for m=1:length(X)
for k=1:length(data)
X_real(m) = X(k)*exp(i*k*ws*(m-1));
end
end
figure(3)
plot(1:length(data), abs(X_real), 1:length(data), ifft(X))
Please, change your for loop like below.
for m=1:length(X)
for k=1:length(data)
temp(k) = X(k)*exp(i*(m-1)*ws*(k-1));
end
X_real(m)=(1/N)*sum(temp);
end
figure(3)
plot(1:length(data), real(X_real))
You can find the equation of ifft in matlab, here.
You missed two things.
One thing is normalization, another is summing.

inverse fft with matlab not working

I am trying to do an inverse FFT in Matlab, but I can't seem to get the inverse working correctly. Here is my code:
data = [-0.0005
-0.0004
-0.0003
-0.0002
-0.0001
-0.0000
0.0001
0.0001
0.0001
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0009
0.0010
0.0011
0.0011
0.0012
0.0011
0.0011
0.0011
0.0010 ];
%plot data
figure(1)
plot(data)
%FFT
N = 100;
X = fft(data, N);
F = [-N/2:N/2-1]/N;
F = F/0.0125;
X = fftshift(X);
figure(2)
plot(F, abs( X ) )
%inverse FFT
y = ifft(X);
figure(3)
plot(F,y)
Figure 1 and 3 should be identical, but are not in any way. I made sure not to take ifft of the absolute value of fft, so it's not clear to me what is wrong.
Since you shifted the spectrum using fftshift, you have to "unshift" the spectrum prior to taking the inverse Fourier transform
y = ifft(fftshift(X));

MvNormal Error with Symmetric & Positive Semi-Definite Matrix

The summary of my problem is that I am trying to replicate the Matlab function:
mvnrnd(mu', sigma, 200)
into Julia using:
rand( MvNormal(mu, sigma), 200)'
and the result is a 200 x 7 matrix, essentially generating 200 random return time series data.
Matlab works, Julia doesn't.
My input matrices are:
mu = [0.15; 0.03; 0.06; 0.04; 0.1; 0.02; 0.12]
sigma = [0.0035 -0.0038 0.0020 0.0017 -0.0006 -0.0028 0.0009;
-0.0038 0.0046 -0.0011 0.0001 0.0003 0.0054 -0.0024;
0.0020 -0.0011 0.0041 0.0068 -0.0004 0.0047 -0.0036;
0.0017 0.0001 0.0068 0.0125 0.0002 0.0109 -0.0078;
-0.0006 0.0003 -0.0004 0.0002 0.0025 -0.0004 -0.0007;
-0.0028 0.0054 0.0047 0.0109 -0.0004 0.0159 -0.0093;
0.0009 -0.0024 -0.0036 -0.0078 -0.0007 -0.0093 0.0061]
Using Distributions.jl, running the line:
MvNormal(sigma)
Produces the error:
ERROR: LoadError: Base.LinAlg.PosDefException(4)
The matrix sigma is symmetrical but only positive semi-definite:
issym(sigma) #symmetrical
> true
isposdef(sigma) #positive definite
> false
using LinearOperators
check_positive_definite(sigma) #check for positive (semi-)definite
> true
Matlab produces the same results for these tests however Matlab is able to generate the 200x7 random return sample matrix.
Could someone advise as to what I could do to get it working in Julia? Or where the issue lies?
Thanks.
The issue is that the covariance matrix is indefinite. See
julia> eigvals(sigma)
7-element Array{Float64,1}:
-3.52259e-5
-2.42008e-5
2.35508e-7
7.08269e-5
0.00290538
0.0118957
0.0343873
so it is not a covariance matrix. This might have happened because of rounding so if you have access to unrounded data you can try that instead. I just tried and I also got an error in Matlab. However, in contrast to Julia, Matlab does allow the matrix to be positive semidefinite.
A way to make this work is to add a diagonal matrix to the original matrix and then input that to MvNormal. I.e.
julia> MvNormal(randn(7), sigma - minimum(eigvals(Symmetric(sigma)))*I)
Distributions.MvNormal{PDMats.PDMat{Float64,Array{Float64,2}},Array{Float64,1}}(
dim: 7
μ: [0.889004,-0.768551,1.78569,0.130445,0.589029,0.529418,-0.258474]
Σ: 7x7 Array{Float64,2}:
0.00353523 -0.0038 0.002 0.0017 -0.0006 -0.0028 0.0009
-0.0038 0.00463523 -0.0011 0.0001 0.0003 0.0054 -0.0024
0.002 -0.0011 0.00413523 0.0068 -0.0004 0.0047 -0.0036
0.0017 0.0001 0.0068 0.0125352 0.0002 0.0109 -0.0078
-0.0006 0.0003 -0.0004 0.0002 0.00253523 -0.0004 -0.0007
-0.0028 0.0054 0.0047 0.0109 -0.0004 0.0159352 -0.0093
0.0009 -0.0024 -0.0036 -0.0078 -0.0007 -0.0093 0.00613523
)
The "covariance" matrix is of course not the same anymore, but it is very close.

polyfit/polyval with log scale through scatter points in matlab

I have a scatter plot with both x and y axes in log scale in Matlab. How do I add a line of best fit on the log scale?
Thanks!
x = [0.0090 0.0000 0.0001 0.0000 0.0001 0.0000 0.0097 0.0016 0.0006 0.0000 0.0016 0.0013 0.0023];
y = [0.0085 0.0001 0.0013 0.0006 0.0005 0.0006 0.0018 0.0076 0.0015 0.0001 0.0039 0.0015 0.0024];
scatter(x,y)
set(gca,'YScale','log');
set(gca,'XScale','log');
hold on
p = polyfit(log(x),log(y),1);
f = polyval(p,x);
plot(x,f,'Color',[0.7500 0.7500 0.7500],'linewidth',2)
When searching for the best fit, you need to use the original data x and y and not their logs. The log scale serves only for representation of the result.
Before use the polyval you need to sort the x. It does not matter when using normal axes, but can look strange with log-axes, because of the wrong sequence.
Here is the plot:
The code:
x = [0.0090 0.0000 0.0001 0.0000 0.0001 0.0000 0.0097 0.0016 0.0006 0.0000 0.0016 0.0013 0.0023];
y = [0.0085 0.0001 0.0013 0.0006 0.0005 0.0006 0.0018 0.0076 0.0015 0.0001 0.0039 0.0015 0.0024];
scatter(x,y);
set(gca,'YScale','log');
set(gca,'XScale','log');
hold on;
x_sort = sort(x);
p = polyfit(x,y,1);
f = polyval(p,x_sort);
plot(x_sort,f,'Color',[0.7500 0.7500 0.7500],'linewidth',2);
Is it what you wanted?