Doing PCA and Whitening with matlab

Doing PCA and Whitening with matlab - matlab

My task is to do PCA and whitening transform with given 2dimentional 5000data.
What I understand with PCA is analyzing the main axis of the data with covariance Matrix's Eigen Vector and rotate the main axis to the x axis!
So here's what I did.
[BtEvector,BtEvalue]=eig(MYCov);% Eigen value and vector using built-in function
I first calculated eigen values and vectors. The result was
BtEvalue=[4.027487815706757,0;0,8.903923357227459]
and
BtEvector=[0.033937679569230,-0.999423951036524;-0.999423951036524,-0.033937679569230]
So I figured out that the main axis will have eigen value of 8.903923357227459 and eigen vector of [-0.999423951036524,-0.033937679569230] which is the second corresponding term.
After then, because it's two dimentional data, I let cos(theta) as -0.9994.. and sin(theta)=-0.033937. Because I thought the main axis of the data(eigen vector [-0.999423951036524,-0.033937679569230]) has to be x axis I made rotational axis R= [cos(-Theta)-sin(-theta);sin(-theta) cos(-theta)]. Let original data sets A=>2*5000, I did A*R to get rotated data.
Also, For whitening case, using Cholesky whitening, I made whitening transformation matrix as inv(Covariance Matrix).
Is there something wrong with my algorithm? Could someone testify if there's error or misunderstanding please? Thank you a lot in advance.

Since your data is two-dimensional, the covariance matrix that you calculated is not accurate. If you only calculate the covariance with respect to one axis (say x), you're assuming that the covariance along the y axis is identity. This is obviously not true. Although you've attempted to address this, there's a sound procedure that you can use (I've explained below).
Unfortunately, this is a common mistake. Have a look at this paper, where it is explained exactly how the covariance should be calculated.
In summary, you can calculate the covariance along each axis (Sx and Sy). Then approximate the 2D covariance of the vectorized matrix as kron(Sx,Sy). This will be a better approximation of the 2D covariance.

Related

Vectorization in PCA

i am doing Principal Component Analysis,and want help to know if can represent
summation from i to m (X(i)*X(i)^T) in terms of data matrix..direct multiplication of two matrices.
Can this be done..or need i use a for loop and do it.
Currently i have tried
sum=zeros(n,n);
for i=1:m
sum=sum+ X(i,:)*(X(i,:)^T);
end
My goal is to find the principal eigen values of the resulting matrix.
Thanks in advance

Say the shape of the data matrix X is (Dim, Num), you can just compute sum of all sample correlations with:
S = X*X'
For implementing PCA, also don't forget to divide the matrix by the amount of samples.
Sigma = (1/N)X*X'
If your data has zero mean, this is also the covariance matrix.

Why is this polynomial equation badly conditioned?

I have 1x1024 matrix. So I'd like to estimate a polynomial equation.
X= (0:1023)'
Y= acquired data. A 1024 element vector
Then I try this in MATLAB:
polyfit(x,y,5)
But MATLAB makes an abnormal result with warning.
Warning: Polynomial is badly conditioned. Add points with distinct X values, reduce the degree of the ...
I don't understand what am I doing wrong?
Update
I got a bunch of numbers like this.
Y=
-0.0000000150
...
0.00001
...
0
...
0.17
X= 0~255
polyfit(X,Y,4)
I got a polynomial but it does not match to original curve.
Is there any options to match between original curve and polyfit's curve?

The problem can be attributed to the type of coefficient matrix that polyfit builds from the x vector: a Vandermonde matrix.
When
the elements of the x vector vary too much in magnitude, and
the degree of the fitting polynomial is too high,
you get an ill-conditioned matrix, and the associated linear system cannot be solved reliably.
Try to centre and scale your x vector first, before applying polyfit, as advised at the bottom of the polyfit help page:
Since the columns in the Vandermonde matrix are powers of the vector x, the condition number of V is often large for high-order fits, resulting in a singular coefficient matrix. In those cases centering and scaling can improve the numerical properties of the system to produce a more reliable fit.
(my emphasis)

The warning is because the data that you are supplying to polyfit with your desired degree of polynomial isn't suitable. Specifically, there is an insufficient amount of variability in your data so that you can successfully achieve a good fit. Therefore, MATLAB gives you that warning because the data can't be fit properly with your desired degree polynomial.
The solution to this is to either get more points so that you can get the desired fit of the polynomial degree you want or to decrease the degree of polynomial you want.
Try values that are less than 5... 4, 3 or perhaps 2:
coeff = polyfit(x, y, 4);
%// or
%coeff = polyfit(x, y, 3);
%coeff = polyfit(x, y, 2);
Try each degree until you don't get the warning anymore. However, without the actual data, I can only speculate what's wrong, and this is my best guess.

Matlab calculations of major/minor axis lengths (regiosprops)

I have been doing a fair bit of searching around for an answer to my issue here.. hopefully someone can tell me where im going wrong.
im trying to replicate matlabs regionprops function to calculate major/minor axis lengths, and think i understand most of what it's doing but im a little lost.
my understanding was that the axis lengths are equal to the eigenvalues of the covariance matrix (of the regions pixel coordinates) but the numbers i am getting are far from what matlab is returning.
so far i am:
1)extracting coordinates as an n-by-2 matrix.
2)subtracting the x/y means from the matrix.
3)calculating covariance matrix (matrix'.matrix)/num_pixels
4)calculating eigenvalues from the covariance matrix
but the values i get are way away from what matlab is returning. am i doing completely the wrong thing here or am i just making a mistake in the working?

How can I plot only real eigenvalues of symmetrical matrix in matlab?

I have created a matrix of potentials for a particle in a square well. When I take the eigenvectors of the matrix, I get mirror images for the first few (about 10) vectors. For example, the first eigenvector is a postive hump but there is also a negative mirror hump underneath. I looked at the output of the first few vectors and it appeared that the sign of the number was merely changing back and forth from positive to negative. For later vectors this is not the case so I cannot just plot every other point of the vectors. I am using the following command to plot eigenvectors.
[V,D] = eig(A);
I do not see imaginary numbers in my output. However, it has been suggested to me that MATLAB may be trying to plot the real and imaginary components of the eigenvectors. I found the following command on this site and thought it would fix my problem assuming my problem is in fact that the real and imaginary components are not being plotted.
A1 = real(V*real(D)/V);
then I plot:
[V,D] = eig(A1);
Nothing has changed and I am confused as to whether I am correctly plotting the real eigenvalues or if there is something else causing these mirror images. Help!

Real symmetric matrices have always only real eigenvalues and orthogonal eigenspaces, i.e., one can always construct an orthonormal basis of eigenvectors.
If your physical system has a spacial symmetry, for instance if you can mirror it about some symmetry axis such that the physics of both systems is the same, then this symmetry is also reflected in the eigenspaces, they will always have even dimension and you can either construct odd and even symmetric eigenvectors or pairs of eigenvectors that are mirror images of each other.
To say more one needs more details about your problem.

How to use matlab contourf to draw two-dimensional decision boundary

I finished an SVM training and got data like X, Y. X is the feature matrix only with 2 dimensions, and Y is the classification labels. Because the data is only in two dimensions, so I would like to draw a decision boundary to show the surface of support vectors.
I use contouf in Matlab to do the trick, but really find it hard to understand how to use the function.
I wrote like:
#1 try:
contourf(X);
#2 try:
contourf([X(:,1) X(:,2) Y]);
#3 try:
Z(:,:,1)=X(Y==1,:);
Z(:,:,2)=X(Y==2,:);
contourf(Z);
all these things do not correctly. And I checked the Matlab help files, most of them make Z as a function, so I really do not know how to form the correct Z matrix.

If you're using the svmtrain and svmclassify commands from Bioinformatics Toolbox, you can just use the additional input argument (...'showplot', true), and it will display a scatter plot with a decision boundary and the support vectors highlighted.
If you're using your own SVM, or a third-party tool such as libSVM, what you probably need to do is to:
Create a grid of points in your 2D input feature space using the meshgrid command
Classify those points using your trained SVM
Plot the grid of points and the classifications using contourf.
For example, in kind-of-MATLAB-but-pseudocode, assuming your input features are called X1 and X2:
numPtsInGrid = 100;
x1Range = linspace(x1lower, x1upper, numPtsInGrid);
x2Range = linspace(x2lower, x2upper, numPtsInGrid);
[X1, X2] = meshgrid(x1Range, x2Range);
Z = classifyWithMySVMSomehow([X1(:), X2(:)]);
contourf(X1(:), X2(:), Z(:))
Hope that helps.

I know it's been a while but I will give it a try in case someone else will come up with that issue.
Assume we have a 2D training set so as to train an SVM model, in other words the feature space is a 2D space. We know that a kernel SVM model leads to a score (or decision) function of the form:
f(x) = sumi=1 to N(aiyik(x,xi)) + b
Where N is the number of support vectors, xi is the i -th support vector, ai is the estimated Lagrange multiplier and yi the associated class label. Values(scores) of decision function in way depict the distance of the observation x frοm the decision boundary.
Now assume that for every point (X,Y) in the 2D feature space we can find the corresponding score of the decision function. We can plot the results in the 3D euclidean space, where X corresponds to values of first feature vector f1, Y to values of second feature f2, and Z to the the return of decision function for every point (X,Y). The intersection of this 3D figure with the Z=0 plane gives us the decision boundary into the two-dimensional feature space. In other words, imagine that the decision boundary is formed by the (X,Y) points that have scores equal to 0. Seems logical right?
Now in MATLAB you can easily do that, by first creating a grid in X,Y space:
d = 0.02;
[x1Grid,x2Grid] = meshgrid(minimum_X:d:maximum_X,minimum_Y:d:maximum_Y);
d is selected according to the desired resolution of the grid.
Then for a trained model SVMModel find the scores of every grid's point:
xGrid = [x1Grid(:),x2Grid(:)];
[~,scores] = predict(SVMModel,xGrid);
Finally plot the decision boundary
figure;
contour(x1Grid,x2Grid,reshape(scores(:,2),size(x1Grid)),[0 0],'k');
Contour gives us a 2D graph where information about the 3rd dimension is depicted as solid lines in the 2D plane. These lines implie iso-response values, in other words (X,Y) points with same Z value. In our occasion contour gives us the decision boundary.
Hope I helped to make all that more clear. You can find very useful information and examples in the following links:
MATLAB's example
Representation of decision function in 3D space