Projecting new points on pca of second degree MATLAB

Projecting new points on pca of second degree MATLAB - matlab

I am trying to use PCA to visualize my implementation of k-means algorithm. I am following the tutorial on Principal Component Coefficients, Scores, and Variances in this link.
I am using the following command: [coeff,score,~]=pca(X'); where X is my data.
My data is a 30 by 455 matrix, that is 30 features with 455 samples. I have successfully used the score parameter to create a 2D plot for visualization purposes. Now I wish to project the 30 dimensional center to that plain. I have tried coeff*centers(:,1) but I do not understand if this is the correct usage.
How do I project a new 30 dimensional point to the 2D of the first vs the second pca components?

I assume that by centers(:, 1) you denote a new observation. To express this observation in the principal components you should write
[coeff, score, ~, ~, ~, mu]=pca(X'); %return the estimated mean "mu"
tmp = centers(:, 1) - mu'; %remove mean since pca() by default centers data
coeff' * tmp; % the new observation expressed in the principal components
Note that you have to subtract the mean since pca() by default centers the data. Also, note the transpose ' on coeff. In fact it should be inv(coeff), but since coeff is an orthogonal matrix we can use transpose instead.

Related

plotting histograms to visualize differences between two image features

I have extracted the features from a biometric image, and then applied histogram remapping on the extracted features. I found that this step has increased the recognition accuracy. It has reduced the distances between samples from the same person, and increased the distances between samples from different persons. When I used the histogram matlab function to plot the distribution of features after mapping, I found that all histogram are the same for all images and all persons. Is there any Matlab plot function which I can use to show the small differences between features from the same person, and the large differences between features from different persons, after the mapping step, compared to the differences between features before mapping?
The attached file presents examples. In this file, note the following please:
Images 21 and 22 are for the same parson, while image 63 is for a different person
Knn distance between features after mapping for images 21 & 22 is 394.3704
compared to 992.2379 between 21 & 63, and 993.2462 between 22 & 63. Although this difference in distances, the three histograms are the same.
Matlab codes:
% to draw the histogram of the filtered image
histogram(filtered_image22)
% to measure knn distance
[a b]=size(filtered_image21)
filtered_image21_vector=reshape(filtered_image21, [1 a*b]);
[x z]=size(filtered_image63)
filtered_image63_vector=reshape(filtered_image63, [1 x*z]);
[idx D]=knnsearch(filtered_image21_vector,filtered_image63_vector)
%knnsearch(x,y) searches for the nearest neighbor (i.e., the closest
%point, row, or observation) in x to each point (i.e., row or observation)
%in the query data y using an exhaustive search or a Kd-tree.
%knnsearch returns Idx, which is a column vector of the indices representing the nearest neighbors.
% and D whcih contains the distances between each observation in Y that correspond to the closest

Generate random samples from arbitrary discrete probability density function in Matlab

I've got an arbitrary probability density function discretized as a matrix in Matlab, that means that for every pair x,y the probability is stored in the matrix:
A(x,y) = probability
This is a 100x100 matrix, and I would like to be able to generate random samples of two dimensions (x,y) out of this matrix and also, if possible, to be able to calculate the mean and other moments of the PDF. I want to do this because after resampling, I want to fit the samples to an approximated Gaussian Mixture Model.
I've been looking everywhere but I haven't found anything as specific as this. I hope you may be able to help me.
Thank you.

If you really have a discrete probably density function defined by A (as opposed to a continuous probability density function that is merely described by A), you can "cheat" by turning your 2D problem into a 1D problem.
%define the possible values for the (x,y) pair
row_vals = [1:size(A,1)]'*ones(1,size(A,2)); %all x values
col_vals = ones(size(A,1),1)*[1:size(A,2)]; %all y values
%convert your 2D problem into a 1D problem
A = A(:);
row_vals = row_vals(:);
col_vals = col_vals(:);
%calculate your fake 1D CDF, assumes sum(A(:))==1
CDF = cumsum(A); %remember, first term out of of cumsum is not zero
%because of the operation we're doing below (interp1 followed by ceil)
%we need the CDF to start at zero
CDF = [0; CDF(:)];
%generate random values
N_vals = 1000; %give me 1000 values
rand_vals = rand(N_vals,1); %spans zero to one
%look into CDF to see which index the rand val corresponds to
out_val = interp1(CDF,[0:1/(length(CDF)-1):1],rand_vals); %spans zero to one
ind = ceil(out_val*length(A));
%using the inds, you can lookup each pair of values
xy_values = [row_vals(ind) col_vals(ind)];
I hope that this helps!
Chip

I don't believe matlab has built-in functionality for generating multivariate random variables with arbitrary distribution. As a matter of fact, the same is true for univariate random numbers. But while the latter can be easily generated based on the cumulative distribution function, the CDF does not exist for multivariate distributions, so generating such numbers is much more messy (the main problem is the fact that 2 or more variables have correlation). So this part of your question is far beyond the scope of this site.
Since half an answer is better than no answer, here's how you can compute the mean and higher moments numerically using matlab:
%generate some dummy input
xv=linspace(-50,50,101);
yv=linspace(-30,30,100);
[x y]=meshgrid(xv,yv);
%define a discretized two-hump Gaussian distribution
A=floor(15*exp(-((x-10).^2+y.^2)/100)+15*exp(-((x+25).^2+y.^2)/100));
A=A/sum(A(:)); %normalized to sum to 1
%plot it if you like
%figure;
%surf(x,y,A)
%actual half-answer starts here
%get normalized pdf
weight=trapz(xv,trapz(yv,A));
A=A/weight; %A normalized to 1 according to trapz^2
%mean
mean_x=trapz(xv,trapz(yv,A.*x));
mean_y=trapz(xv,trapz(yv,A.*y));
So, the point is that you can perform a double integral on a rectangular mesh using two consecutive calls to trapz. This allows you to compute the integral of any quantity that has the same shape as your mesh, but a drawback is that vector components have to be computed independently. If you only wish to compute things which can be parametrized with x and y (which are naturally the same size as you mesh), then you can get along without having to do any additional thinking.
You could also define a function for the integration:
function res=trapz2(xv,yv,A,arg)
if ~isscalar(arg) && any(size(arg)~=size(A))
error('Size of A and var must be the same!')
end
res=trapz(xv,trapz(yv,A.*arg));
end
This way you can compute stuff like
weight=trapz2(xv,yv,A,1);
mean_x=trapz2(xv,yv,A,x);
NOTE: the reason I used a 101x100 mesh in the example is that the double call to trapz should be performed in the proper order. If you interchange xv and yv in the calls, you get the wrong answer due to inconsistency with the definition of A, but this will not be evident if A is square. I suggest avoiding symmetric quantities during the development stage.

how to do clustering when the input is 3D matrix, MATLAB

i am having 3D matrix in which most of the values are zeros but there are some nonzeros values.
when I am plotting this 3D matrix in matlab I am getting plot like as below
here u can see there are two groups of points are nearer to each other(that's why the color became dark) and two individual group of points is far away....
so my objective is to cluster that two nearer group of points and make it as one cluster1 and other two will be called as cluster2 and cluster3 ....
I tried kmeans clustering, BIC clustering...but as kmeans clustering is basically build up for 2D data input, I faced hurdle there ...then I reshape 3D matrix into 2D matrix but still I am getting another error Subscripted assignment dimension mismatch
so could u plz come out with some fruitful idea to do this......

Based on your comment that you used vol3d I assume that your data has to interpreted this way. If your data-matrix is called M, try
[A,B,C] = ind2sub(size(M),find(M));
points = [A,B,C];
idx = kmeans(points,3);
Here, I assumed that M(i,j,k) = 1 means that you have measured a point with properties i,j and k, which in your case would be velocity, angle and range.

Matlab PCA order of principal components

So I read the documentation on pca and it stated that the columns are organized in descending order of their variance. However, whenever I take the PCA of an example and I take the variance of the PCA matrix I get no specific order. A simple example of this is example:
pc = pca(x)
Which returns
pc =
0.0036 -0.0004
0.0474 -0.0155
0.3149 0.3803
0.3969 -0.1930
0.3794 0.3280
0.5816 -0.2482
0.3188 0.1690
-0.1343 0.7835
0.3719 0.0785
0.0310 -0.0110
Meaning column one should be PC1 and column two should be PC2 meaning var(PC1) > var(PC2), but when I get the variance this is clearly not the case.
var(pc)
ans =
0.0518 0.0932
Can anyone shed light into why the variance of PC1 is not the largest?

The docs state that calling
COEFF = pca(x)
will return a p-by-p matrix, so your result is rather surprising (EDIT: this is because your x data set has so few rows compared to columns (i.e. similar to having 10 unknowns and only 3 equations)). Either way when they talk about variance They don't mean the variance of the coefficients of each component but rather the variance of the x data columns after being projected on to each principal component. The docs state that the output score holds these projections and so to see the descending variance you should be doing:
[COEFF, score, latent] = pca(x)
var(score)
and you will see that var(score) equals the third output latent and is indeed in descending order.
Your misunderstanding is that you are trying to calculate the variance of the coefficients of the principal component vectors. These are just unit vectors describing the direction of the hyperplane on which to project your data such that the resulting projected data has maximum variance. These vectors ARE arranged in an order such that your original data projected onto the hyperplane that each describes will be in descending order of variance, but variance of the projected data (score) and NOT of the coefficients of the principal component vectors (COEFF or in your code pc).

How to use matlab contourf to draw two-dimensional decision boundary

I finished an SVM training and got data like X, Y. X is the feature matrix only with 2 dimensions, and Y is the classification labels. Because the data is only in two dimensions, so I would like to draw a decision boundary to show the surface of support vectors.
I use contouf in Matlab to do the trick, but really find it hard to understand how to use the function.
I wrote like:
#1 try:
contourf(X);
#2 try:
contourf([X(:,1) X(:,2) Y]);
#3 try:
Z(:,:,1)=X(Y==1,:);
Z(:,:,2)=X(Y==2,:);
contourf(Z);
all these things do not correctly. And I checked the Matlab help files, most of them make Z as a function, so I really do not know how to form the correct Z matrix.

If you're using the svmtrain and svmclassify commands from Bioinformatics Toolbox, you can just use the additional input argument (...'showplot', true), and it will display a scatter plot with a decision boundary and the support vectors highlighted.
If you're using your own SVM, or a third-party tool such as libSVM, what you probably need to do is to:
Create a grid of points in your 2D input feature space using the meshgrid command
Classify those points using your trained SVM
Plot the grid of points and the classifications using contourf.
For example, in kind-of-MATLAB-but-pseudocode, assuming your input features are called X1 and X2:
numPtsInGrid = 100;
x1Range = linspace(x1lower, x1upper, numPtsInGrid);
x2Range = linspace(x2lower, x2upper, numPtsInGrid);
[X1, X2] = meshgrid(x1Range, x2Range);
Z = classifyWithMySVMSomehow([X1(:), X2(:)]);
contourf(X1(:), X2(:), Z(:))
Hope that helps.

I know it's been a while but I will give it a try in case someone else will come up with that issue.
Assume we have a 2D training set so as to train an SVM model, in other words the feature space is a 2D space. We know that a kernel SVM model leads to a score (or decision) function of the form:
f(x) = sumi=1 to N(aiyik(x,xi)) + b
Where N is the number of support vectors, xi is the i -th support vector, ai is the estimated Lagrange multiplier and yi the associated class label. Values(scores) of decision function in way depict the distance of the observation x frοm the decision boundary.
Now assume that for every point (X,Y) in the 2D feature space we can find the corresponding score of the decision function. We can plot the results in the 3D euclidean space, where X corresponds to values of first feature vector f1, Y to values of second feature f2, and Z to the the return of decision function for every point (X,Y). The intersection of this 3D figure with the Z=0 plane gives us the decision boundary into the two-dimensional feature space. In other words, imagine that the decision boundary is formed by the (X,Y) points that have scores equal to 0. Seems logical right?
Now in MATLAB you can easily do that, by first creating a grid in X,Y space:
d = 0.02;
[x1Grid,x2Grid] = meshgrid(minimum_X:d:maximum_X,minimum_Y:d:maximum_Y);
d is selected according to the desired resolution of the grid.
Then for a trained model SVMModel find the scores of every grid's point:
xGrid = [x1Grid(:),x2Grid(:)];
[~,scores] = predict(SVMModel,xGrid);
Finally plot the decision boundary
figure;
contour(x1Grid,x2Grid,reshape(scores(:,2),size(x1Grid)),[0 0],'k');
Contour gives us a 2D graph where information about the 3rd dimension is depicted as solid lines in the 2D plane. These lines implie iso-response values, in other words (X,Y) points with same Z value. In our occasion contour gives us the decision boundary.
Hope I helped to make all that more clear. You can find very useful information and examples in the following links:
MATLAB's example
Representation of decision function in 3D space