PCA of Ovarian Cancer Data via SVD - matlab

I want to analyze the Ovarian Cancer Data provided by MATLAB with the PCA. Specifically, I want to visualize the two largest Principal Components, and draw the two corresponding left singular vectors. As I understand, those vectors should be able to serve as a new coordinate-system, aligned towards the largest variance in the data. What I ultimately want to examine is if the cancer patients are distinguishable from the non-cancer patients.
Something that is still wrong in my script are the left singular vectors. They are not in a 90 degree angle to each other, and if I scale them by the respective eigenvalues, they explode in length. What am I doing wrong?
%% PCA - Ovarian Cancer Data
close all;
clear all;
% obs is an NxM matrix, where ...
% N = patients (216)
% M = features - genes in this case (4000)
load ovariancancer.mat;
% Turn obs matrix, such that the rows represent the features
X = obs.';
[U, S, V] = svd(X, 'econ');
% Crop U, S and V, to visualize two largest principal components
U_crop = U(:, 1:2);
S_crop = S(1:2, 1:2);
V_crop = V(:, 1:2);
X_crop = U_crop * S_crop * V_crop.';
% Average over cancer patients
xC = mean(X_crop, 2);
% Visualize two largest principal components as a data cloud
figure;
hold on;
for i = 1 : size(X, 2)
if grp{i} == 'Cancer'
plot(X_crop(1, i), X_crop(2, i), 'rx', 'LineWidth', 2);
else
plot(X_crop(1, i), X_crop(2, i), 'bo', 'LineWidth', 2);
end
end
%scatter(X_crop(1, :), X_crop(2, :), 'k.', 'LineWidth', 2)
set(gca,'DataAspectRatio',[1 1 1])
xlabel('PC1')
ylabel('PC2')
grid on;
Xstd = U_crop; % * S_crop?
quiver([xC(1) xC(1)], [xC(2) xC(2)], Xstd(1, :), Xstd(2, :), 'green', 'LineWidth', 3);

So there were multiple mistakes in my script. In case anyone is interested, I am posting the corrected code (I am plotting three PCs now). This post was very helpful.
% obs is an NxM matrix, where ...
% N = patients (216)
% M = features - genes in this case (4000)
load ovariancancer.mat;
% Let the data matrix X be of n×p size, where n is the number of samples and p is the number of variables
X = obs;
% Let us assume that it is centered, i.e. column means have been subtracted and are now equal to zero
Xavg = mean(X, 2);
%X = X - Xavg * ones(1, size(X, 2));
[U, S, V] = svd(X, 'econ');
PC = U * S;
% Visualize three largest principal components as a data cloud
% The j-th principal component is given by j-th column of XV. The coordinates of the i-th data point in the new PC space are given by the i-th row of XV
figure;
for i = 1 : size(PC, 2)
if grp{i} == 'Cancer'
plot3(PC(i, 1), PC(i, 2), PC(i, 3), 'rx', 'LineWidth', 2);
else
plot3(PC(i, 1), PC(i, 2), PC(i, 3), 'bo', 'LineWidth', 2);
end
hold on;
end
set(gca,'DataAspectRatio',[1 1 1])
xlabel('PC1')
ylabel('PC2')
zlabel('PC3')

Related

Volumetric 3D data plotting from 2D map in MATLAB?

I have a heat map
and want to convert this 2D matrix to a 3D volume/shape/surface data points for further processing. Not simply display it in 3D using surf.
What would be a good way to do this?
With a lot of help from this community I could come closer:
I shrunk the size to 45x45 px for simplicity.
I = (imread("TESTGREYPLASTIC.bmp"))./2+125;
Iinv = 255-(imread("TESTGREYPLASTIC.bmp"))./2-80;%
for i = 1:45
for j = 1:45
A(i, j, I(i,j) ) = 1;
A(i, j, Iinv(i,j) ) = 1;
end
end
volshow(A)
Its not ideal but the matrix is what I wanted now. Maybe the loop can be improved to run faster when dealing with 1200x1200 points.
How do I create a real closed surface now?
Following your conversation with #BoilermakerRV, I guess you are looking for one of the following two results:
A list of 3d points, where x and y are index of pixels in the image, and z is value of corresponding pixels. The result will be an m*n by 3 matrix.
An m by n by 256 volume of zeros and ones, that for (i,j)-th pixel in the image, all voxels of the (i, j)-the pile of the volume are 0, except the one at I(i, j).
Take a look at the following example that generates both results:
close all; clc; clear variables;
I = rgb2gray(imread('data2.png'));
imshow(I), title('Data as image')
% generating mesh grid
[m, n] = size(I);
[X, Y] = meshgrid(1:n, 1:m);
% converting image to list of 3-d points
P = [Y(:), X(:), I(:)];
figure
scatter3(P(:, 1), P(:, 2), P(:, 3), 3, P(:, 3), '.')
colormap jet
title('Same data as a list of points in R^3')
% converting image to 256 layers of voxels
ind = sub2ind([m n 256], Y(:), X(:), I(:));
V = zeros(m, n, 256);
V(ind) = 1.0;
figure
h = slice(V, [250], [250], [71]) ;
[h.EdgeColor] = deal('none');
colormap winter
camlight
title('And finally, as a matrix of 0/1 voxels')
The contour plot that is shown can't be generated with "2D" data. It requires three inputs as follows:
[XGrid,YGrid] = meshgrid(-4:.1:4,-4:.1:4);
C = peaks(XGrid,YGrid);
contourf(XGrid,YGrid,C,'LevelStep',0.1,'LineStyle','none')
colormap('gray')
axis equal
Where XGrid, YGrid and C are all NxN matrices defining the X values, Y values and Z values for every point, respectively.
If you want this to be "3D", simply use surf:
surf(XGrid,YGrid,C)

Matlab animation of several points simultaneously

I am trying to simulate the trajectories of a few particles in 2D on Matlab. I have the x- and y- coordinates of these particles as a function of time, which I store as matrix x and y. The column in both x and y corresponds to the time, while the row corresponds to the particle number: 1, 2, etc.
I know how to do the animation for one particle with pause, but I am not sure how to customize the code for multiple particles' trajectories. Basically, my idea is that on the initial plot, I have 3 markers which correspond to the initial position of the particles, say particle A, B and C. Then, I would like to follow the movement of these 3 markers, and here is where I encountered the problem: I don't know how to sort the subsequent points according to the particle identity. For example, I want to specify the first point I plot in the second time point as particle A, second point as particle B and third point in particle C.
I have done something similar to this, but in my simulation, the number of particles may be 100, which makes it impractical to create x1, x2, ..., x100, y1, y2, ..., y100 for the animation to work:
y = rand(3, 20); % Generate random sample data.
x = rand(size(y, 1), size(y, 2));
% Now we have x and y sample data and we can begin.
% Extract into separate arrays
x1 = sort(x(1,:));
x2 = sort(x(2,:));
x3 = sort(x(3,:));
y1 = y(1,:);
y2 = y(2,:);
y3 = y(3,:);
for k = 1 : length(x1)
plot(x1(1:k), y1(1:k), 'r*-', 'LineWidth', 2);
xlim([min(x(:)), max(x(:))]);
ylim([min(y(:)), max(y(:))]);
grid on;
hold on;
plot(x2(1:k), y2(1:k), 'g*-', 'LineWidth', 2);
plot(x3(1:k), y3(1:k), 'b*-', 'LineWidth', 2);
hold off;
fprintf('Plotted points 1 through %d\n', k);
pause(0.8);
end
Any ideas or suggestions will be greatly appreciated!
In order to plot all graphs at once, we might make an 2D array.
Below is an example.
y = rand(3, 20); % Generate random sample data.
x = rand(size(y, 1), size(y, 2));
% Now we have x and y sample data and we can begin.
% Extract into separate arrays
x = sort(x');
y=y';
M=size(x);
N=M(2);
for k = 1 : length(x)
if k==1;
zeroPad=zeros(1,N);
x0=[zeroPad;x(1,1:N)];
y0=[zeroPad;y(1,1:N)];
plot(x0(1:2,1:N), y0(1:2,1:N), '*', 'LineWidth', 2);
else
plot(x(1:k,1:N), y(1:k,1:N), '*-', 'LineWidth', 2);
end
xlim([min(x(:)), max(x(:))]);
ylim([min(y(:)), max(y(:))]);
grid on;
fprintf('Plotted points 1 through %d\n', k);
pause(0.8);
end
One trick was added.
At the first iteration, I added zeros before x and y.
Some unnecessary codes were removed.

Why does me changing the time have an opposite effect on my output graph in MATLAB?

Here is my code:
clear all;
%% Load the earthquake data file
load ECE350_Earthquake_Demo.mat
tearth = 0:dt:(length(d)-1)*dt;
t1 = tearth';
%% Play the sound of the earthquake
sound(d, fs)
figure,subplot(3, 1, 1); % 3 subplots in a 3x1 matrix
plot(t1,d) %% plots f(t)
title('First Subplot f(t)')
subplot(3, 1, 2);
plot(t1*2, d) %% plots f(2t)
title('Second Subplot f(2t)')
subplot(3, 1, 3);
plot(t1*(1/2), d) %% plots f(t/2)
title('Third Subplot f(t/2)')
xlim([0 20]);
orient landscape
delete 'Demo_plot1.pdf'
print -dpdf 'Demo_plot1'
This code loads in an earthquake data file and plots the output onto a graph.
I am to plot three different subplots vertically, and plot f(t), f(2t), and f(t/2) respectively.
f(2t) should compress the graph, and f(t/2) should expand the graph, naturally.
My code does the opposite - f(2t) compresses, and f(t/2) expands (t1*2 and t1/2 is how I am implementing this).
The output format is fine, and everything works. These two graphs are just switched.
Why is this?
Here is a clean way to see that f(2t) really does compress functions in MATLAB, just like you think it should:
t = 0:.1:2*pi;
figure
hold on
plot(t, sin(t))
plot(t, sin(2*t))
plot(t, sin(t/2))
legend({'sin(t)', 'sin(2t)', 'sin(t/2)'})
In my example, this works nicely because sin is continuous: it can take any value of t as input. In your case, things are a bit more complicated because d is discrete: there is d(1), d(2), ..., but there is no d(.5).
If you "just want to see the right plots", let's think of d as being samples of a continuous function f, such that d(n) = f(dt * n) for integers n. Then for the plots, pick your ts so that you never need values of f between the ones you have:
t2 = t1 * 2;
plot(t2, d)
plots f(t/2) because when we plot the ith point, the t value is t2(i) = t1(i) * 2 and the f(t) value is d(i) = f(dt * i) = f(t1(i)).
t3 = t1 / 2;
plot(t3, d)
plots f(t * 2) because when we plot the ith point, the t value is t3(i) = t1(i) / 2 and the f(t) value is d(i) = f(dt * i) = f(t1(i)).

MATLAB - Smooth heat map from (x, y, z) points within a triangle?

I have many 3D scatter points (x, y, z) that are guaranteed to be within a triangle. I now wish to visualize z as one smooth 2D heat map, where positions are given by (x, y).
I can easily do it with meshgrid and mesh, if (x, y) together form a rectangle. Because I don't want anything falling outside of my triangle, I can't use griddate either.
Then how?
MWE
P = [0 1/sqrt(3); 0.5 -0.5/sqrt(3); -0.5 -0.5/sqrt(3)];
% Vertices
scatter(P(:, 1), P(:, 2), 100, 'ro');
hold on;
% Edges
for idx = 1:size(P, 1)-1
plot([P(idx, 1) P(idx+1, 1)], [P(idx, 2) P(idx+1, 2)], 'r');
end
plot([P(end, 1) P(1, 1)], [P(end, 2) P(1, 2)], 'r');
% Sample points within the triangle
N = 1000; % Number of points
t = sqrt(rand(N, 1));
s = rand(N, 1);
sample_pts = (1-t)*P(1, :)+bsxfun(#times, ((1-s)*P(2, :)+s*P(3, :)), t);
% Colors for demo
C = ones(size(sample_pts, 1), 1).*sample_pts(:, 1);
% Scatter sample points
scatter(sample_pts(:, 1), sample_pts(:, 2), [], C, 'filled');
colorbar;
produces
PS
As suggested by Nitish, increasing number of points will do the trick. But is there a more computationally cheap way of doing so?
Triangulate your 2D data points using delaunayTriangulation, evaluate your function with the points of the triangulation and then plot the resulting surface using trisurf:
After %Colors for demo, add this:
P = [P; sample_pts]; %// Add the edgepoints to the sample points, so we get a triangle.
f = #(X,Y) X; %// Defines the function to evaluate
%// Compute the triangulation
dt = delaunayTriangulation(P(:,1),P(:,2));
%// Plot a trisurf
P = dt.Points;
trisurf(dt.ConnectivityList, ...
P(:,1), P(:,2), f(P(:,1),P(:,2)), ...
'EdgeColor', 'none', ...
'FaceColor', 'interp', ...
'FaceLighting', 'phong');
%// A finer colormap gives more beautiful results:
colormap(jet(2^14)); %// Or use 'parula' instead of 'jet'
view(2);
The trick to make this graphic beautiful is to use 'FaceLighting','phong' instead of 'gouraud' and use a denser colormap than is usually used.
The following uses only N = 100 sample points, but a fine colormap (using the now default parula colormap):
In comparison the default output for:
trisurf(dt.ConnectivityList, ...
P(:,1), P(:,2), f(P(:,1),P(:,2)), ...
'EdgeColor', 'none', ...
'FaceColor', 'interp');
looks really ugly: (I'd say mainly because of the odd interpolation, but the jet colormap also has its downsides)
Why not just increase N to make the grid "more smooth"? It will obviously be more computationally expensive but is probably better than extrapolation. Since this is a simulation where s and t are your inputs, you can alternately create a fine grids for them (depending on how they interact).
P = [0 1/sqrt(3); 0.5 -0.5/sqrt(3); -0.5 -0.5/sqrt(3)];
% Vertices
scatter(P(:, 1), P(:, 2), 100, 'ro');
hold on;
% Edges
for idx = 1:size(P, 1)-1
plot([P(idx, 1) P(idx+1, 1)], [P(idx, 2) P(idx+1, 2)], 'r');
end
plot([P(end, 1) P(1, 1)], [P(end, 2) P(1, 2)], 'r');
% Sample points within the triangle
N = 100000; % Number of points
t = sqrt(rand(N, 1));
s = rand(N, 1);
sample_pts = (1-t)*P(1, :)+bsxfun(#times, ((1-s)*P(2, :)+s*P(3, :)), t);
% Colors for demo
C = ones(size(sample_pts, 1), 1).*sample_pts(:, 1);
% Scatter sample points
scatter(sample_pts(:, 1), sample_pts(:, 2), [], C, 'filled');
colorbar;

PCA generated initial matrix in gaussian and ellipse?

I have to do PCA in Matlab for object recognition.
For now, I generate matrix randomly
[a,InputMatrix] = sort(rand(100,20)); %Rows=100 Columns=20
Average=mean(InputMatrix);
CovarianceMatrix= cov(InputMatrix);
%% Compute the Eigen Value and Eigen the Vector
[EigVector,EigValue] = eigs(Matlab_Covariance);
NewMatrix=(EigVector)*(EigValue)*(EigVector)';
e1=EigVector(:,1); % Get the all the row at the first column
e2=EigVector(:,2); % Get the all the row at the second column
%% Plotting The Matrix with Eigen Value and Eigen Vector
%creating all combinations of x and y coordinates
[x,y]=meshgrid(1:size(InputMatrix,2),1:size(InputMatrix,1)); % 2= Columns 1= Rows
x=x(:);
y=y(:);
%plotting values of A such that X-Y axis represent the column and row coordinates of A
%respectively. Z-axis represents the value at that coordinate.
scatter3(x,y,InputMatrix(:),30,'rx');
%plotting the mean at the center of the coordinate system
hold on;
scatter3(mean([1:size(InputMatrix,2)]),mean([1:size(InputMatrix,1)]),
mean2(InputMatrix),60,'go','filled');
plot(e1,'k--');
plot(e2,'k--');
But if I perform PCA in that Random Matrix (InputMatrix), the shape of eigen vector e1 and e2 that I get for the PCA result will be wrong (when I plot them with InputMatrix in the same figure).
Someone told me that for the input matrix / data, it should be fulfill the condition (to be distributed in Normal Gaussian) and in elipse shape (when I plot it).
I think, I have to do rotation, scalling and other things to do it..
But I dont understand..
Could smeone please help me to generated random matrix with Normal Gaussian and in ellipse shape??
Please.. help me T_T
This can be achieved by multiplying hidden components matrix to noise vectors, i. e., using underlying ICA model. To get higher dimensionality just change cnum.
close all; clear all;
cnum = 2;
nnum = 500;
C = rand(cnum, cnum); % hidden components
N1 = sort(rand(cnum, nnum)); % sorted uniform noise
D1 = C * N1; % data
N2 = rand(cnum, nnum); % uniform noise
D2 = C * N2; % data
N3 = randn(cnum, nnum); % Gaussian noise
D3 = C * N3; % data
[V1, R1] = eig(cov(D3'));
[V2, R2] = eig(cov(D3'));
[V3, R3] = eig(cov(D3'));
subplot(1, 3, 1);
axis equal
hold on
plot(D1(1,:), D1(2,:), '.');
line([C(1, 1) 0 C(1, 2)], [C(2, 1) 0 C(2, 2)], 'Color', [1 .0 .0])
line([V1(1, 1) 0 V1(1, 2)], [V1(2, 1) 0 V1(2, 2)], 'Color', [.0 .0 .0])
title('Sorted uniform')
subplot(1, 3, 2);
axis equal
hold on
plot(D2(1,:), D2(2,:), '.');
line([C(1, 1) 0 C(1, 2)], [C(2, 1) 0 C(2, 2)], 'Color', [1 .0 .0])
line([V2(1, 1) 0 V2(1, 2)], [V2(2, 1) 0 V2(2, 2)], 'Color', [.0 .0 .0])
title('Uniform')
subplot(1, 3, 3);
axis equal
hold on
plot(D3(1,:), D3(2,:), '.');
line([C(1, 1) 0 C(1, 2)], [C(2, 1) 0 C(2, 2)], 'Color', [1 .0 .0])
line([V3(1, 1) 0 V3(1, 2)], [V3(2, 1) 0 V3(2, 2)], 'Color', [.0 .0 .0])
title('Gaussian')
print('-dpng', 'pca.png')
Red lines represent hidden components and black lines represent PCA components.