I have to do PCA in Matlab for object recognition.
For now, I generate matrix randomly
[a,InputMatrix] = sort(rand(100,20)); %Rows=100 Columns=20
Average=mean(InputMatrix);
CovarianceMatrix= cov(InputMatrix);
%% Compute the Eigen Value and Eigen the Vector
[EigVector,EigValue] = eigs(Matlab_Covariance);
NewMatrix=(EigVector)*(EigValue)*(EigVector)';
e1=EigVector(:,1); % Get the all the row at the first column
e2=EigVector(:,2); % Get the all the row at the second column
%% Plotting The Matrix with Eigen Value and Eigen Vector
%creating all combinations of x and y coordinates
[x,y]=meshgrid(1:size(InputMatrix,2),1:size(InputMatrix,1)); % 2= Columns 1= Rows
x=x(:);
y=y(:);
%plotting values of A such that X-Y axis represent the column and row coordinates of A
%respectively. Z-axis represents the value at that coordinate.
scatter3(x,y,InputMatrix(:),30,'rx');
%plotting the mean at the center of the coordinate system
hold on;
scatter3(mean([1:size(InputMatrix,2)]),mean([1:size(InputMatrix,1)]),
mean2(InputMatrix),60,'go','filled');
plot(e1,'k--');
plot(e2,'k--');
But if I perform PCA in that Random Matrix (InputMatrix), the shape of eigen vector e1 and e2 that I get for the PCA result will be wrong (when I plot them with InputMatrix in the same figure).
Someone told me that for the input matrix / data, it should be fulfill the condition (to be distributed in Normal Gaussian) and in elipse shape (when I plot it).
I think, I have to do rotation, scalling and other things to do it..
But I dont understand..
Could smeone please help me to generated random matrix with Normal Gaussian and in ellipse shape??
Please.. help me T_T
This can be achieved by multiplying hidden components matrix to noise vectors, i. e., using underlying ICA model. To get higher dimensionality just change cnum.
close all; clear all;
cnum = 2;
nnum = 500;
C = rand(cnum, cnum); % hidden components
N1 = sort(rand(cnum, nnum)); % sorted uniform noise
D1 = C * N1; % data
N2 = rand(cnum, nnum); % uniform noise
D2 = C * N2; % data
N3 = randn(cnum, nnum); % Gaussian noise
D3 = C * N3; % data
[V1, R1] = eig(cov(D3'));
[V2, R2] = eig(cov(D3'));
[V3, R3] = eig(cov(D3'));
subplot(1, 3, 1);
axis equal
hold on
plot(D1(1,:), D1(2,:), '.');
line([C(1, 1) 0 C(1, 2)], [C(2, 1) 0 C(2, 2)], 'Color', [1 .0 .0])
line([V1(1, 1) 0 V1(1, 2)], [V1(2, 1) 0 V1(2, 2)], 'Color', [.0 .0 .0])
title('Sorted uniform')
subplot(1, 3, 2);
axis equal
hold on
plot(D2(1,:), D2(2,:), '.');
line([C(1, 1) 0 C(1, 2)], [C(2, 1) 0 C(2, 2)], 'Color', [1 .0 .0])
line([V2(1, 1) 0 V2(1, 2)], [V2(2, 1) 0 V2(2, 2)], 'Color', [.0 .0 .0])
title('Uniform')
subplot(1, 3, 3);
axis equal
hold on
plot(D3(1,:), D3(2,:), '.');
line([C(1, 1) 0 C(1, 2)], [C(2, 1) 0 C(2, 2)], 'Color', [1 .0 .0])
line([V3(1, 1) 0 V3(1, 2)], [V3(2, 1) 0 V3(2, 2)], 'Color', [.0 .0 .0])
title('Gaussian')
print('-dpng', 'pca.png')
Red lines represent hidden components and black lines represent PCA components.
Related
I want to analyze the Ovarian Cancer Data provided by MATLAB with the PCA. Specifically, I want to visualize the two largest Principal Components, and draw the two corresponding left singular vectors. As I understand, those vectors should be able to serve as a new coordinate-system, aligned towards the largest variance in the data. What I ultimately want to examine is if the cancer patients are distinguishable from the non-cancer patients.
Something that is still wrong in my script are the left singular vectors. They are not in a 90 degree angle to each other, and if I scale them by the respective eigenvalues, they explode in length. What am I doing wrong?
%% PCA - Ovarian Cancer Data
close all;
clear all;
% obs is an NxM matrix, where ...
% N = patients (216)
% M = features - genes in this case (4000)
load ovariancancer.mat;
% Turn obs matrix, such that the rows represent the features
X = obs.';
[U, S, V] = svd(X, 'econ');
% Crop U, S and V, to visualize two largest principal components
U_crop = U(:, 1:2);
S_crop = S(1:2, 1:2);
V_crop = V(:, 1:2);
X_crop = U_crop * S_crop * V_crop.';
% Average over cancer patients
xC = mean(X_crop, 2);
% Visualize two largest principal components as a data cloud
figure;
hold on;
for i = 1 : size(X, 2)
if grp{i} == 'Cancer'
plot(X_crop(1, i), X_crop(2, i), 'rx', 'LineWidth', 2);
else
plot(X_crop(1, i), X_crop(2, i), 'bo', 'LineWidth', 2);
end
end
%scatter(X_crop(1, :), X_crop(2, :), 'k.', 'LineWidth', 2)
set(gca,'DataAspectRatio',[1 1 1])
xlabel('PC1')
ylabel('PC2')
grid on;
Xstd = U_crop; % * S_crop?
quiver([xC(1) xC(1)], [xC(2) xC(2)], Xstd(1, :), Xstd(2, :), 'green', 'LineWidth', 3);
So there were multiple mistakes in my script. In case anyone is interested, I am posting the corrected code (I am plotting three PCs now). This post was very helpful.
% obs is an NxM matrix, where ...
% N = patients (216)
% M = features - genes in this case (4000)
load ovariancancer.mat;
% Let the data matrix X be of n×p size, where n is the number of samples and p is the number of variables
X = obs;
% Let us assume that it is centered, i.e. column means have been subtracted and are now equal to zero
Xavg = mean(X, 2);
%X = X - Xavg * ones(1, size(X, 2));
[U, S, V] = svd(X, 'econ');
PC = U * S;
% Visualize three largest principal components as a data cloud
% The j-th principal component is given by j-th column of XV. The coordinates of the i-th data point in the new PC space are given by the i-th row of XV
figure;
for i = 1 : size(PC, 2)
if grp{i} == 'Cancer'
plot3(PC(i, 1), PC(i, 2), PC(i, 3), 'rx', 'LineWidth', 2);
else
plot3(PC(i, 1), PC(i, 2), PC(i, 3), 'bo', 'LineWidth', 2);
end
hold on;
end
set(gca,'DataAspectRatio',[1 1 1])
xlabel('PC1')
ylabel('PC2')
zlabel('PC3')
I have to plot a histogram of each column of MatrixE1. How can I go about doing this? This is what I have written so far.
% Create a random 5 x 3 matrix filled with random values between 0 and 10
a0 = 0;
b0 = 10;
r = a0 + (b0-a0).*rand(1,1);
matrixA = [randi([0 10]) randi([0 10]) randi([0 10]); randi([0 10]) randi([0 10]) randi([0 10]); randi([0 10]) randi([0 10]) randi([0 10]); randi([0 10]) randi([0 10]) randi([0 10]); randi([0 10]) randi([0 10]) randi([0 10])]
% Create identity matrix 3 x 3
matrixB = eye(3,3)
% Create new submatrix of A with the last 3 rows
matrixC = matrixA(end-2 : end, :)
% Pair wise multiplication of C and B
matrixD = times(matrixC, matrixB)
% Concatenate Matrix A and D
matrixE1 = [matrixA ; matrixD]
% Plot histogram of columns.
matrixColumn1 = matrixE1(1 : end , end-2: end-2);
matrixFColumn2 = matrixE1(1 : end, end -1 : end-1);
matrixFColumn3 = matrixE1(1 : end, end : end);
You can access each of your coloumns in matrixE1 like this:
firstCol = matrixE1(:,1);
secondCol = matrixE1(:,2);
thirdCol = matrixE1(:,3);
...and then you can simply use comand hist() to plot histograms. You would plot histogram of first coloumn in matrixE1 as:
hist(firstCol);
And if I understand your second question:
''What would I do? hist(??). How can I get one histogram of all the columns of matrixE1? Should I do hist(matrixE1)?''
You can simply use command hold on after ploting histogram of one coloumn. Then plot another histogram on the same plot. For example if you want to plot histogram of first and second coloumn from matrixE1 to the same plot, you would type:
hist(firstCol);
hold on;
hist(secondCol);
>> v1=randn(1000,1); % zero mean, unity stdev
>> v2=randn(1000,1)+1; % mean at one, unity stdev
>> V=[v1 v2]; % 1000 x 2 matrix
>> hist(V,100); % 100 bins
>> legend('v1', 'v2');
There is another, simpler but computationally more expensive way:
plotmatrix(A)
For any matrix A this will produce a m-by-n plot of scatterplots of all pairwise combinations of your input matrix (do not do this for large matrices, larger than you could fit on your screen).
What you gain on top are histograms along the main diagonal of the plotmatrix.
Adding this answer due to other answers (1, 2) using outdated function hist.
MATLAB recommends avoiding the use of hist and now favors histogram (source). The changeover is straightforward.
% MATLAB R2019a
% Sample Data
NumPoints = 2000;
a1 = 10*rand(NumPoints,1);
a2 = wblrnd(3,7,NumPoints,1);
a3 = 7 + 0.75*randn(NumPoints,1);
A = [a1 a2 a3]; % Data Matrix
% Implement Sturges Rule for n<=500, Scott's Rule for n>500
nbinsh =#(n) ceil((1 + 3.3*log10(n))*(n<=500) + ((5/3)*(n^(1/3)))*(n>500));
NumBins = nbinsh(NumPoints);
numCols = size(A,2);
% Plot
figure, hold on
for k = 1:numCols
histogram(A(:,k),'NumBins',NumBins,'DisplayName',['Col ' num2str(k)]);
end
legend('show')
You can adjust from frequency (counts) to probability or probability density function depending on the application needs with the Normalization property (see documentation here).
I am wanting to do conditional plotting of vertical lines, that change color based on the value of an integer vector. Those values are integers that range from 0-4.
Currently, I am using a loop to go through the tables to plot the lines. This works, but for LARGE amounts of data it takes time, and I'm wondering if it can be vectorized.
Attached is a stripped down version of the script to loop through a data vector(sample) that simply Loops through the vector, and plots a vertical line based on the value of the integer.
I will also attach the simple variable I created called 'SAMPLE' below to paste into your workspace.
for i=1:size(sample,1)
if sample(i)==1
line( [i i] ,[0 10], 'Marker','.','LineStyle','-','Color','r');
elseif sample(i)==2
line( [i i] ,[0 10], 'Marker','.','LineStyle','-','Color','b');
elseif sample(i)==3
line( [i i] ,[0 10], 'Marker','.','LineStyle','-','Color',[1 .5 0]);
elseif sample(i)==4
line( [i i] ,[0 10], 'Marker','.','LineStyle','-','Color','g');
end
end
Variable:
sample=[[3;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;4;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;2;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;3;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;4;0;0;0;0]];
But is is possible to 'vectorize' plotting in this way w/o having to do it iteratively in a loop as I have done?
Take advantage of the fact that when plotting a line, MATLAB will skip points whose value is NaN.
% Your vector
sample=[3;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;4;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;2;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;3;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;4;0;0;0;0];
% Your colors
colors = [
1 0 0
0 0 1
1 .5 0
0 1 0];
for idx = 1:4
% Find the index of each of your integers
X = find(sample == (idx));
% Force X to be a row vector
X = X(:)';
% Stack two X's on top of one another with a third row filled
% with NaNs. Fill in your Y values in the same way while
% you're at it.
Y = [zeros(size(X)); 10 + zeros(size(X)); nan(size(X))];
X = [X; X; nan(size(X))]; %#ok<AGROW>
% Matlab is column major. By using the colon here, you
% produce a vector that is [X1 X1 nan X2 X2 nan ... etc.]
X = X(:);
Y = Y(:);
% Draw the line
line(X, Y, 'Marker', '.', 'LineStyle', '-', 'Color', colors(idx, :))
end
There's still a loop, but now you're just looping over the possible values instead of looping over the each value in the vector. I think you will find that this will scale much better.
Changing input to:
sample = zeros(1, 1e6);
for idx = 1:4
sample(randi(1e6, 1, 1000)) = idx;
end
and benchmarking with timeit gives a time of 0.0065706 seconds on my machine, while the OP code benchmarks at 1.4861 seconds.
I'd change to something like:
colors=[1 0 0,
0 1 0,
1 0.5 0,
0 0 1];
nnsamples=samples(samples~=0);
for ii=1:size(nnsamples,1)
line( [ii ii] ,[0 10], 'Marker','.','LineStyle','-','Color',colors(nnsamples(ii),:));
end
I have many 3D scatter points (x, y, z) that are guaranteed to be within a triangle. I now wish to visualize z as one smooth 2D heat map, where positions are given by (x, y).
I can easily do it with meshgrid and mesh, if (x, y) together form a rectangle. Because I don't want anything falling outside of my triangle, I can't use griddate either.
Then how?
MWE
P = [0 1/sqrt(3); 0.5 -0.5/sqrt(3); -0.5 -0.5/sqrt(3)];
% Vertices
scatter(P(:, 1), P(:, 2), 100, 'ro');
hold on;
% Edges
for idx = 1:size(P, 1)-1
plot([P(idx, 1) P(idx+1, 1)], [P(idx, 2) P(idx+1, 2)], 'r');
end
plot([P(end, 1) P(1, 1)], [P(end, 2) P(1, 2)], 'r');
% Sample points within the triangle
N = 1000; % Number of points
t = sqrt(rand(N, 1));
s = rand(N, 1);
sample_pts = (1-t)*P(1, :)+bsxfun(#times, ((1-s)*P(2, :)+s*P(3, :)), t);
% Colors for demo
C = ones(size(sample_pts, 1), 1).*sample_pts(:, 1);
% Scatter sample points
scatter(sample_pts(:, 1), sample_pts(:, 2), [], C, 'filled');
colorbar;
produces
PS
As suggested by Nitish, increasing number of points will do the trick. But is there a more computationally cheap way of doing so?
Triangulate your 2D data points using delaunayTriangulation, evaluate your function with the points of the triangulation and then plot the resulting surface using trisurf:
After %Colors for demo, add this:
P = [P; sample_pts]; %// Add the edgepoints to the sample points, so we get a triangle.
f = #(X,Y) X; %// Defines the function to evaluate
%// Compute the triangulation
dt = delaunayTriangulation(P(:,1),P(:,2));
%// Plot a trisurf
P = dt.Points;
trisurf(dt.ConnectivityList, ...
P(:,1), P(:,2), f(P(:,1),P(:,2)), ...
'EdgeColor', 'none', ...
'FaceColor', 'interp', ...
'FaceLighting', 'phong');
%// A finer colormap gives more beautiful results:
colormap(jet(2^14)); %// Or use 'parula' instead of 'jet'
view(2);
The trick to make this graphic beautiful is to use 'FaceLighting','phong' instead of 'gouraud' and use a denser colormap than is usually used.
The following uses only N = 100 sample points, but a fine colormap (using the now default parula colormap):
In comparison the default output for:
trisurf(dt.ConnectivityList, ...
P(:,1), P(:,2), f(P(:,1),P(:,2)), ...
'EdgeColor', 'none', ...
'FaceColor', 'interp');
looks really ugly: (I'd say mainly because of the odd interpolation, but the jet colormap also has its downsides)
Why not just increase N to make the grid "more smooth"? It will obviously be more computationally expensive but is probably better than extrapolation. Since this is a simulation where s and t are your inputs, you can alternately create a fine grids for them (depending on how they interact).
P = [0 1/sqrt(3); 0.5 -0.5/sqrt(3); -0.5 -0.5/sqrt(3)];
% Vertices
scatter(P(:, 1), P(:, 2), 100, 'ro');
hold on;
% Edges
for idx = 1:size(P, 1)-1
plot([P(idx, 1) P(idx+1, 1)], [P(idx, 2) P(idx+1, 2)], 'r');
end
plot([P(end, 1) P(1, 1)], [P(end, 2) P(1, 2)], 'r');
% Sample points within the triangle
N = 100000; % Number of points
t = sqrt(rand(N, 1));
s = rand(N, 1);
sample_pts = (1-t)*P(1, :)+bsxfun(#times, ((1-s)*P(2, :)+s*P(3, :)), t);
% Colors for demo
C = ones(size(sample_pts, 1), 1).*sample_pts(:, 1);
% Scatter sample points
scatter(sample_pts(:, 1), sample_pts(:, 2), [], C, 'filled');
colorbar;
I have a set of correspondences between points in two different images, and I want to plot them as lines in an image obtained concatenating the two original images, in order to show those correspondences.
I have done the following:
function plotInliers(im1, im2, locs1, locs2, corr, inliers)
l1 = locs1(:, 1:2);
l1 = l1(corr(:, 1), :);
l2 = locs2(:, 1:2);
l2 = l2(corr(:, 2), :);
l2 = l2 + repmat([0 size(im1, 2)], size(l2, 1), 1);
im = horzcat(im1, im2);
figure
imshow(im)
hold on
% plot the correspondences: green inliers, red outliers
for ii = 1:size(corr, 1)
hold on
% Check if it is an inlier
if any(ii==inliers), color = 'g'; else color = 'r'; end
plot([l1(ii, 1) l1(11, 2)], [l2(ii, 1) l2(ii, 2)], ...
'Color', color, 'LineWidth', 1)
end
hold off
end
im1/im2 are the two images, locs1/locs2 are the significative points in the images, corr is the array containing the correspondences between indexes.
However, the result is absolutely wrong, in a sense that the indexes seems to be completely wrong. The two images have both this size: [388 517 3]
I also tried to plot a single line on the image
line([1 1], [300 800])
but, again, the result is wrong, in a sense that the lines does not start on the first pixel of the first image and does not ends up on the second one. The result that I get is that this line starts from (more or less) pixel [1 300] and the line goes down straight.
Thanks for the help
You have mixed up your coordinates. The syntax is line([x1 x2 x3 ...], [y1 y2 y3 ...]), so when you write line([1 1], [300 800]) you are drawing a line from (1,300) to (1,800) (just as you later say).
What you seem to be wanting to plot is in this case is line([1 300], [1 800]).