Configuring biplot in Matlab to distinguish in scatter - matlab

My original data is a 195x22 record set containing vocal measurements of people having Parkinson's disease or not. In a vector, 195x1, I have a status which is either 1/0.
Now, I have performed a PCA and I do a biplot, which turns out well. The problem is that I can't tell which dots from my scatter plot origin of a sick or a healthy person (I can't link it with status). I would like for my scatter plot to have a red dot if healthy (status=0) and green if sick (status=1).
How would I do that? My biplot code is:
biplot(coeff(:,1:2), ...
'Scores', score(:,1:2), ...
'VarLabels', Labels, ...
'markersize', 15 ...
);
xlabel('Bi-Plot: Standardized Data');
xlabel('PCA1');
ylabel('PCA2');
Click to view image
UPDATE (Solution):
Solution is inspired by #Magla and code can be seen here: http://pastebin.com/KHUj3DnA
With this beautiful graph:

The principal component scores (red points) in a biplot are not the ones returned by the pca function. As the help states,
biplot scales the scores so that they fit on the plot: It divides each score by the maximum absolute value of all scores, and multiplies
by the maximum coefficient length of coefs. Then biplot changes the
sign of score coordinates according to the sign convention for the
coefs.
You therefore can't easily use the (X,Y) information to find out which point belong to a category.
Here is a workaround using the ObsLabels option of biplot. ObsLabels assigns some user-defined data to each observation: for each point, we will assign the index corresponding to a status variable (a simple incrementing value). With this, you can easily modify the red points of a biplot - here marker set to square and red/green color.
The following figure
is produced by this code
%some data
load carsmall
x = [Acceleration Displacement Horsepower MPG Weight]; x = x(all(~isnan(x),2),:);
[coefs,score] = pca(zscore(x));
%the status vector (here zero or one)
class_pt = round(rand(size(score,1),1));
vbls = {'Accel','Disp','HP','MPG','Wgt'};
figure('Color', 'w');
hbi = biplot(coefs(:,1:2),'scores',score(:,1:2),'varlabels',vbls,...
'ObsLabels',num2str((1:size(score,1))'));
for ii = 1:length(hbi)
userdata = get(hbi(ii), 'UserData');
if ~isempty(userdata)
if class_pt(userdata) == 0
set(hbi(ii), 'Color', 'g', 'Marker', 's');
elseif class_pt(userdata) == 1
set(hbi(ii), 'Color', 'r', 'Marker', 's');
end
end
end

Related

Scatter with line segments

I really like scatter()'s ability to automatically color points based on some vector of values, I just want to add colored lines between the points.
The plot in question has time on x-axis, monte-carlo number on y-axis, and then some measured value as the color vector (e.g. number of cars seen in a video frame).
Basically, each point is an update in the system. So calling scatter(time,monte_carlo_number,[],color_vec) plots the points at which there is an update in the system, with color representing some value. This is great, but I would like to add line segments that connect these points, each segment matching the color specified by color_vec.
Basic working example
% Create example data
data = table();
data.time = randsample(1:100, 1000, true)';
data.mc = randsample(1:50, 1000, true)'; % actual monte-carlo run number labels are sorted
data.color_value = randsample(1:10, 1000, true)';
% Create the scatter plot
scatter(data.time, data.mc, [] , data.color_value, 'filled')
colorbar('Ticks', unique(data.color_value))
% Always label your axes
xlabel('Time (s)')
ylabel('Monte-Carlo Run Number')
Below is a screen-shot of what this code might produce. If color_value is the number of cars seen in a video frame, we can see each time this value is updated via the points. However, it is easier for humans to read this plot if there were lines connecting each point to the next with the correct color. This demonstrates to the viewer that this value continues on in time until the next update.
Something like this? I changed the number of samples to 100, and it is already quite a mess, so I don't think this is going to the viewer understand what's plotted.
% Create example data
data = table();
np = 100;
data.time = randsample(1:100, np, true)';
data.mc = randsample(1:50, np, true)'; % actual monte-carlo run number labels are sorted
data.color_value = randsample(1:10, np, true)';
vals = unique(data.color_value).';
cmap = parula(numel(vals));
colors = [];
for k = 1:numel(vals)
ind = find(data.color_value == vals(k));
data_sel{k} = sortrows(data(ind,:));
colors(k,:) = cmap(k,:);
end
figure(1); clf;
% Create the scatter plot
scatter(data.time, data.mc, [] , data.color_value, 'filled')
hold on
for k = 1:numel(vals)
plot(data_sel{k}.time, data_sel{k}.mc, 'Color',colors(k,:))
end
colorbar('Ticks', unique(data.color_value))
% Always label your axes
xlabel('Time (s)')
ylabel('Monte-Carlo Run Number')

Plot a cell into a time-changing curve

I have got a cell, which is like this : Data={[2,3],[5,6],[1,4],[6,7]...}
The number in every square brackets represent x and y of a point respectively. There will be a new coordinate into the cell in every loop of my algorithm.
I want to plot these points into a time-changing curve, which will tell me the trajectory of the point.
As a beginner of MATLAB, I have no idea of this stage. Thanks for your help.
Here is some sample code to get you started. It uses some basic Matlab functionalities that you will hopefully find useful as you continue using it. I added come data points to you cell array for illustrative purposes.
The syntax to access elements into the cell array might seem weird but is important. Look here for details about cell array indexing.
In order to give nice colors to the points, I generated an array based on the jet colormap built-in in Matlab. Basically issuing the command
Colors = jet(N)
create a N x 3 matrix in which every row is a 3-element color ranging from blue to red. That way you can see which points were detected before other (i.e. blue before red). Of course you can change that to anything you want (look here if you're interested).
So here is the code. If something is unclear please ask for clarifications.
clear
clc
%// Get data
Data = {[2,3],[5,6],[1,4],[6,7],[8,1],[5,2],[7,7]};
%// Set up a matrix to color the points. Here I used a jet colormap
%// available from MATLAB but that could be anything.
Colors = jet(numel(Data));
figure;
%// Use "hold all" to prevent the content of the figure to be overwritten
%// at every iterations.
hold all
for k = 1:numel(Data)
%// Note the syntax used to access the content of the cell array.
scatter(Data{k}(1),Data{k}(2),60,Colors(k,:),'filled');
%// Trace a line to link consecutive points
if k > 1
line([Data{k-1}(1) Data{k}(1)],[Data{k-1}(2) Data{k}(2)],'LineStyle','--','Color','k');
end
end
%// Set up axis limits
axis([0 10 0 11])
%// Add labels to axis and add a title.
xlabel('X coordinates','FontSize',16)
ylabel('Y coordinates','FontSize',16)
title('This is a very boring title','FontSize',18)
Which outputs the following:
This would be easier to achieve if all of your data was stored in a n by 2 (or 2 by n) matrix. In this case, each row would be a new entry. For example:
Data=[2,3;
5,6;
1,4;
6,7];
plot(Data(:, 1), Data(:, 2))
Would plot your points. Fortunately, Matlab is able to handle matrices which grow on every iteration, though it is not recommended.
If you really wanted to work with cells, there are a couple of ways you could do it. Firstly, you could assign the elements to a matrix and repeat the above method:
NumPoints = numel(Data);
DataMat = zeros(NumPoints, 2);
for I = 1:NumPoints % Data is a cell here
DataMat(I, :) = cell2mat(Data(I));
end
You could alternatively plot the elements straight from the cell, though this would limit your plot options.
NumPoints = numel(Data);
hold on
for I = 1:NumPoints
point = cell2mat(Data(I));
plot(point(1), point(2))
end
hold off
With regards to your time changing curve, if you find that Matlab starts to slow down after it stores lots of points, it is possible to limit your viewing window in time with clever indexing. For example:
index = 1;
SamplingRate = 10; % How many times per second are we taking a sample (Hertz)?
WindowTime = 10; % How far into the past do we want to store points (seconds)?
NumPoints = SamplingRate * WindowTime
Data = zeros(NumPoints, 2);
while running
% Your code goes here
Data(index, :) = NewData;
index = index + 1;
index = mod(index-1, NumPoints)+1;
plot(Data(:, 1), Data(:, 2))
drawnow
end
Will store your data in a Matrix of fixed size, meaning Matlab won't slow down.

Matlab Boxplots

I'd like to create a quasi boxplot graph as shown on pages 15/16 of the attached report.
comisef.eu/files/wps031.pdf
Ideally I only want to show the median, the maximum and minimum values as in the report.
I would also like to have similar spacing to that shown in the report.
Currently I have two matrices with the all the necessary values stored in them but have no idea how to do this in matlab.
The boxplot function gives too much data (outliers etc) which makes the resulting graph look confused especially when I try to plot 200 on one page as in the original report.
Is there another function that can so the same thing as in the report in matlab?
Baz
OK here is some test data each row represents 10 sets of estimations of a data set, and each column represents the test number for a given observation.
As boxplot works on the columns of the input matrix you will need to transpose the matrix.
Is it possible to turn outliers and the inter-quartile ranges off? Ideally I just want to see the maximum, minimum and median values?
You can repeat the data below to get up to 200. Or I can send more data if necessary.
0.00160329732202511 0.000859407819412016 0.000859407819411159 0.0659939338995606 0.000859407819416322 0.000859407819416519 2.56395024851142e-15 2.05410662537078e-14 0.000859407819416209
1.67023155116586e-06 8.88178419700125e-16 1.67023155115637e-06 0.000730536218639616 1.67023155105582e-06 3.28746017489609e-15 4.41416632660789e-15 1.67023155094400e-06 1.67023155097567e-06
1.42410590843629e-06 1.42410590840224e-06 1.76149166727218e-15 5.97790925044131e-15 1.42410590843863e-06 2.87802701599909e-15 9.31529385335274e-16 9.17306727455842e-16 0.000820358763518906
8.26849110292527e-16 3.23505095414772e-15 4.38139485761850e-07 4.38139485938112e-07 4.38139485981887e-07 0.000884647755317917 3.72611754134110e-15 4.38139485974329e-07 4.38139485923219e-07
0.000160661751819407 0.000870787937135265 0.000870787937136209 1.16934122581182e-15 9.02860049358913e-16 1.18053134896556e-15 1.40433338743068e-15 0.000870787937135929 1.13510916297112e-15
1.16934122581182e-15 3.80292342262841e-05 3.80292342263200e-05 0.00284904319356532 1.74649997619656e-15 3.80292342264024e-05 0.00284904319356537 1.01267920724547e-15 0.00284904319356540
0.100091800399985 0.100091773169254 0.100091803903140 0.000770464183529358 0.100091812455930 3.49996706323281e-05 3.49996706323553e-05 1.05090687851466e-15 0.100091846333800
0.00100555294602561 0.00100555294601056 0.105365907420183 0.000121078082591672 9.02860049358913e-16 0.000121078082591805 4.49679158258033e-15 7.77684615168284e-16 0.000121078082591693
0.122539456858702 0.000363547764643498 0.000363547764643509 0.122516928568610 0.0101487499394213 0.122408366511784 0.000363547764643519 1.13510916297112e-15 0.122521393586646
0.000460749357561036 0.000460749357560646 3.27600489447913e-13 1.18053134896556e-15 0.000460749357561239 1.54689304063675e-15 0.000460749357560827 0.000460749357561205 1.16934122581182e-15
Instead of using boxplot, I suggest just drawing lines from the min to the max and making a mark at the median. Boxplot draws boxes from the 25 to 75 percentile, which doesn't sound like what you want. Something like this:
% fake data
nPoints = 100;
data = 10*rand(10, nPoints);
% find statistics
minData = min(data, [], 1);
maxData = max(data, [], 1);
medData = median(data);
% x coordinates of each line. Change this to change the spacing.
x = 1:nPoints;
figure
hold on
%plot lines
line([x; x], [minData; maxData])
% plot cross at median
plot(x, medData, '+')
EDIT: To have horizontal lines and a second axis you can do something like this:
figure
h1 = subplot(1,2,1);
h2 = subplot(1,2,2);
% left subplot
axes(h1)
hold on
%plot lines
line([minData; maxData], [x; x])
% plot cross at median
plot(medData, x, '+')
% link the axes so they will have the same limits
linkaxes([h1,h2],'y')
% turn off ticks on y axis.
set(h2, 'YTick', [])
I think it's a question of playing with the settings. You can try:
boxplot(X, 'plotstyle', 'compact', 'colors', 'k', 'medianstyle', 'line', 'outliersize', 0);
Explanation:
'plotstyle', 'compact': makes the boxes filled and the lines undashed
'colors', 'k': color is black
'medianstyle', 'line': the median is marked by a line
'outliersize', 0: if outlier size is zero, you don't see them
Other you can try:
'orientation', 'vertical': this flips the orientation, depends on your data
'whisker', 10 (or higher): this sets the maximum whisker length as a function of the interquartile limits (if you crank it up, it will eventually default to max and min values), I wasn't sure if this is what you wanted. Right now, it goes to the 25th and 75th percentile values.
The spacing is going to depend on how much data you have. If you edit with some data, I can try it out for you.

Representing three variables in a three dimension plot

I have a problem dealing with 3rd dimension plot for three variables.
I have three matrices: Temperature, Humidity and Power. During one year, at every hour, each one of the above were measured. So, we have for each matrix 365*24 = 8760 points. Then, one average point is taken every day. So,
Tavg = 365 X 1
Havg = 365 X 1
Pavg = 365 X 1
In electrical point of veiw, the power depends on the temperature and humidity. I want to discover this relation using a three dimensional plot.
I tried using mesh, meshz, surf, plot3, and many other commands in MATLAB but unfortunately I couldn't get what I want. For example, let us take first 10 days. Here, every day is represented by average temperature, average humidity and average power.
Tavg = [18.6275
17.7386
15.4330
15.4404
16.4487
17.4735
19.4582
20.6670
19.8246
16.4810];
Havg = [75.7105
65.0892
40.7025
45.5119
47.9225
62.8814
48.1127
62.1248
73.0119
60.4168];
Pavg = [13.0921
13.7083
13.4703
13.7500
13.7023
10.6311
13.5000
12.6250
13.7083
12.9286];
How do I represent these matrices by three dimension plot?
The challenge is that the 3-D surface plotting functions (mesh, surf, etc.) are looking for a 2-D matrix of z values. So to use them you need to construct such a matrix from the data.
Currently the data is sea of points in 3-D space, so, you have to map these points to a surface. A simple approach to this is to divide up the X-Y (temperature-humidity) plane into bins and then take the average of all of the Z (power) data. Here is some sample code for this that uses accumarray() to compute the averages for each bin:
% Specify bin sizes
Tbin = 3;
Hbin = 20;
% Create binned average array
% First create a two column array of bin indexes to use as subscripts
subs = [round(Havg/Hbin)+1, round(Tavg/Tbin)+1];
% Now create the Z (power) estimate as the average value in each bin
Pest = accumarray(subs,Pavg,[],#mean);
% And the corresponding X (temp) & Y (humidity) vectors
Tval = Tbin/2:Tbin:size(Pest,2)*Tbin;
Hval = Hbin/2:Hbin:size(Pest,1)*Hbin;
% And create the plot
figure(1)
surf(Tval, Hval, Pest)
xlabel('Temperature')
ylabel('Humidity')
zlabel('Power')
title('Simple binned average')
xlim([14 24])
ylim([40 80])
The graph is a bit coarse (can't post image yet, since I am new) because we only have a few data points. We can enhance the visualization by removing any empty bins by setting their value to NaN. Also the binning approach hides any variation in the Z (power) data so we can also overlay the orgional point cloud using plot3 without drawing connecting lines. (Again no image b/c I am new)
Additional code for the final plot:
%% Expanded Plot
% Remove zeros (useful with enough valid data)
%Pest(Pest == 0) = NaN;
% First the original points
figure(2)
plot3(Tavg, Havg, Pavg, '.')
hold on
% And now our estimate
% The use of 'FaceColor' 'Interp' uses colors that "bleed" down the face
% rather than only coloring the faces away from the origin
surfc(Tval, Hval, Pest, 'FaceColor', 'Interp')
% Make this plot semi-transparent to see the original dots anb back side
alpha(0.5)
xlabel('Temperature')
ylabel('Humidity')
zlabel('Power')
grid on
title('Nicer binned average')
xlim([14 24])
ylim([40 80])
I think you're asking for a surface fit for your data. The Curve Fitting Toolbox handles this nicely:
% Fit model to data.
ft = fittype( 'poly11' );
fitresult = fit( [Tavg, Havg], Pavg, ft);
% Plot fit with data.
plot( fitresult, [xData, yData], zData );
legend( 'fit 1', 'Pavg vs. Tavg, Havg', 'Location', 'NorthEast' );
xlabel( 'Tavg' );
ylabel( 'Havg' );
zlabel( 'Pavg' );
grid on
If you don't have the Curve Fitting Toolbox, you can use the backslash operator:
% Find the coefficients.
const = ones(size(Tavg));
coeff = [Tavg Havg const] \ Pavg;
% Plot the original data points
clf
plot3(Tavg,Havg,Pavg,'r.','MarkerSize',20);
hold on
% Plot the surface.
[xx, yy] = meshgrid( ...
linspace(min(Tavg),max(Tavg)) , ...
linspace(min(Havg),max(Havg)) );
zz = coeff(1) * xx + coeff(2) * yy + coeff(3);
surf(xx,yy,zz)
title(sprintf('z=(%f)*x+(%f)*y+(%f)',coeff))
grid on
axis tight
Both of these fit a linear polynomial surface, i.e. a plane, but you'll probably want to use something more complicated. Both of these techniques can be adapted to this situation. There's more information on this subject at mathworks.com: How can I determine the equation of the best-fit line, plane, or N-D surface using MATLAB?.
You might want to look at Delaunay triangulation:
tri = delaunay(Tavg, Havg);
trisurf(tri, Tavg, Havg, Pavg);
Using your example data, this code generates an interesting 'surface'. But I believe this is another way of doing what you want.
You might also try the GridFit tool by John D'Errico from MATLAB Central. This tool produces a surface similar to interpolating between the data points (as is done by MATLAB's griddata) but with cleaner results because it smooths the resulting surface. Conceptually multiple datapoints for nearby or overlapping X,Y coordinates are averaged to produce a smooth result rather than noisy "ripples." The tool also allows for some extrapolation beyond the data points. Here is a code example (assuming the GridFit Tool has already been installed):
%Establish points for surface
num_points = 20;
Tval = linspace(min(Tavg),max(Tavg),num_points);
Hval = linspace(min(Havg),max(Havg),num_points);
%Do the fancy fitting with smoothing
Pest = gridfit(Tavg, Havg, Pavg, Tval, Hval);
%Plot results
figure(5)
surfc(XI,YI,Pest, 'FaceColor', 'Interp')
To produce an even nicer plot, you can add labels, some transparancy and overlay the original points:
alpha(0.5)
hold on
plot3(Tavg,Havg,Pavg,'.')
xlabel('Temperature')
ylabel('Humidity')
zlabel('Power')
grid on
title('GridFit')
PS: #upperBound: Thanks for the Delaunay triangulation tip. That seems like the way to go if you want to go through each of the points. I am a newbie so can't comment yet.
Below is your solution:
Save/write the Myplot3D function
function [x,y,V]=Myplot3D(X,Y,Z)
x=linspace(X(1),X(end),100);
y=linspace(Y(1),Y(end),100);
[Xt,Yt]=meshgrid(x,y);
V=griddata(X,Y,Z,Xt,Yt);
Call the following from your command line (or script)
[Tavg_new,Pavg_new,V]=Myplot3D(Tavg,Pavg,Havg);
surf(Tavg_new,Pavg_new,V)
colormap jet;
xlabel('Temperature')
ylabel('Power/Pressure')
zlabel('Humidity')

Plot data with MATLAB biplot with more than 1 color

I have 3 groups of data that had PCA performed on them as one group. I want to highlight each variable group with a different color. Prior to this I overlaid 3 biplots. This gives different colors but creates a distortion in the data as each biplot function skews the data. This caused the groups to all be skewed by different amounts, making the plot not a correct representation.
How do I take a PCA scores matrix (30x3) and split it so the first 10x3 is one color, the next 10x3 is another and the third 10x3 is another, without the data being skewed?
"Skewing" is happening because biplot is renormalizing the scores so the farthest score is distance 1 . axis equal isn't going to fix this. You should use scatter3 instead of biplot
data = rand(30,3);
group = scores(1:10,:)
scatter3(group(:,1), group(:,2), group(:,3), '.b')
hold all
group = scores(11:20,:)
scatter3(group(:,1), group(:,2), group(:,3), '.r')
group = scores(21:30,:)
scatter3(group(:,1), group(:,2), group(:,3), '.g')
hold off
title('Data')
xlabel('X')
ylabel('Y')
zlabel('Z')
Or modify your code's scatter3 lines so that the markers are different colors. The parameter after 'marker' tells what symbol and what symbol and color to plot. E.g. '.r' is a red dot. See Linespec for marker and color parameters.
scatter3(plotdataholder(1:14,1),plotdataholder(1:14,2),plotdataholder(1:14,3),35,[1 0 0],'marker', '.b');
hold on;
scatter3(plotdataholder(15:28,1),plotdataholder(15:28,2),plotdataholder(15:28,3),35,[0 0 1],'marker', '.r') ;
scatter3(plotdataholder(29:42,1),plotdataholder(29:42,2),plotdataholder(29:42,3),35,[0 1 0],'marker', '.g');
This is the method I used to plot biplot data with different colors. The lines of code prior to plot are taken from the biplot.m file. The way biplot manipulates data is kept intact and stops skewing of data when using overlaid biplots.
This coding is not the most efficient, one can see parts that can be cut. I wanted to keep the code intact so one can see how biplot works in it's entirety.
%%%%%%%%%%%%%%%%%%%%%
xxx = coeff(:,1:3);
yyy= score(:,1:3);
**%Taken from biplot.m; This is alter the data the same way biplot alters data - having the %data fit on grid axes no larger than 1.**
[n,d2] = size(yyy);
[p,d] = size(xxx); %7 by 3
[dum,maxind] = max(abs(xxx),[],1);
colsign = sign(xxx(maxind + (0:p:(d-1)*p)));
xxx = xxx .* repmat(colsign, p, 1);
yyy= (yyy ./ max(abs(yyy(:)))) .* repmat(colsign, 42, 1);
nans = NaN(n,1);
ptx = [yyy(:,1) nans]';
pty = [yyy(:,2) nans]';
ptz = [yyy(:,3) nans]';
**%I grouped the pt matrices for my benefit**
plotdataholder(:,1) = ptx(1,:);
plotdataholder(:,2) = pty(1,:);
plotdataholder(:,3) = ptz(1,:);
**%my original score matrix is 42x3 - wanted each 14x3 to be a different color**
scatter3(plotdataholder(1:14,1),plotdataholder(1:14,2),plotdataholder(1:14,3),35,[1 0 0],'marker', '.');
hold on;
scatter3(plotdataholder(15:28,1),plotdataholder(15:28,2),plotdataholder(15:28,3),35,[0 0 1],'marker', '.') ;
scatter3(plotdataholder(29:42,1),plotdataholder(29:42,2),plotdataholder(29:42,3),35,[0 1 0],'marker', '.');
xlabel('Principal Component 1');
ylabel('Principal Component 2');
zlabel('Principal Component 3');
I am not sure if it will help, but try axis equal after you have overlaid the plots.