I have a matrix, X, in which I want to plot it using the kmeans function. What I would like: If row has a value of 1 in column 4 I would like it to be square shaped If the row has a value of 2 in column 4 I would like it + shaped BUT If the row has a value of 0 in column 5 it must be blue and if the row has a vale of 1 in column 5 it must be yellow
(You don't need to use these exact colors and shapes, I just want to distinguish these.) I tried this and it did not work:
plot(X(idx==2,1),X(idx==2,2),X(:,4)==1,'k.');
Thanks!!
Based on the example on the kmeans documentation page I propose this "nested" logic:
X = [randn(100,2)+ones(100,2);...
randn(100,2)-ones(100,2)];
opts = statset('Display','final');
% This gives a random distribution of 0s and 1s in column 5:
X(:,5) = round(rand(size(X,1),1));
[idx,ctrs] = kmeans(X,2,...
'Distance','city',...
'Replicates',5,...
'Options',opts);
hold on
plot(X(idx==1,1),X(idx==1,2),'rs','MarkerSize',12)
plot(X(idx==2,1),X(idx==2,2),'r+','MarkerSize',12)
% after plotting the results of kmeans,
% plot new symbols with a different logic on top:
plot(X(X(idx==1,5)==0,1),X(X(idx==1,5)==0,2),'bs','MarkerSize',12)
plot(X(X(idx==1,5)==1,1),X(X(idx==1,5)==1,2),'gs','MarkerSize',12)
plot(X(X(idx==2,5)==0,1),X(X(idx==2,5)==0,2),'b+','MarkerSize',12)
plot(X(X(idx==2,5)==1,1),X(X(idx==2,5)==1,2),'g+','MarkerSize',12)
The above code is a minimal working example, given that the statistics toolbox is available.
The key feature is the nested logic for the plotting. For example:
X(X(idx==1,5)==0,1)
The inner X(idx==1,5) selects those values of X(:,5) for which idx==1. From those, only values which are 0 are considered: X(X(...)==0,1). Based on the logic in the question, this should be a blue square: bs.
You have four cases, hence there are four additional plot lines.
Related
I made a Classification Tree, code:
mytree=ClassificationTree.fit(MyData,MyLables);
mytree.view('mode','graph');
My data has two classes and I want to get the result of prediction as a matrix that can show me every data row is belongs to which as an example.
data row predicted class
1 2
2 1
. .
. .
. .
how can i make this matrix?
---------------------Edited----------------------
I found that with this function I can predict my data:
label = predict(Mdl,MyData([1:50],:));
but this labels are belong to which rows?
The first column, i.e. 'data row', is simply a vector starting from 1 to number of rows of X (which is obviously also the same as number of values in Y). The second column, i.e. 'predicted class', is the same as the variable MyLables. Hence:
ReqResult = [(1:numel(Y)).' Y];
%Assuming Y is a column vector (order = nx1).
%If Y is a row vector then take the transpose of Y as well.
Warning:
If you're using ≥ R2014a, you should use fitctree instead of ClassificationTree.fit because as mentioned in the documentation:
ClassificationTree.fit will be removed in a future release. Use fitctree instead.
I have a dataset of points represented by a 2D vector (X).
Each point belongs to a categorical data (Y) represented by an integer value(from 1 to 4).
I want to plot each point with a different symbol depending on its class.
Toy example:
X = randi(100,10,2); % 10 points ranging 1:100 in 2D space
Y = randi(4,10,1); % class of the points (1 to 4)
I create a vector of symbols for each class:
S = {'bx' 'rx' 'b.' 'r.'};
Then I try:
plot(X(:,1), X(:,2), S(Y))
Error using plot
Invalid first data argument
How can I assign to each point of X a different symbol based on the value of Y?
Of curse I can use a loop for each class and plot the different classes one by one. But is there a method to directly plot each class with a different symbol?
No need for a loop, use gscatter:
X = randi(100,10,2); % 10 points ranging 1:100 in 2D space
Y = randi(4,10,1); % class of the points (1 to 4)
color = 'brbr';
symbol = 'xx..';
gscatter(X(:,1),X(:,2),Y,color,symbol)
and you will get:
If X has many rows, but there are only a few S types, then I suggest you check out the second approach first. It's optimized for speed instead of readability. It's about twice as fast if the vector has 10 elements, and more than 200 times as fast if the vector has 1000 elements.
First approach (easy to read):
Regardless of approach, I think you need a loop for this:
hold on
arrayfun(#(n) plot(X(n,1), X(n,2), S{Y(n)}), 1:size(X,1))
Or, to write the loop in the "conventional way":
hold on
for n = 1:size(X,1)
plot(X(n,1), X(n,2), S{Y(n)})
end
Second approach (gives same plot as above):
If your dataset is large, you can sort [Y_sorted, sort_idx] = sort(Y), then use sort_idx to index X, like this: X_sorted = X(sort_idx);. After this, you split X_sorted into 4 groups, one for each of the individual Y-values, using histc and mat2cell. Then you loop over the four groups and plot each one individually.
This way you only need to loop through four values, regardless of the number of elements in your data. This should be a lot faster if the number of elements is high.
[Y_sorted, Y_index] = sort(Y);
X_sorted = X(Y_index, :);
X_cell = mat2cell(X_sorted, histc(Y,1:numel(S)));
hold on
for ii = 1:numel(X_cell)
plot(X_cell{ii}(:,1),X_cell{ii}(:,2),S{ii})
end
Benchmarking:
I did a very simple benchmarking of the two approaches using timeit. The result shows that the second approach is a lot faster:
For 10 elements:
First approach: 0.0086
Second approach: 0.0037
For 1000 elements:
First approach = 0.8409
Second approach = 0.0039
How can I get the numeric values in a column (let's say column 10) when the numeric values in another column (let's say column 9) are equal to a specific number and plot this in a graph.
e.g., When values of column 9 == 4, get the corresponding value of column 10 and plot. I am using row index number as a marker for time.
I am plotting all of column 10 to get a waveform then I want to use the data of column 9 to add markers to my waveform that are representative of a command occurring at a certain point in time.
Here is my code:
E = csvread('Experiment_at_10_45_1.csv');
[signal_rows, signal_columns] = size(E);
t=(1:signal_rows)/128; %128 samples per second
%% SNR plot for down frequency
plot(t,E(:,13),'k')
I hope my explanation is clear, as I have attempted to use a minimum working example of my code for the first time.
You'll want to use logical indexing to do this. You want to first create an array of 0 (false) and 1 (true) values where column 9 is equal to the value you want.
bool = E(:,9) == 4;
Then you'll want to use this 0 and 1 array as the row index. This will grab only the rows where column 9 was equal to 4. This is referred to as logical indexing.
E(bool, 10)
Then you can plot this
plot(t(bool), E(bool, 10))
As pointed out though, it is possible that the values aren't exactly to 4 due to floating point representation. To get around this, you just want to check if they are "close enough" using a very small epsilon.
bool = abs(E(:,9) - 4) < 1e-12;
I'm just beginning to teach myself MATLAB, and I'm making a 501x6 array. The columns will contain probabilities for flipping 101 sided die, and as such, the columns contain 101,201,301 entries, not 501. Is there a way to 'stretch the column' so that I add 0s above and below the useful data? So far I've only thought of making a column like a=[zeros(200,1);die;zeros(200,1)] so that only the data shows up in rows 201-301, and similarly, b=[zeros(150,1);die2;zeros(150,1)], if I wanted 200 or 150 zeros to precede and follow the data, respectively in order for it to fit in the array.
Thanks for any suggestions.
You can do several thing:
Start with an all-zero matrix, and only modify the elements you need to be non-zero:
A = zeros(501,6);
A(someValue:someOtherValue, 5) = value;
% OR: assign the range to a vector:
A(someValue:someOtherValue, 5) = 1:20; % if someValue:someOtherValue is the same length as 1:20
I have a binary matrix of size m-by-n. Given below is a sample binary matrix (the real matrix is much larger):
1010001
1011011
1111000
0100100
Given p = m*n, I have 2^p possible matrix configurations. I would like to get some patterns which satisfy certain rules. For example:
I want not less than k cells in the jth column as zero
I want the sum of cell values of the ith row greater than a given number Ai
I want at least g cells in a column continuously as one
etc....
How can I get such patterns satisfying these constraints strictly without sequentially checking all the 2^p combinations?
In my case, p can be a number like 2400, giving approximately 2.96476e+722 possible combinations.
Instead of iterating over all 2^p combinations, one way you could generate such binary matrices is by performing repeated row- and column-wise operations based on the given constraints you have. As an example, I'll post some code that will generate a matrix based on the three constraints you have listed above:
A minimum number of zeroes per column
A minimum sum for each row
A minimum sequential length of ones per column
Initializations:
First start by initializing a few parameters:
nRows = 10; % Row size of matrix
nColumns = 10; % Column size of matrix
minZeroes = 5; % Constraint 1 (for columns)
minRowSum = 5; % Constraint 2 (for rows)
minLengthOnes = 3; % Constraint 3 (for columns)
Helper functions:
Next, create a couple of functions for generating column vectors that match constraints 1 and 3 from above:
function vector = make_column
vector = [false(minZeroes,1); true(nRows-minZeroes,1)]; % Create vector
[vector,maxLength] = randomize_column(vector); % Randomize order
while maxLength < minLengthOnes, % Loop while constraint 3 is not met
[vector,maxLength] = randomize_column(vector); % Randomize order
end
end
function [vector,maxLength] = randomize_column(vector)
vector = vector(randperm(nRows)); % Randomize order
edges = diff([false; vector; false]); % Find rising and falling edges
maxLength = max(find(edges == -1)-find(edges == 1)); % Find longest
% sequence of ones
end
The function make_column will first create a logical column vector with the minimum number of 0 elements and the remaining elements set to 1 (using the functions TRUE and FALSE). This vector will undergo random reordering of its elements until it contains a sequence of ones greater than or equal to the desired minimum length of ones. This is done using the randomize_column function. The vector is randomly reordered using the RANDPERM function to generate a random index order. The edges where the sequence switches between 0 and 1 are detected using the DIFF function. The indices of the edges are then used to find the length of the longest sequence of ones (using FIND and MAX).
Generate matrix columns:
With the above two functions we can now generate an initial binary matrix that will at least satisfy constraints 1 and 3:
binMat = false(nRows,nColumns); % Initialize matrix
for iColumn = 1:nColumns,
binMat(:,iColumn) = make_column; % Create each column
end
Satisfy the row sum constraint:
Of course, now we have to ensure that constraint 2 is satisfied. We can sum across each row using the SUM function:
rowSum = sum(binMat,2);
If any elements of rowSum are less than the minimum row sum we want, we will have to adjust some column values to compensate. There are a number of different ways you could go about modifying column values. I'll give one example here:
while any(rowSum < minRowSum), % Loop while constraint 2 is not met
[minValue,rowIndex] = min(rowSum); % Find row with lowest sum
zeroIndex = find(~binMat(rowIndex,:)); % Find zeroes in that row
randIndex = round(1+rand.*(numel(zeroIndex)-1));
columnIndex = zeroIndex(randIndex); % Choose a zero at random
column = binMat(:,columnIndex);
while ~column(rowIndex), % Loop until zero changes to one
column = make_column; % Make new column vector
end
binMat(:,columnIndex) = column; % Update binary matrix
rowSum = sum(binMat,2); % Update row sum vector
end
This code will loop until all the row sums are greater than or equal to the minimum sum we want. First, the index of the row with the smallest sum (rowIndex) is found using MIN. Next, the indices of the zeroes in that row are found and one of them is randomly chosen as the index of a column to modify (columnIndex). Using make_column, a new column vector is continuously generated until the 0 in the given row becomes a 1. That column in the binary matrix is then updated and the new row sum is computed.
Summary:
For a relatively small 10-by-10 binary matrix, and the given constraints, the above code usually completes in no more than a few seconds. With more constraints, things will of course get more complicated. Depending on how you choose your constraints, there may be no possible solution (for example, setting minRowSum to 6 will cause the above code to never converge to a solution).
Hopefully this will give you a starting point to begin generating the sorts of matrices you want using vectorized operations.
If you have enough constraints, exploring all possible matrices could be attempted:
// Explore all possibilities starting at POSITION (0..P-1)
explore(int position)
{
// Check if one or more constraints can't be verified anymore with
// all values currently set.
invalid = ...;
if (invalid) return;
// Do we have a solution?
if (position >= p)
{
// print the matrix
return;
}
// Set one more value and continue exploring
for (int value=0;value<2;value++)
{ matrix[position] = value; explore(position+1); }
}
If the number of constraints is low, this approach will take too much time.
In this case, for the kind of constraints you gave as examples, simulated annealing may be a good solution.
You must design an energy function, high when all constraints are met. That would be something like that:
Generate a random matrix
Compute energy E0
Change one cell
Compute energy E1
If E1>E0, or E0-E1 is smaller than f(temperature), keep it, otherwise reverse the move
Update temperature, and goto 2 unless stop criterion is reached
If all the contraints relate to columns (as is the case in the question), then you can find all possible valid columns and check that each column in the matrix is in this set. (i.e. when you consider each column independently, you reduce the number of possibilities a lot.)
I might be way off here, but I remember doing something similar once with some genetic algorithm.
Check out pseudo boolean constraints (also called 0-1 integer programming).
This is virtually impossible if your constraint set is complex enough. You might try to use a stochastic optimizer, like simulated annealing, particle swarm optimization, or a genetic algorithm to find a feasible solution.
However, if you can generate one (non-random) solution to such a problem, then often you can generate others by random permutations made to the existing solution.