K-means sort labels - matlab

Assume I have matrix A and I perform K-means clustering on them in MATLAB. I get the following
A=
1 20 5
1 30 10
2 60 20
5 100 45
kmeans(A,4) results in the following labels:
2
4
3
1
Now I permute rows of A and I get matrix B:
B =
2 60 20
1 30 10
5 100 45
1 20 5
and after applying the kmeans the labels are B1 = [3 1 2 4], which seems to be random assignment. For example second row of matrix A is in cluster 4 but second row of matrix B which is the same thing as second row of A is in cluster 1.
How can I get the labels in the kmeans such that rows that have highest value always get the same label, for example 3, and row that have lowest value always get 1?
For example the last row of A gets label 3, thus the third row of B also get label 3.

Each label is tied to the mean of a cluster. To sort the labels, you sort the means in e.g. order of appearance along a given axis (x-axis in this example). Here's an implementation in Python:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
np.random.seed(1)
def rearrange_labels(X, cluster_labels, sort_on_column=0):
labels, ctrs = [], []
for i in range(len(set(cluster_labels))):
Xi = X[cluster_labels == i]
ctr = np.mean(Xi, axis=0)
labels.append(i)
ctrs.append(ctr)
ctrs = np.row_stack(ctrs)
labels = np.array(labels).reshape(-1, 1)
# sort on x column
new_order = ctrs[:, sort_on_column].argsort()
labels_new = labels[new_order]
ctrs_new = ctrs[new_order]
np.put(cluster_labels, labels, labels_new)
return cluster_labels, ctrs_new
X, _ = make_blobs(n_samples=500, centers=10, n_features=2)
clf = KMeans(n_clusters=10)
cluster_labels = clf.fit_predict(X)
cluster_labels, ctrs = rearrange_labels(X=X, cluster_labels=cluster_labels)
fig, ax = plt.subplots()
for i, m in enumerate(ctrs):
ax.annotate(
xy=m[[0, 1]],
s=i,
bbox=dict(boxstyle="square", fc="w", ec="grey", alpha=0.9),
)
ax.scatter(X[:, 0], X[:, 1], c=cluster_labels)
plt.show()

The cluster numbers assigned by k-means do not have an order - don't treat them as such. They are numbers just for convenience, they might as well be A B C D.
If you want to impose an order on them, you can relabel them as you want. You can sort the centers by X coordinate, and relabel them. It's not the job of k-means to do so, you need to do this yourself.

Related

MATLAB: Applying vectors of row and column indices without looping

I have a situation analogous to the following
z = magic(3) % Data matrix
y = [1 2 2]' % Column indices
So,
z =
8 1 6
3 5 7
4 9 2
y represents the column index I want for each row. It's saying I should take row 1 column 1, row 2 column 2, and row 3 column 2. The correct output is therefore 8 5 9.
I worked out I can get the correct output with the following
x = 1:3;
for i = 1:3
result(i) = z(x(i),y(i));
end
However, is it possible to do this without looping?
Two other possible ways I can suggest is to use sub2ind to find the linear indices that you can use to sample the matrix directly:
z = magic(3);
y = [1 2 2];
ind = sub2ind(size(z), 1:size(z,1), y);
result = z(ind);
We get:
>> result
result =
8 5 9
Another way is to use sparse to create a sparse matrix which you can turn into a logical matrix and then sample from the matrix with this logical matrix.
s = sparse(1:size(z,1), y, 1, size(z,1), size(z,2)) == 1; % Turn into logical
result = z(s);
We also get:
>> result
result =
8
5
9
Be advised that this only works provided that each row index linearly increases from 1 up to the end of the rows. This conveniently allows you to read the elements in the right order taking advantage of the column-major readout that MATLAB is based on. Also note that the output is also a column vector as opposed to a row vector.
The link posted by Adriaan is a great read for the next steps in accessing elements in a vectorized way: Linear indexing, logical indexing, and all that.
there are many ways to do this, one interesting way is to directly work out the indexes you want:
v = 0:size(y,2)-1; %generates a number from 0 to the size of your y vector -1
ind = y+v*size(z,2); %generates the indices you are looking for in each row
zinv = z';
zinv(ind)
>> ans =
8 5 9

Shifting repeating rows to a new column in a matrix

I am working with a n x 1 matrix, A, that has repeating values inside it:
A = [0;1;2;3;4; 0;1;2;3;4; 0;1;2;3;4; 0;1;2;3;4]
which correspond to an n x 1 matrix of B values:
B = [2;4;6;8;10; 3;5;7;9;11; 4;6;8;10;12; 5;7;9;11;13]
I am attempting to produce a generalised code to place each repetition into a separate column and store it into Aa and Bb, e.g.:
Aa = [0 0 0 0 Bb = [2 3 4 5
1 1 1 1 4 5 6 7
2 2 2 2 6 7 8 9
3 3 3 3 8 9 10 11
4 4 4 4] 10 11 12 13]
Essentially, each repetition from A and B needs to be copied into the next column and then deleted from the first column
So far I have managed to identify how many repetitions there are and copy the entire column over to the next column and then the next for the amount of repetitions there are but my method doesn't shift the matrix rows to columns as such.
clc;clf;close all
A = [0;1;2;3;4;0;1;2;3;4;0;1;2;3;4;0;1;2;3;4];
B = [2;4;6;8;10;3;5;7;9;11;4;6;8;10;12;5;7;9;11;13];
desiredCol = 1; %next column to go to
destinationCol = 0; %column to start on
n = length(A);
for i = 2:1:n-1
if A == 0;
A = [ A(:, 1:destinationCol)...
A(:, desiredCol+1:destinationCol)...
A(:, desiredCol)...
A(:, destinationCol+1:end) ];
end
end
A = [...] retrieved from Move a set of N-rows to another column in MATLAB
Any hints would be much appreciated. If you need further explanation, let me know!
Thanks!
Given our discussion in the comments, all you need is to use reshape which converts a matrix of known dimensions into an output matrix with specified dimensions provided that the number of elements match. You wish to transform a vector which has a set amount of repeating patterns into a matrix where each column has one of these repeating instances. reshape creates a matrix in column-major order where values are sampled column-wise and the matrix is populated this way. This is perfect for your situation.
Assuming that you already know how many "repeats" you're expecting, we call this An, you simply need to reshape your vector so that it has T = n / An rows where n is the length of the vector. Something like this will work.
n = numel(A); T = n / An;
Aa = reshape(A, T, []);
Bb = reshape(B, T, []);
The third parameter has empty braces and this tells MATLAB to infer how many columns there will be given that there are T rows. Technically, this would simply be An columns but it's nice to show you how flexible MATLAB can be.
If you say you already know the repeated subvector, and the number of times it repeats then it is relatively straight forward:
First make your new A matrix with the repmat function.
Then remap your B vector to the same size as you new A matrix
% Given that you already have the repeated subvector Asub, and the number
% of times it repeats; An:
Asub = [0;1;2;3;4];
An = 4;
lengthAsub = length(Asub);
Anew = repmat(Asub, [1,An]);
% If you can assume that the number of elements in B is equal to the number
% of elements in A:
numberColumns = size(Anew, 2);
newB = zeros(size(Anew));
for i = 1:numberColumns
indexStart = (i-1) * lengthAsub + 1;
indexEnd = indexStart + An;
newB(:,i) = B(indexStart:indexEnd);
end
If you don't know what is in your original A vector, but you do know it is repetitive, if you assume that the pattern has no repeats you can use the find function to find when the first element is repeated:
lengthAsub = find(A(2:end) == A(1), 1);
Asub = A(1:lengthAsub);
An = length(A) / lengthAsub
Hopefully this fits in with your data: the only reason it would not is if your subvector within A is a pattern which does not have unique numbers, such as:
A = [0;1;2;3;2;1;0; 0;1;2;3;2;1;0; 0;1;2;3;2;1;0; 0;1;2;3;2;1;0;]
It is worth noting that from the above intuitively you would have lengthAsub = find(A(2:end) == A(1), 1) - 1;, But this is not necessary because you are already effectively taking the one off by only looking in the matrix A(2:end).

Leave one out crossvalind in Matlab

I have extracted HOG features for male and female pictures, now, I'm trying to use the Leave-one-out-method to classify my data.
Due the standard way to write it in Matlab is:
[Train, Test] = crossvalind('LeaveMOut', N, M);
What I should write instead of N and M?
Also, should I write above code statement inside or outside a loop?
this is my code, where I have training folder for Male (80 images) and female (80 images), and another one for testing (10 random images).
for i = 1:10
[Train, Test] = crossvalind('LeaveMOut', N, 1);
SVMStruct = svmtrain(Training_Set (Train), train_label (Train));
Gender = svmclassify(SVMStruct, Test_Set_MF (Test));
end
Notes:
Training_Set: an array consist of HOG features of training folder images.
Test_Set_MF: an array consist of HOG features of test folder images.
N: total number of images in training folder.
SVM should detect which images are male and which are female.
I will focus on how to use crossvalind for the leave-one-out-method.
I assume you want to select random sets inside a loop. N is the length of your data vector. M is the number of randomly selected observations in Test. Respectively M is the number of observations left out in Train. This means you have to set N to the length of your training-set. With M you can specify how many values you want in your Test-output, respectively you want to left out in your Train-output.
Here is an example, selecting M=2 observations out of the dataset.
dataset = [1 2 3 4 5 6 7 8 9 10];
N = length(dataset);
M = 2;
for i = 1:5
[Train, Test] = crossvalind('LeaveMOut', N, M);
% do whatever you want with Train and Test
dataset(Test) % display the test-entries
end
This outputs: (this is generated randomly, so you won't have the same result)
ans =
1 9
ans =
6 8
ans =
7 10
ans =
4 5
ans =
4 7
As you have it in your code according to this post, you need to adjust it for a matrix of features:
Training_Set = rand(10,3); % 10 samples with 3 features each
N = size(Training_Set,1);
M = 2;
for i = 1:5
[Train, Test] = crossvalind('LeaveMOut', N, 2);
Training_Set(Train,:) % displays the data to train
end

Every possible difference among multiple vectors

I have 3 vectors, v1, v2, v3. What I want to get is the difference between every possible pair of them, that is, v1-v2, v1-v3, v2-v3. How can I do this without looping in matlab?
Thank you.
Just use nchoosek to generate the combinations first and then use them to index into your array of row-vectors:
Test case:
numVectors = 3;
dim = 5;
Vs = rand(numVectors, dim);
Actual computation:
combs = nchoosek(1:size(Vs,1), 2);
differences = Vs(combs(:,1),:) - Vs(combs(:,2),:);
The above creates 3 random row vectors of dimension 5. So in your case, you may want to replace the creation of the random matrix with Vs = [v1; v2; v3]; if your vectors are row vectors; or transpose the vectors using Vs = [v1, v2, v3].'; if your data are column vectors.
Using bsxfun:
clear
clc
%// Sample vectors.
v1 = [1 2];
v2 = [10 20];
v3 = [0 0];
Out = bsxfun(#minus,[v1 v2 v3], [v1 v2 v3].')
Out =
0 1 9 19 -1 -1
-1 0 8 18 -2 -2
-9 -8 0 10 -10 -10
-19 -18 -10 0 -20 -20
1 2 10 20 0 0
1 2 10 20 0 0
Reasoning: Each difference is computed starting from the 1st element of the 1st vector until the 2nd element of the last vector.
The 1st column contains all the differences for the 1st element of the 1st vector, i.e. (1 -1), (1-2), (1-10), (1 - 20), (1 - 0), (1 - 0).
Then 2nd column, same thing but this time with the 2: (2 - 1), (2 - 2), (2 - 10), and so on.
Sorry if my explanations are unclear haha I don't know the right terms in english. Please ask for more details.
Code
%// Concatenate all vectors to form a 2D array
V = cat(2,v1(:),v2(:),v3(:),v4(:),v5(:))
N = size(V,2) %// number of vectors
%// Find all IDs of all combinations as x,y
[y,x] = find(bsxfun(#gt,[1:N]',[1:N])) %//'
%// OR [y,x] = find(tril(true(size(V,2)),-1))
%// Use matrix indxeing to collect vector data for all combinations with those
%// x-y IDs from V. Then, perform subtractions across them for final output
diff_array = V(:,x) - V(:,y)
Few points about the code
bsxfun with find gets us the IDs for forming pairwise combinations.
We use those IDs to index into the 2D concatenated array and perform subtractions between them to get the final output.
Bonus Stuff
If you look closely into the part where it finds the IDs of all combinations, that is basically nchoosek(1:..,2).
So, basically one can have alternatives to nchoosek(1:N,2) as:
[Y,X] = find(bsxfun(#gt,[1:N]',[1:N]))
[y,x] = find(tril(true(N),-1))
with [X Y] forming those pairwise combinations and might be interesting to benchmark them!

Finding maxima in 2D matrix along certain dimension with indices

I have a <206x193> matrix A. It contains the values of a parameter at 206 different locations at 193 time steps. I am interested in the maximum value at each location over all times as well as the corresponding indices. I have another matrix B with the same dimensions of A and I'm interested in values for each location at the time that A's value at that location was maximal.
I've tried [max_val pos] = max(A,[],2), which gives the right maximum values, but A(pos) does not equal max_val.
How exactly does this function work?
I tried a smaller example as well. Still I don't understand the meaning of the indices....
>> H
H(:,:,1) =
1 2
3 4
H(:,:,2) =
5 6
7 8
>> [val pos] = max(H,[],2)
val(:,:,1) =
2
4
val(:,:,2) =
6
8
pos(:,:,1) =
2
2
pos(:,:,2) =
2
2
The indices in idx represent the index of the max value in the corresponding row. You can use sub2ind to create a linear index if you want to test if A(pos)=max_val
A=rand(206, 193);
[max_val, idx]=max(A, [], 2);
A_max=A(sub2ind(size(A), (1:size(A,1))', idx));
Similarly, you can access the values of B with:
B_Amax=B(sub2ind(size(A), (1:size(A,1))', idx));
From your example:
H(:,:,2) =
5 6
7 8
[val pos] = max(H,[],2)
val(:,:,2) =
6
8
pos(:,:,2) =
2
2
The reason why pos(:,:,2) is [2; 2] is because the maximum is at position 2 for both rows.
max is a primarily intended for use with vectors. In normal mode, even the multi-dimensional arrays are treated as a series of vectors along which the max function is applied.
So, to get the values in B at each location at the time where A is maximum, you should
// find the maximum values and positions in A
[c,i] = max(A, [], 2);
// iterate along the first dimension, to retrieve the corresponding values in B
C = [];
for k=1:size(A,1)
C(k) = B(k,i(k));
end
You can refer to #Jigg's answer for a more concise way of creating matrix C