K-means Coding Implementation - matlab

I am looking for an implementation of k-means that will out where each row of data belongs too.
I have found other links like Matlab:K-means clustering
But they do not help.
So I am looking for something like this. If my data is as follows
1, 2, 4, 5, 6, 7, 8, 9
1, 4, 7, 8, 9, 4, 5, 6
I would like to know that Row 1 Belongs to Cluster A and Row 2 Belongs to Cluster B and so on.
Does anyone know if Matlab can show me that, if so how? If not does anyone have a link to some code that would be able to do that?

Yes, the kmeans command from Statistics Toolbox will do this. Here's an example using the Fisher Iris dataset that is supplied with the toolbox. meas is a 100x4 dataset of four anatomical variables (petal length, petal width, sepal length, sepal width) measured on 150 irises. The output variable, which I've here called clusterIndex, tells you which cluster each row of the dataset falls into, and can be used, for example, as a variable to color points in a plot.
>> load fisheriris
>> k = 3;
>> clusterIndex = kmeans(meas,3);
>> scatter(meas(:,1),meas(:,2),[],clusterIndex,'filled')

Related

Coloring the cluster with same colors as defined for ground truth for visualization

Example: (Consider the platform = MATLAB)
Ground_Truth_Indices = [ 1, 1, 1, 2, 2, 2, 3, 3, 3];
For each unique index in the GT, I have defined a color array.
Color_Array = [ 0, 255, 0; 255, 0, 0; 0, 0, 255]; %assuming (in this eg.) the max. cluster size is 3
Next, I use a clustering algorithm (DBSCAN in my case) and it gives the following indices:
Clustered_Indices = [2, 2, 2, 3, 3, 3, 1, 1, 1];
Now, I need to visualize the results alongside the ground truth.
But the obtained indices, after clustering, are different from the ground truth indices.
Thus, according to the color array defined, I would not get the same pattern of colors for ground truth and obtained clusters during visualization. Is there any solution so that I could make both the colorings consistent?
Figure with ground truth and obtained clusters
The same is illustrated in the above link to the figure (not a MatLab plot! Created for the purpose of illustration), where the Cluster 1 should have the same color in the ground truth as well as the obtained cluster results. But, it is not the case here because of the index number associated with colour array defined.
Note: The indices obtained after the clustering cant be predefined and depends on the clustering algorithm and clustering input.
You can use the Kuhn-Munkres maximum matching (Hungarian Algorithm) to find the best 1:1 alignment of the cluster labels.
As the generated clustering may have a different number of clusters, you'll need a robust implementation that can find alignments in non-square matrixes.
Butyou may be more interested in visualizing the differences between the clusterings. I've seen this in the following paper, but I am not sure if this is usable beyond toy data sets:
Evaluation of Clusterings -- Metrics and Visual Support
Elke Achtert, Sascha Goldhofer, +2 authors Arthur Zimek
Published in IEEE 28th International… 2012
DOI:10.1109/ICDE.2012.128
(Sorry for the incomplete reference, blame semantic scholar, but that was easiest to link a figure from the paper, I can't take a better screenshot on this device).
This seems to visualize the differences between a k-means and an EM clustering, where grey points are those where they agree on the clustering. This approach seems to work on pairs of points, just as the evaluation measures.
Inspired from the answer to this post:
How can I match up cluster labels to my 'ground truth' labels in Matlab, I have the following solution code for my question:
N = length(Ground_Truth_Indices);
cluster_names = unique(Clustered_Indices);
accuracy = 0;
maxInd = 1;
perm = perms(unique(Ground_Truth_Indices));
[perm_nrows perm_ncols] = size(perm);
true_labels = Ground_Truth_Indices;
for i=1:perm_nrows
flipped_labels = zeros(1,N);
for cl = 1 : perm_ncol
flipped_labels(Clustered_Indices==cluster_names(cl)) = perm(i,cl);
end
testAcc = sum(flipped_labels == Ground_Truth_Indices')/N;
if testAcc > accuracy
accuracy = testAcc;
maxInd = i;
true_labels = flipped_labels;
end
end
where 'true_labels' contains the re-arranged labels for the variable 'Clustered_Indices' in accordance with the variable 'Ground_Truth_Indices'.
This code as explained in the original post uses permutation-based
matching (It works well for the example which I had given in this post. I also tested with other variations). But, when the size of the cluster becomes large this code
does not work well. What do think about this code? Is there a better
way to write it? Or optimize it?

How to get a 5-Dimensional output after torch.nn.Conv2d layer in PyTorch?

I am working on a project based on the OpenPose research paper that I read two weeks ago. In that, the model is supposed to give a 5-dimensional output. For example, torch.nn.conv2d() gives a 4-D output of the following shape: (Batch_size, n_channels, input_width, input_height). What I need is an output of the following shape: (Batch_size, n_channels, input_width, input_height, 2). Here 2 is a fixed number not subject to any changes.
The 2 is there because each entry is a 2-dimensional vector hence for each channel in every pixel position, there are 2 values hence, the added dimension.
What will be the best way to do this?
I thought about having 2 seperate branches for each of the vector values but the network is very deep and I would like to be as Computationally efficient as possible.
So you are effectively looking to compute feature maps which are interpreted as 2-dimensional vectors. Unless there is something fancy math-wise happening there, you are probably fine with just having twice as many output channels: (batch_size, n_channels * 2, width, height), and then reshaping it as
output5d = output4d.reshape(
output4d.shape[0],
output4d.shape[1] / 2,
2,
output4d.shape[2],
output4d.shape[3]
)
which gives you a shape of (batch_size, n_channels, 2, width, height). If you really want to have 2 as the last dimension, you can use transpose:
output5d = output5d.transpose(2, 4)
but if there is no strong argument in favor of this layout, I would suggest you do not transpose as it always costs a bit of performance.

dual function not working when using several measures in a plot

I am making a barplot with two measures both using the dual() function like so:
IF(AMOUNT>4,AMOUNT,dual(num('5','<#'),4))
I want the number formatting to be by measure expression so that all numbers below 5 are displayed as "<5" instead of their actual value.
For example, if:
AMOUNT = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
AMOUNT displayed in plot = [10, 9, 8, 7, 6, 5, <5, <5, <5, <5]
When I use only one of the measures the plot works fine and everything is displayed correctly.
However, if I use both measures, the chart displays "4" (actual value) instead of "<5" (the measure expression).
Does anyone know how to make the plot display the measure-expression when using several measures?
I am not sure how exactly you want chart suppose to look like (I understand that "AMOUNT" is a dimension but what is a measure - bar height?). Maybe you could share example chart (can be even in paint) and screenshot of data-model. Generally you should modify only dimension and set it just as:
=If(AMOUNT<5,'<5',AMOUNT)
Expression leave as it is so for example:
Count(Field)
Sum(Field)

Using a neural network to provide recommendations

So I started playing with FANN (http://leenissen.dk/) in order to create a simple recommendation engine.
For example,
User X has relations to records with ids [1, 2, 3]
Other users have relations to following ids:
User A: [1, 2, 3, 4]
User B: [1, 2, 3, 4]
It would be natural, then, that there's some chance user X would be interested in record with id 4 as well and that it should be the desired output of the recommendation engine.
It feels like this would be something a neural network could accomplish. However, from trying out FANN and googling around, it seems there seems to need to be some mathematical relation with the data and results. Here with ids there's none; the ids could just as well be any symbols.
Question: Is it possible to solve this kind of problem with a neural network and where should I begin to search for a solution?
What you are looking for is some kind of recurrent neural network; a network that stores 'context' in some way or another. Examples of such networks would be LSTM and GRU. So basically, you have to input your data sequentially. Based on the context and the current input, the network will predict which label is most likely.
it seems there seems to need to be some mathematical relation with the data and results. Here with ids there's none; the ids could just as well be any symbols.
There is a definitely relation between the data and the results, and this can be expressed through weights and biases.
So how would it work? First you one-hot encoding your inputs and outputs. So basically, you want to predict which label is most likely after a set of labels that a user has already interacted with.
If you have 5 labels: A, B, C, D, E that means you will have 5 inputsand outputs: [0, 0, 0, 0, 0].
If your label is A, the array will be [1, 0, 0, 0, 0], if it's D, it will be [0, 0, 0, 1, 0].
So the key to LSTM's and GRU's that the data should be sequential. So basically, you input all the labels watched one by one. So if a user has watched A, B and C:
activate: [1,0,0,0,0]
activate: [0,1,0,0,0]
// the output of this activation will be the next predicted label
activate: [0,0,1,0,0]
// output: [0.1, 0.3, 0.2, 0.7, 0.5], so the next label is D
And you should always train the network so that the output of INt is INt+1

Calculating the Local Ternary Pattern of an depth image

I found the detail and implementation of Local Ternary Pattern (LTP) on Calculating the Local Ternary Pattern of an image?. I want to ask more details that what the best way to choose the threshold t and also I have confusion in understand the role of reorder_vector = [8 7 4 1 2 3 6 9];
Unfortunately there isn't a good way to figure out what the threshold is using LTPs. It's mostly trial and error or by experimentation. However, I could suggest to make the threshold adaptive. You can use Otsu's algorithm to dynamically determine the best threshold of your image. This is assuming that the distribution of your intensities in the image is bimodal. In other words, there is a clear separation between objects and background. MATLAB has an implementation of this by the graythresh function. However, this generates a threshold between 0 and 1, so you will need to multiply the result by 255, assuming that the type of your image is uint8.
Therefore, do:
t = 255*graythresh(im);
im is the image that you desire to compute the LTPs. Now, I can certainly provide insight on what the reorder_vector is doing. Look at the following figure on how to calculate LTPs:
(source: hindawi.com)
When we generate the ternary code matrix (matrix in the middle), we need to generate an 8 element sequence that doesn't include the middle of the neighbourhood. We start from the east most element (row 2, column 3), then traverse the elements in counter-clockwise order. The reorder_vector variable allows you to select those specific elements that respect that order. If you recall, MATLAB can access matrices using column-major linear indices. Specifically, given a 3 x 3 matrix, we can access an element using a number from 1 to 9 and the memory is laid out like so:
1 4 7
2 5 8
3 6 9
Therefore, the first element of reorder_vector is index 8, which is the east most element. Next is index 7, which is the top right element, then index 4 which is the north facing element, then 1, 2, 3, 6 and finally 9.
If you follow these numbers, you will determine how I got the reorder_vector:
reorder_vector = [8 7 4 1 2 3 6 9];
By using this variable for accessing each 3 x 3 local neighbourhood, we would thus generate the correct 8 element sequence that respects the ordering of the ternary code so that we can proceed with the next stage of the algorithm.