How can I cluster 3-dimensional data? - cluster-analysis

For a project in my data mining class, I am to perform fuzzy-c means clustering on a data set, where each data point has 3 axes (I googled and that's apparently the correct way to pluralize 'axis'). I'm not exactly sure how I would do so, especially given the clustering algorithm I'm using. Here's an example of the data set I'm using;
-
x
y
z
apple
2
5
5
banana
3
2
5
carrot
1
4
4
durian
6
7
1
eggplant
0
3
6
Any help or resources would be greatly appreciated!

Related

How to draw a Histogram in Matlab

I have a set of around 35000 data. These data are the signal strengths received only from a single location for different time interval of time. I want to plot a Histogram using these data. My X-axis will give the information about "Signal Strengths" and my Y-axis will give the information about "Probability". My histogram will consists of different bars which will give information about the signal strength and probabilities.
For example, suppose I have the following data
a= [ 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3 4 4 4 5 6 6 6 6 6 6 6 6 6 6 6]
How can I plot the graph using data at X-axis and Probability at Y-axis? Any help will be appreciated. Thanks!
This should work just fine if you don't want to use some predefined functions:
una=unique(a);
normhist=hist(a,size(unique(a),2))/sum(hist(a));
figure, stairs(una,normhist)
Una has only the unique values of a, normhist is now between 0 and 1 and it's the probability of occurring of the individual signal because you divide it by the number of elements included in the data.

Accessing indexes as first columns of matrix in Matlab

I have data that is output from a computational chemistry program (Gaussian09) which contains sets of Force Constant data. The data is arranged with indexes as the first 2-4 columns (quadratic, cubic and quartic FC's are calculated). As an example the cubic FC's look something like this, and MatLab has read them in successfully so I have the correct matrix:
cube=[
1 1 1 5 5 5
1 1 2 6 6 6
.
.
4 1 1 8 8 8
4 2 1 9 9 9
4 3 1 7 7 7 ]
I need a way to access the last 3 columns when feeding in the indices of the first 3 columns. Something along the lines of
>>index=find([cube(:,1)==4 && cube(:,2)==3 && cube(:,3)==1]);
Which would give me the row number of the data that is index [ 4 3 1 ] and allow me to read out the values [7 7 7] which I need within loops to calculate anharmonic frequencies.
Is there a way to do this without a bunch of loops?
Thanks in advance,
Ben
You have already found one way to solve this, by using & in your expression (allowing you to make non-scalar comparisons).
Another way is to use ismember:
index = find(ismember(cube(:,1:3),[4 3 1]));
Note that in many cases, you may not even need the call to find: the binary vector returned by the comparisons or ismember can directly be used to index into another array.

Strange behaviour of MATLAB combnk function

I am trying to generate all combination of 2 elements in a given range of numbers. I am using 'combnk' function as follows.
combnk(1:4,2)
ans =
3 4
2 4
2 3
1 4
1 3
1 2
combnk(1:6,2)
ans =
1 2
1 3
1 4
1 5
1 6
2 3
2 4
2 5
2 6
3 4
3 5
3 6
4 5
4 6
5 6
The order of combinations returned appears to change. I need to know the order in advance for my program to work properly.
Is there any solution to make sure I get the combinations in a consistent order?
Also, why is MATLAB showing this strange behavior?
The only solution I can think of so far is to first check the 1st entry of the result matrix and flip it up side down using 'flipud' function.
Update: By a little bit of experimenting I noticed the reverse order occurs only when the length of the set of numbers is less than 6. This is why combnk(1:6,2) produce the 'correct' order. Where as combnk(1:5,2) produce the results backwards. This is still big problem.
You could try nchoosek instead of combnk. I don't have the matlab statistics toolbox (only octave), so I don't know if nchoosek has any significant disadvanvatages.
This will solve the ordering issue:
a=combnk(1:4,2);
[~,idx]=sortrows(a);
aNew=a(idx,:);
I don't know why MATLAB is showing this behavior.

data rearrangement in matlab

I have the following code
[X1,Y1]=meshgrid(1:5,1:5);
z=X1.^2+Y1.^2
[X2,Y2]=meshgrid([1 2 3 3.5 4 5],[1 2 3 3.5 4 5]);
z2=interp2(X1,Y1,z,X2,Y2)
mesh(X2,Y2,z2)
Is there any way to stucture data such as command mesh(z2) would produce the same result ?
I am not exactly sure what you are asking, but you could certainly cut out a few lines in your code to do the same thing:
[X2,Y2]=meshgrid([1 2 3 3.5 4 5],[1 2 3 3.5 4 5]);
z2=X2.^2+Y2.^2;
mesh(X2,Y2,z2);
Or, you could just do:
[X1,Y1]=meshgrid(1:5,1:5);
z=X1.^2+Y1.^2;
mesh(z);
For this particular example, I don't think you will see much of a difference between the two methods.

Permutation vectors from the CLUSTERGRAM object (MATLAB)

I'm using the CLUSTERGRAM object from the Bioinformatics Toolbox (ver 3.7).
MATLAB version R2011a.
I'd like to get permutation vectors for row and columns for clustergram, as I can do with dendrogram function:
x = magic(10);
>> [~,~,permrows] = dendrogram(linkage(x,'average','euc'))
permrows =
9 10 6 7 8 1 2 4 5 3
>> [~,~,permcols] = dendrogram(linkage(x','average','euc'))
permcols =
6 7 8 9 2 1 3 4 5 10
I found that the clustering is not the same from clustergram and dendrogram, most probably due to optimal leaf ordering calculation (I don't want to disable it).
For example, for clustergram from:
clustergram(x)
('average' and 'eucledian' are default methods for clustergram)
the vectors (as on the figure attached) should be:
permrows = [1 2 4 5 3 10 9 6 7 8];
permcols = [1 2 8 9 6 7 10 5 4 3];
So, how to get those vectors programmatically? Anybody well familiar with this object?
Do anyone can suggest a good alternative? I know I can create a similar figure combining imagesc and dendrogram functions, but leaf ordering is much better (optimal) in clustergram, than in dendrogram.
From looking at the documentation, I guess that get(gco,'ColumnLabels') and get(gco,'RowLabels'), where gco is the clustergram object, should give you the reordered labels. Note that the corresponding set-methods take in the labels in original order and internally reorders them.
Consequently, if you have used custom labels (set(gco,'RowLabels',originalLabels))
[~,permrows] = ismember(get(gco,'RowLabels'),originalLabels)
should return the row permutation.