fuzzy c means in matlab - cluster-analysis

I am clustering some data in matlab using the builtin fuzzy c means algorithm which returns C the cluster centers, U fuzzy partition matrix. So I know what the cluster centers are from C but how can I figure out which cluster center each data point belongs to? using the fuzzy partition matrix or some other way?

I know it is a very old question but someone else might find helpful if I give the answer.
The following example is from the Matlab help. There are 2 clusters in the example.
index1 is the indices of the data point that belong to the cluster 1, and index2 is similar. So, using this info what you need is easily obtained.
data = rand(100, 2);
[center,U,obj_fcn] = fcm(data, 2);
plot(data(:,1), data(:,2),'o');
maxU = max(U);
index1 = find(U(1,:) == maxU);
index2 = find(U(2, :) == maxU);
line(data(index1,1),data(index1, 2),'linestyle','none',...
'marker','*','color','g');
line(data(index2,1),data(index2, 2),'linestyle','none',...
'marker', '*','color','r');

Related

K-mean Clustering in IDL

I am an IDL beginner and I was wondering if I could get some help on clustering in IDL. I found a good example on Harris Geospatial that explains the method, however, I am confused on how to run the clustering on my own data (ASCII) to perform the K-mean analysis. How can I use my data instead of the 'random' function that generates random numbers
Below is the code I found on Harris:
n = 50
c1 = RANDOMN(seed, 3, n)
c1[0:1,*] -= 3
c2 = RANDOMN(seed, 3, n)
c2[0,*] += 3
c2[1,*] -= 3
c3 = RANDOMN(seed, 3, n)
c3[1:2,*] += 3
array = [[c1], [c2], [c3]]
; Compute cluster weights, using three clusters:
weights = CLUST_WTS(array, N_CLUSTERS = 3)
; Compute the classification of each sample:
result = CLUSTER(array, weights, N_CLUSTERS = 3)
Thank you.
you'll need to get your data into IDL. If it's a comma-separated (or other "delimiter") file, then you can just use READ_CSV. Or you could try using READ_ASCII but then you need to know the specific format. Either way, you just need to use one of the read routines.
https://www.harrisgeospatial.com/docs/READ_CSV.html

Using matlab FCM to cluster my own data

I am trying to use fcm (fuzzy C-means clustering) matlab tool, but I don't know how to put my own data. I am trying to cluster nodes based on distance from the center. So my data are x and y coordinates. I am basically trying to compare it with k-means this is how I did the k-means:
X=[x_users,y_users];
nc=20;
idx = kmeans(X,nc);
I need to know how to do the same thing with fcm, am sorry if my question is too naive.
Thanks,
fcm(X,nc);
will do it. For example:
data = rand(100,2);
nc = 2;
[center,U,obj_fcn] = fcm(data,nc);
plot(data(:,1),data(:,2),'o');
maxU = max(U);
index1 = find(U(1,:)== maxU);
index2 = find(U(2,:)== maxU);
line(data(index1,1),data(index1,2),'linestyle','none',...
'marker','*','color','g');
line(data(index2,1),data(index2,2),'linestyle','none',...
'marker', '*','color','r');
Using kmeans the answer would be like the following plot:
idx = kmeans(data,nc);
data1 = data(idx==1,:);
data2 = data(idx==2,:);
figure;plot(data1(:,1),data1(:,2),'x');
hold on;plot(data2(:,1),data2(:,2),'or');

Optimizing code, removing "for loop"

I'm trying to remove outliers from a tick data series, following Brownlees & Gallo 2006 (if you may be interested).
The code works fine but given that I'm working on really long vectors (the biggest has 20m observations and after 20h it was not done computing) I was wondering how to speed it up.
What I did until now is:
I changed the time and date format to numeric double and I saw that it saves quite some time in processing and A LOT OF MEMORY.
I allocated memory for the vectors:
[n] = size(price);
x = price;
score = nan(n,'double'); %using tic and toc I saw that nan requires less time than zeros
trimmed_mean = nan(n,'double');
sd = nan(n,'double');
out_mat = nan(n,'double');
Here is the loop I'd love to remove. I read that vectorizing would speed up a lot, especially using long vectors.
for i = k+1:n
trimmed_mean(i) = trimmean(x(i-k:i-1 & i+1:i+k),10,'round'); %trimmed mean computed on the 'k' closest observations to 'i' (i is excluded)
score(i) = x(i) - trimmed_mean(i);
sd(i) = std(x(i-k:i-1 & i+1:i+k)); %same as the mean
tmp = abs(score(i)) > (alpha .* sd(i) + gamma);
out_mat(i) = tmp*1;
end
Here is what I was trying to do
trimmed_mean=trimmean(regroup_matrix,10,'round',2);
score=bsxfun(#minus,x,trimmed_mean);
sd=std(regroup_matrix,2);
temp = abs(score) > (alpha .* sd + gamma);
out_mat = temp*1;
But given that I'm totally new to Matlab, I don't know how to properly construct the matrix of neighbouring observations. I just think it should be shaped like: regroup_matrix= nan (n,2*k).
EDIT: To be specific, what I am trying to do (and I am not able to) is:
Given a column vector "x" (n,1) for each observation "i" in "x" I want to take the "k" neighbouring observations to "i" (from i-k to i-1 and from i+1 to i+k) and put these observations as rows of a matrix (n, 2*k).
EDIT 2: I made a few changes to the code and I think I am getting closer to the solution. I posted another question specific to what I think is the problem now:
Matlab: Filling up matrix rows using moving intervals from a column vector without a for loop
What I am trying to do now is:
[n] = size(price,1);
x = price;
[j1]=find(x);
matrix_left=zeros(n, k,'double');
matrix_right=zeros(n, k,'double');
toc
matrix_left(j1(k+1:end),:)=x(j1-k:j1-1);
matrix_right(j1(1:end-k),:)=x(j1+1:j1+k);
matrix_group=[matrix_left matrix_right];
trimmed_mean=trimmean(matrix_group,10,'round',2);
score=bsxfun(#minus,x,trimmed_mean);
sd=std(matrix_group,2);
temp = abs(score) > (alpha .* sd + gamma);
outmat = temp*1;
I have problems with the matrix_left and matrix_right creation.
j1, that I am using for indexing is a column vector with the indices of price's observations. The output is simply
j1=[1:1:n]
price is a column vector of double with size(n,1)
For your reshape, you can do the following:
idxArray = bsxfun(#plus,(k:n)',[-k:-1,1:k]);
reshapedArray = x(idxArray);
Thanks to Jonas that showed me the way to go I came up with this:
idxArray_left=bsxfun(#plus,(k+1:n)',[-k:-1]); %matrix with index of left neighbours observations
idxArray_fill_left=bsxfun(#plus,(1:k)',[1:k]); %for observations from 1:k I take the right neighbouring observations, this way when computing mean and standard deviations there will be no problems.
matrix_left=[idxArray_fill_left; idxArray_left]; %Just join the two matrices and I have the complete matrix of left neighbours
idxArray_right=bsxfun(#plus,(1:n-k)',[1:k]); %same thing as left but opposite.
idxArray_fill_right=bsxfun(#plus,(n-k+1:n)',[-k:-1]);
matrix_right=[idxArray_right; idxArray_fill_right];
idx_matrix=[matrix_left matrix_right]; %complete index matrix, joining left and right indices
neigh_matrix=x(idx_matrix); %exactly as proposed by Jonas, I fill up a matrix of observations from 'x', following idx_matrix indexing
trimmed_mean=trimmean(neigh_matrix,10,'round',2);
score=bsxfun(#minus,x,trimmed_mean);
sd=std(neigh_matrix,2);
temp = abs(score) > (alpha .* sd + gamma);
outmat = temp*1;
Again, thanks a lot to Jonas. You really made my day!
Thanks also to everyone that had a look to the question and tried to help!

MatLab - algorithm for finding inverse of matrix

I am trying to write an algorithm in MatLab which takes as its input a lower triangular matrix. The output should be the inverse of this matrix (which also should be in lower triangular form). I have almost managed to solve this, but one part of my algorithm still leaves me scratching my head. So far I have:
function AI = inverse(A)
n = length(A);
I = eye(n);
AI = zeros(n);
for k = 1:n
AI(k,k) = (I(k,k) - A(k,1:(k-1))*AI(1:(k-1),k))/A(k,k);
for i = k+1:n
AI(i,k) = (I(i,k) - (??????????????))/A(i,i);
end
end
I have marked with question marks the part I am unsure of. I have tried to find a pattern for this part of the code by writing out the procedure on paper, but I just can't seem to find a proper way to solve this part.
If anyone can help me out, I would be very grateful!
Here is my code to get the inverse of a lower triangular matrix by using row transformation:
function AI = inverse(A)
len = length(A);
I = eye(len);
M = [A I];
for row = 1:len
M(row,:) = M(row,:)/M(row,row);
for idx = 1:row-1
M(row,:) = M(row,:) - M(idx,:)*M(row,idx);
end
end
AI = M(:,len+1:end);
end
You can see how it's done on Octave's source. This seems to be implemented in different places depending on the class of the matrix. For Float type Diagonal Matrix it's on liboctave/array/fDiagMatrix.cc, for Complex Diagonal matrix it's on liboctave/array/CDiagMatrix.cc, etc...
One of the advantages of free (as in freedom) software is that you are free to study how things are implemented ;)
Thanks for all the input! I was actually able to find a very nice and easy way to solve this problem today, given that the input is a lower triangular matrix:
function AI = inverse(A)
n = length(A);
I = eye(n);
AI = zeros(n);
for k = 1:n
for i = 1:n
AI(k,i) = (I(k,i) - A(k,1:(k-1))*AI(1:(k-1),i))/A(k,k);
end
end

Adjacency matrix from edge list (preferrably in Matlab)

I have a list of triads (vertex1, vertex2, weight) representing the edges of a weighted directed graph. Since prototype implementation is going on in Matlab, these are imported as a Nx3 matrix, where N is the number of edges. So the naive implementation of this is
id1 = L(:,1);
id2 = L(:,2);
weight = L(:,3);
m = max(max(id1, id2)) % to find the necessary size
V = zeros(m,m)
for i=1:m
V(id1(i),id2(i)) = weight(i)
end
The trouble with tribbles is that "id1" and "id2" are nonconsecutive; they're codes. This gives me three problems. (1) Huge matrices with way too many "phantom", spurious vertices, which distorts the results of algorithms to be used with that matrix and (2) I need to recover the codes in the results of said algorithms (suffice to say this would be trivial if id codes where consecutive 1:m).
Answers in Matlab are preferrable, but I think I can hack back from answers in other languages (as long as they're not pre-packaged solutions of the kind "R has a library that does this").
I'm new to StackOverflow, and I hope to be contributing meaningfully to the community soon. For the time being, thanks in advance!
Edit: This would be a solution, if we didn't have vertices at the origin of multiple vertices. (This implies a 1:1 match between the list of edge origins and the list of identities)
for i=1:n
for j=1:n
if id1(i) >0 & i2(j) > 0
V(i,j) = weight(i);
end
end
end
You can use the function sparse:
sparse(id1,id2,weight,m,m)
If your problem is that the node ID numbers are nonconsecutive, why not re-map them onto consecutive integers? All you need to do is create a dictionary of all unique node ID's and their correspondence to new IDs.
This is really no different to the case where you're asked to work with named nodes (Australia, Britain, Canada, Denmark...) - you would map these onto consecutive integers first.
You can use GRP2IDX function to convert your id codes to consecutive numbers, and ids can be either numerical or not, does not matter. Just keep the mapping information.
[idx1, gname1, gmap1] = grp2idx(id1);
[idx2, gname2, gmap2] = grp2idx(id2);
You can recover the original ids with gmap1(idx1).
If your id1 and id2 are from the same set you can apply grp2idx to their union:
[idx, gname,gmap] = grp2idx([id1; id2]);
idx1 = idx(1:numel(id1));
idx2 = idx(numel(id1)+1:end);
For the reordering see a recent question - how to assign a set of coordinates in Matlab?
You can use ACCUMARRAY or SUB2IND to solve this problem.
V = accumarray([idx1 idx2], weight);
or
V = zeros(max(idx1),max(idx2)); %# or V = zeros(max(idx));
V(sub2ind(size(V),idx1,idx2)) = weight;
Confirm if you have non-unique combinations of id1 and id2. You will have to take care of that.
Here is another solution:
First put together all your vertex ids since there might a sink vertex in your graph:
v_id_from = edge_list(:,1);
v_id_to = edge_list(:,2);
v_id_all = [v_id_from; v_id_to];
Then find the unique vertex ids:
v_id_unique = unique(v_id_all);
Now you can use the ismember function to get the mapping between your vertex ids and their consecutive index mappings:
[~,from] = ismember(v_id_from, v_id_unique);
[~,to] = ismember(v_id_to, v_id_unique);
Now you can use sub2ind to populate your adjacency matrix:
adjacency_matrix = zeros(length(from), length(to));
linear_ind = sub2ind(size(adjacency_matrix), from, to);
adjacency_matrix(linear_ind) = edge_list(:,3);
You can always go back from the mapped consecutive id to the original vertex id:
original_vertex_id = v_id_unique(mapped_consecutive_id);
Hope this helps.
Your first solution is close to what you want. However it is probably best to iterate over your edge list instead of the adjacency matrix.
edge_indexes = edge_list(:, 1:2);
n_edges = max(edge_indexes(:));
adj_matrix = zeros(n_edges);
for local_edge = edge_list' %transpose in order to iterate by edge
adj_matrix(local_edge(1), local_edge(2)) = local_edge(3);
end