How to find clusters in binary 3D image? - matlab

I have a binary 3D image, i.e. it contains only 0 and 1. Now I want to find all clusters of 1s (i.e. clusters of voxels containing only value 1). Finally for each cluster I should know the coordinates of the voxels belonging to that cluster.
How can this be done? Of course I can iterate over all voxels but the difficulty is to detect the clusters and extract all voxels inside the cluster.
I want to do this in Matlab.

Use regionprops with the 'PixelIdxList' attribute. This uses 8-way connected regions by default. It also obtains the linear index which is useful for computation.
Example:
A = false(4,4,3);
A(1,1,1) = true;
A(3,3,3) = true;
rp = regionprops(A,'PixelIdxList');
EDU>> A(rp(1).PixelIdxList)
ans =
1
EDU>> A(rp(2).PixelIdxList)
ans =
1
You can also use 'PixelList' to get the 3D coords:
EDU>> rp = regionprops(A,'PixelList');
EDU>> rp
rp =
2x1 struct array with fields:
PixelList
EDU>> rp(1)
ans =
PixelList: [1 1 1]
EDU>> rp(2)
ans =
PixelList: [3 3 3]

This is called connected components analysis.
A simple approach is by seed filling: scan the whole domain systematically; when you meet a '1', visit all '1' neighbors recursively and set them to '0' (to avoid visiting them more times). The top-level visit enumerates all voxels in a cluster. When a cluster has been cleared, continue the search for other '1's.
Beware that this will be stack-intensive, and it can be better to implement an explicit stack for this purpose.

It depends on rules you wish to employ. And on how is your 3D represented.
Is it a point cloud, or a 2D bitmap using colours to represent depth, or a
3D array, or what...
You can try clustering them according to planes or in little 3D clouds inside a 3D Space.
If first, slice the 3D space into planes and use 2D clustering algorithm on them.
You will then have clusters for each depth plane, if one exists.
If second, modify the 2D search clustering algorithm to use cubes of space instead of squares of a plane as a frame.
You can even use 2D algorithm on sliced planes, then check surrounding planes to see whether a cluster goes further in 3D space.
But this would be inefficient. I am not a matlab expert, so I cant help you much with implementation,
but maybe there is some toolbox already for doing exactly what you want.
And, of course, how will you do it much depends on how is your image represented in memory.
Maybe you will have to change formats in order to easily and efficiently extract clusters.
Give Google something to do.
Edit:
Just got an idea.
Use proper format and just sort the data.
You should get a list of all adjacent points.
Incorporate the info about coordinates in the input data. Sorting is often faster than connecting.

Related

Detection of groups

After having used bwconncomp and regionprops to detect the number and place of connected domains within my picture. I now want to figure out how much space (size of a convex domain) the objects fill out.
If all of the objects are evenly distributed or are in one big group, this is not a problem as I can just use convhull to calculate the area.
The problem is now if I have several groups then I want several convex areas - one for each group, the number of groups are on beforehand unknown.
See for instance:
(source: jasonyianakis.co.nz)
Note that it is just the grouping I am interested in, the detection as a single element is already functioning.
You can use 'ConvexArea' property in regionprops
For example
img = imread('http://jasonyianakis.co.nz/wp-ontent/uploads/2012/08/different-people-groups.jpg');
img = im2double(img); %// convert to double
bw = max(bsxfun(#rdivide,img,sum(img,3)),[], 3 )>.4; %// get a binary mask
The resulting binary mask:
Label each component, and get the 'ConvexArea':
lb = bwlabel(bw);
st = regionprops(lb,'ConvexArea');
cxArea = [st.ConvexArea];
Discard too small regions
cxArea( cxArea < 100 ) = [];
Now you have the convex area of the components:
cxArea =
474813 2054497 451879
Try to use the kmeans function of Matlab. This funtion is used to divide data into some clusters. Also, Gaussian mixture models are frequently used for your purpose (they give the best results).There are lot of information in web about them and several functions implemented in Matlab.
PS: Determine the number of clusters is a very difficult task. clusterdata and similar functions can do it for you but it does not mean that it will be the best result. It is usual to try several different number of groups and see what fits best , the Gaussians can help, with its weights . If one has a very low weight it is more likely than it is not significant and can be deleted.

Coordinate normalization for NN input in matlab

I am trying to implement a classification NN in Matlab.
My inputs are clusters of coordinates from an image. (Corresponding to delaunay triangulation vertexes)
There are 3 clusters (results of the optics algorithm) in this format:
( Not all clusters are of the same size.). Elements represent coordinates in euclidean 2d space . So (110,12) is a point in my image and the matrix depicted represents one cluster of points.
Clustering was done on image edges. So coordinates refer to logical values (always 1s in this case) on the image matrix.(After edge detection there are 3 "dense" areas in an image, and these collections of pixels are used for classification). There are 6 target classes.
So, my question is how can I format them into single column vector inputs to use in a neural network?
(There is a relevant answer here but I would like some elaboration if possible. ( I am probably too tired right now from 12 hours of trying stuff and dont get it 100% :D :( )
Remember, there are 3 different coordinate matrices for each picture, so my initial thought was, create an nn with 3 inputs (of different length). But how to serialize this?
Here's a cluster with its tags on in case it helps:
For you to train the classifier, you need a matrix X where each row will correspond to an image. If you want to use a coordinate representation, this means all images will have to be of the same size, say, M by N. So, the row of an image will have M times N elements (features) and the corresponding feature values will be the cluster assignments. Class vector y will be whatever labels you have, that is one of the six different classes you mentioned through the comments above. You should keep in mind that if you use a coordinate representation, X can get very high-dimensional, and unless you have a large number of images, chances are your classifier will perform very poorly. If you have few images, consider using fractions of pixels belonging to clusters that I suggested in one of the comments: this can give you a shorter feature description that is invariant to rotation and translation, and may yield better classification.

K-means Clustering, major understanding issue

Suppose that we have a 64dim matrix to cluster, let's say that the matrix dataset is dt=64x150.
Using from vl_feat's library its kmeans function, I will cluster my dataset to 20 centrers:
[centers, assignments] = vl_kmeans(dt, 20);
centers is a 64x20 matrix.
assignments is a 1x150 matrix with values inside it.
According to manual: The vector assignments contains the (hard) assignments of the input data to the clusters.
I still can not understand what those numbers in the matrix assignments mean. I dont get it at all. Anyone mind helping me a bit here? An example or something would be great. What do these values represent anyway?
In k-means the problem you are trying to solve is the problem of clustering your 150 points into 20 clusters. Each point is a 64-dimension point and thus represented by a vector of size 64. So in your case dt is the set of points, each column is a 64-dim vector.
After running the algorithm you get centers and assignments. centers are the 20 positions of the cluster's center in a 64-dim space, in case you want to visualize it, measure distances between points and clusters, etc. 'assignments' on the other hand contains the actual assignments of each 64-dim point in dt. So if assignments[7] is 15 it indicates that the 7th vector in dt belongs to the 15th cluster.
For example here you can see clustering of lots of 2d points, let's say 1000 into 3 clusters. In this case dt would be 2x1000, centers would be 2x3 and assignments would be 1x1000 and will hold numbers ranging from 1 to 3 (or 0 to 2, in case you're using openCV)
EDIT:
The code to produce this image is located here: http://pypr.sourceforge.net/kmeans.html#k-means-example along with a tutorial on kmeans for pyPR.
In openCV it is the number of the cluster that each of the input points belong to

Principal component analysis with Voxels using Matlab

I have Vectors which have a certain position (Voxels) in an image. I would like to perform a pca to cluster out all Voxels which are correlating with each other.
I have for example three Voxels in 1D:
syn_data_1 = [1;0;0;1;1];
syn_data_2 = [1;0;0;1;1];
syn_data_3 = [0;0;1;0;1];
syn_data = [syn_data_1, syn_data_2, syn_data_3]
%syn_data(:,1) is the Voxel in position 1 in 1D etc
Now Position 1 and 2 are strongly correlating, while 3 doesn't. It is possible to use corr to see which Voxels are correlating, but it is impossible to do it for all Voxels on a big data set in 3D.
Is there a way to perform a pca on such data, so that I can see a clustering of Voxels which are similar?
PS: Please don't be confused by the word Voxel, since in the end of the day I am just talking about Pixels which are having several properties represented by a vector.
I am of course happy to provide further information if this can help to understand my question.
Well, princomp is the MATLAB PCA function. Using it is a little tricky. I've answered a similar question here:
computing PCA matrix for set of sift descriptors
Does that help?

How to generate this shape in Matlab?

In matlab, how to generate two clusters of random points like the following graph. Can you show me the scripts/code?
If you want to generate such data points, you will need to have their probability distribution to be able to generate the points.
For your point, I do not have the real distributions, so I can only give an approximation. From your figure I see that both lay approximately on a circle, with a random radius and a limited span for the angle. I assume those angles and radii are uniformly distributed over certain ranges, which seems like a pretty good starting point.
Therefore it also makes sense to generate the random data in polar coordinates (i.e. angle and radius) instead of the cartesian ones (i.e. horizontal and vertical), and transform them to allow plotting.
C1 = [0 0]; % center of the circle
C2 = [-5 7.5];
R1 = [8 10]; % range of radii
R2 = [8 10];
A1 = [1 3]*pi/2; % [rad] range of allowed angles
A2 = [-1 1]*pi/2;
nPoints = 500;
urand = #(nPoints,limits)(limits(1) + rand(nPoints,1)*diff(limits));
randomCircle = #(n,r,a)(pol2cart(urand(n,a),urand(n,r)));
[P1x,P1y] = randomCircle(nPoints,R1,A1);
P1x = P1x + C1(1);
P1y = P1y + C1(2);
[P2x,P2y] = randomCircle(nPoints,R2,A2);
P2x = P2x + C2(1);
P2y = P2y + C2(2);
figure
plot(P1x,P1y,'or'); hold on;
plot(P2x,P2y,'sb'); hold on;
axis square
This yields:
This method works relatively well when you deal with distributions that you can transform easily and when you can easily describe the possible locations of the points. If you cannot, there are other methods such as the inverse transforming sampling method which offer algorithms to generate the data instead of manual variable transformations as I did here.
K-means is not going to give you what you want.
For K-means, vectors are classified based on their nearest cluster center. I can only think of two ways you could get the non-convex assignment shown in the picture:
Your input data is actually higher-dimensional, and your sample image is just a 2-d projection.
You're using a distance metric with different scaling across the dimensions.
To achieve your aim:
Use a non-linear clustering algorithm.
Apply a non-linear transform to your input data. (Probably not feasible).
You can find a list on non-linear clustering algorithms here. Specifically, look at this reference on the MST clustering page. Your exact shape appears on the fourth page of the PDF together with a comparison of what happens with K-Means.
For existing MATLAB code, you could try this Kernel K-Means implementation. Also, check out the Clustering Toolbox.
Assuming that you really want to do the clustering operation on existing data, as opposed to generating the data itself. Since you have a plot of some data, it seems logical that you already know how to do that! If I am wrong in this assumption, then you should word your questions more carefully in the future.
The human brain is quite good at seeing patterns in things like this, that writing a code for on a computer will often take some serious effort.
As has been said already, traditional clustering tools such as k-means will fail. Luckily, the image processing toolbox has good tools for these purposes already written. I might suggest converting the plot into an image, using filled in dots to plot the points. Make sure the dots are large enough that they touch each other within a cluster, with some overlap. Then use dilation/erosion tools if necessary to make sure that any small cracks are filled in, but don't go so far as to cause the clusters to merge. Finally, use region segmentation tools to pick out the clusters. Once done, transform back from pixel units in the image into your spatial units, and you have accomplished your task.
For the image processing approach to work, you will need sufficient separation between the clusters compared to the coarseness within a cluster. But that seems obvious for any method to succeed.