I have Vectors which have a certain position (Voxels) in an image. I would like to perform a pca to cluster out all Voxels which are correlating with each other.
I have for example three Voxels in 1D:
syn_data_1 = [1;0;0;1;1];
syn_data_2 = [1;0;0;1;1];
syn_data_3 = [0;0;1;0;1];
syn_data = [syn_data_1, syn_data_2, syn_data_3]
%syn_data(:,1) is the Voxel in position 1 in 1D etc
Now Position 1 and 2 are strongly correlating, while 3 doesn't. It is possible to use corr to see which Voxels are correlating, but it is impossible to do it for all Voxels on a big data set in 3D.
Is there a way to perform a pca on such data, so that I can see a clustering of Voxels which are similar?
PS: Please don't be confused by the word Voxel, since in the end of the day I am just talking about Pixels which are having several properties represented by a vector.
I am of course happy to provide further information if this can help to understand my question.
Well, princomp is the MATLAB PCA function. Using it is a little tricky. I've answered a similar question here:
computing PCA matrix for set of sift descriptors
Does that help?
Related
I have a binary 3D image, i.e. it contains only 0 and 1. Now I want to find all clusters of 1s (i.e. clusters of voxels containing only value 1). Finally for each cluster I should know the coordinates of the voxels belonging to that cluster.
How can this be done? Of course I can iterate over all voxels but the difficulty is to detect the clusters and extract all voxels inside the cluster.
I want to do this in Matlab.
Use regionprops with the 'PixelIdxList' attribute. This uses 8-way connected regions by default. It also obtains the linear index which is useful for computation.
Example:
A = false(4,4,3);
A(1,1,1) = true;
A(3,3,3) = true;
rp = regionprops(A,'PixelIdxList');
EDU>> A(rp(1).PixelIdxList)
ans =
1
EDU>> A(rp(2).PixelIdxList)
ans =
1
You can also use 'PixelList' to get the 3D coords:
EDU>> rp = regionprops(A,'PixelList');
EDU>> rp
rp =
2x1 struct array with fields:
PixelList
EDU>> rp(1)
ans =
PixelList: [1 1 1]
EDU>> rp(2)
ans =
PixelList: [3 3 3]
This is called connected components analysis.
A simple approach is by seed filling: scan the whole domain systematically; when you meet a '1', visit all '1' neighbors recursively and set them to '0' (to avoid visiting them more times). The top-level visit enumerates all voxels in a cluster. When a cluster has been cleared, continue the search for other '1's.
Beware that this will be stack-intensive, and it can be better to implement an explicit stack for this purpose.
It depends on rules you wish to employ. And on how is your 3D represented.
Is it a point cloud, or a 2D bitmap using colours to represent depth, or a
3D array, or what...
You can try clustering them according to planes or in little 3D clouds inside a 3D Space.
If first, slice the 3D space into planes and use 2D clustering algorithm on them.
You will then have clusters for each depth plane, if one exists.
If second, modify the 2D search clustering algorithm to use cubes of space instead of squares of a plane as a frame.
You can even use 2D algorithm on sliced planes, then check surrounding planes to see whether a cluster goes further in 3D space.
But this would be inefficient. I am not a matlab expert, so I cant help you much with implementation,
but maybe there is some toolbox already for doing exactly what you want.
And, of course, how will you do it much depends on how is your image represented in memory.
Maybe you will have to change formats in order to easily and efficiently extract clusters.
Give Google something to do.
Edit:
Just got an idea.
Use proper format and just sort the data.
You should get a list of all adjacent points.
Incorporate the info about coordinates in the input data. Sorting is often faster than connecting.
I am trying to implement a classification NN in Matlab.
My inputs are clusters of coordinates from an image. (Corresponding to delaunay triangulation vertexes)
There are 3 clusters (results of the optics algorithm) in this format:
( Not all clusters are of the same size.). Elements represent coordinates in euclidean 2d space . So (110,12) is a point in my image and the matrix depicted represents one cluster of points.
Clustering was done on image edges. So coordinates refer to logical values (always 1s in this case) on the image matrix.(After edge detection there are 3 "dense" areas in an image, and these collections of pixels are used for classification). There are 6 target classes.
So, my question is how can I format them into single column vector inputs to use in a neural network?
(There is a relevant answer here but I would like some elaboration if possible. ( I am probably too tired right now from 12 hours of trying stuff and dont get it 100% :D :( )
Remember, there are 3 different coordinate matrices for each picture, so my initial thought was, create an nn with 3 inputs (of different length). But how to serialize this?
Here's a cluster with its tags on in case it helps:
For you to train the classifier, you need a matrix X where each row will correspond to an image. If you want to use a coordinate representation, this means all images will have to be of the same size, say, M by N. So, the row of an image will have M times N elements (features) and the corresponding feature values will be the cluster assignments. Class vector y will be whatever labels you have, that is one of the six different classes you mentioned through the comments above. You should keep in mind that if you use a coordinate representation, X can get very high-dimensional, and unless you have a large number of images, chances are your classifier will perform very poorly. If you have few images, consider using fractions of pixels belonging to clusters that I suggested in one of the comments: this can give you a shorter feature description that is invariant to rotation and translation, and may yield better classification.
I have written a simple SOM algorithm in MATLAB. My big challenge is that, how can I visualize/plot data in the format of U-Matrix, Sample Hits and Component/Input Planes? These three plots exists in the SOM toolbox in MATLAB. But the problem is that I cannot call them to visualize my data over my written code. Because they need a 'net' as input in which my code does not make any 'net'.
Is there any guidance?
You can create your own functions as they are not too complicated. I will assume a SOM of 20x20x10 (400 nodes, 4 features) for explanation.
The Hit-Map is no more than giving each sample to the already learned SOM and incrementing +1 to the node that was chosen as the Best Matching Unit (BMU). Then you plot this map. So if node(1,1) fires 10 times, and node(1,2) fires 100 times, then you will have an image where node(1,2) has a higher intensity than node(1,1).
The U-Matrix is a map representing the average distance between the node's weight vector and its closest neighbours. So here you can calculate the Euclidean distance between the feature vector of node X to every neighbour. So if you had a feature vector for node(1,1,:)=[1,1,2,3], node(1,2,:)=[2,2,1,1], and node(2,1,:)=[1,1,1,1], then the value of the U-matrix for node(1,1) could be U(1,1)=norm(squeeze(node(1,1,:)-node(1,2,:)))+norm(squeeze(node(1,1,:)-node(2,1,:)))=4.8818
The Component/Input Planes is the simplest one and does not require any processing. You just basically pick each feature of the SOM map and plot. So in our example of a 20x20x4 SOM, you would have 4 features and therefore 4 components, which you can plot through imagesc(node(:,:,1)) for feature 1
As i've explained in a previous question: I have a dataset consisting of a large semi-random collection of points in three dimensional euclidian space. In this collection of points, i am trying to find the point that is closest to the area with the highest density of points.
As high performance mark answered;
the most straightforward thing to do would be to divide your subset of
Euclidean space into lots of little unit volumes (voxels) and count
how many points there are in each one. The voxel with the most points
is where the density of points is at its highest. Perhaps initially
dividing your space into 2 x 2 x 2 voxels, then choosing the voxel
with most points and sub-dividing that in turn until your criteria are
satisfied.
Mark suggested i use triplequad for this, but this is not a function i am familiar with, or understand very well. Does anyone have any pointers on how i could go about using this function in Matlab for what i am trying to do?
For example, say i have a random normally distributed matrix A = randn([300,300,300]), how could i use triplequad to find the point i am looking for? Because as i understand currently, i also have to provide triplequad with a function fun when using it. Which function should that be for this problem?
Here's an answer which doesn't use triplequad.
For the purposes of exposition I define an array of data like this:
A = rand([30,3])*10;
which gives me 30 points uniformly distributed in the box (0:10,0:10,0:10). Note that in this explanation a point in 3D space is represented by each row in A. Now define a 3D array for the counts of points in each voxel:
counts = zeros(10,10,10)
Here I've chosen to have a 10x10x10 array of voxels, but this is just for convenience, it would be only a little more difficult to have chosen some other number of voxels in each dimension, and there don't have to be the same number of voxels along each axis. Then the code
for ix = 1:size(A,1)
counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3))) = counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3)))+1
end
will count up the number of points in each of the voxels in counts.
EDIT
Unfortunately I have to do some work this afternoon and won't be able to get back to wrestling with the triplequad solution until later. Hope this is OK in the meantime.
In matlab, how to generate two clusters of random points like the following graph. Can you show me the scripts/code?
If you want to generate such data points, you will need to have their probability distribution to be able to generate the points.
For your point, I do not have the real distributions, so I can only give an approximation. From your figure I see that both lay approximately on a circle, with a random radius and a limited span for the angle. I assume those angles and radii are uniformly distributed over certain ranges, which seems like a pretty good starting point.
Therefore it also makes sense to generate the random data in polar coordinates (i.e. angle and radius) instead of the cartesian ones (i.e. horizontal and vertical), and transform them to allow plotting.
C1 = [0 0]; % center of the circle
C2 = [-5 7.5];
R1 = [8 10]; % range of radii
R2 = [8 10];
A1 = [1 3]*pi/2; % [rad] range of allowed angles
A2 = [-1 1]*pi/2;
nPoints = 500;
urand = #(nPoints,limits)(limits(1) + rand(nPoints,1)*diff(limits));
randomCircle = #(n,r,a)(pol2cart(urand(n,a),urand(n,r)));
[P1x,P1y] = randomCircle(nPoints,R1,A1);
P1x = P1x + C1(1);
P1y = P1y + C1(2);
[P2x,P2y] = randomCircle(nPoints,R2,A2);
P2x = P2x + C2(1);
P2y = P2y + C2(2);
figure
plot(P1x,P1y,'or'); hold on;
plot(P2x,P2y,'sb'); hold on;
axis square
This yields:
This method works relatively well when you deal with distributions that you can transform easily and when you can easily describe the possible locations of the points. If you cannot, there are other methods such as the inverse transforming sampling method which offer algorithms to generate the data instead of manual variable transformations as I did here.
K-means is not going to give you what you want.
For K-means, vectors are classified based on their nearest cluster center. I can only think of two ways you could get the non-convex assignment shown in the picture:
Your input data is actually higher-dimensional, and your sample image is just a 2-d projection.
You're using a distance metric with different scaling across the dimensions.
To achieve your aim:
Use a non-linear clustering algorithm.
Apply a non-linear transform to your input data. (Probably not feasible).
You can find a list on non-linear clustering algorithms here. Specifically, look at this reference on the MST clustering page. Your exact shape appears on the fourth page of the PDF together with a comparison of what happens with K-Means.
For existing MATLAB code, you could try this Kernel K-Means implementation. Also, check out the Clustering Toolbox.
Assuming that you really want to do the clustering operation on existing data, as opposed to generating the data itself. Since you have a plot of some data, it seems logical that you already know how to do that! If I am wrong in this assumption, then you should word your questions more carefully in the future.
The human brain is quite good at seeing patterns in things like this, that writing a code for on a computer will often take some serious effort.
As has been said already, traditional clustering tools such as k-means will fail. Luckily, the image processing toolbox has good tools for these purposes already written. I might suggest converting the plot into an image, using filled in dots to plot the points. Make sure the dots are large enough that they touch each other within a cluster, with some overlap. Then use dilation/erosion tools if necessary to make sure that any small cracks are filled in, but don't go so far as to cause the clusters to merge. Finally, use region segmentation tools to pick out the clusters. Once done, transform back from pixel units in the image into your spatial units, and you have accomplished your task.
For the image processing approach to work, you will need sufficient separation between the clusters compared to the coarseness within a cluster. But that seems obvious for any method to succeed.