Finding Co-varying Regions in a 3D Matrix - cluster-analysis

I wonder if anyone has a suggestion on how to solve the following problem.
I have a matrix of size n*p*t. I would like to find areas in the plane n*p, that co-vary (in the t dimension). In other words, sub-regions on the plane (n*p) whose values vary together along the t dimension.
An example:
` t=1
[ 0 0 0
[ 0 1 1
[ 0 1 1
--->
t=2
[ 3 2 7
[ 0 2 2
[ 4 2 2 `
So the bottom corner would be assigned to a group since the values co-vary.
Do you have any ideas how to tackle such a problem?
Thanks!

Related

transfer the lower triangular part of a matrix into a vector in matlab

Let a be a matrix. The following code will transfer the lower triangular part of it into a vector if there is no 0-elements in the lower triangular part of it.
a(find(tril(a,-1)))
So, what shall I do if there is some 0-element in the lower triangular part of a? Thanks very much for your time and attention.
Use a mask -
%// Mask of lower triangular elements
mask = tril(true(size(a)),-1)
%// Use mask to select lower triangular elements from input array
out = a(mask)
Alternatively, you can create the mask with bsxfun -
mask = bsxfun(#gt,[1:size(a,1)]',1:size(a,2))
Sample run -
>> a
a =
1 3 0 2 1
0 1 1 3 1
0 2 2 1 2
3 0 1 3 2
3 3 3 0 3
>> out
out =
0
0
3
3
2
0
3
1
3
0

How to create Adjacency Matrix for dataset in Matlab?

i am a new user of matlab and need help in creating adjacency matrix from a data set.
dataset is in the following pattern
A=[
0 1
0 2
0 5
1 2
1 3
1 4
2 3
2 5
3 1
3 4
3 5
4 0
4 2
5 2
5 4
];
the adjacency matrix for above will be
M= 0 1 1 0 0 1
0 0 1 1 1 0
0 0 0 1 0 1
0 1 0 0 1 1
1 0 1 0 0 0
0 0 1 0 1 0
i need code to perform the above task in matlab
You could use sparse. Please take a look at that function, give your problem a try, and then check by hovering the mouse over the following rectangle:
full(sparse(A(:,1)+1, A(:,2)+1, 1))
Welcome to SO! Please read the following guide on how to ask good questions.
To elaborate on #Ankur's comment, please also take a look at this Open letter to students with homework problems: "...If your question is just a copy paste of homework problem, expect it to be downvoted, closed, and deleted - potentially in quite short order."
What you need to do is pretty straight-forward:
First you preallocate your M matrix, using either M=zeros(6); or M(6,6)=0; (this option assumes M does not exist).
Next thing you should note is that MATLAB uses "1-based indexing", which means that you can't use the indices in A as-is and you first need to increment them by 1.
After incrementing the indices, we see that "A+1" contains the coordinates of M that should have a 1 in them (I noticed that the adjacency matrix is asymmetric in your case). From here it's a matter of accessing the correct cells, and this can be done using sub2ind(...).
Finally, the code to generate M is:
M=zeros(6);
M(sub2ind(size(M), A(:,1)+1, A(:,2)+1))=1;
I don't understand your matrix A, but supposing that A is of the same dimensions as the sought adjacency matrix M, and that you simply want to keep all the zero entries as "0", but want to make the positive entries equal to "1", then just do:
M = (A>0)
As pointed out in a comment by #Dev-iL above, this is a "problem of how to modify values of a matrix in known positions. It doesn't really involve adjacency..."

How to label data index with count using 3D histogram in Matlab

I have a set of data points (around 20000) with their x,y values and I want to remove the points that not very close to other points. I try to approach by 'digitizing' and I think the closest way to implement it in Matlab is a 3D histogram so I can remove the points in the low-count bins. I used hist3() but the problems is I couldn't get the index of the points labeled with counts (like the output 'ind' from histc()). The only way I can think of is a nested for loop which is the last thing I want to try. Is there any way I can label the points index or any other approach to do this?
Thanks
I feel like I need some clarification
I have the histogram graph from the data generated by #rayryeng
There are some bins have N=0 or N=1 so I want to remove the data in these bins.
For histc() there is a form of output [bincounts,ind]= histc( ) where ind returns the bin numbers the data falls into. So I can find the index of bins which less/equal or larger than 1, then find the data in the particular bins. Is there any similar thing I can do for a 2D inputs?
Thanks Again
hist3 should be able to accomplish this for you. I'm not quite sure where the problem is. You can call hist3 like so:
[N,C] = hist3(X);
This will automatically partition your dataset into a 10 x 10 grid of equally spaced containers. You can override this behaviour by doing:
[N,C] = hist3(X, NBINS);
NBINS is a 2 element array where the first element tells you how many bins you want vertically and the second element is how many bins you would like horizontally.
N will tell you how many elements fall within each location of the grid and C will give you a 1 x 2 cell array where the first element of the cell array gives you the X co-ordinates of each centre of the bin while the second element of the cell array gives you the Y co-ordinates of each centre of the bin.
To be explicit, if we have a 10 x 10 grid, C will contain a two element cell array where each element is 10 elements long. For each X co-ordinate of the centre found in C{1}, we will have 10 corresponding Y co-ordinates that relate to the a bin's centre in C{2}. This means that the first 10 bin centres are located at C{1}(1), C{2}(1), C{1}(1), C{2}(2), C{1}(1), C{2}(3), ..., C{1}(1), C{2}(10), then the next 10 bin centres are located at: C{1}(2), C{2}(1), C{1}(2), C{2}(2), C{1}(2), C{2}(3), ..., C{1}(1), C{2}(10).
As a quick example, let's do this on a grid between [0,1] on the x-axis and [0,1] on the y-axis. I'm going to generate 100 2D points. Let's also decompose the image into 10 bins horizontally and 10 bins vertically (as per the default of hist3).
rng(100); %// Set seed for reproducibility
A = rand(100,2);
[N,C] = hist3(A);
disp(N);
celldisp(C);
We thus get:
N =
1 2 0 1 2 0 1 0 1 1
0 1 1 1 1 1 0 0 2 5
0 4 1 1 1 1 1 4 0 1
2 0 3 2 2 1 1 0 2 1
0 0 0 0 1 1 1 0 0 1
1 1 1 2 1 1 0 2 0 1
1 0 2 1 2 0 3 1 1 1
0 1 0 0 0 1 1 0 0 1
1 0 1 2 3 3 0 0 0 2
0 2 1 1 0 1 0 3 0 1
C{1} =
Columns 1 through 7
0.0541 0.1528 0.2516 0.3503 0.4491 0.5478 0.6466
Columns 8 through 10
0.7453 0.8440 0.9428
C{2} =
Columns 1 through 7
0.0513 0.1510 0.2508 0.3505 0.4503 0.5500 0.6498
Columns 8 through 10
0.7495 0.8493 0.9491
This tells us that the first grid located at the top left corner of our point distribution only has 1 value logged into it. The next grid after that has 2 bins logged in it and so on and so forth. We also have our bin centres for each of the bins shown in C. Remember, We have 10 x 10 possible bin centres. If we want to display our data with the bin locations, this is what we can do:
[X,Y] = meshgrid(C{1},C{2});
plot(A(:,1), A(:,2), 'b*', X(:), Y(:), 'r*');
grid;
We thus get:
The red stars denote the bin centres while the blue stars denote our data points within the grid. Because our origin is on the bottom left corner of our plot, but the origin of the N matrix is at the top left corner (i.e. the first bin that is decomposed is at the top left while in our data it's at the bottom left corner), we need to rotate N by 90 degrees counter-clockwise so that the origins of each of the matrices agree with each other, and also agree with the plot. As such:
Nrot = rot90(N);
disp(Nrot);
Nrot =
1 5 1 1 1 1 1 1 2 1
1 2 0 2 0 0 1 0 0 0
0 0 4 0 0 2 1 0 0 3
1 0 1 1 1 0 3 1 0 0
0 1 1 1 1 1 0 1 3 1
2 1 1 2 1 1 2 0 3 0
1 1 1 2 0 2 1 0 2 1
0 1 1 3 0 1 2 0 1 1
2 1 4 0 0 1 0 1 0 2
1 0 0 2 0 1 1 0 1 0
As you can see from the picture, this agrees with what we see within the (rotated) N matrix as well as the bin centres C. Using N (or Nrot if you get the convention correct), you can now figure out which points to eliminate from your array of points. Any points that have low membership within N, you would find those points that are the closest to that bin centre that is associated to the grid location in N and remove them.
As an example, supposing that the bin in the first row, second column (of the rotated result) is the one you want to filter out. This corresponds to (C{1}(2), C{2}(10)). We also know that we need to filter out 5 points as they belong to this bin centre. Therefore:
numPointsToRemove = N(2,10); %//or Nrot(1,2);
%// Computes Euclidean distance between this bin centre with every point
dists = sqrt(sum(bsxfun(#minus, A, [C{1}(2) C{2}(10)]).^2, 2));
%// Find the numPointsToRemove closest points to the bin centre and remove
[~,ind] = sort(dists);
A(ind(1:numPointsToRemove,:)) = [];
We sort our distances in ascending order, then determine the numPointsToRemove closest points to this bin centre. We thus remove them from our data matrix.
If you want to remove those bins that have either a 0 or a 1 for the count, we can find those locations, then run a for loop and filter accordingly. However, any bins that have 0 means that we don't even need to run through and filter anything, because no points were mapped to there! You really need to filter out those values that have just 1 in the bins. In other words:
[rows, cols] = find(N == 1);
for index = 1 : numel(rows)
row = rows(index);
col = cols(index);
%// Computes Euclidean distance between this bin centre with every point
dists = sqrt(sum(bsxfun(#minus, A, [C{1}(row) C{2}(col)]).^2, 2));
%// Finds the closest point to the bin centre and remove
[~,ind] = min(dists);
A(ind,:) = [];
end
As you can see, this is similar the same procedure as above. As we wish to filter out those bins that only have 1 assigned to a bin, we just need to find the minimum distance. Remember, we don't need to process any bins that have a count of 0 so we can skip those.

Count identical elements near each other in matrix

Consider a matrix like
A = 0 1 0 1
1 1 0 0
0 0 0 0
1 1 1 1
I would like to calculate the average size of each cluster of 1's. I define a cluster as occurring when two or more 1's are near each other, i.e. next to or above/below. Eg, in this matrix there is a cluster of size 3 in the top left hand corner and a cluster of size 4 in the bottom row.
I need a way to extract this information in a non-visual way because I need to do this many times for different A.
You may want to use bwlabel which isolates the connected components (clusters of 1) in your binary matrix.
A = [0 1 0 1
1 1 0 0
0 0 0 0
1 1 1 1 ];
[L,n] = bwlabel(A,8) % # for a 8-pixel stencil
% # (i.e. hor/vert/diag first neighbors)
or
[L,n] = bwlabel(A,4) % # for 4-pixel stencil
% # (just horizontal & vertical neighbors)
L = 0 1 0 3
1 1 0 0
0 0 0 0
2 2 2 2
Doing so, you obtain a matrix L which labels the n different connected components.
Then you may want to extract some statistics; for instance you may want to histogram the size of the clusters.
cluster_size = hist(L(:),0:n);
cluster_size = cluster_size(2:end); % # histogram of component vs. size
% # (without zeros)
hist(cluster_size) % # histogram of sizes
which tells you thay you have one cluser of 1 element, one cluster of 3 and one cluster of four.
Finally, if you are looking for the average size of the clusters, you can do
mean(cluster_size)
2.6667

How to display separate disconnected trees in MATLAB when doing hierarchical clustering and producing dendrograms?

I am working with MATLAB, and I have an adjacency matrix:
mat =
0 1 0 0 0 0
1 0 0 0 1 0
0 0 0 1 0 0
0 0 1 0 0 1
0 1 0 0 0 0
0 0 0 1 0 0
which is not fully connected. Nodes {1,2,5} are connected, and {3,4,6} are connected (the edges are directed).
I would like to see the separate clusters in a dendrogram on a single plot. Since there is not path from one cluster to the next, I would like to see separate trees with separate roots for each cluster. I am using the commands:
mat=zeros(6,6)
mat(1,2)=1;mat(2,1)=1;mat(5,2)=1;mat(2,5)=1;
mat(6,4)=1;mat(4,6)=1;mat(3,4)=1;mat(4,3)=1;
Y=pdist(mat)
squareform(Y)
Z=linkage(Y)
figure()
dendrogram(Z)
These commands are advised from Hierarchical Clustering. And the result is attached: imageDendrogram. Other than that the labels don't make sense, the whole tree is connected, and I connect figure out how to have several disconnected trees which reflect the disconnected nature of the data. I would like to avoid multiple plots as I wish to work with larger datasets that may have many disjoint clusters.
I see this was asked a while ago, but in case you're still interested, here's something to try:
First extract the values above the diagonal from the adjacency matrix, like so:
>> matY = [];
>> for n = 2:6
for m = n:6
matY = [matY mat(n,m)];
end
end
>> matY
matY =
Columns 1 through 13
0 0 0 1 0 0 1 0 0 0 0 1 0
Columns 14 through 15
0 0
Now you have something that looks like the Y vector pdist would have produced. But the values here are the opposite of what you probably want; the unconnected vertices have a "distance" of zero and the connected ones are one apart. Let's fix that:
>> matY(matY == 0) = 10
matY =
Columns 1 through 13
10 10 10 1 10 10 1 10 10 10 10 1 10
Columns 14 through 15
10 10
Better. Now we can compute a regular cluster tree, which will represent connected vertices as "close together" and non-connected ones as "far apart":
>> linkage(matY)
ans =
3 6 1
1 5 1
2 4 1
7 8 10
9 10 10
>> dendrogram(ans)
The resulting diagram:
Hope this is a decent approximation of what you're looking for.