grouping using dendrogram matlab - matlab

I have a matrix A composed by 4 vectors (columns) of 12 elements each
A = [ 0 0 0 0;
0.0100 0.0100 0.0100 0;
0.3000 0.2700 0.2400 0.2400;
0.0400 0 0.0200 0.0200;
0.1900 0.0400 0.0800 0.0800;
0.1600 0.6500 0.2100 0.3800;
0.0600 0.0100 0.0300 0.0200;
0.1500 0.0100 0.0600 0.1700;
0 0 0 0.0800;
0.0300 0 0.0200 0.0100;
0.0700 0 0.1200 0.0100;
0 0 0.2300 0]
I also have a similarity matrix that states how much a vector is similar to the others
SIM =[1.00 0.6400 0.7700 0.8300;
0.6400 1.0000 0.6900 0.9100;
0.7700 0.6900 1.0000 0.7500;
0.8300 0.9100 0.7500 1.0000]
reading the rows of this matrix
vetor 1 is similar to vector 2 for 64%
vector 1 is similar to vector 3 for the 77%
...
I would like to create a dendrogram graph that shows me how many different groups there are in A considering a threshold of 0.95 for similarity (i.e. if 2 groups have a similarity >0.7 connect them)
I didn't really understand how to use this function with my data...

Not sure I understood correctly you question, but for what I've understood I will do that:
DSIM = squareform(1-SIM); % convert to a dissimilarity vector
it gives the result:
% DSIM = 0.3600 0.2300 0.1700 0.3100 0.0900 0.2500
% DSIM = 1 vs 2 , 1 vs 3 , 1 vs 4, 2 vs 3, 2 vs 4, 3 vs 4 ;
After, compute the linkage:
Z = linkage (DSIM,'average'); % there is other grouping option than average
You can plot the dendrogram with:
dendrogram(Z)
However, you want to split the groups according to a threshold so:
c = 0.1;
This is the dissimilarity at which to cut, here it means that two groups will be connected if they have a similarity higher than 0.9
T = cluster(tree,'cutoff',c,'criterion','distance')
The result of T in that case is:
T =
1
2
3
2
This means that at this level your vectors 1, 2, 3, 4 (call them A B C D) are organized in 3 groups:
A
B,D
C
Also, with c = 0.3, or 0.7 similarity:
T = 1 1 1 1
So there is just one group here.
To have that on the dendrogram you can calculate the number of groups:
num_grp = numel(unique(T));
After:
dendrogram(tree,num_grp,'labels',{'A','B','C','D'})
In that case the dendrogram won't display all groups because you set the maximum of nodes equal to the number of groups.

Related

Normalize each slice of a 3D matrix

How do I normalize each slice of a 3D matrix? I tried like this:
a=rand(1,100,3481);
a= (a - min(a)) ./ (max(a)-min(a)); %
By right each slice of matrix should ranges from 0 to 1. But that is not the case, I don't find 1 in some of the slices. As I inspected, min(a) and max(a) returned the respective value in 3D. Thus it should be of no issue using the code above. Is there something I missed for 3D matrix? Thanks in advance!
We need to find the minimum and maximum values for each of those 2D slices and then we can use bsxfun to do those operations in a vectorized manner with help from permute to let the singleton dims align properly to let bsxfun do its broadcasting job (or use reshape there).
Hence, the implementation would be -
mins = min(reshape(a,[],size(a,3)));
maxs = max(reshape(a,[],size(a,3)));
a_offsetted = bsxfun(#minus, a, permute(mins,[1,3,2]));
a_normalized = bsxfun(#rdivide, a_offsetted, permute(maxs-mins,[1,3,2]))
Sample input, output -
>> a
a(:,:,1) =
2 8 2 2
8 3 8 2
a(:,:,2) =
8 1 1 5
4 9 8 6
a(:,:,3) =
7 9 3 5
6 2 6 5
a(:,:,4) =
9 3 4 9
7 1 9 9
>> a_normalized
a_normalized(:,:,1) =
0 1.0000 0 0
1.0000 0.1667 1.0000 0
a_normalized(:,:,2) =
0.8750 0 0 0.5000
0.3750 1.0000 0.8750 0.6250
a_normalized(:,:,3) =
0.7143 1.0000 0.1429 0.4286
0.5714 0 0.5714 0.4286
a_normalized(:,:,4) =
1.0000 0.2500 0.3750 1.0000
0.7500 0 1.0000 1.0000
My option would be without reshaping as it is sometimes bit difficult to understand. I use min max with the dimension you want want to use for normalization with repmat to clone...:
a=rand(1,100,3481);
a_min2 = min(a,[],2);
a_max2 = max(a,[],2);
a_norm2 = (a - repmat(a_min2,[1 size(a,2) 1]) ) ./ repmat( (a_max2-a_min2),[1 size(a,2) 1]);
or if normalization on 3rd dim...
a_min3 = min(a,[],3);
a_max3 = max(a,[],3);
a_norm3 = (a - repmat(a_min3,[1 1 size(a,3)]) ) ./ repmat( (a_max3-a_min3),[1 1 size(a,3)]);

Voronoi diagram: what is output of [v,C] = voronoin(X) in MATLAB?

I have an image and I want to generate its voronoi diagram
V = {V_1,V_2,...,V_N} in MATLAB and take corner points of image as seed points and V_i is set of pixels.I used
[v,C] = voronoin(corners);
I know that C is vooronoi cells but Are these cells contain location of pixels?
As mentioned in documentation:
V is a numv-by-n array of the numv Voronoi vertices in n-dimensional space, each row corresponds to a Voronoi vertex. C is a vector cell array where each element contains the indices into V of the vertices of the corresponding Voronoi cell.
It means C is created based on vertices in V and mentioned id of the vertices for each cell.
For example:
X = [0.5 0; 0 0.5; -0.5 -0.5; -0.2 -0.1; -0.1 0.1; 0.1 -0.1; 0.1 0.1]
X =
0.5000 0
0 0.5000
-0.5000 -0.5000
-0.2000 -0.1000
-0.1000 0.1000
0.1000 -0.1000
0.1000 0.1000
Compute the voronoi cell:
[V,C] = voronoin(X)
V =
Inf Inf
0.7000 -1.6500
-0.0500 -0.0500
-0.0500 -0.5250
-1.4500 0.6500
-1.7500 0.7500
0 0.2875
0.3833 0.3833
0.2875 0
0 0
C = 7×1 cell array
[1×4 double]
[1×5 double]
[1×4 double]
[1×4 double]
[1×4 double]
[1×5 double]
[1×4 double]
Use a for loop to display the contents of the cell array C.
for i = 1:length(C)
disp(C{i});
end
9 2 1 8
8 1 6 5 7
6 1 2 4
6 4 3 5
10 3 5 7
10 3 4 2 9
10 7 8 9
One of the cell is 9 2 1 8. It means the vertices of this cell is V(9,:), V(2,:), V(1,:), V(8,:) respectively.

Normalization of inputs of a feedforward Neural network

Let's say I have a mxn matrix of different features of a time series signal (column 1 represents linear regression of the last n samples, column 2 represents the average of the last n samples, column 3 represents the local max values of a different time series but correlated signal, etc). How should I normalize these inputs? All the inputs fall into different categories, so they have a different range. One ranges from 0,1, the other ranges from -5 to 50, etc etc.
Should I normalize the WHOLE matrix? Or should I normalize each set of inputs one by one individually?
Note: I usually use mapminmax function from MATLAB for the normalization.
You should normalise each vector/column of your matrix individually, they represent different data types and shouldn't be mixed up together.
You could for example transpose your matrix to have your 3 different data types in the rows instead of in the columns of your matrix and still use mapminmax:
A = [0 0.1 -5; 0.2 0.3 50; 0.8 0.8 10; 0.7 0.9 20];
A =
0 0.1000 -5.0000
0.2000 0.3000 50.0000
0.8000 0.8000 10.0000
0.7000 0.9000 20.0000
B = mapminmax(A')
B =
-1.0000 -0.5000 1.0000 0.7500
-1.0000 -0.5000 0.7500 1.0000
-1.0000 1.0000 -0.4545 -0.0909
You should normalize each feature independently.
column 1 represents linear regression of the last n samples, column 2 represents the average of the last n samples, column 3 represents the local max values of a different time series but correlated signal, etc
I can't say for sure about your particular problem, but generally, you should normalize each feature independently. So normalize column 1, then column 2 etc.
Should I normalize the WHOLE matrix? Or should I normalize each set of inputs one by one individually?
I'm not sure what you mean here. What is an input? If by that you mean an instance (a row of your matrix), then no, you should not normalize rows individually, but columns.
I don't know how you would do this in Matlab, but I took your question more as a theoretical one than an implementation one.
If you want to have a range of [0,1] for all the columns that normalized within each column, you can use mapminmax like so (assuming A as the 2D input array) -
out = mapminmax(A.',0,1).'
You can also use bsxfun for the same output, like so -
Aoffsetted = bsxfun(#minus,A,min(A,[],1))
out = bsxfun(#rdivide,Aoffsetted,max(Aoffsetted,[],1))
Sample run -
>> A
A =
3 7 4 2 7
1 3 4 5 7
1 9 7 5 3
8 1 8 6 7
>> mapminmax(A.',0,1).'
ans =
0.28571 0.75 0 0 1
0 0.25 0 0.75 1
0 1 0.75 0.75 0
1 0 1 1 1
>> Aoffsetted = bsxfun(#minus,A,min(A,[],1));
>> bsxfun(#rdivide,Aoffsetted,max(Aoffsetted,[],1))
ans =
0.28571 0.75 0 0 1
0 0.25 0 0.75 1
0 1 0.75 0.75 0
1 0 1 1 1

Generating a grid in matlab with a general number of dimensions

Problem
I have a vector w containing n elements. I do not know n in advance.
I want to generate an n-dimensional grid g whose values range from grid_min to grid_max and obtain the "dimension-wise" product of w and g.
How can I do this for an arbitrary n?
Examples
For simplicity, let's say that grid_min = 0 and grid_max = 5.
Case: n=1
>> w = [0.75];
>> g = 0:5
ans =
0 1 2 3 4 5
>> w * g
ans =
0 0.7500 1.5000 2.2500 3.0000 3.7500
Case: n=2
>> w = [0.1, 0.2];
>> [g1, g2] = meshgrid(0:5, 0:5)
g1 =
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
g2 =
0 0 0 0 0 0
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
>> w(1) * g1 + w(2) * g2
ans =
0 0.1000 0.2000 0.3000 0.4000 0.5000
0.2000 0.3000 0.4000 0.5000 0.6000 0.7000
0.4000 0.5000 0.6000 0.7000 0.8000 0.9000
0.6000 0.7000 0.8000 0.9000 1.0000 1.1000
0.8000 0.9000 1.0000 1.1000 1.2000 1.3000
1.0000 1.1000 1.2000 1.3000 1.4000 1.5000
Now suppose a user passes in the vector w and we do not know how many elements (n) it contains. How can I create the grid and obtain the product?
%// Data:
grid_min = 0;
grid_max = 5;
w = [.1 .2 .3];
%// Let's go:
n = numel(w);
gg = cell(1,n);
[gg{:}] = ndgrid(grid_min:grid_max);
gg = cat(n+1, gg{:});
result = sum(bsxfun(#times, gg, shiftdim(w(:), -n)), n+1);
How this works:
The grid (variable gg) is generated with ndgrid, using as output a comma-separated list of n elements obtained from a cell array. The resulting n-dimensional arrays (gg{1}, gg{2} etc) are contatenated along the n+1-th dimension (using cat), which turns gg into an n+1-dimensional array. The vector w is reshaped into the n+1-th dimension (shiftdim), multiplied by gg using bsxfun, and the results are summed along the n+1-th dimension.
Edit:
Following #Divakar's insightful comment, the last line can be replaced by
sz_gg = size(gg);
result = zeros(sz_gg(1:end-1));
result(:) = reshape(gg,[],numel(w))*w(:);
which results in a significant speedup, because Matlab is even better at matrix multiplication than at bsxfun (see for example here and here).

How to implement correlation kernel for image

I read a paper that said about correlation kernel that defined:
W(x−y)=(α/1+d(|y−x|))
where α=(∫(1+d(y−x)dy)−1, (d(|y−x|)) is spatial Euclidean distance from the central pixel.
Given an image I. Could you help me implement convolution that kernel with the image by matlab code. Thank you so much
OK! Sorry for the delay. Referencing the paper, the convolution kernel can be written as:
d(|y-x|) is the Euclidean distance between the centre pixel y and a location in the kernel x. \alpha is used to ensure that the entire area under the kernel is 1. However, you did not specify how big this kernel is. As such, we will specify the rows and columns of this kernel to be M and N respectively. Let's also assume that the size of the kernel for each dimension is odd. The reason why is because the shape of the kernel will be an even square and makes implementation easier. As such, here are the steps that I would perform to do this:
Define a grid of X and Y co-ordinates, and ensure that the centre pixel is at 0.
Compute each term in the convolution kernel without the \alpha term.
Sum up all of the terms in the kernel, then divide every value in this kernel by this term so that the entire area of the kernel is 1.
Let's do this step by step:
Step #1
We can do this by using meshgrid. meshgrid (in this case) creates a 2D grid of (X,Y) co-ordinates. X defines the horizontal co-ordinate for each location in X, while Y defines this vertically. By calling meshgrid(1:m, 1:n), I am creating a n x m grid for both X and Y, where each row of X progresses from 1 to m, while each column of Y progresses from 1 to n. Therefore, these will both be n x m matrices. Calling the above with m = 4 and n = 4 computes:
m = 4;
n = 4;
[X,Y] = meshgrid(1:m, 1:n)
X =
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
Y =
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
As such, we simply have to modify this, but we ensure that the centre is at (0,0), and also ensure that the size of X and Y are odd. Let's say that M = 5 and N = 5. We can then define our X and Y co-ordinates like so:
M = 5;
N = 5;
[X,Y] = meshgrid(-floor(N/2):floor(N/2), -floor(M/2):floor(M/2))
X =
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
Y =
-2 -2 -2 -2 -2
-1 -1 -1 -1 -1
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2
As you can see here, the centre pixel for both X and Y are defined as (0,0). Everywhere else has its (X,Y) co-ordinates defined with respect to the centre.
Step #2
We simply have to compute the Euclidean distance between the centre pixel to every point in the kernel. This can be done by:
dis = sqrt(X.^2 + Y.^2)
dis =
2.8284 2.2361 2.0000 2.2361 2.8284
2.2361 1.4142 1.0000 1.4142 2.2361
2.0000 1.0000 0 1.0000 2.0000
2.2361 1.4142 1.0000 1.4142 2.2361
2.8284 2.2361 2.0000 2.2361 2.8284
Doing some quick calculation checks, you can see that this agrees with our understanding of Euclidean distance. Moving to the left by 1 from the centre is a distance of 1. Moving to the left by 1 then up by 1 gives us a Euclidean distance of \sqrt(1^2 + 1^2) = \sqrt(2) = 1.4142. Doing similar checks with each element will demonstrate that this is indeed a Euclidean distance field from the centre pixel. After we do this, let's compute the kernel terms without the \alpha term.
kern = 1 ./ (1 + dis)
kern =
0.2612 0.3090 0.3333 0.3090 0.2612
0.3090 0.4142 0.5000 0.4142 0.3090
0.3333 0.5000 1.0000 0.5000 0.3333
0.3090 0.4142 0.5000 0.4142 0.3090
0.2612 0.3090 0.3333 0.3090 0.2612
Step #3
The last step we need is to normalize the mask so that the total sum of the kernel is 1. This can simply be done by:
kernFinal = kern / sum(kern(:))
kernFinal =
0.0275 0.0325 0.0351 0.0325 0.0275
0.0325 0.0436 0.0526 0.0436 0.0325
0.0351 0.0526 0.1052 0.0526 0.0351
0.0325 0.0436 0.0526 0.0436 0.0325
0.0275 0.0325 0.0351 0.0325 0.0275
This should finally give you the correlation kernel that you are seeking. You can now use this in convolution (i.e. using imfilter or conv2).
Hopefully I have answered your question adequately. Good luck!