Image labeling and finding centroids in matlab - matlab

My problem is that I have a radar image in png format. (Sorry, but i had to remove the image as my colleague says it a copyright infringement of the German Weather Service)
I want to read the image in MATLAB. Then read all the clouds, and label each cloud with a unique index. This means that each pixel belonging to a certain cloud is labeled with the same index i. Calculate the center of area(coa) of each cloud and then I should be able to measure distances between clouds from one coa to another.
Some similar work I know was done in IDL. I tried using that but it would be much easier for me if I'm able to do all this in MATLAB and concentrate more on the result, rather then spend time learning IDL.
So, before jumping in, I want to know if all this is possible in MATLAB. If yes, can you guide me a little on how I can extract the cloud and label them?

First do some basic image analysis such as thresholding or median filtering and so forth, to reduce noise if relevant.
Then you can use bwlabel to label each cloud with a unique index. The use reigonprops to find the centroids.
Here's a very basic code sample:
d=imread('u09q8.png');
bw = im2bw(d,0.1); % thereshold at 50%
bw = bwareaopen(bw, 10); % Remove objects smaller than 10 pixels from binary image
bw=bwlabel(bw); % label each cloud
stats=regionprops(bw,'Centroid'); % find centroid coordinates of all labeled clouds

Yes it is possible.
Regarding the cloud detection, it is a step by step process.
It will be based on the algorithm you are going to use. You can start here.

Yes, of course. This can be done using k-means clustering.
You can read about imread and kmeans. The example given in the official documentation of kmeans shows exactly what you need.
For example, if you want to cluster your image into 5 clouds:
%// Read the image
I = imread('clouds.png');
[y, x] = find(I); %// Obtain all coordinates
y = size(I, 1) - y + 1; %// Adjust y-coordinates
K = 5;
[idx, c] = kmeans([x, y], K); %// Classify clouds into K clusters
Now idx stores the corresponding cluster indices and c stores the coordinates of the centroids.
To draw the results, you can do something like this:
%// Plot results
figure, hold on
scatter(x, y, 5, idx) %// Plot the clusters
plot(c(:, 1), c(:, 2), 'r+') %// Plot centroids
text(c(:, 1) + 10, c(:, 2), num2str((1:K)'), 'color', 'r') %// Plot cluster IDs
Note that this method requires predetermining the number of clusters K in advance. Alternatively, you can use this tool to attempt to automatically detect the number of clusters.
EDIT: Due to the copyright claim I removed the resulting image.

Related

Find volume of 3d peaks in matlab

right now I have a 3d scatter plot with peaks that I need to find the volumes for. My data is from an image, so the x- and y- values indicate the pixel positions on the xy-plane, and the z value is the pixel value for each pixel.
Here's my scatter plot:
scatter3(x,y,z,20,z,'filled')
I am trying to find the "volume" of the peaks of the data, like drawn below:
I've tried findpeaks() but it gives me many local maxima without the the two prominent peaks that I'm looking for. In addition, I'm really stuck on how to establish the "base" of my peaks, because my data is from a scatter plot. I've also tried the convex hull and a linear surface fit, and get this:
But I'm still stuck on how to use any of these commands to establish an automated peak "base" and volume. Please let me know if you have any ideas or code segments to help me out, because I am stumped and I can't find anything on Stack Overflow. Sorry in advance if this is really unclear! Thank you so much!
Here is a suggestion for dealing with this problem:
Define a threshold for z height, or define in any other way which points from the scatter are relevant (the black plane in the leftmost figure below).
Within the resulted points, find clusters on the X-Y plane, to define the different regions to calculate. You will have to define manually how many clusters you want.
for each cluster, perform a Delaunay triangulation to estimate its volume.
Here is an example code for all that:
[x,y,z] = peaks(30); % some data
subplot 131
scatter3(x(:),y(:),z(:),[],z(:),'filled')
title('The original data')
th = 2.5; % set a threshold for z values
hold on
surf([-3 -3 3 3],[-4 4 -4 4],ones(4)*th,'FaceColor','k',...
'FaceAlpha',0.5)
hold off
ind = z>th; % get an index of all values of interest
X = x(ind);
Y = y(ind);
Z = z(ind);
clustNum = 3; % the number of clusters should be define manually
T = clusterdata([X Y],clustNum);
subplot 132
gscatter(X,Y,T)
title('A look from above')
subplot 133
hold on
c = ['rgb'];
for k = 1:max(T)
valid = T==k;
% claculate a triangulation of the data:
DT = delaunayTriangulation([X(valid) Y(valid) Z(valid)]);
[K,v] = convexHull(DT); % get the convex hull indices
% plot the volume:
ts = trisurf(K,DT.Points(:,1),DT.Points(:,2),DT.Points(:,3),...
'FaceColor',c(k));
text(mean(X(valid)),mean(Y(valid)),max(Z(valid))*1.3,...
num2str(v),'FontSize',12)
end
hold off
view([-45 40])
title('The volumes')
Note: this code uses different functions from several toolboxes. In any case that something does not work, first make sure that you have the relevant toolbox, there are alternatives to most of them.
Having already a mesh, maybe you could use the process described in https://se.mathworks.com/matlabcentral/answers/277512-how-to-find-peaks-in-3d-mesh .
If not, making a linear regression on (x,z) or (y,z) plane could make a base for you to find the peaks.
Out of experience in data with plenty of noise, selecting the peaks manually is often faster if you have small set of data to make the analysis. Just plot every peak with its number from findpeaks() and select the ones that are relevant to you. An interpolation to a smoother data can help to solve the problem in the long term (but creates a problem by itself).
Other option will be searching for peaks in the (x,z) and (y,z) planes, then having the amplitude of each peak in an (x) [or (y)] interval and from there make a integration for every area.

grayscale image processing using k-mean

I am trying to convert the rgb image into a grayscale and then cluster it using kmean function of matlab .
here is my code
he = imread('tumor2.jpg');
%convert into a grayscale image
ab=rgb2gray(he);
nrows = size(ab,1);
ncols = size(ab,2);
%convert the image into a column vector
ab = reshape(ab,nrows*ncols,1);
%nColors=no of clusters
nColors = 3;
%cluster_idx is a n x 1 vector where cluster_idx(i) is the index of cluster assigned to ith pixel
[cluster_idx, cluster_center ,cluster_sum] = kmeans(ab,nColors,'distance','sqEuclidean','Replicates',1,'EmptyAction','drop' );
figure;
%converting vector into a matrix of dimensions equal to that of original
%image dimensions (nrows x ncols)
pixel_labels = reshape(cluster_idx,nrows,ncols);
pixel_labels
imshow(pixel_labels,[]), title('image labeled by cluster index');
problems
1) output image is always a plain white image.
i tried the solution given in the link below but output of the image is a plain gray image in this case.
find the solution tried here
2) when i execute my code second time ,execution does not proceed beyond k-mean function (it is likes an infinite loop there). hence no output in this case.
Actually, it looks like when you are colour segmenting kmeans is known to fall in local minima. This means that often, it wont find the amount of clusters you want as the minimization is not the best (that's why lots of people use other type of segmentation, such as level sets or simple region growing).
An option is to increase the amount of Replicates (amount of times kmeans will try to find the answer). At the moment you are setting it to 1, but you could try 3 or 4, and it may reach the solution that way.
In this question the accepted answer recommends to use a kmeans version of the algorithm specifically created for image segmentation. I havent tried myself but I think its worth a shot.
Link to FEX

Clustering an image using Gaussian mixture models

I want to use GMM(Gaussian mixture models for clustering a binary image and also want to plot the cluster centroids on the binary image itself.
I am using this as my reference:
http://in.mathworks.com/help/stats/gaussian-mixture-models.html
This is my initial code
I=im2double(imread('sil10001.pbm'));
K = I(:);
mu=mean(K);
sigma=std(K);
P=normpdf(K, mu, sigma);
Z = norminv(P,mu,sigma);
X = mvnrnd(mu,sigma,1110);
X=reshape(X,111,10);
scatter(X(:,1),X(:,2),10,'ko');
options = statset('Display','final');
gm = fitgmdist(X,2,'Options',options);
idx = cluster(gm,X);
cluster1 = (idx == 1);
cluster2 = (idx == 2);
scatter(X(cluster1,1),X(cluster1,2),10,'r+');
hold on
scatter(X(cluster2,1),X(cluster2,2),10,'bo');
hold off
legend('Cluster 1','Cluster 2','Location','NW')
P = posterior(gm,X);
scatter(X(cluster1,1),X(cluster1,2),10,P(cluster1,1),'+')
hold on
scatter(X(cluster2,1),X(cluster2,2),10,P(cluster2,1),'o')
hold off
legend('Cluster 1','Cluster 2','Location','NW')
clrmap = jet(80); colormap(clrmap(9:72,:))
ylabel(colorbar,'Component 1 Posterior Probability')
But the problem is that I am unable to plot the cluster centroids received from GMM in the primary binary image.How do i do this?
**Now suppose i have 10 such images in a sequence And i want to store the information of their mean position in two cell array then how do i do that.This is my code foe my new question **
images=load('gait2go.mat');%load the matrix file
for i=1:10
I{i}=images.result{i};
I{i}=im2double(I{i});
%determine 'white' pixels, size of image can be [M N], [M N 3] or [M N 4]
Idims=size(I{i});
whites=true(Idims(1),Idims(2));
df=I{i};
%we add up the various color channels
for colori=1:size(df,3)
whites=whites & df(:,:,colori)>0.5;
end
%choose indices of 'white' pixels as coordinates of data
[datax datay]=find(whites);
%cluster data into 10 clumps
K = 10; % number of mixtures/clusters
cInd = kmeans([datax datay], K, 'EmptyAction','singleton',...
'maxiter',1000,'start','cluster');
%get clusterwise means
meanx=zeros(K,1);
meany=zeros(K,1);
for i=1:K
meanx(i)=mean(datax(cInd==i));
meany(i)=mean(datay(cInd==i));
end
xc{i}=meanx(i);%cell array contaning the position of the mean for the 10
images
xb{i}=meany(i);
figure;
gscatter(datay,-datax,cInd); %funky coordinates for plotting according to
image
axis equal;
hold on;
scatter(meany,-meanx,20,'+'); %same funky coordinates
end
I am able to get 10 images segmented but no the values of themean stored in the cell arrays xc and xb.They r only storing [] in place of the values of means
I decided to post an answer to your question (where your question was determined by a maximum-likelihood guess:P), but I wrote an extensive introduction. Please read carefully, as I think you have difficulties understanding the methods you want to use, and you have difficulties understanding why others can't help you with your usual approach of asking questions. There are several problems with your question, both code-related and conceptual. Let's start with the latter.
The problem with the problem
You say that you want to cluster your image with Gaussian mixture modelling. While I'm generally not familiar with clustering, after a look through your reference and the wonderful SO answer you cited elsewhere (and a quick 101 from #rayryeng) I think you are on the wrong track altogether.
Gaussian mixture modelling, as its name suggests, models your data set with a mixture of Gaussian (i.e. normal) distributions. The reason for the popularity of this method is that when you do measurements of all sorts of quantities, in many cases you will find that your data is mostly distributed like a normal distribution (which is actually the reason why it's called normal). The reason behind this is the central limit theorem, which implies that the sum of reasonably independent random variables tends to be normal in many cases.
Now, clustering, on the other hand, simply means separating your data set into disjoint smaller bunches based on some criteria. The main criterion is usually (some kind of) distance, so you want to find "close lumps of data" in your larger data set. You usually need to cluster your data before performing a GMM, because it's already hard enough to find the Gaussians underlying your data without having to guess the clusters too. I'm not familiar enough with the procedures involved to tell how well GMM algorithms can work if you just let them work on your raw data (but I expect that many implementations start with a clustering step anyway).
To get closer to your question: I guess you want to do some kind of image recognition. Looking at the picture, you want to get more strongly correlated lumps. This is clustering. If you look at a picture of a zoo, you'll see, say, an elephant and a snake. Both have their distinct shapes, and they are well separated from one another. If you cluster your image (and the snake is not riding the elephant, neither did it eat it), you'll find two lumps: one lump elephant-shaped, and one lump snake-shaped. Now, it wouldn't make sense to use GMM on these data sets: elephants, and especially snakes, are not shaped like multivariate Gaussian distributions. But you don't need this in the first place, if you just want to know where the distinct animals are located in your picture.
Still staying with the example, you should make sure that you cluster your data into an appropriate number of subsets. If you try to cluster your zoo picture into 3 clusters, you might get a second, spurious snake: the nose of the elephant. With an increasing number of clusters your partitioning might make less and less sense.
Your approach
Your code doesn't give you anything reasonable, and there's a very good reason for that: it doesn't make sense from the start. Look at the beginning:
I=im2double(imread('sil10001.pbm'));
K = I(:);
mu=mean(K);
sigma=std(K);
X = mvnrnd(mu,sigma,1110);
X=reshape(X,111,10);
You read your binary image, convert it to double, then stretch it out into a vector and compute the mean and deviation of that vector. You basically smear your intire image into 2 values: an average intensity and a deviation. And THEN you generate 111*10 standard normal points with these parameters, and try to do GMM on the first two sets of 111. Which are both independently normal with the same parameter. So you probably get two overlapping Gaussians around the same mean with the same deviation.
I think the examples you found online confused you. When you do GMM, you already have your data, so no pseudo-normal numbers should be involved. But when people post examples, they also try to provide reproducible inputs (well, some of them do, nudge nudge wink wink). A simple method for this is to generate a union of simple Gaussians, which can then be fed into GMM.
So, my point is, that you don't have to generate random numbers, but have to use the image data itself as input to your procedure. And you probably just want to cluster your image, instead of actually using GMM to draw potatoes over your cluster, since you want to cluster body parts in an image about a human. Most body parts are not shaped like multivariate Gaussians (with a few distinct exceptions for men and women).
What I think you should do
If you really want to cluster your image, like in the figure you added to your question, then you should use a method like k-means. But then again, you already have a program that does that, don't you? So I don't really think I can answer the question saying "How can I cluster my image with GMM?". Instead, here's an answer to "How can I cluster my image?" with k-means, but at least there will be a piece of code here.
%set infile to what your image file will be
infile='sil10001.pbm';
%read file
I=im2double(imread(infile));
%determine 'white' pixels, size of image can be [M N], [M N 3] or [M N 4]
Idims=size(I);
whites=true(Idims(1),Idims(2));
%we add up the various color channels
for colori=1:Idims(3)
whites=whites & I(:,:,colori)>0.5;
end
%choose indices of 'white' pixels as coordinates of data
[datax datay]=find(whites);
%cluster data into 10 clumps
K = 10; % number of mixtures/clusters
cInd = kmeans([datax datay], K, 'EmptyAction','singleton',...
'maxiter',1000,'start','cluster');
%get clusterwise means
meanx=zeros(K,1);
meany=zeros(K,1);
for i=1:K
meanx(i)=mean(datax(cInd==i));
meany(i)=mean(datay(cInd==i));
end
figure;
gscatter(datay,-datax,cInd); %funky coordinates for plotting according to image
axis equal;
hold on;
scatter(meany,-meanx,20,'ko'); %same funky coordinates
Here's what this does. It first reads your image as double like yours did. Then it tries to determine "white" pixels by checking that each color channel (of which can be either 1, 3 or 4) is brighter than 0.5. Then your input data points to the clustering will be the x and y "coordinates" (i.e. indices) of your white pixels.
Next it does the clustering via kmeans. This part of the code is loosely based on the already cited answer of Amro. I had to set a large maximal number of iterations, as the problem is ill-posed in the sense that there aren't 10 clear clusters in the picture. Then we compute the mean for each cluster, and plot the clusters with gscatter, and the means with scatter. Note that in order to have the picture facing in the right directions in a scatter plot you have to shift around the input coordinates. Alternatively you could define datax and datay correspondingly at the beginning.
And here's my output, run with the already processed figure you provided in your question:
I do believe you must had made a naive mistake in the plot and that's why you see just a straight line: You are plotting only the x values.
In my opinion, the second argument in the scatter command should be X(cluster1,2) or X(cluster2,2) depending on which scatter command is being used in the code.
The code can be made more simple:
%read file
I=im2double(imread('sil10340.pbm'));
%choose indices of 'white' pixels as coordinates of data
[datax datay]=find(I);
%cluster data into 10 clumps
K = 10; % number of mixtures/clusters
[cInd, c] = kmeans([datax datay], K, 'EmptyAction','singleton',...
'maxiter',1000,'start','cluster');
figure;
gscatter(datay,-datax,cInd); %funky coordinates for plotting according to
image
axis equal;
hold on;
scatter(c(:,2),-c(:,1),20,'ko'); %same funky coordinates
I don't think there is nay need for the looping as the c itself return a 10x2 double array which contains the position of the means

k-means clustering using function 'kmeans' in MATLAB

I have this matrix:
x = [2+2*i 2-2*i -2+2*i -2-2*i];
I want to simulate transmitting it and adding noise to it. I represented the components of the complex number as below:
A = randn(150, 2) + 2*ones(150, 2); C = randn(150, 2) - 2*ones(150, 2);
At the receiver, I received the below vector, where the components are ordered based on what I sent originally, i.e., the components of x).
X = [A A A C C A C C];
Now I want to apply the kmeans(X) to have four clusters, so kmeans(X, 4). I am experiencing the following problems:
I am not sure if I can represent the complex numbers as shown in X above.
I can't plot the result of the kmeans to show the clusters.
I could not understand the clusters centroid results.
How can I find the best error rate, if this example was to represent a communication system and at the receiver, k-means clustering was used in order to decide what the transmitted signal was?
If you don't "understand" the cluster centroid results, then you don't understand how k-means works. I'll present a small summary here.
How k-means works is that for some data that you have, you want to group them into k groups. You initially choose k random points in your data, and these will have labels from 1,2,...,k. These are what we call the centroids. Then, you determine how close the rest of the data are to each of these points. You then group those points so that whichever points are closest to any of these k points, you assign those points to belong to that particular group (1,2,...,k). After, for all of the points for each group, you update the centroids, which actually is defined as the representative point for each group. For each group, you compute the average of all of the points in each of the k groups. These become the new centroids for the next iteration. In the next iteration, you determine how close each point in your data is to each of the centroids. You keep iterating and repeating this behaviour until the centroids don't move anymore, or they move very little.
Now, let's answer your questions one-by-one.
1. Complex number representation
k-means in MATLAB doesn't define how complex data is handled. A common way for people to deal with complex numbered data is to split up the real and imaginary parts into separate dimensions as you have done. This is a perfectly valid way to use k-means for complex valued data.
See this post on the MathWorks MATLAB forum for more details: https://www.mathworks.com/matlabcentral/newsreader/view_thread/78306
2. Plot the results
You aren't constructing your matrix X properly. Note that A and C are both 150 x 2 matrices. You need to structure X such that each row is a point, and each column is a variable. Therefore, you need to concatenate your A and C row-wise. Therefore:
X = [A; A; A; C; C; A; C; C];
Note that you have duplicate points. This is actually no different than doing X = [A; C]; as far as kmeans is concerned. Perhaps you should generate X, then add the noise in rather than taking A and C, adding noise, then constructing your signal.
Now, if you want to plot the results as well as the centroids, what you need to do is use the two output version of kmeans like so:
[idx, centroids] = kmeans(X, 4);
idx will contain the cluster number that each point in X belongs to, and centroids will be a 4 x 2 matrix where each row tells you the mean of each cluster found in the data. If you want to plot the data, as well as the clusters, you simply need to do following. I'm going to loop over each cluster membership and plot the results on a figure. I'm also going to colour in where the mean of each cluster is located:
x = X(:,1);
y = X(:,2);
figure;
hold on;
colors = 'rgbk';
for num = 1 : 4
plot(x(idx == num), y(idx == num), [colors(num) '.']);
end
plot(centroids(:,1), centroids(:,2), 'c.', 'MarkerSize', 14);
grid;
The above code goes through each cluster, plots them in a different colour, then plots the centroids in cyan with a slightly larger thickness so you can see what the graph looks like.
This is what I get:
3. Understanding centroid results
This is probably because you didn't construct X properly. This is what I get for my centroids:
centroids =
-1.9176 -2.0759
1.5980 2.8071
2.7486 1.6147
0.8202 0.8025
This is pretty self-explanatory and I talked about how this is structured earlier.
4. Best representation of the signal
What you can do is repeat the clustering a number of times, then the algorithm will decide what the best clustering was out of these times. You would simply use the Replicates flag and denote how many times you want this run. Obviously, the more times you run this, the better your results may be. Therefore, do something like:
[idx, centroids] = kmeans(X, 4, 'Replicates', 5);
This will run kmeans 5 times and give you the best centroids of these 5 times.
Now, if you want to determine what the best sequence that was transmitted, you'd have to split up your X into 150 rows each (as your random sequence was 150 elements), then run a separate kmeans on each subset. You can try to find the best representation of each part of the sequence by using the Replicates flag each time.... so you can do something like:
for num = 1 : 8
%// Look at 150 points at a time
[idx, centroids] = kmeans(X((num-1)*150 + 1 : num*150, :), 4, 'Replicates', 5);
%// Do your analysis
%//...
%//...
end
idx and centroids would be the results for each portion of your transmitted signal. You probably want to look at centroids at each iteration to determine what symbol was transmitted at a particular time.
If you want to plot the decision regions, then you're probably looking for a Voronoi diagram. All you do is given a set of points that are defined within the domain of your problem, you just have to determine which cluster each point belongs to. Given that our data spans between -5 <= (x,y) <= 5, let's go through each point in the grid and determine which cluster each point belongs to. We'd then colour the appropriate point according to which cluster it belongs to.
Something like:
colors = 'rgbk';
[X,Y] = meshgrid(-5:0.05:5, -5:0.05:5);
X = X(:);
Y = Y(:);
figure;
hold on;
for idx = 1 : numel(X)
[~,ind] = min(sum(bsxfun(#minus, [X(idx) Y(idx)], centroids).^2, 2));
plot(X(idx), Y(idx), [colors(ind), '.']);
end
plot(centroids(:,1), centroids(:,2), 'c.', 'MarkerSize', 14);
The above code will plot the decision regions / Voronoi diagram of the particular configuration, as well as where the cluster centres are located. Note that the code is rather unoptimized and it'll take a while for the graph to generate, but I wanted to write something quick to illustrate my point.
Here's what the decision regions look like:
Hope this helps! Good luck!

Improving Texture Segmentation Results on Matlab

Picture after segmentation with Euclidean distance (just absolute , not absolute squared)
Original texture picture
I'm getting the result above (picture 1) when I perform clustering using Kmeans algorithm and Laws Texture Energy filters (with cluster centroids / groups =6)
What are the possible ways of improving the result ? As can be seen from the result, there is no clear demarcation of the textures.
Could dilation /erosion somehow be implemented for the same ? If yes, please guide.
Analysing the texture using k-means cause you to disregard spatial relations between neighboring pixels: If i and j are next to each other, then it is highly likely that they share the same texutre.
One way of introducing such spatial information is using pair-wise energy that can be optimized using graph cuts or belief-propagation (among other things).
Suppose you have n pixels in the image and L centroids in your k-means, then
D is an L-by-n matrix with D(i,l) is the distance of pixel i to center l.
If you choose to use graph cuts, you can download my wrapper (don't forget to compile it) and then, in Matlab:
>> sz = size( img ); % n should be numel(img)
>> [ii jj] = sparse_adj_matrix( sz, 1, 1 ); % define 4-connect neighbor grid
>> grid = sparse( ii, jj, 1, n, n );
>> gch = GraphCut('open', D, ones( L ) - eye(L), grid );
>> [gch ll] = GraphCut('expand', gch );
>> gch = GraphCut('close', gch );
>> ll = reshape( double(ll)+1, sz );
>> figure; imagesc(ll);colormap (rand(L,3) ); title('resulting clusters'); axis image;
You can find sparse_adj_matrix here.
For a recent implementation of many optimization algorithms, take a look at opengm package.
With respect morphological filtering i suggest this reference: Texture Segmentation Using Area Morphology Local Granulometries. The paper basically describes a morphological area opening filter which removes grayscale components which are smaller than a given area parameter threshold. In binary images the local granulometric size distributions can be generated by placing a window at each image pixel position and, after each opening operation, counting the number of remaining pixels within. This results in a local size distribution, that can be normalised to give the local pdf . Differentiating the pattern spectra gives the density that yields the local pattern spectrum at the pixel, providing a probability density which contains textural information local to each pixel position.
Here is an example to use the granulometries of an image. They are basically non linear scale spaces which work on the area of the grayscale components. The basic intuition is each texture can be characterized based on their spectrum of areas of their grayscale components. A simple binary area opening filter is available in Matlab.