How to generate this shape in Matlab? - matlab

In matlab, how to generate two clusters of random points like the following graph. Can you show me the scripts/code?

If you want to generate such data points, you will need to have their probability distribution to be able to generate the points.
For your point, I do not have the real distributions, so I can only give an approximation. From your figure I see that both lay approximately on a circle, with a random radius and a limited span for the angle. I assume those angles and radii are uniformly distributed over certain ranges, which seems like a pretty good starting point.
Therefore it also makes sense to generate the random data in polar coordinates (i.e. angle and radius) instead of the cartesian ones (i.e. horizontal and vertical), and transform them to allow plotting.
C1 = [0 0]; % center of the circle
C2 = [-5 7.5];
R1 = [8 10]; % range of radii
R2 = [8 10];
A1 = [1 3]*pi/2; % [rad] range of allowed angles
A2 = [-1 1]*pi/2;
nPoints = 500;
urand = #(nPoints,limits)(limits(1) + rand(nPoints,1)*diff(limits));
randomCircle = #(n,r,a)(pol2cart(urand(n,a),urand(n,r)));
[P1x,P1y] = randomCircle(nPoints,R1,A1);
P1x = P1x + C1(1);
P1y = P1y + C1(2);
[P2x,P2y] = randomCircle(nPoints,R2,A2);
P2x = P2x + C2(1);
P2y = P2y + C2(2);
figure
plot(P1x,P1y,'or'); hold on;
plot(P2x,P2y,'sb'); hold on;
axis square
This yields:
This method works relatively well when you deal with distributions that you can transform easily and when you can easily describe the possible locations of the points. If you cannot, there are other methods such as the inverse transforming sampling method which offer algorithms to generate the data instead of manual variable transformations as I did here.

K-means is not going to give you what you want.
For K-means, vectors are classified based on their nearest cluster center. I can only think of two ways you could get the non-convex assignment shown in the picture:
Your input data is actually higher-dimensional, and your sample image is just a 2-d projection.
You're using a distance metric with different scaling across the dimensions.
To achieve your aim:
Use a non-linear clustering algorithm.
Apply a non-linear transform to your input data. (Probably not feasible).
You can find a list on non-linear clustering algorithms here. Specifically, look at this reference on the MST clustering page. Your exact shape appears on the fourth page of the PDF together with a comparison of what happens with K-Means.
For existing MATLAB code, you could try this Kernel K-Means implementation. Also, check out the Clustering Toolbox.

Assuming that you really want to do the clustering operation on existing data, as opposed to generating the data itself. Since you have a plot of some data, it seems logical that you already know how to do that! If I am wrong in this assumption, then you should word your questions more carefully in the future.
The human brain is quite good at seeing patterns in things like this, that writing a code for on a computer will often take some serious effort.
As has been said already, traditional clustering tools such as k-means will fail. Luckily, the image processing toolbox has good tools for these purposes already written. I might suggest converting the plot into an image, using filled in dots to plot the points. Make sure the dots are large enough that they touch each other within a cluster, with some overlap. Then use dilation/erosion tools if necessary to make sure that any small cracks are filled in, but don't go so far as to cause the clusters to merge. Finally, use region segmentation tools to pick out the clusters. Once done, transform back from pixel units in the image into your spatial units, and you have accomplished your task.
For the image processing approach to work, you will need sufficient separation between the clusters compared to the coarseness within a cluster. But that seems obvious for any method to succeed.

Related

Matlab watershed algorithm - control separation width

I am interested in separating features on an image using the watershed algorithm. Using the matlab tutorial, I tried to write a small proof of principle algorithm that I can further use in my image analysis.
Im = imread('../../Pictures/testrec.png');
bw = rgb2gray(Im);
figure
imshow(bw,'InitialMagnification','fit'), title('bw')
%Compute the distance transform of the complement of the binary image.
D = bwdist(~bw);
figure
imshow(D,[],'InitialMagnification','fit')
title('Distance transform of ~bw')
%Complement the distance transform, and force pixels that don't belong to the objects to be at Inf .
D = -D;
D(~bw) = Inf;
%Compute the watershed transform and display the resulting label matrix as an RGB image.
L = watershed(D);
L(~bw) = 0;
rgb = label2rgb(L,'jet',[.5 .5 .5]);
figure
imshow(rgb,'InitialMagnification','fit')
title('Watershed transform of D')
It appears that the feature separation is somewhat random, as can be seen from the prolonged feature in the middle. However, there does not seem to be any parameters for the watershed algorithm, that could be used to optimize its performance. Can you suggest how such parameter can be introduced, or a better algorithm to process the data.
Bonus Question: I am interested to first separate my image using bwconncomp, then selectively apply the watershed algorithm to only some of the regions. Assume I know which of the cc.PixelIdxList regions I want to apply the algorithm to - how do I get a new PixelIdxList with separated components.
Watershed transformation cannot separate convex shapes.
There is no way to change that. A convex shape always results in one object.
Blobs very close to convex will always result in poor watershed results.
The only reason why you have that "somewhat random" result instead of a single basin is that a few pixels are a bit off the perimeter.
Results of watershed are improved by pre- and post-processing. But that would be very specific to a certain problem.

How to compute distance and estimate quality of heterogeneous grids in Matlab?

I want to evaluate the grid quality where all coordinates differ in the real case.
Signal is of a ECG signal where average life-time is 75 years.
My task is to evaluate its age at the moment of measurement, which is an inverse problem.
I think 2D approximation of the 3D case is hard (done here by Abo-Zahhad) with with 3-leads (2 on chest and one at left leg - MIT-BIT arrhythmia database):
where f is a piecewise continuous function in R^2, \epsilon is the error matrix and A is a 2D matrix.
Now, I evaluate the average grid distance in x-axis (time) and average grid distance in y-axis (energy).
I think this can be done by Matlab's Image Analysis toolbox.
However, I am not sure how complete the toolbox's approaches are.
I think a transform approach must be used in the setting of uneven and noncontinuous grids. One approach is exact linear time euclidean distance transforms of grid line sampled shapes by Joakim Lindblad et all.
The method presents a distance transform (DT) which assigns to each image point its smallest distance to a selected subset of image points.
This kind of approach is often a basis of algorithms for many methods in image analysis.
I tested unsuccessfully the case with bwdist (Distance transform of binary image) with chessboard (returns empty square matrix), cityblock, euclidean and quasi-euclidean where the last three options return full matrix.
Another pseudocode
% https://stackoverflow.com/a/29956008/54964
%// retrieve picture
imgRGB = imread('dummy.png');
%// detect lines
imgHSV = rgb2hsv(imgRGB);
BW = (imgHSV(:,:,3) < 1);
BW = imclose(imclose(BW, strel('line',40,0)), strel('line',10,90));
%// clear those masked pixels by setting them to background white color
imgRGB2 = imgRGB;
imgRGB2(repmat(BW,[1 1 3])) = 255;
%// show extracted signal
imshow(imgRGB2)
where I think the approach will not work here because the grids are not necessarily continuous and not necessary ideal.
pdist based on the Lumbreras' answer
In the real examples, all coordinates differ such that pdist hamming and jaccard are always 1 with real data.
The options euclidean, cytoblock, minkowski, chebychev, mahalanobis, cosine, correlation, and spearman offer some descriptions of the data.
However, these options make me now little sense in such full matrices.
I want to estimate how long the signal can live.
Sources
J. Müller, and S. Siltanen. Linear and nonlinear inverse problems with practical applications.
EIT with the D-bar method: discontinuous heart-and-lungs phantom. http://wiki.helsinki.fi/display/mathstatHenkilokunta/EIT+with+the+D-bar+method%3A+discontinuous+heart-and-lungs+phantom Visited 29-Feb 2016.
There is a function in Matlab defined as pdist which computes the pairwisedistance between all row elements in a matrix and enables you to choose the type of distance you want to use (Euclidean, cityblock, correlation). Are you after something like this? Not sure I understood your question!
cheers!
Simply, do not do it in the post-processing. Those artifacts of the body can be about about raster images, about the viewer and/or ... Do quality assurance in the signal generation/processing step.
It is much easier to evaluate the original signal than its views.

Using matlab to obtain the vector fields and the angles made by the vector field on a closed curve?

Here is the given system I want to plot and obtain the vector field and the angles they make with the x axis. I want to find the index of a closed curve.
I know how to do this theoretically by choosing convenient points and see how the vector looks like at that point. Also I can always use
to compute the angles. However I am having trouble trying to code it. Please don't mark me down if the question is unclear. I am asking it the way I understand it. I am new to matlab. Can someone point me in the right direction please?
This is a pretty hard challenge for someone new to matlab, I would recommend taking on some smaller challenges first to get you used to matlab's conventions.
That said, Matlab is all about numerical solutions so, unless you want to go down the symbolic maths route (and in that case I would probably opt for Mathematica instead), your first task is to decide on the limits and granularity of your simulated space, then define them so you can apply your system of equations to it.
There are lots of ways of doing this - some more efficient - but for ease of understanding I propose this:
Define the axes individually first
xpts = -10:0.1:10;
ypts = -10:0.1:10;
tpts = 0:0.01:10;
The a:b:c syntax gives you the lower limit (a), the upper limit (c) and the spacing (b), so you'll get 201 points for the x. You could use the linspace notation if that suits you better, look it up by typing doc linspace into the matlab console.
Now you can create a grid of your coordinate points. You actually end up with three 3d matrices, one holding the x-coords of your space and the others holding the y and t. They look redundant, but it's worth it because you can use matrix operations on them.
[XX, YY, TT] = meshgrid(xpts, ypts, tpts);
From here on you can perform whatever operations you like on those matrices. So to compute x^2.y you could do
x2y = XX.^2 .* YY;
remembering that you'll get a 3d matrix out of it and all the slices in the third dimension (corresponding to t) will be the same.
Some notes
Matlab has a good builtin help system. You can type 'help functionname' to get a quick reminder in the console or 'doc functionname' to open the help browser for details and examples. They really are very good, they'll help enormously.
I used XX and YY because that's just my preference, but I avoid single-letter variable names as a general rule. You don't have to.
Matrix multiplication is the default so if you try to do XX*YY you won't get the answer you expect! To do element-wise multiplication use the .* operator instead. This will do a11 = b11*c11, a12 = b12*c12, ...
To raise each element of the matrix to a given power use .^rather than ^ for similar reasons. Likewise division.
You have to make sure your matrices are the correct size for your operations. To do elementwise operations on matrices they have to be the same size. To do matrix operations they have to follow the matrix rules on sizing, as will the output. You will find the size() function handy for debugging.
Plotting vector fields can be done with quiver. To plot the components separately you have more options: surf, contour and others. Look up the help docs and they will link to similar types. The plot family are mainly about lines so they aren't much help for fields without creative use of the markers, colours and alpha.
To plot the curve, or any other contour, you don't have to test the values of a matrix - it won't work well anyway because of the granularity - you can use the contour plot with specific contour values.
Solving systems of dynamic equations is completely possible, but you will be doing a numeric simulation and your results will again be subject to the granularity of your grid. If you have closed form solutions, like your phi expression, they may be easier to work with conceptually but harder to get working in matlab.
This kind of problem is tractable in matlab but it involves some non-basic uses which are pretty hard to follow until you've got your head round Matlab's syntax. I would advise to start with a 2d grid instead
[XX, YY] = meshgrid(xpts, ypts);
and compute some functions of that like x^2.y or x^2 - y^2. Get used to plotting them using quiver or plotting the coordinates separately in intensity maps or surfaces.

Clustering an image using Gaussian mixture models

I want to use GMM(Gaussian mixture models for clustering a binary image and also want to plot the cluster centroids on the binary image itself.
I am using this as my reference:
http://in.mathworks.com/help/stats/gaussian-mixture-models.html
This is my initial code
I=im2double(imread('sil10001.pbm'));
K = I(:);
mu=mean(K);
sigma=std(K);
P=normpdf(K, mu, sigma);
Z = norminv(P,mu,sigma);
X = mvnrnd(mu,sigma,1110);
X=reshape(X,111,10);
scatter(X(:,1),X(:,2),10,'ko');
options = statset('Display','final');
gm = fitgmdist(X,2,'Options',options);
idx = cluster(gm,X);
cluster1 = (idx == 1);
cluster2 = (idx == 2);
scatter(X(cluster1,1),X(cluster1,2),10,'r+');
hold on
scatter(X(cluster2,1),X(cluster2,2),10,'bo');
hold off
legend('Cluster 1','Cluster 2','Location','NW')
P = posterior(gm,X);
scatter(X(cluster1,1),X(cluster1,2),10,P(cluster1,1),'+')
hold on
scatter(X(cluster2,1),X(cluster2,2),10,P(cluster2,1),'o')
hold off
legend('Cluster 1','Cluster 2','Location','NW')
clrmap = jet(80); colormap(clrmap(9:72,:))
ylabel(colorbar,'Component 1 Posterior Probability')
But the problem is that I am unable to plot the cluster centroids received from GMM in the primary binary image.How do i do this?
**Now suppose i have 10 such images in a sequence And i want to store the information of their mean position in two cell array then how do i do that.This is my code foe my new question **
images=load('gait2go.mat');%load the matrix file
for i=1:10
I{i}=images.result{i};
I{i}=im2double(I{i});
%determine 'white' pixels, size of image can be [M N], [M N 3] or [M N 4]
Idims=size(I{i});
whites=true(Idims(1),Idims(2));
df=I{i};
%we add up the various color channels
for colori=1:size(df,3)
whites=whites & df(:,:,colori)>0.5;
end
%choose indices of 'white' pixels as coordinates of data
[datax datay]=find(whites);
%cluster data into 10 clumps
K = 10; % number of mixtures/clusters
cInd = kmeans([datax datay], K, 'EmptyAction','singleton',...
'maxiter',1000,'start','cluster');
%get clusterwise means
meanx=zeros(K,1);
meany=zeros(K,1);
for i=1:K
meanx(i)=mean(datax(cInd==i));
meany(i)=mean(datay(cInd==i));
end
xc{i}=meanx(i);%cell array contaning the position of the mean for the 10
images
xb{i}=meany(i);
figure;
gscatter(datay,-datax,cInd); %funky coordinates for plotting according to
image
axis equal;
hold on;
scatter(meany,-meanx,20,'+'); %same funky coordinates
end
I am able to get 10 images segmented but no the values of themean stored in the cell arrays xc and xb.They r only storing [] in place of the values of means
I decided to post an answer to your question (where your question was determined by a maximum-likelihood guess:P), but I wrote an extensive introduction. Please read carefully, as I think you have difficulties understanding the methods you want to use, and you have difficulties understanding why others can't help you with your usual approach of asking questions. There are several problems with your question, both code-related and conceptual. Let's start with the latter.
The problem with the problem
You say that you want to cluster your image with Gaussian mixture modelling. While I'm generally not familiar with clustering, after a look through your reference and the wonderful SO answer you cited elsewhere (and a quick 101 from #rayryeng) I think you are on the wrong track altogether.
Gaussian mixture modelling, as its name suggests, models your data set with a mixture of Gaussian (i.e. normal) distributions. The reason for the popularity of this method is that when you do measurements of all sorts of quantities, in many cases you will find that your data is mostly distributed like a normal distribution (which is actually the reason why it's called normal). The reason behind this is the central limit theorem, which implies that the sum of reasonably independent random variables tends to be normal in many cases.
Now, clustering, on the other hand, simply means separating your data set into disjoint smaller bunches based on some criteria. The main criterion is usually (some kind of) distance, so you want to find "close lumps of data" in your larger data set. You usually need to cluster your data before performing a GMM, because it's already hard enough to find the Gaussians underlying your data without having to guess the clusters too. I'm not familiar enough with the procedures involved to tell how well GMM algorithms can work if you just let them work on your raw data (but I expect that many implementations start with a clustering step anyway).
To get closer to your question: I guess you want to do some kind of image recognition. Looking at the picture, you want to get more strongly correlated lumps. This is clustering. If you look at a picture of a zoo, you'll see, say, an elephant and a snake. Both have their distinct shapes, and they are well separated from one another. If you cluster your image (and the snake is not riding the elephant, neither did it eat it), you'll find two lumps: one lump elephant-shaped, and one lump snake-shaped. Now, it wouldn't make sense to use GMM on these data sets: elephants, and especially snakes, are not shaped like multivariate Gaussian distributions. But you don't need this in the first place, if you just want to know where the distinct animals are located in your picture.
Still staying with the example, you should make sure that you cluster your data into an appropriate number of subsets. If you try to cluster your zoo picture into 3 clusters, you might get a second, spurious snake: the nose of the elephant. With an increasing number of clusters your partitioning might make less and less sense.
Your approach
Your code doesn't give you anything reasonable, and there's a very good reason for that: it doesn't make sense from the start. Look at the beginning:
I=im2double(imread('sil10001.pbm'));
K = I(:);
mu=mean(K);
sigma=std(K);
X = mvnrnd(mu,sigma,1110);
X=reshape(X,111,10);
You read your binary image, convert it to double, then stretch it out into a vector and compute the mean and deviation of that vector. You basically smear your intire image into 2 values: an average intensity and a deviation. And THEN you generate 111*10 standard normal points with these parameters, and try to do GMM on the first two sets of 111. Which are both independently normal with the same parameter. So you probably get two overlapping Gaussians around the same mean with the same deviation.
I think the examples you found online confused you. When you do GMM, you already have your data, so no pseudo-normal numbers should be involved. But when people post examples, they also try to provide reproducible inputs (well, some of them do, nudge nudge wink wink). A simple method for this is to generate a union of simple Gaussians, which can then be fed into GMM.
So, my point is, that you don't have to generate random numbers, but have to use the image data itself as input to your procedure. And you probably just want to cluster your image, instead of actually using GMM to draw potatoes over your cluster, since you want to cluster body parts in an image about a human. Most body parts are not shaped like multivariate Gaussians (with a few distinct exceptions for men and women).
What I think you should do
If you really want to cluster your image, like in the figure you added to your question, then you should use a method like k-means. But then again, you already have a program that does that, don't you? So I don't really think I can answer the question saying "How can I cluster my image with GMM?". Instead, here's an answer to "How can I cluster my image?" with k-means, but at least there will be a piece of code here.
%set infile to what your image file will be
infile='sil10001.pbm';
%read file
I=im2double(imread(infile));
%determine 'white' pixels, size of image can be [M N], [M N 3] or [M N 4]
Idims=size(I);
whites=true(Idims(1),Idims(2));
%we add up the various color channels
for colori=1:Idims(3)
whites=whites & I(:,:,colori)>0.5;
end
%choose indices of 'white' pixels as coordinates of data
[datax datay]=find(whites);
%cluster data into 10 clumps
K = 10; % number of mixtures/clusters
cInd = kmeans([datax datay], K, 'EmptyAction','singleton',...
'maxiter',1000,'start','cluster');
%get clusterwise means
meanx=zeros(K,1);
meany=zeros(K,1);
for i=1:K
meanx(i)=mean(datax(cInd==i));
meany(i)=mean(datay(cInd==i));
end
figure;
gscatter(datay,-datax,cInd); %funky coordinates for plotting according to image
axis equal;
hold on;
scatter(meany,-meanx,20,'ko'); %same funky coordinates
Here's what this does. It first reads your image as double like yours did. Then it tries to determine "white" pixels by checking that each color channel (of which can be either 1, 3 or 4) is brighter than 0.5. Then your input data points to the clustering will be the x and y "coordinates" (i.e. indices) of your white pixels.
Next it does the clustering via kmeans. This part of the code is loosely based on the already cited answer of Amro. I had to set a large maximal number of iterations, as the problem is ill-posed in the sense that there aren't 10 clear clusters in the picture. Then we compute the mean for each cluster, and plot the clusters with gscatter, and the means with scatter. Note that in order to have the picture facing in the right directions in a scatter plot you have to shift around the input coordinates. Alternatively you could define datax and datay correspondingly at the beginning.
And here's my output, run with the already processed figure you provided in your question:
I do believe you must had made a naive mistake in the plot and that's why you see just a straight line: You are plotting only the x values.
In my opinion, the second argument in the scatter command should be X(cluster1,2) or X(cluster2,2) depending on which scatter command is being used in the code.
The code can be made more simple:
%read file
I=im2double(imread('sil10340.pbm'));
%choose indices of 'white' pixels as coordinates of data
[datax datay]=find(I);
%cluster data into 10 clumps
K = 10; % number of mixtures/clusters
[cInd, c] = kmeans([datax datay], K, 'EmptyAction','singleton',...
'maxiter',1000,'start','cluster');
figure;
gscatter(datay,-datax,cInd); %funky coordinates for plotting according to
image
axis equal;
hold on;
scatter(c(:,2),-c(:,1),20,'ko'); %same funky coordinates
I don't think there is nay need for the looping as the c itself return a 10x2 double array which contains the position of the means

Convert grayscale image to point cloud (similar to dither)

I'm currently trying to implement a method to generate TSP art, and for that I need a list of points (x,y), the local density of which is proportional to the gray scale pixel value of a given image.
My first thought was: well that works pretty much like Inverse Transform Sampling for statistics (you want to draw a sample that matches a given probability density function but you can only create a sample that is uniformly distributed).
I implemented this and it works fairly well, as evident by executing this code:
%% Load image, adjust it for our needs
im=imread('http://goo.gl/DDwV3t'); %load random headshot from google
im=imadjust(im,stretchlim(im,[.01,.65]),[]);
im=im2double(rgb2gray(im));
im=im(10:end-5,50:end-5);
figure;imshow(im);title('original');
im=1-im; %we want black dots on white background
im=flipud(im); %and we want it the right way up
%% process per row
imrow = cumsum(im,2);
imrow=imrow*size(imrow,1)./repmat(max(imrow,[],2),1,size(imrow,2));
y=1:size(imrow,2);
ximrow_i = zeros(size(imrow));
for i = 1:size(imrow,1)
mask =logical([diff(imrow(i,:))>=0.01,0]); %needed for interp
ximrow_i(i,:) = interp1(imrow(i,mask),y(mask),y);
end
y=1:size(ximrow_i,1);
y=repmat(y',1,size(ximrow_i,2));
y1=y(1:5:end,1:5:end); %downscale a bit
ximcol_i1=ximrow_i(1:5:end,1:5:end); %downscale a bit
figure('Color','w');plot(ximcol_i1(:),y1(:),'k.');title('Inverse Transform Sampling on rows');
axis equal;axis off;
%% process per column
imcol=cumsum(im,1);
imcol=imcol*size(imcol,2)./repmat(max(imcol,[],1),size(imcol,1),1);
y=1:size(imcol,1);
yimcol_i=zeros(size(imcol));
for i = 1:size(imcol,2)
mask =logical([diff(imcol(:,i))>=0.01;0]);
yimcol_i(:,i) = interp1(imcol(mask,i),y(mask),y);
end
y=1:size(imcol,2);
y=repmat(y,size(imcol,1),1);
y1=y(1:5:end,1:5:end);
yimcol_i1=yimcol_i(1:5:end,1:5:end);
figure('Color','w');plot(y1(:),yimcol_i1(:),'k.');title('Inverse Transform Sampling on cols');
axis equal;axis off;
It has the shortcoming that I can only use this per-row or per-column, but not both. The Inverse Transform Sampling method does not work for multivariate PDFs in general, and I'm fairly sure I wont be able to get it to work in this case.
Is there a simple method to achieve my goal that I haven't seen yet?
I am aware that an algorithm called Voronoi Stippler has been used to create the desired result and I will investigate that, but for the moment I liked the simplicity of Inverse Transform Sampling and would like to know if I can extend that method to match my needs.
It turns out this is fairly simple and can be done by Rejection Sampling.
For the special case where the instrumental distribution is U(0,1) it works like this (if I understood it correctly):
im=imread('http://goo.gl/DDwV3t'); %load random headshot from google
im=imadjust(im,stretchlim(im,[.01,.65]),[]);
im=im2double(rgb2gray(im));
im=im(10:end-5,50:end-5);
im=1-flipud(im);
d = im > .9*rand(size(im));
d=d&(rand(size(d))>.95); %randomly sieve out some more points
[i,j]=ind2sub(size(d),find(d));
figure('Color','w');plot(j,i,'k.');title('Rejection Sampling');
axis equal;axis off;
The sampling is done in one line:
d = im > .9*rand(size(im));
Since I ended up with too many points I randomly sampled the result thus reducing the number of points by approximately the factor 20.
This is pretty much the result I originally desired.