How to process individual connected components from 3D volume in matlab? - matlab

I have a 3D dataset containing multiple connected components. Using matlab, I would like to compute a certain metric for each of these components (the metric is not included in the 'regionprops' function). My question is what is the best way to do it?
The metric I would like to compute is the surface area. I know how to do this for one connected component, but I'm looking for an efficient way to do it for all components that meet a certain volume criteria.
What I have so far:
cc = bwconncomp(data,26); % find components
L = labelmatrix(cc); %
stats = regionprops(data, 'area');
for i = 1:length(cc.PixelIdxList)
if stats(i,1).Area > threshold
a = (L==i);
surfaceArea(i,1) = compute_surface_area(a);
end
end
I'm sure there is a better way to do this!
Thanks in advance, N

You may want to use arrayfun that computes the surface area for each connected component whose area is above threshold.
idx = find([stats.Area]>threshold);
arrayfun(#(ii) compute_surface_area(L == ii), idx, 'UniformOutput', 0 )
Here, the for loop is written in a single line of code.

Related

Encode each training image as a histogram of the number of times each vocabulary element shows up for Bag of Visual Words

I want to implement bag of visual words in MATLAB. I used SURF features to extract features from the images and k-means to cluster those features into k clusters. I now have k centroids and I want to know how many times each cluster is used by assigning each image feature to its closet neighbor. Finally, I'd like to create a histogram of this for each image.
I tried to use knnsearch function but it doesn't work in this case.
Here is my MATLAB code:
clc;
clear;
close all;
folder = 'CarData/TrainImages/cars';
filePattern = fullfile(folder, '*.pgm');
f=dir(filePattern);
files={f.name};
for k=1:numel(files)
fullFileName = fullfile(folder, files{k});
H = fspecial('log');
image=imfilter(imread(fullFileName),H);
temp = detectSURFFeatures(image);
[im_features, temp] = extractFeatures(image, temp);
features{k}= im_features;
end
features = vertcat(features{:});
image_feats = [];
[assignments,centers] = kmeans(double(features),500);
vocab = centers';
I have all images feature in features array and cluster center in centroid array
You're almost there. You don't even need to use knnsearch at all. The assignments variable tells you which input feature mapped to which cluster. assignments will give you a N x 1 vector where N is the total number of examples you have, or the total number of features in the input matrix features. Each value assignments(i) tells you which cluster the example i (or row i) of features it maps to. The cluster centroid dictated by assignments(i) would be given as centers(i, :).
Therefore given how you've called kmeans, it will be a N x 1 vector where each element is from 1 to 500 with 500 being the total number of clusters desired.
Let's do the simple case where we only have one image in your codebook. If this is the case, all you have to do is create a histogram of the assignments variable. The output histogram h will be a 500 x 1 vector with each element h(i) being the number of times an example used centroid i as its representation in your codebook.
Just use the histcounts function and make sure that you specify the bin ranges so that they coincide with each cluster ID. You must make sure that you account for the ending bin, as the bin ranges are exclusive on the right edge so just add an additional bin to the end.
Something like this will work:
h = histcounts(assignments, 1 : 501);
If you want something simpler and you don't want to worry about specifying the end bin, you can use accumarray to achieve the same result:
h = accumarray(assignments, 1);
The effect of accumarray we assign key-value pairs where the key is the centroid that the example mapped to and the value is simply 1 for all keys. accumarray will bin all values in assignments that share the same key and you do something with those values. The default behaviour of accumarray is to sum all values, which is effectively computing the histogram.
However, you want to do this for multiple images, not just a single image.
For Bag of Visual Words problems, we will certainly have more than one training image in our database. Therefore, you want to find the histogram of the features for each image. We can still use the above concept, but one thing I can suggest is you maintain a separate variable that tells you how many features were detected per image, then you can index into the assignments variable to help extract out the correct assigned centroid IDs, then build a histogram of those individually. We can build a 2D matrix where each row delineates the histogram of each image. Remember that in kmeans, each row tells you what cluster each example was assigned to independently of the other examples in your data. Using that, you would use kmeans on the entire training dataset, then be smart about how you're accessing the assignments variable to extract out the assigned clusters for each input image.
Therefore, modify your code so that it looks something like this:
clc;
clear;
close all;
folder = 'CarData/TrainImages/cars';
filePattern = fullfile(folder, '*.pgm');
f=dir(filePattern);
files={f.name};
num_features = zeros(numel(files), 1); % New - for keeping track of # of features per image
for k=1:numel(files)
fullFileName = fullfile(folder, files{k});
H = fspecial('log');
image=imfilter(imread(fullFileName),H);
temp = detectSURFFeatures(image);
[im_features, temp] = extractFeatures(image, temp);
num_features(k) = size(im_features, 1); % New - # of features per image
features{k}= im_features;
end
features = vertcat(features{:});
num_clusters = 500; % Added to make the code adaptive
[assignments,centers] = kmeans(double(features), num_clusters);
counter = 1; % Keeps track of where we need to slice in assignments
% Go through each image and find their histograms
features_hist = zeros(numel(files), num_clusters); % Records the per image histograms
for k = 1 : numel(files)
a = assignments(counter : counter + num_features(k) - 1); % Get the assignments
h = histcounts(a, 1 : num_clusters + 1);
% Or:
% h = accumarray(a, 1).'; % Transpose to make it a row
% Place in final output
features_hist(k, :) = h;
% Increment counter
counter = counter + num_features(k);
end
features_hist will now be a N x 500 matrix where each row is the histogram of each image you are seeking. The final job would be to use a supervised machine learning algorithm (SVM, Neural Networks, etc.) where the expected labels is the description of each image you have assigned to the image accompanied by the histogram of each image as the input features. The final result would be a learned model so that when you have a new image, calculate the SURF features, represent them in a histogram of features like we did above, then feed it into the classification model to give you the expected class or label that the image represents.
P.S. Deep Learning / CNNs do a much better job at this, but require much more time to train. If you're looking at performance wise, don't use Bag of Visual Words but this is something very quick to implement and it's known to perform moderately well but that of course depends on the kinds of images you want to classify.

K-Means centroids getting marginalized to having no data points [Matlab]

So I have a sort of strange problem. I have a dataset with 240 points and I'm trying to use k-means to cluster it into 100 clusters. I'm using Matlab but I don't have access to the statistics toolbox, so I had to write my own k-means function. It's pretty simple, so that shouldn't be too hard, right? Well, it seems something is wrong with my code:
function result=Kmeans(X,c)
[N,n]=size(X);
index=randperm(N);
ctrs = X(index(1:c),:);
old_label = zeros(1,N);
label = ones(1,N);
iter = 0;
while ~isequal(old_label, label)
old_label = label;
label = assign_labels(X, ctrs);
for i = 1:c
ctrs(i,:) = mean(X(label == i,:));
if sum(isnan(ctrs(i,:))) ~= 0
ctrs(i,:) = zeros(1,n);
end
end
iter = iter + 1;
end
result = ctrs;
function label = assign_labels(X, ctrs)
[N,~]=size(X);
[c,~]=size(ctrs);
dist = zeros(N,c);
for i = 1:c
dist(:,i) = sum((X - repmat(ctrs(i,:),[N,1])).^2,2);
end
[~,label] = min(dist,[],2);
It seems what happens is that when I go to recompute the centroids, some centroids have no datapoints assigned to them, so I'm not really sure what to do with that. After doing some research on this, I found that this can happen if you supply arbitrary initial centroids, but in this case the initial centroids are taken from the datapoints themselves, so this doesn't really make sense. I've tried re-assigning these centroids to random datapoints, but that causes the code to not converge (or at least after letting it run all night, the code never converged). Basically they get re-assigned, but that causes other centroids to get marginalized, and repeat. I'm not really sure what's wrong with my code, but I ran this same dataset through R's k-means function for k=100 for 1000 iterations and it managed to converge. Does anyone know what I'm messing up here? Thank you.
Let's step through your code one piece at a time and discuss what you're doing with respect to what I know about the k-means algorithm.
function result=Kmeans(X,c)
[N,n]=size(X);
index=randperm(N);
ctrs = X(index(1:c),:);
old_label = zeros(1,N);
label = ones(1,N);
This looks like a function that takes in a data matrix of size N x n, where N is the number of points you have in your dataset, while n is the dimension of a point in your dataset. This function also takes in c: the desired number of output clusters.index provides a random permutation between 1 to as many data points as you have, and then we select at random c points from this permutation which you have used to initialize your cluster centres.
iter = 0;
while ~isequal(old_label, label)
old_label = label;
label = assign_labels(X, ctrs);
for i = 1:c
ctrs(i,:) = mean(X(label == i,:));
if sum(isnan(ctrs(i,:))) ~= 0
ctrs(i,:) = zeros(1,n);
end
end
iter = iter + 1;
end
result = ctrs;
For k-means, we basically keep iterating until the cluster membership of each point from the previous iteration matches with the current iteration, which is what you have going with your while loop. Now, label determines the cluster membership of each point in your dataset. Now, for each cluster that exists, you determine what the mean data point is, then assign this mean data point as the new cluster centre for each cluster. For some reason, should you experience any NaN for any dimension of your cluster centre, you set your new cluster centre to all zeroes instead. This looks very abnormal to me, and I'll provide a suggestion later. Edit: Now I understand why you did this. This is because should you have any clusters that are empty, you would simply make this cluster centre all zeroes as you wouldn't be able to find the mean of empty clusters. This can be solved with my suggestion for duplicate initial clusters towards the end of this post.
function label = assign_labels(X, ctrs)
[N,~]=size(X);
[c,~]=size(ctrs);
dist = zeros(N,c);
for i = 1:c
dist(:,i) = sum((X - repmat(ctrs(i,:),[N,1])).^2,2);
end
[~,label] = min(dist,[],2);
This function takes in a dataset X and the current cluster centres for this iteration, and it should return a label list of where each point belongs to each cluster. This also looks correct because for each column of dist, you are calculating the distance between each point to each cluster, where those distances are in the ith column for the ith cluster. One optimization trick that I would use is to avoid using repmat here and use bsxfun which handles the replication internally. Therefore, do this instead:
function label = assign_labels(X, ctrs)
[N,~]=size(X);
[c,~]=size(ctrs);
dist = zeros(N,c);
for i = 1:c
dist(:,i) = sum(bsxfun(#minus, X, ctrs(i,:)).^2, 2);
end
[~,label] = min(dist,[],2);
Now, this all looks correct. I also ran some tests myself and it all seems to work out, provided that the initial cluster centres are unique. One small problem with k-means is that we implicitly assume that all cluster centres are unique. Should they not be unique, then you'll run into a problem where two clusters (or more) have the exact same initial cluster centres.... so which cluster should the data point be assigned to? When you're doing the min in your assign_labels function, should you have two identical cluster centres, the cluster label that the point gets assigned to will be the minimum of these two numbers. This is why you will have a cluster with no points in it, as all of the points that should have been assigned to this cluster get assigned to the other.
As such, you may have two (or more) initial cluster centres that are the same upon randomization. Even though the permutation of the indices to select are unique, the actual data points themselves may not be unique upon selection. One thing that I can impose is to loop over the permutation until you get a unique set of initial clusters without repeats. As such, try doing this at the beginning of your code instead.
[N,n]=size(X);
index=randperm(N);
ctrs = X(index(1:c),:);
while size(unique(ctrs, 'rows'), 1) ~= c
index=randperm(N);
ctrs = X(index(1:c),:);
end
old_label = zeros(1,N);
label = ones(1,N);
iter = 0;
%// While loop appears here
This will ensure that you have a unique set of initial clusters before you continue on in your code. Now, going back to your NaN stuff inside the for loop. I honestly don't see how any dimension could result in NaN after you compute the mean if your data doesn't have any NaN to begin with. I would suggest you get rid of this in your code as (to me) it doesn't look very useful. Edit: You can now remove the NaN check as the initial cluster centres should now be unique.
This should hopefully fix your problems you're experiencing. Good luck!
"Losing" a cluster is not half as special as one may think, due to the nature of k-means.
Consider duplicates. Lets assume that all your first k points are identical, what would happen in your code? There is a reason you need to carefully handle this case. The simplest solution would be to leave the centroid as it was before, and live with degenerate clusters.
Given that you only have 240 points, but want to use k=100, don't expect too good results. Most objects will be on their own... choosing a much too large k is probably a reason why you do see this degeneration effect a lot. Let's assume out of these 240, fewer than 100 are unique... Then you cannot have 100 non-empty clusters... Plus, I would consider this kind of result "overfitting", anyway.
If you don't have the toolboxes you need in Matlab, maybe you should move on to free software. Octave, R, Weka, ELKI, ... there is plenty of software, some of which is much more powerful when it comes to clustering than pure Matlab (in particular, if you don't have the toolboxes).
Also benchmark. You will be surprised of the performance differences.

Interactive curve fitting with MATLAB using custom GUI?

I find examples the best way to demonstrate my question. I generate some data, add some random noise, and fit it to get back my chosen "generator" value...
x = linspace(0.01,1,50);
value = 3.82;
y = exp(-value.*x);
y = awgn(y,30);
options = optimset('MaxFunEvals',1000,'MaxIter',1000,'TolFun',1e-10,'Display','off');
model = #(p,x) exp(-p(1).*x);
startingVals = [5];
lb = [1];
ub = [10];
[fittedValue] = lsqcurvefit(model,startingVals,x,y,lb,ub,options)
fittedGraph = exp(-fittedValue.*x);
plot(x,y,'o');
hold on
plot(x,fittedGraph,'r-');
In this new example, I have generated the same data but this time added much more noise to the first 15 points. Because it is random sometimes it works out okay, but after a few runs I get a good example that illustrates my problem. Same code, except for these lines added under value = 3.82
y = exp(-value.*x);
y(1:15) = awgn(y(1:15),5);
y(15:end) = awgn(y(15:end),30);
As you can see, it has clearly not given a good fit to where the data seems reliable, because it is fitting from points 1-50. What I want to do is say, okay MATLAB, I can see we have some noisy data but it seems decent over a range, only fit your exponential from points 15 to the end. I could go back to my code and update it to do this, but I will be batch fitting graphs like this where each one will have different ranges of 'good' data.
So what I am after is a GUI callback mechanisms that allows me to click on two circles from the data and have them change color or something, which indicates the lsqcurvefit will only fit over that range. Internally all it has to change is inside the lsqcurvefit call e.g.
x(16:end),y(16:end)
But the range should update depending on the starting and ending circles I have clicked.
I hope my question is clear. Thanks.
You could use ginput to select the two points for your min and max in the plot.
[x,y]=ginput(2);
%this returns you the x and y coordinates of two points clicked after each other
%the min point is assumed to be clicked first
min=[x(1) y(1)];
max=[x(2) y(2)];
then you could fit your curve with the coordinates for min and max I guess.
You could also switch between a rightclick for the min and a leftclick for the max etc.
Hope this helps you.

Meshgrid / Surface clipping

I am working with some antarctic DEM data in Matlab. So, far I have been able to generate a nice looking mesh, with the following basic code:
load (Data.xyz)
X = Data(:,1);
Y = Data(:,2);
Z = Data(:,3);
xr = unique(X);
yr = unique(Y);
gz = zeros(length(yr),length(xr));
gz = griddata(X,Y,Z,xr,yr');
figure
mesh(xr,yr,gz);
hold on
contour3(xr,yr,gz,'k-');
hold off
Now I have a few questions, which I have not been able to answer despite being at it since past couple of days and looking at all forums and googling day and night. I hope you all experts might be able to suggest me something. My questions are:
The above code takes a lot of time. Agreed that the DEM for antarctica is large sized and slow response time for a code does not necessarily mean that its incorrect. However, I am totally unable to run this code on my laptop (2.5 GHz/4GB) - its so slow. I am wondering if there are other ways to generate mesh which are faster and more efficient.
The second issue is that the above "Data.xyz" contains DEM data from all antarctica. After generating a mesh, I want to clip it based on locations. Say, for e.g., I want to extract mesh data for area bound by x1,y1, x2,y2, x3, y3, and x4,y4. How do I go about doing that? I could not find a suitable function or tool or any user script anywhere which will allow me to do so. Is it possible to cut a mesh in matlab?
I am running Matlab 2012a, and I do not have access to mapping toolbox. Any suggestions???
1.I'd just like to clarify what it is you want your code to do. For the data X,Y,Z, I would assume these are points (X,Y) (not sampled on a grid) each associated with some elevation Z.
When you call
gz = griddata(X,Y,Z,xr,yr');
You are saying that every possible pair (xr(i),yr(j)) are the locations on your grid where you want to sample the surface to create your mesh. I could be wrong but I don't think this is what you want?
I think you would rather sample at equally spaced points, instead using something like...
xr = min(X):spacing:max(X);
yr = min(Y):spacing:max(Y);
gz = griddata(X,Y,Z,xr,yr'); % ' is for the transpose
mesh(xr,yr,gz);
Where spacing a reasonable number for the scale of your data. The way your code is now, you could be taking far more samples than you want, which could be why your code takes so long.
For 2. I think you could get a for loop to just add the points insde the region of interest to three new lists of X,Y,Z values. If your region is rectangular bounded by x_left,x_right and y_left,y_right ...
Xnew = []; Ynew = []; Znew = [];
for i = 1:length(X)
if ( x_left<X(i) )&&( X(i)<x_right )&&( y_left<Y(i) )&&( Y(i)<y_right )
Xew = [Xnew, X(i)];
Ynew = [Ynew, Y(i)];
Znew = [Znew, Z(i)];
end
end

Get specific label (incl. data) of k-means

first of all, I'd like to describe my issue with the kmeans in Matlab.
I select a point via mouse and use it for cluster initialization. This works fine.
After the segmentation of the data, I reshape the data back into proper style, because I need a matrix.
Now I want to select only the cluster of which the user selected the data via mouse.
Therefore I select the index of the mouse-coordinates to select the label, which I want to segmentate. Because of other extra data which is not connected or nearby the relevant data, but also gots the same label.
I want to select only connected-components in a neighbourhood of 8.
So here is my code snippet so far:
flatimg = double(reshape(croppedimg,size(croppedimg,1)*size(croppedimg,2),size(croppedimg,3)));
% kmeans
[idx, clusters] = kmeans(flatimg,2,'start',[seedpoint1(3);seedpoint2(3)]);
% form it back to a matrix
k=reshape(idx,size(croppedimg,1),size(croppedimg,2));
%convert point, which is part of the label I want to linear index
selectedobjectpoint = sub2ind(size(croppedimg),seedpoint1(2),seedpoint1(1));
hgplabel = k(selectedobjectpoint);
idx_object = find(k, hgplabel);
% also tried: idx_object = find(k == hgplabel);
I added a screenshot, which shows the direct output of kmeans:
So my aim is it here to get only the "white" OR the "black" ones.
Help or advice appreciated. If you've got any questions regarding the snippet or the goal, feel free to ask.
Thank you in advance!
I think the FIND command is throwing you off. You want something like:
logicalImage = k == hgplabel;
bwImg = bwlabel(logicalImage);
imagesc(bwImg)
FIND will output the indices where k == hgplabel. You want the matrix of zeros and ones where k takes that value (I think).
If you just want the connected components of that, the output of bwlabel will contain unique integers for each connected component, so imagesc(bwImg == 1) will show just component 1. And you can specify the connectivity