I am work from a skeleton of an image. Wanting to fill some incomplete branches, I extracted the end points of the skeleton and implemented Dijkstra's algorithm to find the minimal paths that can be between the points (I get a matrix FR 500x500).
I applied a constraint of cost and orientation to limit the undesirable connections. My problem is that I still have unwanted links because some starting point belong to several paths with different costs. Then I extracted these starting points to keep only those whose cost is the lowest (matrix P 75x3 with column 1 and 2 which are the coordinates X and Y of the starting points and column 3 the cost).
I now try to code that I want to take all possible paths those who have a starting point and a cost corresponding to the matrix FP but I do not see how to do.
I would like to indicate in this code that each starting point has the "right" to take only the least expensive path
NumberPath = 0;
S=size(EP); %EP = matrix 90x2 coordinates x and y of skeleton's end points
NumP=S(1);
for i=1:NumP-1
for j=i+1:NumP
FP = ED(i,:);
LP = ED(j,:);
[cost,path] = dijkstra(PC,Weight); % PC = matrix of all possible paths
dimPath = size(path);
dx=LP(1)-FP(1);
dy=LP(2)-FP(2);
if dx>0 && dx/dy>0 % Constraints of orientation
NumberPath = NumberPath+1;
if (cost<55) % Constraints of Cost
for p=1:dimPath(2)
FR(PC(path(1,p),2),PC(path(1,p),1))=NumberPath;
% FR=Matrix 500x500 with all paths according to conditions
Cost(i,j)=cost; % Matrix of all cost < 55
P(NumberPath,:)=FP; % Starting Point of each path
L(NumberPath,:)=LP; % End Point of each path
end
end
end
end
Related
I have scatter plot of several thousand points, which lie above a lower boundary defined by several line segments. My goal is to find the shortest distance from every point to the lower boundary (which is composed of linear and sloped line segments that connect) above which the pt lies, and sum this distance for all points for later post-processing.
I found a point-to-line function script online (https://www.mathworks.com/matlabcentral/fileexchange/64396-point-to-line-distance) which I have tested in isolation, and would now like to integrate into my script. The function take an array of points (class double) such as [0.1,0.7;0,0.5;...] and also takes two [x,y] points which lie on the line to which the shortest distance is to be calculated.
So far I have written a while loop which loops through all rows in a dataset already saved to the workspace (with the exception of zeroes which I would like to ignore). I then use a nested if loop to check if a given point is in the x range of the given lower boundary segment (I always want to calculate the shortest distance to the lower boundary segment above which a given pt lies), and lastly I try to append the given (x,y) coordinate of the point to a variable which will become one of the function inputs. The two points which define the lower boundary line segment are hard coded for each segment and do not change.
Here is a snippet of my code:
short_deviation = 0;
idx = 1;
while idx <= numel(my_data(:,5)) && not(my_data(idx,5) == 0)
...
if my_data(idx,5) < my_data(2,9) && my_data(idx,5) > my_data(1,9) % check that pt is in x range of lower segment
pt(:,idx) = my_data(idx,3:4); % CURRENT ERROR - Try to append given pt to list for function input
v1 = my_data(1,9:10); % two hard coded x,y pts which lie on lower boundary to which I want the distance
v2 = my_data(2,9:10);
distance_2D(idx) = point_to_line_dist(pt, v1, v2); % calling function
end
...
idx = idx + 1;
end
When I run the current code I get the following error message:
Unable to perform assignment because the size of the left side is 1-by-1 and the size of the right side is 1-by-2.
Error in My_script (line xxx)
pt(:,idx) = my_data(idx,3:4);
Now that I write out this code, I think another potential error is that I call the function distance_2D inside the if loop - I'm also not sure if the syntax for calling the function is correct (little experience here), but I haven't gotten to this point because of the previous error I mentioned.
The error indicates that pt is previously of the wrong size. For your code to run, it must have two rows, otherwise the data won't fit. You could use
pt=zeros(2,numel(my_data(:,5)))
we have measured data that we managed to determine the distribution type that it follows (Gamma) and its parameters (A,B)
And we generated n samples (10000) from the same distribution with the same parameters and in the same range (between 18.5 and 59) using for loop
for i=1:1:10000
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W(i,:) =random(tot,1,1);
end
Then we tried to fit the generated data using:
h1=histfit(W);
After this we tried to plot the Gamma curve to compare the two curves on the same figure uing:
hold on
h2=histfit(W,[],'Gamma');
h2(1).Visible='off';
The problem s the two curves are shifted as in the following figure "Figure 1 is the generated data from the previous code and Figure 2 is without truncating the generated data"
enter image description here
Any one knows why??
Thanks in advance
By default histfit fits a normal probability density function (PDF) on the histogram. I'm not sure what you were actually trying to do, but what you did is:
% fit a normal PDF
h1=histfit(W); % this is equal to h1 = histfit(W,[],'normal');
% fit a gamma PDF
h2=histfit(W,[],'Gamma');
Obviously that will result in different fits because a normal PDF != a gamma PDF. The only thing you see is that for the gamma PDF fits the curve better because you sampled the data from that distribution.
If you want to check whether the data follows a certain distribution you can also use a KS-test. In your case
% check if the data follows the distribution speccified in tot
[h p] = kstest(W,'CDF',tot)
If the data follows a gamma dist. then h = 0 and p > 0.05, else h = 1 and p < 0.05.
Now some general comments on your code:
Please look up preallocation of memory, it will speed up loops greatly. E.g.
W = zeros(10000,1);
for i=1:1:10000
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W(i,:) =random(tot,1,1);
end
Also,
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
is not depending in the loop index and can therefore be moved in front of the loop to speed things up further. It is also good practice to avoid using i as loop variable.
But you can actually skip the whole loop because random() allows to return multiple samples at once:
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W =random(tot,10000,1);
I want to implement bag of visual words in MATLAB. I used SURF features to extract features from the images and k-means to cluster those features into k clusters. I now have k centroids and I want to know how many times each cluster is used by assigning each image feature to its closet neighbor. Finally, I'd like to create a histogram of this for each image.
I tried to use knnsearch function but it doesn't work in this case.
Here is my MATLAB code:
clc;
clear;
close all;
folder = 'CarData/TrainImages/cars';
filePattern = fullfile(folder, '*.pgm');
f=dir(filePattern);
files={f.name};
for k=1:numel(files)
fullFileName = fullfile(folder, files{k});
H = fspecial('log');
image=imfilter(imread(fullFileName),H);
temp = detectSURFFeatures(image);
[im_features, temp] = extractFeatures(image, temp);
features{k}= im_features;
end
features = vertcat(features{:});
image_feats = [];
[assignments,centers] = kmeans(double(features),500);
vocab = centers';
I have all images feature in features array and cluster center in centroid array
You're almost there. You don't even need to use knnsearch at all. The assignments variable tells you which input feature mapped to which cluster. assignments will give you a N x 1 vector where N is the total number of examples you have, or the total number of features in the input matrix features. Each value assignments(i) tells you which cluster the example i (or row i) of features it maps to. The cluster centroid dictated by assignments(i) would be given as centers(i, :).
Therefore given how you've called kmeans, it will be a N x 1 vector where each element is from 1 to 500 with 500 being the total number of clusters desired.
Let's do the simple case where we only have one image in your codebook. If this is the case, all you have to do is create a histogram of the assignments variable. The output histogram h will be a 500 x 1 vector with each element h(i) being the number of times an example used centroid i as its representation in your codebook.
Just use the histcounts function and make sure that you specify the bin ranges so that they coincide with each cluster ID. You must make sure that you account for the ending bin, as the bin ranges are exclusive on the right edge so just add an additional bin to the end.
Something like this will work:
h = histcounts(assignments, 1 : 501);
If you want something simpler and you don't want to worry about specifying the end bin, you can use accumarray to achieve the same result:
h = accumarray(assignments, 1);
The effect of accumarray we assign key-value pairs where the key is the centroid that the example mapped to and the value is simply 1 for all keys. accumarray will bin all values in assignments that share the same key and you do something with those values. The default behaviour of accumarray is to sum all values, which is effectively computing the histogram.
However, you want to do this for multiple images, not just a single image.
For Bag of Visual Words problems, we will certainly have more than one training image in our database. Therefore, you want to find the histogram of the features for each image. We can still use the above concept, but one thing I can suggest is you maintain a separate variable that tells you how many features were detected per image, then you can index into the assignments variable to help extract out the correct assigned centroid IDs, then build a histogram of those individually. We can build a 2D matrix where each row delineates the histogram of each image. Remember that in kmeans, each row tells you what cluster each example was assigned to independently of the other examples in your data. Using that, you would use kmeans on the entire training dataset, then be smart about how you're accessing the assignments variable to extract out the assigned clusters for each input image.
Therefore, modify your code so that it looks something like this:
clc;
clear;
close all;
folder = 'CarData/TrainImages/cars';
filePattern = fullfile(folder, '*.pgm');
f=dir(filePattern);
files={f.name};
num_features = zeros(numel(files), 1); % New - for keeping track of # of features per image
for k=1:numel(files)
fullFileName = fullfile(folder, files{k});
H = fspecial('log');
image=imfilter(imread(fullFileName),H);
temp = detectSURFFeatures(image);
[im_features, temp] = extractFeatures(image, temp);
num_features(k) = size(im_features, 1); % New - # of features per image
features{k}= im_features;
end
features = vertcat(features{:});
num_clusters = 500; % Added to make the code adaptive
[assignments,centers] = kmeans(double(features), num_clusters);
counter = 1; % Keeps track of where we need to slice in assignments
% Go through each image and find their histograms
features_hist = zeros(numel(files), num_clusters); % Records the per image histograms
for k = 1 : numel(files)
a = assignments(counter : counter + num_features(k) - 1); % Get the assignments
h = histcounts(a, 1 : num_clusters + 1);
% Or:
% h = accumarray(a, 1).'; % Transpose to make it a row
% Place in final output
features_hist(k, :) = h;
% Increment counter
counter = counter + num_features(k);
end
features_hist will now be a N x 500 matrix where each row is the histogram of each image you are seeking. The final job would be to use a supervised machine learning algorithm (SVM, Neural Networks, etc.) where the expected labels is the description of each image you have assigned to the image accompanied by the histogram of each image as the input features. The final result would be a learned model so that when you have a new image, calculate the SURF features, represent them in a histogram of features like we did above, then feed it into the classification model to give you the expected class or label that the image represents.
P.S. Deep Learning / CNNs do a much better job at this, but require much more time to train. If you're looking at performance wise, don't use Bag of Visual Words but this is something very quick to implement and it's known to perform moderately well but that of course depends on the kinds of images you want to classify.
Im implementing the k-means algorithm on matlab without using the k-means built-in function, The stopping criteria is when the new centroids doesn't change by new iterations, but i cannot implement it in matlab , can anybody help?
Thanks
Setting no change as a stopping criteria is a bad idea. There are a few main reasons you shouldn't use a 0 change condition
even for a well behaved function the difference between 0 change and a very small change (say 1e-5 perhaps)could be 1000+ iterations, so you are wasting time trying to get them to be exactly the same. Especially because computers usually keep far more digits than we are interested in. IF you only need 1 digit accuracy, why wait for the computer to find an answer within 1e-31?
computers have floating point errors everywhere. Try doing some easily reversible matrix operations like a = rand(3,3); b = a*a*inv(a); a-b theoretically this should be 0 but you will see it isn't. So these errors alone could prevent your program from ever stopping
dithering. lets say we have a 1d k means problem with 3 numbers and we want to split them into 2 groups. One iteration the grouping can be a,b vs c. the next iteration could be a vs b,c the next could be a,b vs c the next.... This is of course a simplified example, but there can be instances where a few data points can dither between clusters, and you will end up with a never ending algorithm. Since those few points are reassigned, the change will never be 0
the solution is to use a delta threshold. basically you subtract the current values from the previous and if they are less than a threshold you are done. This on its own is powerful, but as with any loop, you need a backup escape plan. And that is setting a max_iterations variable. Look at matlabs documentation for kmeans, even they have a MaxIter variable (default is 100) so even if your kmeans doesn't converge, at least it wont run endlessly. Something like this might work
%problem specific
max_iter = 100;
%choose a small number appropriate to your problem
thresh = 1e-3;
%ensures it runs the first time
delta_mu = thresh + 1;
num_iter = 0;
%do your kmeans in the loop
while (delta_mu > thresh && num_iter < max_iter)
%save these right away
old_mu = curr_mu;
%calculate new means and variances, this is the standard kmeans iteration
%then store the values in a variable called curr_mu
curr_mu = newly_calculate_values;
%use the two norm to find the delta as a single number. no matter what
%the original dimensionality of mu was. If old_mu -new_mu was
% 0 the norm is still 0. so it behaves well as a distance measure.
delta_mu = norm(old_mu - curr_mu,2);
num_ter = num_iter + 1;
end
edit
if you don't know the 2 norm is essentially the euclidean distance
I have a 3d matrix with scattered points (Nx4 matrix, x-y-z-data). My aim is to link the closest points together and register each chain in an Kx4 array (x, y, z, data), K being the chain length. The total number of chains depends on the points...
A particularity is that these lines only go upwards (z+), I don't want to link points on same z, or go down.
I have been trying different strategies so far, one being with another array shape (Mx4xNz - basically meaning the values were stacked per z's instead of being all on a 2d matrix):
[edited after some progress, using delaunay/nearestneighbor]
pick a point at level Zn
go to level Zn+1, look for the closest point in a range of coordinates x,y using delaunayTriangulation and nearestNeighbor
register the point into a vector
(I suspect there are other possibilities using nearestNeighbor with the Nx4 matrix, but i can't think how to 'direct' the search upwards and chain the successive points... )
I find myself with the following problem :
The finding of nearest point upwards seems to work well but in 1 direction only!!
Linking doesn't work:
Linking works:
During the loop I have the warning :
Warning: Duplicate data points have been detected and removed.
The Triangulation indices are defined with respect to the unique set of points in
delaunayTriangulation property X.
Lign=zeros(max_iter,4,s);
for i = 1:s;
pp_id=i;
for n=1:max_iter-1;
Wn=W(:,:,n); % W is the data 3d-matrix Mx4xNz
Wnn=W(:,:,n+1);
Point_n = Wn(pp_id,:);
xn= Point_n(1);
yn= Point_n(2);
zn= Point_n(3);
vn= Point_n(4);
if xn==0|yn==0|zn==0|vn==0;
break
end
% Look for nearest neighbour at next level
DT=delaunayTriangulation(Wnn(:,[1:2]));
[pp_id, d]=nearestNeighbor(DT,[Point_n(1),Point_n(2)]);
% limit range
if d>10
break
end
% extraction of values at new pp_id
Point_n=Wnn(pp_id,:);
% register point in line
Lign(n,:,i)=Point_n;
end
end
Anyone has an idea as to why this is happens?