Clustering with ambigous input values - matlab

Imagine the following scenario:
I have two 4x100 matrices ang1_stab and ang2_stab. These contain four angles along the columns, like this:
195.7987 16.2722 14.4171 198.5878 199.2693...
80.2062 86.7363 89.2861 89.5454 89.3411...
68.7998 -83.8318 -80.3138 69.0636 -96.4913...
-5.3262 -23.3030 -20.6823 -18.9915 -16.7224...
95.3450 183.8212 171.0686 151.8887 177.9041...
21.4780 27.2389 23.4016 27.6631 17.2893...
-13.2767 -103.5548 -115.0615 39.6790 -112.3568...
-5.3262 -23.3030 -20.6823 -18.9915 -16.7224...
The fourth angle is always the same for both matrices, so it can be neglected.
The problem: some of the columns of ang1_stab and ang2_stab are swapped, so I need to find the columns that would fit better into the other matrix and then swap the respective columns.
The complication: The calculation of the given angles is ambigous and multiples of 90° might have been added/subtracted, e.g. the angle 16° should be considered closer to 195° than to 95°.
What I have tried so far:
fp1 = []; % define clusters
fp2 = [];
for j = 1:size(ang1_stab,2) % loop over all columns
tmp1 = ang1_stab(:,j); % temporary columns
tmp2 = ang2_stab(:,j);
if j == 1 % assign first cluster center
fp1 = [fp1, tmp1];
fp2 = [fp2, tmp2];
else
mns1 = median(fp1(1:3,:),2); % calculate cluster centers
mns2 = median(fp2(1:3,:),2);
% calculate distances to cluster centers
dif11 = sum(abs((mns1-tmp1(1:3))-round((mns1-mp1(1:3))/90)*90));
dif12 = sum(abs((mns1-tmp2(1:3))-round((mns1-tmp2(1:3))/90)*90));
dif21 = sum(abs((mns2-tmp1(1:3))-round((mns2-tmp1(1:3))/90)*90));
dif22 = sum(abs((mns2-tmp2(1:3))-round((mns2-tmp2(1:3))/90)*90));
if min([dif11,dif21])<min([dif12,dif22]) % assign to cluster
if dif11<dif21
fp1 = [fp1,tmp1];
fp2 = [fp2,tmp2];
else
fp1 = [fp1,tmp2];
fp2 = [fp2,tmp1];
end
else
if dif12<dif22
fp1 = [fp1,tmp2];
fp2 = [fp2,tmp1];
else
fp1 = [fp1,tmp1];
fp2 = [fp2,tmp2];
end
end
end
end
However:
This appraoch seems overly complicated and I was wondering if I can somehow replace it with an appropriate algorithm, e.g. kmeans. However, I don't know how to account for the ambiguity in the angles in that case.
The code is working, but the clustering does currently still throw points in the wrong cluster. I just cannot find why.
I would appreciate it, if someone could tell me how to adopt this to work with built-in routines like kmeans or so.
Edit:
A small toy example:
This could be the output that I am getting:
ang1_stab = [30 10 80 100; 28 15 90 95; 152 93 180 102];
ang2_stab = [150 90 3 100; 145 92 5 95; 32 10 82 102];
What I would like to achieve:
fp1 = [30 10 80 100; 28 15 90 95; 32 10 82 102];
fp2 = [150 90 3 100; 145 92 5 95; 152 93 180 102];
Note that the last columns have been swapped.
Also note that the third element in the last column of fp2 is approximately the mean of the other elements in that row, but 180° higher. I still need to be able to identify this is the right cluster.

Related

coordinates of perpendicular line segment in 2d Cartesian space

I'm having surprisingly difficult time to figure out something which appears so simple. I have two known coordinates on a graph, (X1,Y1) and (X2,Y2). What I'm trying to identify are the coordinates for (X3,Y3).
I thought of using sin and cos but once I get here my brain stops working. I know that
sin O = y/R
cos O = x/R
so I thought of simply importing in the length of the line (in this case it was 2) and use the angles which are known. Seems very simple but for the life of me, my brain won't wrap around this.
The reason I need this is because I'm trying to print a line onto an image using poly2mask in matlab. The code has to work in the 2D space as I will be building movies using the line.
X1 = [134 134 135 147 153 153 167]
Y1 = [183 180 178 173 164 152 143]
X2 = [133 133 133 135 138 143 147]
Y2 = [203 200 197 189 185 173 163]
YZdist = 2;
for aa = 1:length(X2)
XYdis(aa) = sqrt((x2(aa)-x1(aa))^2 + (Y2(aa)-Y1(aa))^2);
X3(aa) = X1(aa) * tan(XYdis/YZdis);
Y3(aa) = Y1(aa) * tan(XYdis/YZdis);
end
polmask = poly2mask([Xdata X3],[Ydata Y3],50,50);
one approach would be to first construct a vector l connection points (x1,y1) and (x2,y2), rotate this vector 90 degrees clockwise and add it to the point (x2,y2).
Thus l=(x2-x1, y2-y1), its rotated version is l'=(y2-y1,x1-x2) and therefore the point of interest P=(x2, y2) + f*(y2-y1,x1-x2), where f is the desired scaling factor. If the lengths are supposed to be the same, then f=1 and thus P=(x2 + y2-y1, y2 + x1-x2).

MATLAB - Working with accumarray with multiple categories?

I am asking a follow-up my question here, in which there was a perfect solution that did exactly what I wanted. But I'm wondering how to apply this method, or do something similar, if instead of yes/no as possible responses, I would have more than 2 responses, so yes/no/maybe, for example. Or how it would generalize to 3+ responses.
This is the answer, reformatted as my question:
Assuming my data looks like this:
responses = categorical(randi(3,1250,1),[1 2 3],{'no','yes','maybe'});
race = categorical(randi(5,1250,1),1:5,{'Asian','Black','BHispanic','White','WHispanic'});
I would like to go through and do the same thing with my yes/no data, but do this with 3 possibilities, or more. And this will not end up working anymore:
% convert everything to numeric:
yn = double(responses);
rac = double(race);
% caluculate all frequencies:
data = accumarray(rac,yn-1);
data(:,2) = accumarray(rac,1)-data;
% get the categories names:
races = categories(race);
answers = categories(responses);
% plotting:
bar(data,0.4,'stacked');
ax = gca;
ax.XTickLabel = races; % set the x-axis ticks to the race names
legend(answers) % add a legend for the colors
colormap(lines(3)) % use nicer colors (close to your example)
ylabel('YES/NO/MAYBE')% set the y-axis label
% some other minor fixes:
box off
ax.YGrid = 'on';
I'm not sure if there is even a way to use the accumarray method to do this, as it doesn't make sense from my understanding to use this with 3 possible responses. I'd like to generalize it to n possible responses too.
UPDATE: I'm currently investigating the crosstab feature which I didn't find at all until now! I think this may be the feature I'm looking for.
Here is a generalized version:
% the data (with even more categories):
yesno = categorical(randi(4,1250,1),1:4,{'no','yes','maybe','don''t know'});
race = categorical(randi(5,1250,1),1:5,{'Asian','Black','BHispanic','White','WHispanic'});
% convert everything to numeric:
yn = double(yesno);
rac = double(race);
% caluculate all frequencies:
data = accumarray([rac yn],1);
% get the categories names:
races = categories(race);
answers = categories(yesno);
% plotting:
bar(data,0.4,'stacked');
ax = gca;
ax.XTickLabel = races; % set the x-axis ticks to the race names
legend(answers) % add a legend for the colors
colormap(lines(numel(answers))) % use pretier colors
ylabel('YES/NO')% set the y-axis lable
% some other minor fixes:
box off
ax.YGrid = 'on';
The result:
And in a table:
T = array2table(data.','VariableNames',races,'RowNames',answers)
the output:
T =
Asian Black BHispanic White WHispanic
_____ _____ _________ _____ _________
no 58 72 69 66 62
yes 58 53 72 54 58
maybe 63 62 67 62 61
don't know 58 57 66 58 74
As you already mentioned, you can use crosstab for the same task. crosstab(rac,yn) will give you the same result as accumarray([rac yn],1). I think accumarray is faster, though I didn't check it.

How can I segment curved text lines? [duplicate]

Note: I am placing this question in both the MATLAB and Python tags as I am the most proficient in these languages. However, I welcome solutions in any language.
Question Preamble
I have taken an image with a fisheye lens. This image consists of a pattern with a bunch of square objects. What I want to do with this image is detect the centroid of each of these squares, then use these points to perform an undistortion of the image - specifically, I am seeking the right distortion model parameters. It should be noted that not all of the squares need to be detected. As long as a good majority of them are, then that's totally fine.... but that isn't the point of this post. The parameter estimation algorithm I have already written, but the problem is that it requires points that appear collinear in the image.
The base question I want to ask is given these points, what is the best way to group them together so that each group consists of a horizontal line or vertical line?
Background to my problem
This isn't really important with regards to the question I'm asking, but if you'd like to know where I got my data from and to further understand the question I'm asking, please read. If you're not interested, then you can skip right to the Problem setup section below.
An example of an image I am dealing with is shown below:
It is a 960 x 960 image. The image was originally higher resolution, but I subsample the image to facilitate faster processing time. As you can see, there are a bunch of square patterns that are dispersed in the image. Also, the centroids I have calculated are with respect to the above subsampled image.
The pipeline I have set up to retrieve the centroids is the following:
Perform a Canny Edge Detection
Focus on a region of interest that minimizes false positives. This region of interest is basically the squares without any of the black tape that covers one of their sides.
Find all distinct closed contours
For each distinct closed contour...
a. Perform a Harris Corner Detection
b. Determine if the result has 4 corner points
c. If this does, then this contour belonged to a square and find the centroid of this shape
d. If it doesn't, then skip this shape
Place all detected centroids from Step #4 into a matrix for further examination.
Here's an example result with the above image. Each detected square has the four points colour coded according to the location of where it is with respect to the square itself. For each centroid that I have detected, I write an ID right where that centroid is in the image itself.
With the above image, there are 37 detected squares.
Problem Setup
Suppose I have some image pixel points stored in a N x 3 matrix. The first two columns are the x (horizontal) and y (vertical) coordinates where in image coordinate space, the y coordinate is inverted, which means that positive y moves downwards. The third column is an ID associated with the point.
Here is some code written in MATLAB that takes these points, plots them onto a 2D grid and labels each point with the third column of the matrix. If you read the above background, these are the points that were detected by my algorithm outlined above.
data = [ 475. , 605.75, 1.;
571. , 586.5 , 2.;
233. , 558.5 , 3.;
669.5 , 562.75, 4.;
291.25, 546.25, 5.;
759. , 536.25, 6.;
362.5 , 531.5 , 7.;
448. , 513.5 , 8.;
834.5 , 510. , 9.;
897.25, 486. , 10.;
545.5 , 491.25, 11.;
214.5 , 481.25, 12.;
271.25, 463. , 13.;
646.5 , 466.75, 14.;
739. , 442.75, 15.;
340.5 , 441.5 , 16.;
817.75, 421.5 , 17.;
423.75, 417.75, 18.;
202.5 , 406. , 19.;
519.25, 392.25, 20.;
257.5 , 382. , 21.;
619.25, 368.5 , 22.;
148. , 359.75, 23.;
324.5 , 356. , 24.;
713. , 347.75, 25.;
195. , 335. , 26.;
793.5 , 332.5 , 27.;
403.75, 328. , 28.;
249.25, 308. , 29.;
495.5 , 300.75, 30.;
314. , 279. , 31.;
764.25, 249.5 , 32.;
389.5 , 249.5 , 33.;
475. , 221.5 , 34.;
565.75, 199. , 35.;
802.75, 173.75, 36.;
733. , 176.25, 37.];
figure; hold on;
axis ij;
scatter(data(:,1), data(:,2),40, 'r.');
text(data(:,1)+10, data(:,2)+10, num2str(data(:,3)));
Similarly in Python, using numpy and matplotlib, we have:
import numpy as np
import matplotlib.pyplot as plt
data = np.array([[ 475. , 605.75, 1. ],
[ 571. , 586.5 , 2. ],
[ 233. , 558.5 , 3. ],
[ 669.5 , 562.75, 4. ],
[ 291.25, 546.25, 5. ],
[ 759. , 536.25, 6. ],
[ 362.5 , 531.5 , 7. ],
[ 448. , 513.5 , 8. ],
[ 834.5 , 510. , 9. ],
[ 897.25, 486. , 10. ],
[ 545.5 , 491.25, 11. ],
[ 214.5 , 481.25, 12. ],
[ 271.25, 463. , 13. ],
[ 646.5 , 466.75, 14. ],
[ 739. , 442.75, 15. ],
[ 340.5 , 441.5 , 16. ],
[ 817.75, 421.5 , 17. ],
[ 423.75, 417.75, 18. ],
[ 202.5 , 406. , 19. ],
[ 519.25, 392.25, 20. ],
[ 257.5 , 382. , 21. ],
[ 619.25, 368.5 , 22. ],
[ 148. , 359.75, 23. ],
[ 324.5 , 356. , 24. ],
[ 713. , 347.75, 25. ],
[ 195. , 335. , 26. ],
[ 793.5 , 332.5 , 27. ],
[ 403.75, 328. , 28. ],
[ 249.25, 308. , 29. ],
[ 495.5 , 300.75, 30. ],
[ 314. , 279. , 31. ],
[ 764.25, 249.5 , 32. ],
[ 389.5 , 249.5 , 33. ],
[ 475. , 221.5 , 34. ],
[ 565.75, 199. , 35. ],
[ 802.75, 173.75, 36. ],
[ 733. , 176.25, 37. ]])
plt.figure()
plt.gca().invert_yaxis()
plt.plot(data[:,0], data[:,1], 'r.', markersize=14)
for idx in np.arange(data.shape[0]):
plt.text(data[idx,0]+10, data[idx,1]+10, str(int(data[idx,2])), size='large')
plt.show()
We get:
Back to the question
As you can see, these points are more or less in a grid pattern and you can see that we can form lines between the points. Specifically, you can see that there are lines that can be formed horizontally and vertically.
For example, if you reference the image in the background section of my problem, we can see that there are 5 groups of points that can be grouped in a horizontal manner. For example, points 23, 26, 29, 31, 33, 34, 35, 37 and 36 form one group. Points 19, 21, 24, 28, 30 and 32 form another group and so on and so forth. Similarly in a vertical sense, we can see that points 26, 19, 12 and 3 form one group, points 29, 21, 13 and 5 form another group and so on.
Question to ask
My question is this: What is a method that can successfully group points in horizontal groupings and vertical groupings separately, given that the points could be in any orientation?
Conditions
There must be at least three points per line. If there is anything less than that, then this does not qualify as a segment. Therefore, the points 36 and 10 don't qualify as a vertical line, and similarly the isolated point 23 shouldn't quality as a vertical line, but it is part of the first horizontal grouping.
The above calibration pattern can be in any orientation. However, for what I'm dealing with, the worst kind of orientation you can get is what you see above in the background section.
Expected Output
The output would be a pair of lists where the first list has elements where each element gives you a sequence of point IDs that form a horizontal line. Similarly, the second list has elements where each element gives you a sequence of point IDs that form a vertical line.
Therefore, the expected output for the horizontal sequences would look something like this:
MATLAB
horiz_list = {[23, 26, 29, 31, 33, 34, 35, 37, 36], [19, 21, 24, 28, 30, 32], ...};
vert_list = {[26, 19, 12, 3], [29, 21, 13, 5], ....};
Python
horiz_list = [[23, 26, 29, 31, 33, 34, 35, 37, 36], [19, 21, 24, 28, 30, 32], ....]
vert_list = [[26, 19, 12, 3], [29, 21, 13, 5], ...]
What I have tried
Algorithmically, what I have tried is to undo the rotation that is experienced at these points. I've performed Principal Components Analysis and I tried projecting the points with respect to the computed orthogonal basis vectors so that the points would more or less be on a straight rectangular grid.
Once I have that, it's just a simple matter of doing some scanline processing where you could group points based on a differential change on either the horizontal or vertical coordinates. You'd sort the coordinates by either the x or y values, then examine these sorted coordinates and look for a large change. Once you encounter this change, then you can group points in between the changes together to form your lines. Doing this with respect to each dimension would give you either the horizontal or vertical groupings.
With regards to PCA, here's what I did in MATLAB and Python:
MATLAB
%# Step #1 - Get just the data - no IDs
data_raw = data(:,1:2);
%# Decentralize mean
data_nomean = bsxfun(#minus, data_raw, mean(data_raw,1));
%# Step #2 - Determine covariance matrix
%# This already decentralizes the mean
cov_data = cov(data_raw);
%# Step #3 - Determine right singular vectors
[~,~,V] = svd(cov_data);
%# Step #4 - Transform data with respect to basis
F = V.'*data_nomean.';
%# Visualize both the original data points and transformed data
figure;
plot(F(1,:), F(2,:), 'b.', 'MarkerSize', 14);
axis ij;
hold on;
plot(data(:,1), data(:,2), 'r.', 'MarkerSize', 14);
Python
import numpy as np
import numpy.linalg as la
# Step #1 and Step #2 - Decentralize mean
centroids_raw = data[:,:2]
mean_data = np.mean(centroids_raw, axis=0)
# Transpose for covariance calculation
data_nomean = (centroids_raw - mean_data).T
# Step #3 - Determine covariance matrix
# Doesn't matter if you do this on the decentralized result
# or the normal result - cov subtracts the mean off anyway
cov_data = np.cov(data_nomean)
# Step #4 - Determine right singular vectors via SVD
# Note - This is already V^T, so there's no need to transpose
_,_,V = la.svd(cov_data)
# Step #5 - Transform data with respect to basis
data_transform = np.dot(V, data_nomean).T
plt.figure()
plt.gca().invert_yaxis()
plt.plot(data[:,0], data[:,1], 'b.', markersize=14)
plt.plot(data_transform[:,0], data_transform[:,1], 'r.', markersize=14)
plt.show()
The above code not only reprojects the data, but it also plots both the original points and the projected points together in a single figure. However, when I tried reprojecting my data, this is the plot I get:
The points in red are the original image coordinates while the points in blue are reprojected onto the basis vectors to try and remove the rotation. It still doesn't quite do the job. There is still some orientation with respect to the points so if I tried to do my scanline algorithm, points from the lines below for horizontal tracing or to the side for vertical tracing would be inadvertently grouped and this isn't correct.
Perhaps I'm overthinking the problem, but any insights you have regarding this would be greatly appreciated. If the answer is indeed superb, I would be inclined to award a high bounty as I've been stuck on this problem for quite some time.
I hope this question wasn't long winded. If you don't have an idea of how to solve this, then I thank you for your time in reading my question regardless.
Looking forward to any insights that you may have. Thanks very much!
Note 1: It has a number of settings -> which for other images may need to altered to get the result you want see % Settings - play around with these values
Note 2: It doesn't find all of the lines you want -> but its a starting point....
To call this function, invoke this in the command prompt:
>> [h, v] = testLines;
We get:
>> celldisp(h)
h{1} =
1 2 4 6 9 10
h{2} =
3 5 7 8 11 14 15 17
h{3} =
1 2 4 6 9 10
h{4} =
3 5 7 8 11 14 15 17
h{5} =
1 2 4 6 9 10
h{6} =
3 5 7 8 11 14 15 17
h{7} =
3 5 7 8 11 14 15 17
h{8} =
1 2 4 6 9 10
h{9} =
1 2 4 6 9 10
h{10} =
12 13 16 18 20 22 25 27
h{11} =
13 16 18 20 22 25 27
h{12} =
3 5 7 8 11 14 15 17
h{13} =
3 5 7 8 11 14 15
h{14} =
12 13 16 18 20 22 25 27
h{15} =
3 5 7 8 11 14 15 17
h{16} =
12 13 16 18 20 22 25 27
h{17} =
19 21 24 28 30
h{18} =
21 24 28 30
h{19} =
12 13 16 18 20 22 25 27
h{20} =
19 21 24 28 30
h{21} =
12 13 16 18 20 22 24 25
h{22} =
12 13 16 18 20 22 24 25 27
h{23} =
23 26 29 31 33 34 35
h{24} =
23 26 29 31 33 34 35 37
h{25} =
23 26 29 31 33 34 35 36 37
h{26} =
33 34 35 37 36
h{27} =
31 33 34 35 37
>> celldisp(v)
v{1} =
33 28 18 8 1
v{2} =
34 30 20 11 2
v{3} =
26 19 12 3
v{4} =
35 22 14 4
v{5} =
29 21 13 5
v{6} =
25 15 6
v{7} =
31 24 16 7
v{8} =
37 32 27 17 9
A figure is also generated that draws the lines through each proper set of points:
function [horiz_list, vert_list] = testLines
global counter;
global colours;
close all;
data = [ 475. , 605.75, 1.;
571. , 586.5 , 2.;
233. , 558.5 , 3.;
669.5 , 562.75, 4.;
291.25, 546.25, 5.;
759. , 536.25, 6.;
362.5 , 531.5 , 7.;
448. , 513.5 , 8.;
834.5 , 510. , 9.;
897.25, 486. , 10.;
545.5 , 491.25, 11.;
214.5 , 481.25, 12.;
271.25, 463. , 13.;
646.5 , 466.75, 14.;
739. , 442.75, 15.;
340.5 , 441.5 , 16.;
817.75, 421.5 , 17.;
423.75, 417.75, 18.;
202.5 , 406. , 19.;
519.25, 392.25, 20.;
257.5 , 382. , 21.;
619.25, 368.5 , 22.;
148. , 359.75, 23.;
324.5 , 356. , 24.;
713. , 347.75, 25.;
195. , 335. , 26.;
793.5 , 332.5 , 27.;
403.75, 328. , 28.;
249.25, 308. , 29.;
495.5 , 300.75, 30.;
314. , 279. , 31.;
764.25, 249.5 , 32.;
389.5 , 249.5 , 33.;
475. , 221.5 , 34.;
565.75, 199. , 35.;
802.75, 173.75, 36.;
733. , 176.25, 37.];
figure; hold on;
axis ij;
% Change due to Benoit_11
scatter(data(:,1), data(:,2),40, 'r.'); text(data(:,1)+10, data(:,2)+10, num2str(data(:,3)));
text(data(:,1)+10, data(:,2)+10, num2str(data(:,3)));
% Process your data as above then run the function below(note it has sub functions)
counter = 0;
colours = 'bgrcmy';
[horiz_list, vert_list] = findClosestPoints ( data(:,1), data(:,2) );
function [horiz_list, vert_list] = findClosestPoints ( x, y )
% calc length of points
nX = length(x);
% set up place holder flags
modelledH = false(nX,1);
modelledV = false(nX,1);
horiz_list = {};
vert_list = {};
% loop for all points
for p=1:nX
% have we already modelled a horizontal line through these?
% second last param - true - horizontal, false - vertical
if modelledH(p)==false
[modelledH, index] = ModelPoints ( p, x, y, modelledH, true, true );
horiz_list = [horiz_list index];
else
[~, index] = ModelPoints ( p, x, y, modelledH, true, false );
horiz_list = [horiz_list index];
end
% make a temp copy of the x and y and remove any of the points modelled
% from the horizontal -> this is to avoid them being found in the
% second call.
tempX = x;
tempY = y;
tempX(index) = NaN;
tempY(index) = NaN;
tempX(p) = x(p);
tempY(p) = y(p);
% Have we found a vertial line?
if modelledV(p)==false
[modelledV, index] = ModelPoints ( p, tempX, tempY, modelledV, false, true );
vert_list = [vert_list index];
end
end
end
function [modelled, index] = ModelPoints ( p, x, y, modelled, method, fullRun )
% p - row in your original data matrix
% x - data(:,1)
% y - data(:,2)
% modelled - array of flags to whether rows have been modelled
% method - horizontal or vertical (used to calc graadients)
% fullRun - full calc or just to get indexes
% this could be made better by storing the indexes of each horizontal in the method above
% Settings - play around with these values
gradDelta = 0.2; % find points where gradient is less than this value
gradLimit = 0.45; % if mean gradient of line is above this ignore
numberOfPointsToCheck = 7; % number of points to check when look along the line
% to find other points (this reduces chance of it
% finding other points far away
% I optimised this for your example to be 7
% Try varying it and you will see how it effect the result.
% Find the index of points which are inline.
[index, grad] = CalcIndex ( x, y, p, gradDelta, method );
% check gradient of line
if abs(mean(grad))>gradLimit
index = [];
return
end
% add point of interest to index
index = [p index];
% loop through all points found above to find any other points which are in
% line with these points (this allows for slight curvature
combineIndex = [];
for ii=2:length(index)
% Find inedex of the points found above (find points on curve)
[index2] = CalcIndex ( x, y, index(ii), gradDelta, method, numberOfPointsToCheck, grad(ii-1) );
% Check that the point on this line are on the original (i.e. inline -> not at large angle
if any(ismember(index,index2))
% store points found
combineIndex = unique([index2 combineIndex]);
end
end
% copy to index
index = combineIndex;
if fullRun
% do some plotting
% TODO: here you would need to calculate your arrays to output.
xx = x(index);
[sX,sOrder] = sort(xx);
% Check its found at least 3 points
if length ( index(sOrder) ) > 2
% flag the modelled on the points found
modelled(index(sOrder)) = true;
% plot the data
plot ( x(index(sOrder)), y(index(sOrder)), colours(mod(counter,numel(colours)) + 1));
counter = counter + 1;
end
index = index(sOrder);
end
end
function [index, gradCheck] = CalcIndex ( x, y, p, gradLimit, method, nPoints2Consider, refGrad )
% x - data(:,1)
% y - data(:,2)
% p - point of interest
% method (x/y) or (y\x)
% nPoints2Consider - only look at N points (options)
% refgrad - rather than looking for gradient of closest point -> use this
% - reference gradient to find similar points (finds points on curve)
nX = length(x);
% calculate gradient
for g=1:nX
if method
grad(g) = (x(g)-x(p))\(y(g)-y(p));
else
grad(g) = (y(g)-y(p))\(x(g)-x(p));
end
end
% find distance to all other points
delta = sqrt ( (x-x(p)).^2 + (y-y(p)).^2 );
% set its self = NaN
delta(delta==min(delta)) = NaN;
% find the closest points
[m,order] = sort(delta);
if nargin == 7
% for finding along curve
% set any far away points to be NaN
grad(order(nPoints2Consider+1:end)) = NaN;
% find the closest points to the reference gradient within the allowable limit
index = find(abs(grad-refGrad)<gradLimit==1);
% store output
gradCheck = grad(index);
else
% find the points which are closes to the gradient of the closest point
index = find(abs(grad-grad(order(1)))<gradLimit==1);
% store gradients to output
gradCheck = grad(index);
end
end
end
While I can not suggest a better approach to group any given list of centroid points than the one you already tried, I hope the following idea might help you out:
Since you are very specific about the content of your image (containing a field of squares) I was wondering if you in fact need to group the centroid points from the data given in your problem setup, or if you can use the data described in Background to the problem as well. Since you already determined the corners of each detected square as well as their position in that given square it seems to me like it would be very accurate to determine a neighbour of a given square by comparing the corner-coordinates.
So for finding any candidate for a right neighbour of any square, i would suggest you compare the upper right and lower right corner of that square with the upper left and lower left corner of any other square (or any square within a certain distance). Allowing for only small vertical differences and slightly greater horizontal differences, you can "match" two squares, if both their corresponding corner-points are close enough together.
By using an upper limit to the allowed vertical/horizontal difference between corners, you might even be able to just assign the best matching square within these boundaries as neighbour
A problem might be that you don't detect all the squares, so there is a rather large space between square 30 and 32. Since you said you need 'at least' 3 squares per row, it might be viable for you to simply ignore square 32 in that horizontal line. If that is not an option for you, you could try to match as many squares as possible and afterwards assign the "missing" squares to a point in your grid by using the previously calculated data:
In the example about square 32 you would've detected that it has upper and lower neighbours 27 and 37. Also you should've been able to determine that square 27 lies within row 1 and 37 lies within row 3, so you can assign square 32 to the "best matching" row in between, which is obviously 2 in this case.
This general approach is basically the approach you have tried already, but should hopefully be a lot more accurate since you are now comparing orientation and distance of two lines instead of simply comparing the location of two points in a grid.
Also as a sidenode on your previous attempts - can you use the black cornerlines to correct the initial rotation of your image a bit? This might make further distortion algorithms (like the ones that you discussed with knedlsepp in the comments) a lot more accurate. (EDIT: I did read the comments of Parag just now - comparing the points by the angle of the lines is of course basically the same as rotating the Image beforehand)
I'm using a cropped version of the posted image as the input. Here I'm only addressing the case where the orientation of the grid can be thought of as near horizontal/vertical. This may not fully address your scope, but I think it may give you some pointers.
Binarize the image so that the distorted squares are filled. Here I use a simple Otsu thresholding. Then take the distance transform of this binary image.
In the distance transformed image we see the gaps between the squares as peaks.
To get horizontally oriented lines, take the local maxima of each of the columns of the distance image and then find connected components.
To get vertically oriented lines, take the local maxima of each of the rows of the distance image and then find connected components.
Images below show the horizontal and vertical lines thus found with corner points as circles.
For reasonably long connected components, you can fit a curve (a line or a polynomial) and then classify the corner points, say based on the distance to the curve, on which side of the curve the point is, etc.
I did this in Matlab. I didn't try the curve fitting and classification parts.
clear all;
close all;
im = imread('0RqUd-1.jpg');
gr = rgb2gray(im);
% any preprocessing to get a binary image that fills the distorted squares
bw = ~im2bw(gr, graythresh(gr));
di = bwdist(bw); % distance transform
di2 = imdilate(di, ones(3)); % propagate max
corners = corner(gr); % simple corners
% find regional max for each column of dist image
regmxh = zeros(size(di2));
for c = 1:size(di2, 2)
regmxh(:, c) = imregionalmax(di2(:, c));
end
% label connected components
ccomph = bwlabel(regmxh, 8);
% find regional max for each row of dist image
regmxv = zeros(size(di2));
for r = 1:size(di2, 1)
regmxv(r, :) = imregionalmax(di2(r, :));
end
% label connected components
ccompv = bwlabel(regmxv, 8);
figure, imshow(gr, [])
hold on
plot(corners(:, 1), corners(:, 2), 'ro')
figure, imshow(di, [])
figure, imshow(label2rgb(ccomph), [])
hold on
plot(corners(:, 1), corners(:, 2), 'ro')
figure, imshow(label2rgb(ccompv), [])
hold on
plot(corners(:, 1), corners(:, 2), 'ro')
To get these lines for arbitrary oriented grids, you can think of the distance image as a graph and find optimal paths. See here for a nice graph based approach.

Matlab lsqnonlin() exitflag=4

I am optimizing some test data using lsqnonlin (i.e. data simulated from known parameter values).
maturity=[1 3 6 9 12 15 18 21 24 30 36 48 60 72 84 96 108 120]'; %maturities
options=optimset('Algorithm',{'levenberg-marquardt',.01},'Display','iter','TolFun',10^(-20),'TolX',10^-3,'MaxFunEvals',10000,'MaxIter',10000); %LM
vp0=[0.99 0.94 0.84 0.0802 -0.0144 -0.0042 0.001693 0.004094 0.003256 log(0.000960765^2) 0.077]'; %LM
[vpML,resnorm,residual,exitflag,output,lambda,jacobian]=lsqnonlin(#(vp) DNS_LL_LM(vp,y,maturity),vp0,[],[],options); %LM
I want the convergence to occur when the norm of the parameter vector changes by 10^-6.
As 'TolX' refers to the raw changes in the parameter vector I use 10^-3 as the tolerance of X which when squared would gives the desired norm of 10^-6.
However I find that when I run the code the exitflag keeps coming up as exitflag=4: "Magnitude of search direction was smaller than the specified tolerance."
But there is nowhere to set the tolerance for the search direction?
In the options you can only set: "TolX" and "TolFun"?
http://www.mathworks.co.uk/help/optim/ug/lsqnonlin.html#f265106
So how can I force the optimization to keep running till my desired convergence criterion?
Kind Regards
Baz
OK I went into the code and there seems to be some disconnect between what the exitflags as described here:
http://www.mathworks.co.uk/help/optim/ug/lsqnonlin.html#f265106
For example exitflag 2 which in the link above is supposed to relate to the change in x being less than tolerance is in fact used here to indicate that the Jacobian is undefined
if undefJac
EXITFLAG = 2;
msgFlag = 26;
msgData = {'levenbergMarquardt',msgFlag,verbosity > 0,detailedExitMsg,caller, ...
[], [], []};
done = true;
The description of exitflag 4 on the mathworks page is a little vague but you can see what it is doing below:
if norm(step) < tolX*(sqrtEps + norm(XOUT))
EXITFLAG = 4;
msgData = {'levenbergMarquardt',EXITFLAG,verbosity > 0,detailedExitMsg,caller, ...
norm(step)/(sqrtEps+norm(XOUT)),optionFeedback.TolX,tolX};
done = true;
Seems that it it testing if the norm of the stepsize is less than the tolerance of X times the norm of X. This is along the lines of what I want, and can easily be changed to give me exactly what I want.

Prevent Matlab from rounding output?

Im running a simple script to estimate roots for a function. Everything works great, each iteration of the algorithm prints off the current x and f(x), but when the script finishes and sets the final estimate of x as the output of the function the value is returned and rounded to 3 decimal places...
while k < maxit
k = k + 1;
dx = b - a;
xm = a + 0.5*dx; % Minimize roundoff in computing the midpoint
fm = feval(fun, xm, diameter, roughness, reynolds);
fprintf('%4d %12.20e %12.4e\n',k,xm,fm);
if (abs(fm)/fref < feps) | (abs(dx)/xref < xeps) % True when root is found
r = xm;
return;
end
here is the tail bit of the output:
k xm fm
45 6.77444446476613980000e-003 1.3891e-012
46 6.77444446478035060000e-003 -1.3380e-011
47 6.77444446477324520000e-003 -5.9952e-012
48 6.77444446476969250000e-003 -2.3022e-012
49 6.77444446476791610000e-003 -4.5830e-013
ans =
0.0068
i dont know why its rounding the output.... how do i prevent that?
try typing 'format longE' in the command line before running the script
I had that problem too. Check out this page. It allows you to control the style of your outputs better.
http://www.mathworks.co.uk/help/techdoc/ref/format.html