euclidean distance for each point in a point cloud - matlab

I want to compute the minimum eucidean distance between each point in one point cloud and all the other points in the second point cloud.
My point clouds are named pc1 and pc2.
Np is a matrix of normal vectors for each point.
sofar I use the following code:
dist = zeros(size(pc2,1),1);
sign = zeros(size(pc2,1),1);
for i = 1:size(pc2,1)
d = (pc1(:,1)-pc2(i,1)).^2 + (pc1(:,2)-pc2(i,2)).^2 + (pc1(:,3) - pc2(i,3)).^2;
[d, ind] = min(d);
dist(i,1) = sqrt(d);
sign(i,1) = Np(ind,:)*(pc2(i,:)-pc1(ind,:))';
end
the last bit with "sign" is from this paper. I added it because I want to know if my point lies inside or outside the other point cloud. (I received the point clouds from STL files and they represent surfaces)
Since I am working with rather large point clouds around 200.000 to 3.000.000 points the computation takes a while and I was wondering if anyone could suggest optimizations for my code.
Maybe it could be vectorized and I'm not seeing it.
All your suggestions are welcome. Thank you in advance for your time and help.
edit: just to clearify. Each row in my point cloud matrix is a point. the first column is the x-, the second the y-, the third the z-value.

You can certainly do this in vectorized form, use pdist2 and min as shown below.
dmat = pdist2(pc1, pc2);
[dist, ind] = min(dmat);
I believe the following will work for sign, but you should verify the results. May have to tweak slightly based on the matrix shapes.
sign = sum(Np(ind,:).*(pc2-pc1(ind,:)), 2);

Another alternative is to use KdTree's. This is also the approach of PCL.
With that approach, you basically create a KdTree from the reference point cloud, and then do a nearest neighbor search for each point from the query point cloud.
This approach will be magnitudes faster than the brute-force approach.

Related

Understanding 3D distance outputs in matlab

Being neither great at math nor coding, I am trying to understand the output I am getting when I try to calculate the linear distance between pairs of 3D points. Essentially, I have the 3D points of a bird that is moving in a confined area towards a stationary reward. I would like to calculate the distance of the animal to the reward at each point. However, when looking online for the best way to do this, I tried several options and get different results that I'm not sure how to interpret.
Example data:
reward = [[0.381605200000000,6.00214980000000,0.596942400000000]];
animal_path = = [2.08638710671220,-1.06496059617432,0.774253689976102;2.06262715454806,-1.01019576900787,0.773933446776898;2.03912411242035,-0.954888684677576,0.773408777383975;2.01583648760496,-0.898935333316342,0.772602855030873];
distance1 = sqrt(sum(([animal_path]-[reward]).^2));
distance2 = norm(animal_path - reward);
distance3 = pdist2(animal_path, reward);
Distance 1 gives 3.33919107083497 13.9693378592353 0.353216791787775
Distance 2 gives 14.3672145652704
Distance 3 gives 7.27198528565078
7.21319284516199
7.15394253573951
7.09412041863743
Why do these all yield different values (and different numbers of values)? Distance 3 seems to make the most sense for my purposes, even though the values are too large for the dimensions of the animal enclosure, which should be something like 3 or 4 meters.
Can someone please explain this in simple terms and/or point me to something less technical and jargon-y than the Matlab pages?
There are many things mathematicians call distance. What you normally associate with distance is the eucledian distance. This is what you want in this situation. The length of the line between two points. Now to your problem. The Euclidean distance distance is also called norm (or 2-norm).
For two points you can use the norm function, which means with distance2 you are already close to a solution. The problem is only, you input all your points at once. This does not calculate the distance for each point, instead it calculates the norm of the matrix. Something of no interest for you. This means you have to call norm once for each row point on the path:
k=nan(size(animal_path,1),1)
for p=1:size(animal_path,1),
k(p)=norm(animal_path(p,:) - reward);
end
Alternatively you can follow the idea you had in distance1. The only mistake you made there, you calculated the sum for each column, where the sum of each row was needed. Simple fix, you can control this using the second input argument of sum:
distance1 = sqrt(sum((animal_path-reward).^2,2))

3 points distance calculation strategy

Problem: I have (lat-long) co-ordinates of a lot of points a, b, c, d . . . in the database.
Now, when i choose point a, i need to calculate the distances of point a from each of the other points and get the closest one for eg. This math requires cos and tan calculations of the points. So this seems to be quite expensive on the db side.
So i thought of a strategy to simplify this. Below is the explanation of the strategy.
I have 3 known points (x, y, z) the distance between one point to the other is known. For this example lets assume to be 10. i.e. distance from x to y = 10; y to z = 10; z to x = 10. (this forms an equilateral triangle. but real scenario might not be the case)
Now lets say we have two points a and b. we calculate the distances of point a to x, y and z and store respectively and so for point b. (say application logic)
so we have:
for point a: Ax, Ay and Az
for point b: Bx, By and Bz
As for the strategy, the question is how can we calculate the distance between point a to point b.
As for the problem itself, if i apply the above strategy to my it, question is am I simplifying or complicating the situation?
Thanks in advance for your answer.
You can calculate the distance between two points using the Pythagorean theorem assuming they are given in Cartesian coordinates, no expensive trigonometry involved.
But this will probably still be to expensive if you have thousands or millions of points. Depending on the database you use it may offer spatial data types and can handle your query efficiently. See for example spatial data in SQL Server.
If your database does not support spatial data the problem becomes quite complex. You need a spatial index with efficient support for nearest neighbor queries. You can start at the Wikipedia article on spatial databases to learn how this problem is usually solved.
If your set of points is stable another option would be to just precompute and store the nearest neighbor for every point but this will get tricky when points are added, deleted or changed.

Find minimum distance between a point and a curve in MATLAB

I would like to use a MATLAB function to find the minimum length between a point and a curve? The curve is described by a complicated function that is not quite smooth. So I hope to use an existing tool of matlab to compute this. Do you happen to know one?
When someone says "its complicated" the answer is always complicated too, since I never know exactly what you have. So I'll describe some basic ideas.
If the curve is a known nonlinear function, then use the symbolic toolbox to start with. For example, consider the function y=x^3-3*x+5, and the point (x0,y0) =(4,3) in the x,y plane.
Write down the square of the distance. Euclidean distance is easy to write.
(x - x0)^2 + (y - y0)^2 = (x - 4)^2 + (x^3 - 3*x + 5 - 3)^2
So, in MATLAB, I'll do this partly with the symbolic toolbox. The minimal distance must lie at a root of the first derivative.
sym x
distpoly = (x - 4)^2 + (x^3 - 3*x + 5 - 3)^2;
r = roots(diff(distpoly))
r =
-1.9126
-1.2035
1.4629
0.82664 + 0.55369i
0.82664 - 0.55369i
I'm not interested in the complex roots.
r(imag(r) ~= 0) = []
r =
-1.9126
-1.2035
1.4629
Which one is a minimzer of the distance squared?
subs(P,r(1))
ans =
35.5086
subs(P,r(2))
ans =
42.0327
subs(P,r(3))
ans =
6.9875
That is the square of the distance, here minimized by the last root in the list. Given that minimal location for x, of course we can find y by substitution into the expression for y(x)=x^3-3*x+5.
subs('x^3-3*x+5',r(3))
ans =
3.7419
So it is fairly easy if the curve can be written in a simple functional form as above. For a curve that is known only from a set of points in the plane, you can use my distance2curve utility. It can find the point on a space curve spline interpolant in n-dimensions that is closest to a given point.
For other curves, say an ellipse, the solution is perhaps most easily solved by converting to polar coordinates, where the ellipse is easily written in parametric form as a function of polar angle. Once that is done, write the distance as I did before, and then solve for a root of the derivative.
A difficult case to solve is where the function is described as not quite smooth. Is this noise or is it a non-differentiable curve? For example, a cubic spline is "not quite smooth" at some level. A piecewise linear function is even less smooth at the breaks. If you actually just have a set of data points that have a bit of noise in them, you must decide whether to smooth out the noise or not. Do you wish to essentially find the closest point on a smoothed approximation, or are you looking for the closest point on an interpolated curve?
For a list of data points, if your goal is to not do any smoothing, then a good choice is again my distance2curve utility, using linear interpolation. If you wanted to do the computation yourself, if you have enough data points then you could find a good approximation by simply choosing the closest data point itself, but that may be a poor approximation if your data is not very closely spaced.
If your problem does not lie in one of these classes, you can still often solve it using a variety of methods, but I'd need to know more specifics about the problem to be of more help.
There's two ways you could go about this.
The easy way that will work if your curve is reasonably smooth and you don't need too high precision is to evaluate your curve at a dense number of points and simply find the minimum distance:
t = (0:0.1:100)';
minDistance = sqrt( min( sum( bxsfun(#minus, [x(t),y(t)], yourPoint).^2,2)));
The harder way is to minimize a function of t (or x) that describes the distance
distance = #(t)sum( (yourPoint - [x(t),y(t)]).^2 );
%# you can use the minimum distance from above as a decent starting guess
tAtMin = fminsearch(distance,minDistance);
minDistanceFitte = distance(tAtMin);

Using triplequad to calculate density (in Matlab)

As i've explained in a previous question: I have a dataset consisting of a large semi-random collection of points in three dimensional euclidian space. In this collection of points, i am trying to find the point that is closest to the area with the highest density of points.
As high performance mark answered;
the most straightforward thing to do would be to divide your subset of
Euclidean space into lots of little unit volumes (voxels) and count
how many points there are in each one. The voxel with the most points
is where the density of points is at its highest. Perhaps initially
dividing your space into 2 x 2 x 2 voxels, then choosing the voxel
with most points and sub-dividing that in turn until your criteria are
satisfied.
Mark suggested i use triplequad for this, but this is not a function i am familiar with, or understand very well. Does anyone have any pointers on how i could go about using this function in Matlab for what i am trying to do?
For example, say i have a random normally distributed matrix A = randn([300,300,300]), how could i use triplequad to find the point i am looking for? Because as i understand currently, i also have to provide triplequad with a function fun when using it. Which function should that be for this problem?
Here's an answer which doesn't use triplequad.
For the purposes of exposition I define an array of data like this:
A = rand([30,3])*10;
which gives me 30 points uniformly distributed in the box (0:10,0:10,0:10). Note that in this explanation a point in 3D space is represented by each row in A. Now define a 3D array for the counts of points in each voxel:
counts = zeros(10,10,10)
Here I've chosen to have a 10x10x10 array of voxels, but this is just for convenience, it would be only a little more difficult to have chosen some other number of voxels in each dimension, and there don't have to be the same number of voxels along each axis. Then the code
for ix = 1:size(A,1)
counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3))) = counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3)))+1
end
will count up the number of points in each of the voxels in counts.
EDIT
Unfortunately I have to do some work this afternoon and won't be able to get back to wrestling with the triplequad solution until later. Hope this is OK in the meantime.

How to generate this shape in Matlab?

In matlab, how to generate two clusters of random points like the following graph. Can you show me the scripts/code?
If you want to generate such data points, you will need to have their probability distribution to be able to generate the points.
For your point, I do not have the real distributions, so I can only give an approximation. From your figure I see that both lay approximately on a circle, with a random radius and a limited span for the angle. I assume those angles and radii are uniformly distributed over certain ranges, which seems like a pretty good starting point.
Therefore it also makes sense to generate the random data in polar coordinates (i.e. angle and radius) instead of the cartesian ones (i.e. horizontal and vertical), and transform them to allow plotting.
C1 = [0 0]; % center of the circle
C2 = [-5 7.5];
R1 = [8 10]; % range of radii
R2 = [8 10];
A1 = [1 3]*pi/2; % [rad] range of allowed angles
A2 = [-1 1]*pi/2;
nPoints = 500;
urand = #(nPoints,limits)(limits(1) + rand(nPoints,1)*diff(limits));
randomCircle = #(n,r,a)(pol2cart(urand(n,a),urand(n,r)));
[P1x,P1y] = randomCircle(nPoints,R1,A1);
P1x = P1x + C1(1);
P1y = P1y + C1(2);
[P2x,P2y] = randomCircle(nPoints,R2,A2);
P2x = P2x + C2(1);
P2y = P2y + C2(2);
figure
plot(P1x,P1y,'or'); hold on;
plot(P2x,P2y,'sb'); hold on;
axis square
This yields:
This method works relatively well when you deal with distributions that you can transform easily and when you can easily describe the possible locations of the points. If you cannot, there are other methods such as the inverse transforming sampling method which offer algorithms to generate the data instead of manual variable transformations as I did here.
K-means is not going to give you what you want.
For K-means, vectors are classified based on their nearest cluster center. I can only think of two ways you could get the non-convex assignment shown in the picture:
Your input data is actually higher-dimensional, and your sample image is just a 2-d projection.
You're using a distance metric with different scaling across the dimensions.
To achieve your aim:
Use a non-linear clustering algorithm.
Apply a non-linear transform to your input data. (Probably not feasible).
You can find a list on non-linear clustering algorithms here. Specifically, look at this reference on the MST clustering page. Your exact shape appears on the fourth page of the PDF together with a comparison of what happens with K-Means.
For existing MATLAB code, you could try this Kernel K-Means implementation. Also, check out the Clustering Toolbox.
Assuming that you really want to do the clustering operation on existing data, as opposed to generating the data itself. Since you have a plot of some data, it seems logical that you already know how to do that! If I am wrong in this assumption, then you should word your questions more carefully in the future.
The human brain is quite good at seeing patterns in things like this, that writing a code for on a computer will often take some serious effort.
As has been said already, traditional clustering tools such as k-means will fail. Luckily, the image processing toolbox has good tools for these purposes already written. I might suggest converting the plot into an image, using filled in dots to plot the points. Make sure the dots are large enough that they touch each other within a cluster, with some overlap. Then use dilation/erosion tools if necessary to make sure that any small cracks are filled in, but don't go so far as to cause the clusters to merge. Finally, use region segmentation tools to pick out the clusters. Once done, transform back from pixel units in the image into your spatial units, and you have accomplished your task.
For the image processing approach to work, you will need sufficient separation between the clusters compared to the coarseness within a cluster. But that seems obvious for any method to succeed.