3 points distance calculation strategy - distance

Problem: I have (lat-long) co-ordinates of a lot of points a, b, c, d . . . in the database.
Now, when i choose point a, i need to calculate the distances of point a from each of the other points and get the closest one for eg. This math requires cos and tan calculations of the points. So this seems to be quite expensive on the db side.
So i thought of a strategy to simplify this. Below is the explanation of the strategy.
I have 3 known points (x, y, z) the distance between one point to the other is known. For this example lets assume to be 10. i.e. distance from x to y = 10; y to z = 10; z to x = 10. (this forms an equilateral triangle. but real scenario might not be the case)
Now lets say we have two points a and b. we calculate the distances of point a to x, y and z and store respectively and so for point b. (say application logic)
so we have:
for point a: Ax, Ay and Az
for point b: Bx, By and Bz
As for the strategy, the question is how can we calculate the distance between point a to point b.
As for the problem itself, if i apply the above strategy to my it, question is am I simplifying or complicating the situation?
Thanks in advance for your answer.

You can calculate the distance between two points using the Pythagorean theorem assuming they are given in Cartesian coordinates, no expensive trigonometry involved.
But this will probably still be to expensive if you have thousands or millions of points. Depending on the database you use it may offer spatial data types and can handle your query efficiently. See for example spatial data in SQL Server.
If your database does not support spatial data the problem becomes quite complex. You need a spatial index with efficient support for nearest neighbor queries. You can start at the Wikipedia article on spatial databases to learn how this problem is usually solved.
If your set of points is stable another option would be to just precompute and store the nearest neighbor for every point but this will get tricky when points are added, deleted or changed.

Related

euclidean distance for each point in a point cloud

I want to compute the minimum eucidean distance between each point in one point cloud and all the other points in the second point cloud.
My point clouds are named pc1 and pc2.
Np is a matrix of normal vectors for each point.
sofar I use the following code:
dist = zeros(size(pc2,1),1);
sign = zeros(size(pc2,1),1);
for i = 1:size(pc2,1)
d = (pc1(:,1)-pc2(i,1)).^2 + (pc1(:,2)-pc2(i,2)).^2 + (pc1(:,3) - pc2(i,3)).^2;
[d, ind] = min(d);
dist(i,1) = sqrt(d);
sign(i,1) = Np(ind,:)*(pc2(i,:)-pc1(ind,:))';
end
the last bit with "sign" is from this paper. I added it because I want to know if my point lies inside or outside the other point cloud. (I received the point clouds from STL files and they represent surfaces)
Since I am working with rather large point clouds around 200.000 to 3.000.000 points the computation takes a while and I was wondering if anyone could suggest optimizations for my code.
Maybe it could be vectorized and I'm not seeing it.
All your suggestions are welcome. Thank you in advance for your time and help.
edit: just to clearify. Each row in my point cloud matrix is a point. the first column is the x-, the second the y-, the third the z-value.
You can certainly do this in vectorized form, use pdist2 and min as shown below.
dmat = pdist2(pc1, pc2);
[dist, ind] = min(dmat);
I believe the following will work for sign, but you should verify the results. May have to tweak slightly based on the matrix shapes.
sign = sum(Np(ind,:).*(pc2-pc1(ind,:)), 2);
Another alternative is to use KdTree's. This is also the approach of PCL.
With that approach, you basically create a KdTree from the reference point cloud, and then do a nearest neighbor search for each point from the query point cloud.
This approach will be magnitudes faster than the brute-force approach.

Finding length between a lot of elements

I have an image of a cytoskeleton. There are a lot of small objects inside and I want to calculate the length between all of them in every axis and to get a matrix with all this data. I am trying to do this in matlab.
My final aim is to figure out if there is any axis with a constant distance between the object.
I've tried bwdist and to use connected components without any luck.
Do you have any other ideas?
So, the end goal is that you want to globally stretch this image in a certain direction (linearly) so that the distances between nearest pairs end up the closest together, hopefully the same? Or may you do more complex stretching ? (note that with arbitrarily complex one you can always make it work :) )
If linear global one, distance in x' and y' is going to be a simple multiplication of the old distance in x and y, applied to every pair of points. So, the final euclidean distance will end up being sqrt((SX*x)^2 + (SY*y)^2), with SX being stretch in x and SY stretch in y; X and Y are distances in X and Y between pairs of points.
If you are interested in just "the same" part, solution is not so difficult:
Find all objects of interest and put their X and Y coordinates in a N*2 matrix.
Calculate distances between all pairs of objects in X and Y. You will end up with 2 matrices sized N*N (with 0 on the diagonal, symmetric and real, not sure what is the name for that type of matrix).
Find minimum distance (say this is between A an B).
You probably already have this. Now:
Take C. Make N-1 transformations, which all end up in C->nearestToC = A->B. It is a simple system of equations, you have X1^2*SX^2+Y1^2*SY^2 = X2^2*SX^2+Y2*SY^2.
So, first say A->B = C->A, then A->B = C->B, then A->B = C->D etc etc. Make sure transformation is normalized => SX^2 + SY^2 = 1. If it cannot be found, the only valid transformation is SX = SY = 0 which means you don't have solution here. Obviously, SX and SY need to be real.
Note that this solution is unique except in case where X1 = X2 and Y1 = Y2. In this case, grab some other point than C to find this transformation.
For each transformation check the remaining points and find all nearest neighbours of them. If distance is always the same as these 2 (to a given tolerance), great, you found your transformation. If not, this transformation does not work and you should continue with the next one.
If you want a transformation that minimizes variations between distances (but doesn't require them to be nearly equal), I would do some optimization method and search for a minimum - I don't know how to find an exact solution otherwise. I would pick this also in case you don't have linear or global stretch.
If i understand your question correctly, the first step is to obtain all of the objects center of mass points in the image as (x,y) coordinates. Then, you can easily compute all of the distances between all points. I suggest taking a look on a histogram of those distances which may provide some information as to the nature of distance distribution (for example if it is uniformly random, or are there any patterns that appear).
Obtaining the center of mass points is not an easy task, consider transforming the image into a binary one, or some sort of background subtraction with blob detection or/and edge detector.
For building a histogram you can use histogram.

DBSCAN with R*-Tree - how it works

Whether someone can explain to me how dbscan algorithm works with R*-Tree? I understand work of dbscan, it seems, I understand as the R*-Tree works, but I can't connect them together.
Initially, I have data - feature vectors with 8 features, and I don't understand how I have to process them for construct R*-Tree. I will be grateful if someone lists the main steps which I have to pass.
I apologize if my question is obvious, but it causes difficulties in me.
Thanks in advance!
An R*-Tree indexes arbitrary geometric objects by their bounding box. In your case, as you have only points, the minimum and maximum values of your bounding box are the same. Every R*-Tree has a function like rtree.add_element(object, boundingbox). object would be the index of the data point and boundingbox would be as mentioned above.
The connecting point is the regionQuery part of DBSCAN. regionQuery(p) of a data point p returns all the data points q for which euclideanDistance(p,q) ≤ ε (value of parameter ε is provided by the user).
Naïvely, you could compute the distance of all your data points to p, which takes O(n) time for one data point, hence querying all your n data points takes o(n²) time. Alternatively you could precompute a matrix which holds the euclidean distances of all your data points to each other. This then takes O(n²) of space whereas regionQuery of one point just has to be looked up in that matrix.
An R*-Tree enables you to look up data points within coordinate ranges in O(log n) time. However, an R*-Tree only allows queries of the form
"All points where: Coordinate 1 in [ 0.3 ; 0.5 ] AND Coordinate 2 in [ 0.8 ; 1.0 ]"
and not
"All points q where: euclideanDistance(p,q) ≤ ε"
Therefore, you query the R*-Tree for points where each coordinate is the respective corrdinate of p±ε and then calculate the euclidean distance of all matching points to your query point p. The difference, however, is that these are far less points to check than if you would calculate the euclidean distance of p to all of your points. Therefore, your time complexity of one regionQuery is now O(log n * m), where m is the number of points returned by your R*-tree. If you choose ε small, you will get few matching points from your R*-tree and m will be small. So your time complexity approaches O(log n) for one regionQuery and therefore O(n * log n) for one regionQuery for each of your data poins. On the other extreme, if you choose ε so large that it will encompass most of your data points, m will approach n and therefore, the time needed for one regionQuery for each data point approaches O(n * log n * n) = O(n² * log n ) again, so you gain nothing compared to the naïve approach.
It is therefore of crucial importance that you choose ε small enough so that every point has only a few other points within euclidean distance of ε.
The R*-tree is a spatial index.
It can find the neighbors faster.

K means clustring find k farthest points in java

I'm trying to implement k means clustering.
I've a set of points with coordinates (x,y) and i am using Euclidean distance for finding distance. I've computed distance between all points in a matrix
dist[i][j] - distance between points i and j
when i choose a[1][3] farthest from pt 1 as 3.
then when i search farthest from 3 i may get a[3][j] but a[1][j] may be minimum.
[pt j is far from pt3 but near to 1]
so how to choose k farthest points using the distance matrix.
Note that the k-farthest points do not necessarily yield the best result: they clearly aren't the best cluster center estimates.
Plus, since k-means heuristics may get stuck in a local minimum, you will want a randomized algorithm that allows you to restart the process multiple times and get potentiall different results.
You may want to look at k-means++ which is a known good heuristic for k-means initialization.

How do I create orthogonal basis based on two almost perpendicular vectors?

I am trying to create an orthogonal coordinate system based on two "almost" perpendicular vectors, which are deduced from medical images. I have two vectors, for example:
Z=[-1.02,1.53,-1.63];
Y=[2.39,-1.39,-2.8];
that are almost perpendicular, since their inner product is equal to 5e-4.
Then I find their cross product to create my 3rd basis:
X=cross(Y,Z);
Even this third vector is not completely orthogonal to Z and Y, as their inner products are in the order of -15 and -16, but I guess that is almost zero. In order to use this set of vectors as my orthogonal basis for a local coordinate system, I assume they should be almost completely perpendicular. I first thought I can do this by rounding my vectors to less decimal figures, but did not help. I guess I need to find a way to alter my initial vectors a little to make them more perpendicular, but I don't know how to do that.
I would appreciate any suggestions.
Gram-Schmidt is right as pointed out above.
Basically, you want to subtract the component of Y that is in the direction of Z from Y (Note: you can alternatively operate on Z instead of Y).
The component of Y in the Z direction is given by:
dot(Y,Z)*Z/(norm(Z)^2)
(projection of Y onto Z)
Note that if Y is orthogonal to Z, then this is 0.
So:
Y = Y - dot(Y,Z)*Z/(norm(Z)^2)
and Z stays unchanged.
let V=Y+aZ
Z dot V = 0 so you can solve a and get V
Now use V and Z as you basis
You may need to normalize the vectors and use double type to get the desired precision.