Efficient way to select all points inside radius - mongodb

I'm working with meanstack application. so I have mongodb collection that contain worldwide locations. Schema loooks like follows.
[{
address : "example address",
langitude : 79.8816,
latitude : 6.773
},
{...}]
What is the efficient way to select all points inside Circle( pre defined point and pre defined radius) ..??
Using SQL Query also we can do this. But I want a efficient way without iterating one by one and check weather it is inside that radius or not.

Distance d between two points with coordinates {lat1,lon1} and {lat2,lon2} is given by
d = 2*asin(sqrt((sin((lat1-lat2)/2))^2 +
cos(lat1)*cos(lat2)*(sin((lon1-lon2)/2))^2))
Above formula gives d in radians. To get the answer in km, use the formula below.
distance_km ≈ radius_km * distance_radians ≈ 6371 * d
Here, the magic number 6371 is approx. the radius of planet earth. This is the minimum computation that you will have to do in your case. You can compute this distance and select all the records having a distance less than your radius value.
Better approach
You can use Elasticsearch which supports geolocation queries for better performance.

Related

Postgis ST_Distance_Sphere giving about 1.7 too high result

I am new into Postgis and spatial stuff and I am struggling with quite simple query.
I have two records in places for test, where column addressLocation is a POINT with following values:
(51.122711,17.031819)
(51.122805,17.035522)
I am trying to make a query:
SELECT *
FROM places
WHERE ST_Distance_Sphere("addressLocation"::geometry, ST_MakePoint(51.122711, 17.033686)) <= 255;
51.122711, 17.033686 Is about in the center between both of these points and distance measured on Google maps is about 125 and 128 meters.
The issue is that (51.122805,17.035522) got into results with 205 as limit and the other one with 210.
I was looking through the PostGIS docs and cannot find any explanation for such inaccuracy.
Coordinates in PostGIS must be expressed as longitude/latitude, while in Google Map they are expressed as latitude/longitude.
Your query is computing distances in Yemen:
Select ST_DistanceSphere(st_geomFromText('POINT(51.122711 17.031819)'),
st_geomFromText('POINT(51.122711 17.033686)'));
st_distancesphere
-------------------
207.60121386
While by swapping the coordinates, the points are in Poland and the distance is:
Select ST_DistanceSphere(st_geomFromText('POINT(17.031819 51.122711)'),
st_geomFromText('POINT(17.033686 51.122711)'));
st_distancesphere
-------------------
130.30184168

PostgreSQL scoring distance with trigrams

I would like to understand why the distance between 2 words is so big with just a letter reverse.
Example:
SELECT name, searchable, searchable <-> 'fluerie' AS dist FROM shd_appellations WHERE region_id=3 ORDER BY dist LIMIT 1;
Result:
fleurie => 0,6666666
So, the distance is 0,666 between fluerie and fleurie ! I know the word is truncated into "flu", "lue", ... "rie" before comparing but I need to set a limit in my code and I fix it to 0.5 maximum distance to obtain a result... so in this case the matching is not taken.
I could set a max distance of 0.75... but I don't want to match other terms not related to search in some other cases.
An idea: could it be logical to increase the distance for small words, so depending search term width ?

Distance order of coordinates equation without acos

I need an equation that will compute the distance order between 2 coordinates(the result unit doesnt matter. I want obtain only sorted list).
The equation can include cosinus and sinus of specyfic cord, but can't acos and sin/cos of the difrence cord1-cord2.
For example: it can include cos(cord1.lattitude) +sin(cord2.longitude) but cant sin(cord1.lattitude-cord2.lattitude).
Someone told me that such of that equation exist but I cannot find it on the internet.
EDIT: I found the solution that includes the earth curve and gives pretty solid solution. This query gives us the Descending distance order value of two coordinates.Compute the excacly distance is not a problem when we got correct order of data:
(sin(currentPosLattitude) * sin(targetLattitude)) +(cos(currentPositionLattitude) * cos(targetLattitude))*
(sin(currentPosLongitude) * sin(targetLongittude) + cos(currentPosLongitude)*cos(targetLongittude))

Most efficient way to find points within a certain radius from a given point

I've read several questions + answers here on SO about this theme, but I can't understand which is the common way (if there is one...) to find all the points whithin a "circle" having a certain radius, centered on a given point.
In particular I found two ways that seem the most convincing:
select id, point
from my_table
where st_Distance(point, st_PointFromText('POINT(-116.768347 33.911404)', 4326)) < 10000;
and:
select id, point
from my_table
where st_Within(point, st_Buffer(st_PointFromText('POINT(-116.768347 33.911404)', 4326), 10000));
Which is the most efficient way to query my database? Is there some other option to consider?
Creating a buffer to find the points is a definite no-no because of (1) the overhead of creating the geometry that represents the buffer, and (2) the point-in-polygon calculation is much less efficient than a simple distance calculation.
You are obviously working with (longitude, latitude) data so you should convert that to an appropriate Cartesian coordinate system which has the same unit of measure as your distance of 10,000. If that distance is in meter, then you could also cast the point from the table to geography and calculate directly on the (long, lat) coordinates. Since you only want to identify the points that are within the specified distance, you could use the ST_DWithin() function with calculation on the sphere for added speed (don't do this when at very high latitudes or with very long distances):
SELECT id, point
FROM my_table
WHERE ST_DWithin(point::geography,
ST_GeogFromText('POINT(-116.768347 33.911404)'),
10000, false);
I have used following query
SELECT *, ACOS(SIN(latitude) * SIN(Lat)) + COS(latitude) * COS(Lat) * COS(longitude) - (Long)) ) * 6380 AS distance FROM Table_tab WHERE ACOS( SIN(latitude) * SIN(Lat) + COS(latitude) * COS(Lat) * COS(longitude) - Long )) * 6380 < 10
In above query latitude and longitude are from database and lat, long are the points from we want to search.
WORKING : it will calculate the distance(In KM) between all the points in database from search points and check if the distance is less then 10 KM. It will return all the co-ordinates within 10 KM.
I do not know how postgis does it best, but in general:
Depending on your data it might be best to first search in a square bounding box (which contains the search area circle) in order to eliminate a lot of candidates, this should be extremely fast as you can use simple range operators on lon/lat which are ideally indexed properly for this.
In a second step search using the radius.
Also if your limit max points is relatively low and you know you have a lot of candidates, you may simply do a first 'optimistic' attempt with a box inside your circle, if you find enough points you are done !

Which GEO implementation to use for millions of points

I am trying to figure out which GEO implementation to use to find the nearest points based on long/lat to a certain point. I will have millions if not billions of different latitude/longitude points that will need to be compared. I have been looking at many different implementations to do the job I need to be done. I have looked into Postgis (looks like it is very popular and performs well), Neo4J (Graph databases are a new concept to me and I am unsure how they perfrom), AWS dynamodb geohash (Scales very well, but only library is written in Java, I am hoping to write a library in node.js), etc but can't figure out which would perform best. I am purely looking into performance opposed to number of features. All I need to be able is to compare one point to all points and find the closest (read operation), and as well, be able to change a point in the database quickly (write operation). Could anyone suggest based on these requirements a good implementation
PostGIS has several function for geohashing. If you make your strings long enough the search becomes quicker (fewer collisions per box + its 8 neighbours) but the geohash generation slower on inserting new points.
The question is also how accurate you want to be. At increasing latitude, lat/long "distance" deteriorates because a degree of longitude shrinks from about 110km at the Equator to 0 at the poles, while a degree of latitude is always about 110km. At the mid-latitude of 45 degrees a degree of longitude is nearly 79km, giving an error in distance of a factor of 2 (sqr(110/79)). Spherical distance to give you true distance between lat/long pairs is very expensive to calculate (lots of trigonometry going on) and then you geohashing won't work (unless you convert all points to planar coordinates).
A solution that might work is the following:
CREATE INDEX hash8 ON tablename(substring(hash_column FROM 1 FOR 8)). This gives you an index on a box twice as large as your resolution, which helps finding points and reducing the need to search neighbouring hash boxes.
On INSERT of a point, compute its geohash of length 9 (10m resolution approx.) into hash_column, using PostGIS. You could use a BEFORE INSERT TRIGGER here.
In a function:
Given a point, find the nearest point by looking for all points with a geohash value shortened to 8 chars equal to the given points 8-char geohash (hence the index above).
Compute the distance to each of the encountered points using spherical coordinates, keeping the closest point. But since you are only looking for the nearest point (at least initially), do not search on distance using spherical coordinates, but use the below optimization, which should make the search much much faster.
Compute if the given point is closer to the edge of the box determined by the 8-char geohash than the closest computed point. If so, repeat the procedure with the 7-char geohash on all points in its 8 neighbours. This can be highly optimized by computing distances to individual box sides and corners and evaluating only the relevant neighbour hash boxes; I leave this to you to tinker with.
In any case, this will not be particularly speedy. If you are indeed going towards billions of points you might want to think about clustering which has a rather "natural" solution for geohashing (break up your table on substring(hash_column FROM 1 FOR 2) for instance, giving you four quadrants). Just make sure that you account for cross-boundary searches.
Two optimizations can be made fairly quickly:
First, "normalize" your spherical coordinates (meaning: compensate for the reduced length of a degree of longitude with increasing latitude) so that you can search for nearest points using a "pseudo-cartesian" approach. This only works if points are close together, but since you are working with lots of points this should not be a problem. More specifically, this should work for all points in geohash boxes of length 6 or more.
Assuming the WGS84 ellipsoid (which is used in all GPS devices), the Earth's major axis (a) is 6,378,137 meter, with an ellipticity (e2) of 0.00669438. A second of longitude has a length of
longSec := Pi * a * cos(lat) / sqrt(1 - e2 * sqr(sin(lat))) / 180 / 3600
or
longSec := 30.92208078 * cos(lat) / sqrt(1 - 0.00669438 * sqr(sin(lat)))
For a second of latitude:
latSec := 30.870265 - 155.506 * cos(2 * lat) + 0.0003264 + cos(4 * lat)
The correction factor to make your local coordinate system "square" is by multiplying your longitude values by longSec/latSec.
Secondly, since you are looking for the nearest point, do not search on distance because of the computationally expensive square root. Instead, search on the term within the square root, the squared distance if you will, because this has the same property of selecting for the nearest point.
In pseudo-code:
CREATE FUNCTION nearest_point(pt geometry, ptHash8 char(8)) RETURNS integer AS $$
DECLARE
corrFactor double precision;
ptLat double precision;
ptLong double precision;
currPt record;
minDist double precision;
diffLat double precision;
diffLong double precision;
minId integer;
BEGIN
minDist := 100000000.; -- a large value, 10km (squared)
ptLat := ST_Y(pt);
ptLong := ST_X(pt);
corrFactor := 30.92208078 * cos(radians(ptLat)) / (sqrt(1 - 0.00669438 * power(sin(radians(ptLat)), 2)) *
(30.870265 - 155.506 * cos(2 * radians(ptLat)) + 0.0003264 + cos(4 * radians(ptLat))));
FOR currPt IN SELECT * FROM all_points WHERE hash8 = ptHash8
LOOP
diffLat := ST_Y(currPt.pt) - ptLat;
diffLong := (ST_X(currPt.pt) - ptLong) * corrFactor; -- "square" things out
IF (diffLat * diffLat) < (minDist * diffLong * diffLong) THEN -- no divisions here to speed thing up a little further
minDist := (diffLat * diffLat) / (diffLong * diffLong); -- this does not happen so often
minId := currPt.id;
END IF;
END LOOP;
IF minDist < 100000000. THEN
RETURN minId;
ELSE
RETURN NULL;
END IF;
END; $$ LANGUAGE PLPGSQL STRICT;
Needless to say, this would be a lot faster in a C language function. Also, do not forget to do boundary checking to see if neighbouring geohash boxes need to be searched.
Incidentally, "spatial purists" would not index on the 8-char geohash and search from there; instead they would start from the 9-char hash and work outwards from there. However, a "miss" in your initial hash box (because there are no other points or you are close to a hash box side) is expensive because you have to start computing distances to neighbouring hash boxes and pull in more data. In practice you should work from a hash box which is about twice the size of the typical nearest point; what that distance is depends on your point set.