mongodb GEO2D Index - kilometers? - mongodb

I have a geospatial index in my database. MongoDB's documentation says that the $near operator takes an argument in meters to find matches. For example:
db.data.find({'coordinates': SON([('$near', [20.450550042732765, 80.52327036857605]), ('$maxDistance', 100)])}).count()
should return matches within 100 meters of that location. But too many are returned, about 3,000. When I rewrite the query as
db.data.find({'coordinates': SON([('$near', [20.450550042732765, 80.52327036857605]), ('$maxDistance', 1)])}).count()
I receive 364 results. I do not think my database has 364 matches within a 1 meter radius of the point I query. Finally, if I query
db.data.find({'coordinates': SON([('$near', [20.450550042732765, 80.52327036857605]), ('$maxDistance', .1)])}).count()
I receive 330 matches. There are definitely not 330 results within 10 centimeters of that point.
I think mongodb interprets distance as kilometers, not meters. Can anyone confirm or explain what is going on?

If you are using the legacy 2d index, and it looks like that's what you're doing, then $maxDistance is in radians, as per the documentation you linked to.

Related

The best way to store given lat, long and address in mongo db such that finding address for a given lat, long is fast

I am scratching my head around from past two days to get it right.
I have a live data set of around 20M lat, long and address. Now, I want to store them in MongoDB such that query to find the address for a given lat, long is fast.
Some of the solutions that I found on MongoDB are :
index loc with 2dsphere.
but how good is find query for an exact match of lat, long?
If not using 2dsphere then what is the best way to store lat, long so that keep storage and index size at the minimum and get the performance.
MongoDB uses a GeoHash and a B-tree internally for its 2dsphere indices, which provides very fast area lookups using $near and $geoNear; you can use $minDistance and $maxDistance of 0 for exact matches but you may want to use a $maxDistance of 1 if you're worried about issues relating to floating point precision. In many cases it can be important to limit(1) your results if they are densely distributed in some places, although for street addresses that should not be an issue. On my dev machine I can query a collection with 40 million polygons, totalling nearly 50GB of data, in 300-400 ms.

How do I query for documents outside of given radius in MongoDB?

How do I do a geospatial query for items outside a given radius? I don't want results sorted by distance, I just want all results outside of a given distance. Is the best/only option to use $minDistance and $near? I believe this only returns results ranked by distance. Or can I create a query using $geoWithin (or some other operator) that specifies results outside a given distance from a point?

MongoDB/PyMongo geospatial query: distance of documents from a point

I've recently upgraded my MongoDB from version 2.2.1 to version 2.4.6, and pymongo to 2.6.2.
One of the reasons for the upgrade is the capability of the new version of MongoDB to calculate and return the distance of the documents (which include proper coordinates) from the center of a geospatial query as explained here.
So far I execute the following query:
db.collection.find({"loc": {"$within": {"$center": [[LON, LAT], RADIUS]}}})
where LON, LAT and Radius are proper numbers. I then programmatically calculate the distance from the center for each document returned.
Now I'm trying to have MongoDB to do the distance calculations on my behalf, because of higher efficiency compared to my code.
What I'm trying now is the following:
db.collection.find({"loc": {"$geoWithin": {"$centerSphere": [[LON, LAT], RADIUS]}}})
where RADIUS is now calculated properly (radius in km / 6371), but I get the same results as the older query.
How should I change the new query in order to get returned the extra field "dis" per every document returned?
The geospatial index is 2D, which should work according to docs, but I can change it to 2dsphere if necessary. Does anyone have a good suggestion?
Try using the $geoNear command in the aggregation framework. The $geoNear documentation is here:
http://docs.mongodb.org/manual/reference/aggregation/geoNear/
Your query will end up looking like:
db.collection.aggregate([{$geoNear:{near:[LON,LAT],distanceField:"distance",maxDistance:RADIUS,spherical:true}}])
and the resulting documents will have a field named "distance" with the calculated value. Hope that helps.

Is Mongodb geohaystack the same with standard mongodb spatial index?

It seems that mongodb has 2 types of geospatial index.
http://www.mongodb.org/display/DOCS/Geospatial+Indexing
The standard one. With a note:
You may only have 1 geospatial index per collection, for now. While
MongoDB may allow to create multiple indexes, this behavior is
unsupported. Because MongoDB can only use one index to support a
single query, in most cases, having multiple geo indexes will produce
undesirable behavior.
And then there is this so called geohaystack thingy.
http://www.mongodb.org/display/DOCS/Geospatial+Haystack+Indexing
They both claim to use the same algorithm. They both turn earth into several grids. And then search based on that.
So what's the different?
Mongodb doesn't seem to use Rtree and stuff right?
NB: Answer to this question that How does MongoDB implement it's spatial indexes? says that 2d index use geohash too.
The implementation is similar, but the use case difference is described on the Geospatial Haystack Indexing page.
The haystack indices are "bucket-based" (aka "quadrant") searches tuned for small-region longitude/latitude searches:
In addition to ordinary 2d geospatial indices, mongodb supports the use
of bucket-based geospatial indexes. Called "Haystack indexing", these
indices can accelerate small-region type longitude / latitude queries
when additional criteria is also required.
For example, "find all restaurants within 25 miles with name 'foo'".
Haystack indices allow you to tune your bucket size to the distribution
of your data, so that in general you search only very small regions of
2d space for a particular kind of document. They are not suited for
finding the closest documents to a particular location, when the
closest documents are far away compared to bucket size.
The bucketSize parameter is required, and determines the granularity of the haystack index.
So, for example:
db.places.ensureIndex({ pos : "geoHaystack", type : 1 }, { bucketSize : 1 })
This example bucketSize of 1 creates an index where keys within 1 unit of longitude or latitude are stored in the same bucket. An additional category can also be included in the index, which means that information will be looked up at the same time as finding the location details.
The B-tree representation would be similar to:
{ loc: "x,y", category: z }
If your use case typically searches for "nearby" locations (i.e. "restaurants within 25 miles") a haystack index can be more efficient. The matches for the additional indexed field (eg. category) can be found and counted within each bucket.
If, instead, you are searching for "nearest restaurant" and would like to return results regardless of distance, a normal 2d index will be more efficient.
There are currently (as of MongoDB 2.2.0) a few limitations on haystack indexes:
only one additional field can be included in the haystack index
the additional index field has to be a single value, not an array
null long/lat values are not supported
Note: distance between degrees of latitude will vary greatly (longitude, less so). See: What is the distance between a degree of latitude and longitude?.

MongoDB spatial query for Polygons

I will build a GIS system based on polygons, not just points. I wanted to use MongoDB or PostGIS.
How do this in MongoDB?
Query A - get the center of a polygon
Query B - distance between two polygons
Query C - list of polygons that are part of a third that I specify
Query D - near-distance of the polygon
Support SRID?
MongoDB's geospatial indexing currently only indexes points. Although it does support proximity and bounds queries, documents are matched by a single point. You may be able to take advantage of multi-location documents and index multiple points along a polygon, which might support some of your queries with reduced precision; however, that would certainly not be ideal.
PostGIS seems more appropriate for your requirements.