I've recently upgraded my MongoDB from version 2.2.1 to version 2.4.6, and pymongo to 2.6.2.
One of the reasons for the upgrade is the capability of the new version of MongoDB to calculate and return the distance of the documents (which include proper coordinates) from the center of a geospatial query as explained here.
So far I execute the following query:
db.collection.find({"loc": {"$within": {"$center": [[LON, LAT], RADIUS]}}})
where LON, LAT and Radius are proper numbers. I then programmatically calculate the distance from the center for each document returned.
Now I'm trying to have MongoDB to do the distance calculations on my behalf, because of higher efficiency compared to my code.
What I'm trying now is the following:
db.collection.find({"loc": {"$geoWithin": {"$centerSphere": [[LON, LAT], RADIUS]}}})
where RADIUS is now calculated properly (radius in km / 6371), but I get the same results as the older query.
How should I change the new query in order to get returned the extra field "dis" per every document returned?
The geospatial index is 2D, which should work according to docs, but I can change it to 2dsphere if necessary. Does anyone have a good suggestion?
Try using the $geoNear command in the aggregation framework. The $geoNear documentation is here:
http://docs.mongodb.org/manual/reference/aggregation/geoNear/
Your query will end up looking like:
db.collection.aggregate([{$geoNear:{near:[LON,LAT],distanceField:"distance",maxDistance:RADIUS,spherical:true}}])
and the resulting documents will have a field named "distance" with the calculated value. Hope that helps.
Related
I'm on MongoDB Compass Version 1.5.1 for mac.
When I look at distribution of values, Compass returns plots like the following:
As you can see, min and max value are available. But min values are wrong. I know the minimum values of those two keys are 1 and 1, not 9 and 13.
Does Anyone know how to fix that problem?
Got it. The standard report is based on a sample of max 1000 documents.
From the doc:
Sampling in MongoDB Compass is the practice of selecting a subset of
data from the desired collection and analyzing the documents within
the sample set.
Sampling is commonly used in statistical analysis because analyzing a
subset of data gives similar results to analyzing all of the data. In
addition, sampling allows results to be generated quickly rather than
performing a potentially long and computationally expensive collection
scan.
MongoDB Compass employs two distinct sampling mechanisms.
Collections in MongoDB 3.2 are sampled via the $sample operator in the
aggregation framework of the core server. This provides efficient
random sampling without replacement over the entire collection, or
over the subset of documents specified by a query.
Collections in MongoDB 3.0 and 2.6 are sampled via a backwards
compatible algorithm executed entirely within Compass. It comprises
three phases:
Query for a stream of _id values, limit 10000 descending by _id
Read the stream of _ids and save sampleSize randomly chosen values. We
employ reservoir sampling to perform this efficiently.
Then query the selected random documents by _id The choice of sampling > method is transparent in usage to the end-user.
sampleSize is currently set to 1000 documents.
I have a geospatial index in my database. MongoDB's documentation says that the $near operator takes an argument in meters to find matches. For example:
db.data.find({'coordinates': SON([('$near', [20.450550042732765, 80.52327036857605]), ('$maxDistance', 100)])}).count()
should return matches within 100 meters of that location. But too many are returned, about 3,000. When I rewrite the query as
db.data.find({'coordinates': SON([('$near', [20.450550042732765, 80.52327036857605]), ('$maxDistance', 1)])}).count()
I receive 364 results. I do not think my database has 364 matches within a 1 meter radius of the point I query. Finally, if I query
db.data.find({'coordinates': SON([('$near', [20.450550042732765, 80.52327036857605]), ('$maxDistance', .1)])}).count()
I receive 330 matches. There are definitely not 330 results within 10 centimeters of that point.
I think mongodb interprets distance as kilometers, not meters. Can anyone confirm or explain what is going on?
If you are using the legacy 2d index, and it looks like that's what you're doing, then $maxDistance is in radians, as per the documentation you linked to.
How do I do a geospatial query for items outside a given radius? I don't want results sorted by distance, I just want all results outside of a given distance. Is the best/only option to use $minDistance and $near? I believe this only returns results ranked by distance. Or can I create a query using $geoWithin (or some other operator) that specifies results outside a given distance from a point?
It seems that mongodb has 2 types of geospatial index.
http://www.mongodb.org/display/DOCS/Geospatial+Indexing
The standard one. With a note:
You may only have 1 geospatial index per collection, for now. While
MongoDB may allow to create multiple indexes, this behavior is
unsupported. Because MongoDB can only use one index to support a
single query, in most cases, having multiple geo indexes will produce
undesirable behavior.
And then there is this so called geohaystack thingy.
http://www.mongodb.org/display/DOCS/Geospatial+Haystack+Indexing
They both claim to use the same algorithm. They both turn earth into several grids. And then search based on that.
So what's the different?
Mongodb doesn't seem to use Rtree and stuff right?
NB: Answer to this question that How does MongoDB implement it's spatial indexes? says that 2d index use geohash too.
The implementation is similar, but the use case difference is described on the Geospatial Haystack Indexing page.
The haystack indices are "bucket-based" (aka "quadrant") searches tuned for small-region longitude/latitude searches:
In addition to ordinary 2d geospatial indices, mongodb supports the use
of bucket-based geospatial indexes. Called "Haystack indexing", these
indices can accelerate small-region type longitude / latitude queries
when additional criteria is also required.
For example, "find all restaurants within 25 miles with name 'foo'".
Haystack indices allow you to tune your bucket size to the distribution
of your data, so that in general you search only very small regions of
2d space for a particular kind of document. They are not suited for
finding the closest documents to a particular location, when the
closest documents are far away compared to bucket size.
The bucketSize parameter is required, and determines the granularity of the haystack index.
So, for example:
db.places.ensureIndex({ pos : "geoHaystack", type : 1 }, { bucketSize : 1 })
This example bucketSize of 1 creates an index where keys within 1 unit of longitude or latitude are stored in the same bucket. An additional category can also be included in the index, which means that information will be looked up at the same time as finding the location details.
The B-tree representation would be similar to:
{ loc: "x,y", category: z }
If your use case typically searches for "nearby" locations (i.e. "restaurants within 25 miles") a haystack index can be more efficient. The matches for the additional indexed field (eg. category) can be found and counted within each bucket.
If, instead, you are searching for "nearest restaurant" and would like to return results regardless of distance, a normal 2d index will be more efficient.
There are currently (as of MongoDB 2.2.0) a few limitations on haystack indexes:
only one additional field can be included in the haystack index
the additional index field has to be a single value, not an array
null long/lat values are not supported
Note: distance between degrees of latitude will vary greatly (longitude, less so). See: What is the distance between a degree of latitude and longitude?.
I will build a GIS system based on polygons, not just points. I wanted to use MongoDB or PostGIS.
How do this in MongoDB?
Query A - get the center of a polygon
Query B - distance between two polygons
Query C - list of polygons that are part of a third that I specify
Query D - near-distance of the polygon
Support SRID?
MongoDB's geospatial indexing currently only indexes points. Although it does support proximity and bounds queries, documents are matched by a single point. You may be able to take advantage of multi-location documents and index multiple points along a polygon, which might support some of your queries with reduced precision; however, that would certainly not be ideal.
PostGIS seems more appropriate for your requirements.