I'm using MongoDB 2.6.3 to query against a large collection of geospatial data. Specifically, I'm looking at querying the dataset for all pings within a few kilometers of a central location, and then collapsing them by user identifier to get a count of how many pings each user has.
Naturally, I'm using MongoDB aggregation for this, and specifically the $geoNear pipeline stage. However, it looks like, even though aggregation returns a cursor in 2.6.0, the $geoNear still has restrictions on the size of the result set tied to when aggregation returned a document. Namely, aggregation with $geoNear is returning only 65,000 records, while an equivalent (cursored) query is returning 200,000+.
Does anyone have any insight as to how I can perform large-scale aggregation then with geoNear?
edit:
Sample document:
{
"initial_epoch_time" : 1370062800,
"location" : [
-72.3458073902,
41.8241332683
],
"_id" : ObjectId("540a34050dc2520000912286"),
"__v" : 0
}
The following cursored query returns a count of ~200,000 documents, which I suspect is the correct number:
var cursor = db.pings.find( { location : { $near: { $geometry: { type: 'Point', coordinates: [-71.10560939999999, 42.3465666] }, $maxDistance: 10*1000 } } } )
var ctr = 0;
while(cursor.hasNext())
{ ctr++;
var ping = cursor.next()
}
print(ctr)
while the following aggregation-based query:
var cursor = db.pings.aggregate ( [ {$geoNear: { near: {type: "Point", coordinates: [-71.10560939999999, 42.3465666]},limit: 100000000, spherical: true, maxDistance: 10*1000, distanceField: "distance"} } ] )
var ctr = 0;
while(cursor.hasNext())
{ ctr++;
var ping = cursor.next()
}
print(ctr)
returns ~65,000 documents, regardless of maxDistance.
$geoNear as an command has a 16MB document limit to the output. I've found out that you won't retrieve any errors but the document will be automatically cut to the aggregation's document size limit. You can test by reducing size of your documents in the collection. You will get more results when you make your collection's documents smaller.
Line 235:
https://github.com/mongodb/mongo/blob/master/src/mongo/db/commands/geo_near_cmd.cpp
Related
I have a route that does a $near search on my mongo database. It returns documents that have a geo tag within 100 miles. This request searches each document for "bandLocation". I have a second field, that is indexed, called bandTour - It holds several other geo locations in the same format. I want the request to also include these locations but have been unsuccessful - How do I add the second query?
Here is my route - If "bandLocation" is the only request, it works... How would I add "bandTour"?
router.get('/allbands/:lng/:lat', (req, res) => {
quoteGenerator.find(
{
"bandLocation.geometry":
{ $near :
{
$geometry: {
type: "Point",
coordinates: [parseFloat(req.params.lng), parseFloat(req.params.lat)]
},
$maxDistance: 160934,
}
},
"bandTour.geometry":
{ $near :
{
$geometry: {
type: "Point",
coordinates: [parseFloat(req.params.lng), parseFloat(req.params.lat)]
},
$maxDistance: 160934,
}
}
})
.then(
function(bands){
res.send(bands)
}
)
});
MongoDB only supports a single geo-expressions such as $near in a query. While aggregation does support geo queries, it is only supported as the first stage of the pipeline, so aggregation is not a solution in this case.
You would need to implement this as 2 queries, then combine the results. Since your sample query is requiring both locations to match, you could use a find with $near on bandLocation, projecting just the _id field. Then use the returned _id values to build a second query that tests for _id:{$in:[_array of ids_]} along with the $near on bandTour
I am using a $geoNear as the first step in the aggregation framework. I need to filter out the results based on "tag" field and it works fine but I see there are 2 ways both giving different results.
Sample MongoDB Document
{
"position": [
40.80143,
-73.96095
],
"tag": "pizza"
}
I have added 2dsphere index to the "position" key
db.restaurants.createIndex( { 'position' : "2dsphere" } )
Query 1
uses $match aggregration pipeline operation to filter out the results based on "tag" key
db.restaurants.aggregate(
[
{
"$geoNear":{
"near": { type: "Point", coordinates: [ 55.8284,-4.207] },
"limit":100,
"maxDistance":10*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
},{
"$match":{"tag":"pizza"}
},
{
"$group":{"_id":null,"totalDocs":{"$sum":1}}
}
]
);
Query 2
Uses query inside the $geoNear aggregation operation to filter results based on "tag" key
db.restaurants.aggregate(
[
{
"$geoNear":{
"query" : {"tag":"pizza"}
"near": { type: "Point", coordinates: [ 55.8284,-4.207] },
"limit":100,
"maxDistance":10*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
},
{
"$group":{"_id":null,"totalDocs":{"$sum":1}}
}
]
);
The grouping option is just to get the count of documents returned by both the queries.
The totalDocs returned by both queries seem to be different.
Can someone explain me the differences between both the queries ?
Few assumptions:-
1. Assume there are 300 records that match based on the location.
2. Assume first set of 100 results do not have tag pizza. The rest 200 documents (101 to 300) have tag pizza
Query 1:-
There are 2 pipeline operations $geoNear and $match
The output of $geoNear pipeline operation is the input to $match
pipeline operation
$geoNear finds max of 100 results (limit we have specified) based on
the location sorted by nearest to far distance. (Note here that the
100 results retured are purely based on the location. So these 100
results do not contain any document with tag "pizza")
These 100 results are sent to the next pipeline operation $match from
where the filtering happens. But since the first set of 100 results
did not have tag pizza, the output is empty
Query 2:-
There is only 1 pipeline operation $geoNear
There is a query field included in the $geoNear pipeline operation
$geoNear finds max of 100 results (limit we have specified) based on
the location sorted by nearest to far distance and the query
tag=pizza
Now here the results from 101 to 200 are returned as output as the
query is included within the pipeline operation $geoNear. So in
simple sentence we say, find all documents with location [x,y] with
tag=pizza.
P.S : - The $group pipeline stage is added just for getting the count and hence have not written about it in the explaination
// If you have to apply multiple criteria to find locations then this query might helpful
const userLocations = await userModel.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [data.lon1,data.lat1]
},//set the univercity points
spherical: true,
distanceField: "calcDistance",
// maxDistance: 2400,//25km
"distanceMultiplier": 0.001,
}
},
{ $unwind: "$location" },
{ $match: {
"location": {
$geoWithin: {
$centerSphere: [
[ 73.780553, 18.503327], 20/ 6378.1 //check the user point is present here
]
}
}
}},
])
I am using a $geoNear as the first step in the aggregation framework. I need to filter out the results based on "tag" field and it works fine but I see there are 2 ways both giving different results.
Sample MongoDB Document
{
"position": [
40.80143,
-73.96095
],
"tag": "pizza"
}
I have added 2dsphere index to the "position" key
db.restaurants.createIndex( { 'position' : "2dsphere" } )
Query 1
uses $match aggregration pipeline operation to filter out the results based on "tag" key
db.restaurants.aggregate(
[
{
"$geoNear":{
"near": { type: "Point", coordinates: [ 55.8284,-4.207] },
"limit":100,
"maxDistance":10*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
},{
"$match":{"tag":"pizza"}
},
{
"$group":{"_id":null,"totalDocs":{"$sum":1}}
}
]
);
Query 2
Uses query inside the $geoNear aggregation operation to filter results based on "tag" key
db.restaurants.aggregate(
[
{
"$geoNear":{
"query" : {"tag":"pizza"}
"near": { type: "Point", coordinates: [ 55.8284,-4.207] },
"limit":100,
"maxDistance":10*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
},
{
"$group":{"_id":null,"totalDocs":{"$sum":1}}
}
]
);
The grouping option is just to get the count of documents returned by both the queries.
The totalDocs returned by both queries seem to be different.
Can someone explain me the differences between both the queries ?
Few assumptions:-
1. Assume there are 300 records that match based on the location.
2. Assume first set of 100 results do not have tag pizza. The rest 200 documents (101 to 300) have tag pizza
Query 1:-
There are 2 pipeline operations $geoNear and $match
The output of $geoNear pipeline operation is the input to $match
pipeline operation
$geoNear finds max of 100 results (limit we have specified) based on
the location sorted by nearest to far distance. (Note here that the
100 results retured are purely based on the location. So these 100
results do not contain any document with tag "pizza")
These 100 results are sent to the next pipeline operation $match from
where the filtering happens. But since the first set of 100 results
did not have tag pizza, the output is empty
Query 2:-
There is only 1 pipeline operation $geoNear
There is a query field included in the $geoNear pipeline operation
$geoNear finds max of 100 results (limit we have specified) based on
the location sorted by nearest to far distance and the query
tag=pizza
Now here the results from 101 to 200 are returned as output as the
query is included within the pipeline operation $geoNear. So in
simple sentence we say, find all documents with location [x,y] with
tag=pizza.
P.S : - The $group pipeline stage is added just for getting the count and hence have not written about it in the explaination
// If you have to apply multiple criteria to find locations then this query might helpful
const userLocations = await userModel.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [data.lon1,data.lat1]
},//set the univercity points
spherical: true,
distanceField: "calcDistance",
// maxDistance: 2400,//25km
"distanceMultiplier": 0.001,
}
},
{ $unwind: "$location" },
{ $match: {
"location": {
$geoWithin: {
$centerSphere: [
[ 73.780553, 18.503327], 20/ 6378.1 //check the user point is present here
]
}
}
}},
])
If I have documents like this:
{firstname:"Jordan", lastname:"Snyder", age:6, homelocation:[<longitude, latitude>]}
In the mongo shell, how do I all the "distinct" firstname's across matching documents of people who live near a specific point (say 1 mile)? I see mongo has a distinct db.collection.distinct(field, query), but all the samples I see for finding anything "near" or "geowithin" (using homelocation field in my case) is using db.collection.find. I don't want all documents, I just want the distinct list of firstnames.
The query parameter of distinct uses the same format as the query selector parameter of find. So assuming a 2dsphere index on homelocation you can do something like:
db.test.distinct('firstname', {
homelocation: {
$near: {
$geometry: { type: "Point", coordinates: [ -73.9667, 40.78 ] },
$maxDistance: 1600 // In meters
}
}
})
how to sort by proximity and date in mongoDB?
I tried this. But they just sort by date:
coll.find({'date':{$gte:date},'location':{$nearSphere:[lat,lng]}}).sort({'date':1}).execFind(function (err, docs) {})
I appreciate the help.
There's no direct way to use $near or $nearSphere and sort by another field, because both of these operators already sort the results of doing a find(). When you sort again by 'date', you're re-sorting the results. What you can do, however, is grab results from the $nearSphere incrementally, and sort each set of results. For example:
function sortByDate(a, b) { return a.date - b.date; }
// how many results to grab at a time
var itersize = 10;
// this will hold your final, two-way sorted results
var sorted_results = new Array();
for (var i=0, last=db.coll.count(); i<last-itersize; i+=itersize) {
var results = db.coll.find( {"date":{$gte:date},
// longitude, then latitude
"location":[lng, lat]} ).skip(i).limit(itersize).toArray();
// do date sorting app-side for each group of nearSphere-sorted results
sorted_results = sorted_results.concat( results.sort(sortByDate) );
}
You should also be aware of the order you specify geospatial coordinates in mongodb queries. MongoDB uses the geojson spec, which does coordinates in X, Y, Z order (i.e., longitude, latitude).
It's recommended to use $geoNear in an aggregate :
https://docs.mongodb.com/manual/reference/operator/aggregation/geoNear/
You can sort on your date and distance in the aggregate :
coll.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [ lng, lat ] },
key: "location",
spherical: true,
distanceField: "dist.calculated",
query: { "date": {"$gte": date} }
}
},
{$sort: {"dist.calculated":1, "date": 1}}
])