Mongo $geoNear not searching in all records - mongodb

I'm wondering if anyone can help me solve a problem with this query.
I'm trying to query all my items with a $geoNear operator but with a very large maxDistance it doesn't seem to search in all records.
The logs show this error "Too many geoNear results for query" which apparently means that the query hit the 16MB limit, but the output is only 20 records and claims the total is 1401 where I would expect 17507 as total.
The average record is 12345 bytes. At 1401 records it stops because it hit 16MB limit.
How can I run this query so that it returns the first 20 results taken from the entire pool of items?
This is the query I'm running:
db.getCollection('items').aggregate([
{
"$geoNear": {
"near": {
"type": "Point",
"coordinates": [
10,
30
]
},
"minDistance": 0,
"maxDistance": 100000,
"spherical": true,
"distanceField": "location",
"limit": 100000
}
},
{
"$sort": {
"createdAt": -1
}
},
{
"$facet": {
"results": [
{
"$skip": 0
},
{
"$limit": 20
}
],
"total": [
{
"$count": "total"
}
]
}
}
])
This is the output of the query (and the error is added to the log):
{
"results" : [
// 20 items
],
"total" : [
{
"total" : 1401
}
]
}

I changed my query to use a separate find() and count() call. The facet was severely slowing down the query and since it really isn't a complex query, there was no reason to use an aggregate.
I initially used the aggregate because it made sense to do 1 db call instead of multiple and with $facet you'd have built in paging with a total count but it 1 aggregate call took 600ms where as now a find() and count() call take 20ms.
The 16MB limit is also no problem anymore.

Related

Mongo Query Time Duration

I have a mongo query like this
aggregate([
{
"$geoNear": {
"near": { "type": "Point", "coordinates": [ 35.709770973477255 , 51.404043066431775 ] },
"distanceField": "dist.calculated",
"maxDistance": 5000,
"spherical": True,
"query": { "active_delivery_categories": { "$in" : ["biker"]}, "ban_status": False},
"num": 100
}
}
])
when i add new filed to my query like this:
"query": { "availability_status":"idle","active_delivery_categories": { "$in" : ["biker"]}, "ban_status": False}
response time grow doubles
it happen just on this filed
for example on this query it not happen!
"query": { "city":"London","active_delivery_categories": { "$in" : ["biker"]}, "ban_status": False}
do you have any idea?
One possibility that I can think of is as follows... your original query is able to make use of an appropriate index, and is therefore fast. With the added condition, it is not able to use the index, and is therefore slower.
Look at the .explain() output to see if this is so.
But then, I also notice that you have said it happens only by adding this field. Are you sure of that?

MongoDB Different response on different environment[Aggregate] [duplicate]

I am using a $geoNear as the first step in the aggregation framework. I need to filter out the results based on "tag" field and it works fine but I see there are 2 ways both giving different results.
Sample MongoDB Document
{
"position": [
40.80143,
-73.96095
],
"tag": "pizza"
}
I have added 2dsphere index to the "position" key
db.restaurants.createIndex( { 'position' : "2dsphere" } )
Query 1
uses $match aggregration pipeline operation to filter out the results based on "tag" key
db.restaurants.aggregate(
[
{
"$geoNear":{
"near": { type: "Point", coordinates: [ 55.8284,-4.207] },
"limit":100,
"maxDistance":10*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
},{
"$match":{"tag":"pizza"}
},
{
"$group":{"_id":null,"totalDocs":{"$sum":1}}
}
]
);
Query 2
Uses query inside the $geoNear aggregation operation to filter results based on "tag" key
db.restaurants.aggregate(
[
{
"$geoNear":{
"query" : {"tag":"pizza"}
"near": { type: "Point", coordinates: [ 55.8284,-4.207] },
"limit":100,
"maxDistance":10*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
},
{
"$group":{"_id":null,"totalDocs":{"$sum":1}}
}
]
);
The grouping option is just to get the count of documents returned by both the queries.
The totalDocs returned by both queries seem to be different.
Can someone explain me the differences between both the queries ?
Few assumptions:-
1. Assume there are 300 records that match based on the location.
2. Assume first set of 100 results do not have tag pizza. The rest 200 documents (101 to 300) have tag pizza
Query 1:-
There are 2 pipeline operations $geoNear and $match
The output of $geoNear pipeline operation is the input to $match
pipeline operation
$geoNear finds max of 100 results (limit we have specified) based on
the location sorted by nearest to far distance. (Note here that the
100 results retured are purely based on the location. So these 100
results do not contain any document with tag "pizza")
These 100 results are sent to the next pipeline operation $match from
where the filtering happens. But since the first set of 100 results
did not have tag pizza, the output is empty
Query 2:-
There is only 1 pipeline operation $geoNear
There is a query field included in the $geoNear pipeline operation
$geoNear finds max of 100 results (limit we have specified) based on
the location sorted by nearest to far distance and the query
tag=pizza
Now here the results from 101 to 200 are returned as output as the
query is included within the pipeline operation $geoNear. So in
simple sentence we say, find all documents with location [x,y] with
tag=pizza.
P.S : - The $group pipeline stage is added just for getting the count and hence have not written about it in the explaination
// If you have to apply multiple criteria to find locations then this query might helpful
const userLocations = await userModel.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [data.lon1,data.lat1]
},//set the univercity points
spherical: true,
distanceField: "calcDistance",
// maxDistance: 2400,//25km
"distanceMultiplier": 0.001,
}
},
{ $unwind: "$location" },
{ $match: {
"location": {
$geoWithin: {
$centerSphere: [
[ 73.780553, 18.503327], 20/ 6378.1 //check the user point is present here
]
}
}
}},
])

MongoDB $geoNear aggregation pipeline (using query option and using $match pipeline operation) giving different no of results

I am using a $geoNear as the first step in the aggregation framework. I need to filter out the results based on "tag" field and it works fine but I see there are 2 ways both giving different results.
Sample MongoDB Document
{
"position": [
40.80143,
-73.96095
],
"tag": "pizza"
}
I have added 2dsphere index to the "position" key
db.restaurants.createIndex( { 'position' : "2dsphere" } )
Query 1
uses $match aggregration pipeline operation to filter out the results based on "tag" key
db.restaurants.aggregate(
[
{
"$geoNear":{
"near": { type: "Point", coordinates: [ 55.8284,-4.207] },
"limit":100,
"maxDistance":10*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
},{
"$match":{"tag":"pizza"}
},
{
"$group":{"_id":null,"totalDocs":{"$sum":1}}
}
]
);
Query 2
Uses query inside the $geoNear aggregation operation to filter results based on "tag" key
db.restaurants.aggregate(
[
{
"$geoNear":{
"query" : {"tag":"pizza"}
"near": { type: "Point", coordinates: [ 55.8284,-4.207] },
"limit":100,
"maxDistance":10*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
},
{
"$group":{"_id":null,"totalDocs":{"$sum":1}}
}
]
);
The grouping option is just to get the count of documents returned by both the queries.
The totalDocs returned by both queries seem to be different.
Can someone explain me the differences between both the queries ?
Few assumptions:-
1. Assume there are 300 records that match based on the location.
2. Assume first set of 100 results do not have tag pizza. The rest 200 documents (101 to 300) have tag pizza
Query 1:-
There are 2 pipeline operations $geoNear and $match
The output of $geoNear pipeline operation is the input to $match
pipeline operation
$geoNear finds max of 100 results (limit we have specified) based on
the location sorted by nearest to far distance. (Note here that the
100 results retured are purely based on the location. So these 100
results do not contain any document with tag "pizza")
These 100 results are sent to the next pipeline operation $match from
where the filtering happens. But since the first set of 100 results
did not have tag pizza, the output is empty
Query 2:-
There is only 1 pipeline operation $geoNear
There is a query field included in the $geoNear pipeline operation
$geoNear finds max of 100 results (limit we have specified) based on
the location sorted by nearest to far distance and the query
tag=pizza
Now here the results from 101 to 200 are returned as output as the
query is included within the pipeline operation $geoNear. So in
simple sentence we say, find all documents with location [x,y] with
tag=pizza.
P.S : - The $group pipeline stage is added just for getting the count and hence have not written about it in the explaination
// If you have to apply multiple criteria to find locations then this query might helpful
const userLocations = await userModel.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [data.lon1,data.lat1]
},//set the univercity points
spherical: true,
distanceField: "calcDistance",
// maxDistance: 2400,//25km
"distanceMultiplier": 0.001,
}
},
{ $unwind: "$location" },
{ $match: {
"location": {
$geoWithin: {
$centerSphere: [
[ 73.780553, 18.503327], 20/ 6378.1 //check the user point is present here
]
}
}
}},
])

MongoDB geoNear query on legacy data returns inaccurate distance

I am experimenting with mongoDB geoNear query. I have a Collection with one document inside:
{
"_id": {
"$oid": "583d169df18ef10012ae8345"
},
"location": {
"loc": [
103.7652117,
1.3150887
],
"name": "",
"_id": {
"$oid": "583d169df18ef10012ae8346"
}
},
}
The location.loc field is 2d indexed. Then I used a geoNear query on mongoDB command
{
"geoNear": "users",
"near": [103.761614, 1.3172431],
"num": 10
}
The result distance returned from mongoDB is 0.004193433515629152 radians which corresponds to more than 20 km. However, these 2 coordinates are just 0.5 km apart. Is there anything I have done wrong? I know it must be some very silly thing, but I just couldn't figure out.
If you want to query related to $geoWithin or $centerSphere or $geoNear location structure like this only:-
"location" : {
"lng" : 77.15319738236303,
"lat" : 28.434568229025803
},
And then run your query. It will give you accurate result.

MongoDB - Querying between a time range of hours

I have a MongoDB datastore set up with location data stored like this:
{
"_id" : ObjectId("51d3e161ce87bb000792dc8d"),
"datetime_recorded" : ISODate("2013-07-03T05:35:13Z"),
"loc" : {
"coordinates" : [
0.297716,
18.050614
],
"type" : "Point"
},
"vid" : "11111-22222-33333-44444"
}
I'd like to be able to perform a query similar to the date range example but instead on a time range. i.e. Retrieve all points recorded between 12AM and 4PM (can be done with 1200 and 1600 24 hour time as well).
e.g.
With points:
"datetime_recorded" : ISODate("2013-05-01T12:35:13Z"),
"datetime_recorded" : ISODate("2013-06-20T05:35:13Z"),
"datetime_recorded" : ISODate("2013-01-17T07:35:13Z"),
"datetime_recorded" : ISODate("2013-04-03T15:35:13Z"),
a query
db.points.find({'datetime_recorded': {
$gte: Date(1200 hours),
$lt: Date(1600 hours)}
});
would yield only the first and last point.
Is this possible? Or would I have to do it for every day?
Well, the best way to solve this is to store the minutes separately as well. But you can get around this with the aggregation framework, although that is not going to be very fast:
db.so.aggregate( [
{ $project: {
loc: 1,
vid: 1,
datetime_recorded: 1,
minutes: { $add: [
{ $multiply: [ { $hour: '$datetime_recorded' }, 60 ] },
{ $minute: '$datetime_recorded' }
] }
} },
{ $match: { 'minutes' : { $gte : 12 * 60, $lt : 16 * 60 } } }
] );
In the first step $project, we calculate the minutes from hour * 60 + min which we then match against in the second step: $match.
Adding an answer since I disagree with the other answers in that even though there are great things you can do with the aggregation framework, this really is not an optimal way to perform this type of query.
If your identified application usage pattern is that you rely on querying for "hours" or other times of the day without wanting to look at the "date" part, then you are far better off storing that as a numeric value in the document. Something like "milliseconds from start of day" would be granular enough for as many purposes as a BSON Date, but of course gives better performance without the need to compute for every document.
Set Up
This does require some set-up in that you need to add the new fields to your existing documents and make sure you add these on all new documents within your code. A simple conversion process might be:
MongoDB 4.2 and upwards
This can actually be done in a single request due to aggregation operations being allowed in "update" statements now.
db.collection.updateMany(
{},
[{ "$set": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}}]
)
Older MongoDB
var batch = [];
db.collection.find({ "timeOfDay": { "$exists": false } }).forEach(doc => {
batch.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": {
"timeOfDay": doc.datetime_recorded.valueOf() % (60 * 60 * 24 * 1000)
}
}
}
});
// write once only per reasonable batch size
if ( batch.length >= 1000 ) {
db.collection.bulkWrite(batch);
batch = [];
}
})
if ( batch.length > 0 ) {
db.collection.bulkWrite(batch);
batch = [];
}
If you can afford to write to a new collection, then looping and rewriting would not be required:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$subtract": [ "$datetime_recorded", Date(0) ] },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
Or with MongoDB 4.0 and upwards:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
All using the same basic conversion of:
1000 milliseconds in a second
60 seconds in a minute
60 minutes in an hour
24 hours a day
The modulo from the numeric milliseconds since epoch which is actually the value internally stored as a BSON date is the simple thing to extract as the current milliseconds in the day.
Query
Querying is then really simple, and as per the question example:
db.collection.find({
"timeOfDay": {
"$gte": 12 * 60 * 60 * 1000, "$lt": 16 * 60 * 60 * 1000
}
})
Of course using the same time scale conversion from hours into milliseconds to match the stored format. But just like before you can make this whatever scale you actually need.
Most importantly, as real document properties which don't rely on computation at run-time, you can place an index on this:
db.collection.createIndex({ "timeOfDay": 1 })
So not only is this negating run-time overhead for calculating, but also with an index you can avoid collection scans as outlined on the linked page on indexing for MongoDB.
For optimal performance you never want to calculate such things as in any real world scale it simply takes an order of magnitude longer to process all documents in the collection just to work out which ones you want than to simply reference an index and only fetch those documents.
The aggregation framework may just be able to help you rewrite the documents here, but it really should not be used as a production system method of returning such data. Store the times separately.