MongoDB Different response on different environment[Aggregate] [duplicate] - mongodb

I am using a $geoNear as the first step in the aggregation framework. I need to filter out the results based on "tag" field and it works fine but I see there are 2 ways both giving different results.
Sample MongoDB Document
{
"position": [
40.80143,
-73.96095
],
"tag": "pizza"
}
I have added 2dsphere index to the "position" key
db.restaurants.createIndex( { 'position' : "2dsphere" } )
Query 1
uses $match aggregration pipeline operation to filter out the results based on "tag" key
db.restaurants.aggregate(
[
{
"$geoNear":{
"near": { type: "Point", coordinates: [ 55.8284,-4.207] },
"limit":100,
"maxDistance":10*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
},{
"$match":{"tag":"pizza"}
},
{
"$group":{"_id":null,"totalDocs":{"$sum":1}}
}
]
);
Query 2
Uses query inside the $geoNear aggregation operation to filter results based on "tag" key
db.restaurants.aggregate(
[
{
"$geoNear":{
"query" : {"tag":"pizza"}
"near": { type: "Point", coordinates: [ 55.8284,-4.207] },
"limit":100,
"maxDistance":10*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
},
{
"$group":{"_id":null,"totalDocs":{"$sum":1}}
}
]
);
The grouping option is just to get the count of documents returned by both the queries.
The totalDocs returned by both queries seem to be different.
Can someone explain me the differences between both the queries ?

Few assumptions:-
1. Assume there are 300 records that match based on the location.
2. Assume first set of 100 results do not have tag pizza. The rest 200 documents (101 to 300) have tag pizza
Query 1:-
There are 2 pipeline operations $geoNear and $match
The output of $geoNear pipeline operation is the input to $match
pipeline operation
$geoNear finds max of 100 results (limit we have specified) based on
the location sorted by nearest to far distance. (Note here that the
100 results retured are purely based on the location. So these 100
results do not contain any document with tag "pizza")
These 100 results are sent to the next pipeline operation $match from
where the filtering happens. But since the first set of 100 results
did not have tag pizza, the output is empty
Query 2:-
There is only 1 pipeline operation $geoNear
There is a query field included in the $geoNear pipeline operation
$geoNear finds max of 100 results (limit we have specified) based on
the location sorted by nearest to far distance and the query
tag=pizza
Now here the results from 101 to 200 are returned as output as the
query is included within the pipeline operation $geoNear. So in
simple sentence we say, find all documents with location [x,y] with
tag=pizza.
P.S : - The $group pipeline stage is added just for getting the count and hence have not written about it in the explaination

// If you have to apply multiple criteria to find locations then this query might helpful
const userLocations = await userModel.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [data.lon1,data.lat1]
},//set the univercity points
spherical: true,
distanceField: "calcDistance",
// maxDistance: 2400,//25km
"distanceMultiplier": 0.001,
}
},
{ $unwind: "$location" },
{ $match: {
"location": {
$geoWithin: {
$centerSphere: [
[ 73.780553, 18.503327], 20/ 6378.1 //check the user point is present here
]
}
}
}},
])

Related

Mongodb geospacial aggregate returns different results using $match vs query

db.units.aggregate([
{
"$geoNear": {
"near": {
"type": "Point",
"coordinates": [ -3.70256, 40.4165 ]
},
"distanceField": "dist.calculated",
"spherical": true,
"maxDistance": 50000
}
},
{
$match: {
"some.field.a": true,
"otherField": null
}
}
]).explain("executionStats");
Gives me:
nReturned: 671,
executionTimeMillis: 8,
totalKeysExamined: 770,
totalDocsExamined: 671,
However:
db.units.aggregate([
{
"$geoNear": {
"near": {
"type": "Point",
"coordinates": [ -3.70256, 40.4165 ]
},
"distanceField": "dist.calculated",
"spherical": true,
"maxDistance": 50000,
"query": {
"some.field.a": true,
"otherField": null
}
}
}
]).explain("executionStats");
Gives me:
nReturned: 67,
executionTimeMillis: 6,
totalKeysExamined: 770,
totalDocsExamined: 1342,
The first question which comes to my mind is, why the number of returned documents is different?
The second one is, why the totalDocsExamined is higher when using query of $geoNear?
Updated
When query field of $geoNear is used, there is a COLLSCAN to find all documents matching the query filter. Unless you create a compound index with all fields:
db.units.createIndex({coordinates:'2dsphere', 'some.field.': 1, otherField:1 )
So it seems like the behavior in case of using query is by default a COLLSCAN except if you have a compounded index with the geospatial field plus the ones included in query.
Reason is that query param of geoNear decides the number of docs examined.
Limits the results to the documents that match the query. The query syntax is the usual MongoDB read operation query syntax.
In your first case, it's considered as pipeline. geoNear executes first then match stage. Hence the number changes.

Mongo $geoNear not searching in all records

I'm wondering if anyone can help me solve a problem with this query.
I'm trying to query all my items with a $geoNear operator but with a very large maxDistance it doesn't seem to search in all records.
The logs show this error "Too many geoNear results for query" which apparently means that the query hit the 16MB limit, but the output is only 20 records and claims the total is 1401 where I would expect 17507 as total.
The average record is 12345 bytes. At 1401 records it stops because it hit 16MB limit.
How can I run this query so that it returns the first 20 results taken from the entire pool of items?
This is the query I'm running:
db.getCollection('items').aggregate([
{
"$geoNear": {
"near": {
"type": "Point",
"coordinates": [
10,
30
]
},
"minDistance": 0,
"maxDistance": 100000,
"spherical": true,
"distanceField": "location",
"limit": 100000
}
},
{
"$sort": {
"createdAt": -1
}
},
{
"$facet": {
"results": [
{
"$skip": 0
},
{
"$limit": 20
}
],
"total": [
{
"$count": "total"
}
]
}
}
])
This is the output of the query (and the error is added to the log):
{
"results" : [
// 20 items
],
"total" : [
{
"total" : 1401
}
]
}
I changed my query to use a separate find() and count() call. The facet was severely slowing down the query and since it really isn't a complex query, there was no reason to use an aggregate.
I initially used the aggregate because it made sense to do 1 db call instead of multiple and with $facet you'd have built in paging with a total count but it 1 aggregate call took 600ms where as now a find() and count() call take 20ms.
The 16MB limit is also no problem anymore.

MongoDB $geoNear operation "near" field usage

I have a collection "machine" with document
{
"_id" : "ac9d1db9-6a0d-47c6-97d3-a613c8dd0031",
"pin" : {
"machine":"test1",
"position" : [
-1.5716,
54.7732
]
}
}
Note: -1.5716 is lng and 54.7732 is lat
I have created a 2dsphere index on the document
db.machine.createIndex( { 'pin.position' : "2dsphere" } )
I try with 2 different versions of query (only difference in query is the near field in geoNear pipeline stage)
Query 1:
db.machine.aggregate(
[
{
"$geoNear":{
"near": [-0.2129092,51.5031594],
"limit":100,
"maxDistance":500*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
}
]
)
Note: -0.2129092 is lng and 51.5031594 is lat
Query 2
db.machine.aggregate(
[
{
"$geoNear":{
"near": { type: "Point", coordinates: [-0.2129092,51.5031594] },
"limit":100,
"maxDistance":500*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
}
]
)
Note: -0.2129092 is lng and 51.5031594 is lat
Query 1 returns me the document and provides that this document is 5.88161133560063e-05 Kms away from the search co-ordinates
Query 2 returns me the document and provides that this document is 375.135052595944 Kms away from the search co-ordinates
I cross-verify the distance between these lng/lat on a site http://andrew.hedges.name/experiments/haversine/ and see that the distance between the document and the search co-ordinates is around 374.835 Kms
It seems Query 2 is providing the correct result but am not sure as to what is the difference between Query 1 and Query 2 and if I am using it incorrectly.
Please advise
Query 1 provides the distance in legacy co-ordinate pairs and Query 2 provices the distance in meters (GeoJSON) and hence both queries are using different units
Please check the following link https://jira.mongodb.org/browse/SERVER-16652?jql=text%20~%20%22geoNear%22

MongoDB $geoNear aggregation pipeline (using query option and using $match pipeline operation) giving different no of results

I am using a $geoNear as the first step in the aggregation framework. I need to filter out the results based on "tag" field and it works fine but I see there are 2 ways both giving different results.
Sample MongoDB Document
{
"position": [
40.80143,
-73.96095
],
"tag": "pizza"
}
I have added 2dsphere index to the "position" key
db.restaurants.createIndex( { 'position' : "2dsphere" } )
Query 1
uses $match aggregration pipeline operation to filter out the results based on "tag" key
db.restaurants.aggregate(
[
{
"$geoNear":{
"near": { type: "Point", coordinates: [ 55.8284,-4.207] },
"limit":100,
"maxDistance":10*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
},{
"$match":{"tag":"pizza"}
},
{
"$group":{"_id":null,"totalDocs":{"$sum":1}}
}
]
);
Query 2
Uses query inside the $geoNear aggregation operation to filter results based on "tag" key
db.restaurants.aggregate(
[
{
"$geoNear":{
"query" : {"tag":"pizza"}
"near": { type: "Point", coordinates: [ 55.8284,-4.207] },
"limit":100,
"maxDistance":10*1000,
"distanceField": "dist.calculated",
"includeLocs": "dist.location",
"distanceMultiplier":1/1000,
"spherical": true
}
},
{
"$group":{"_id":null,"totalDocs":{"$sum":1}}
}
]
);
The grouping option is just to get the count of documents returned by both the queries.
The totalDocs returned by both queries seem to be different.
Can someone explain me the differences between both the queries ?
Few assumptions:-
1. Assume there are 300 records that match based on the location.
2. Assume first set of 100 results do not have tag pizza. The rest 200 documents (101 to 300) have tag pizza
Query 1:-
There are 2 pipeline operations $geoNear and $match
The output of $geoNear pipeline operation is the input to $match
pipeline operation
$geoNear finds max of 100 results (limit we have specified) based on
the location sorted by nearest to far distance. (Note here that the
100 results retured are purely based on the location. So these 100
results do not contain any document with tag "pizza")
These 100 results are sent to the next pipeline operation $match from
where the filtering happens. But since the first set of 100 results
did not have tag pizza, the output is empty
Query 2:-
There is only 1 pipeline operation $geoNear
There is a query field included in the $geoNear pipeline operation
$geoNear finds max of 100 results (limit we have specified) based on
the location sorted by nearest to far distance and the query
tag=pizza
Now here the results from 101 to 200 are returned as output as the
query is included within the pipeline operation $geoNear. So in
simple sentence we say, find all documents with location [x,y] with
tag=pizza.
P.S : - The $group pipeline stage is added just for getting the count and hence have not written about it in the explaination
// If you have to apply multiple criteria to find locations then this query might helpful
const userLocations = await userModel.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [data.lon1,data.lat1]
},//set the univercity points
spherical: true,
distanceField: "calcDistance",
// maxDistance: 2400,//25km
"distanceMultiplier": 0.001,
}
},
{ $unwind: "$location" },
{ $match: {
"location": {
$geoWithin: {
$centerSphere: [
[ 73.780553, 18.503327], 20/ 6378.1 //check the user point is present here
]
}
}
}},
])

Mongo ordering a $near with another secondary sort

I have a list of shops they have a useCount and a geolocation.
How would I search and order by useCount but also have a property on each object returned signifying how close they are to me.
schema:
{
name: String,
useCount: { type: Number, index: true },
location: { 'type': {type: String, enum: "Point", default: "Point"}, coordinates: { type: [Number], default: [0,0]} }
}
e.g results
shop1 usecount-12 closest-3 geo-1333.222,222.222
shop2 usecount-3 closest-1 geo-1333.222,222.222
shop3 usecount-1 closest-2 geo-1333.222,222.222
Presuming your data is actually properly arranged for MongoDB and looks something like this:
{
"shop": 1,
"usecount": 12,
"closest": 3,
"geo": {
"type": "Point",
"coordinates": [1333.222,222.222]
}
}
And your coordinates are in fact in "longitude/latitude" order as is requireed from GeoJSON and MongoDB and that you have a geospatial index that is "2dsphere", then your best option for "composite sort" is using the $geoNear aggregate command pipeline, along with aggregation $sort:
Model.aggregate(
[
{ "$geoNear": {
"near": {
"type": "Point",
"coordinates": [1333.222,222.222]
},
"distanceField": "dist",
"spherical": true
}},
{ "$sort": { "dist": 1, "usecount": -1 } }
],
function(err,results) {
}
)
Where the $geoNear projects the "distance" as the nominated field here in "dist", and then you use that in the $sort along with the other field "usecount" as shown here in descending order for the "largest" value if "usecount" first, and within each "dist" already sorted.
The aggregation framework though .aggregate() does more than just "aggregate" documents. It is your "main tool" for projecting new values into a document, useful for such things as sorting results by values that "calculate" by one means or the other.
Unlike $near ( or $nearSphere ) the distance is returned as a true field in the document rather than just a "default" sort order. This allows that key to be used in the sorted results, along with any other field value present or projected into the document at the $sort stage.
Also noting that your data here does not appear to be valid spherical coordinates, which is going to cause problems with GeoJSON storage and also a "2dsphere" index. If not real global coordinates but coordinates on a "plane, then just use a plane legacy array for "geo" as [1333.222,222.222] and a "2d" index only. As well the argument to "near" within $geoNear is simply an array as well, and the "spherical" option would then not be required.
But possibly a problem with typing in your question as well.