MongoDB: Try to get explanation on aggregation - mongodb

I try to get information on an aggregate I do on MongoDB 3.2.5 using {explain:true} in order to see if my indices are used :
db.mycollection.aggregate([{
"$geoNear": {
"near": {
type: "Point",
coordinates: [2.48043, 49.14128]
},
"spherical": true,
"distanceField": "distance",
"maxDistance": 500
}
}, {
"$match": {
"date": {
$gt: new ISODate("2016-01-01T01:01:01Z")
}
}
}, {
"$sort": {
"score": -1,
"distance": 1
}
}], {
explain: true
});
As a result I only got the stages aggregate :
{
"waitedMS": NumberLong(0),
"stages": [{
"$geoNear": {
"near": {
"type": "Point",
"coordinates": [
2.48043,
49.14128
]
},
"distanceField": "distance",
"limit": NumberLong(100),
"maxDistance": 500,
"query": {
},
"spherical": true,
"distanceMultiplier": 1
}
}, {
"$match": {
"date": {
"$gt": ISODate("2016-01-01T01:01:01Z")
}
}
}, {
"$sort": {
"sortKey": {
"score": -1,
"distance": 1
}
}
}],
"ok": 1
}
I don't have any information about doc scanned, index used, etc...
Can someone help me, please ?

as per manual
The operation returns a cursor with the document that contains
detailed information regarding the processing of the aggregation
pipeline. For example, the document may show, among other details,
which index, if any, the operation used.
So there is a kind of ambiguity, but probably if you swap $geoNear with $match - explain should show something more.

Related

How can I delete duplicates documents across multiple fields in Mongodb or Pymongo

I have billions of data including Geometry field in a collection, like this:
Doc1:
{
"_id": {
"$oid": "61ea9daff9a37e64d24099c2"
},
"mobile_ad_id": "6122d81b-750b-4cf4-9dc0-d779294f514a",
"Date": "2021-11-19",
"Time": "19:50:55",
"geometry": {
"type": "Point",
"coordinates": [72.910606, 19.09972]
},
"ipv_4": "103.251.50.0",
"publisher": "1077c92082522992f0adcd46b31a51eb"
}
Doc2:
{
"_id": {
"$oid": "61ea9daff9a37e64d24099c3"
},
"mobile_ad_id": "6122d81b-750b-4cf4-9dc0-d779294f514a",
"Date": "2021-11-19",
"Time": "19:50:55",
"geometry": {
"type": "Point",
"coordinates": [72.910606, 19.09972]
},
"ipv_4": "103.251.51.0",
"publisher": "1077c92082522992f0adcd46b31a53eb"
}
I need to find and delete the duplicate documents based on "mobile_ad_id", "Date", "Time", and "geometry".
So Instead of two docs I'll have only one documents.
I need to run this for billions of entries in the collection, so an optimized solution would be ideal.
Find out duplicate documents by using $group.
$slice the id_List and keep those id that you actually want to remove.
Using $limit to get part of data per aggregate.
Try to $remove part of data from previous action. Also make sure that those data are being removed correctly.
You can make the $limit of number bigger once you know how long those action will take. And the time spent is acceptable to you.
db.collection.aggregate([
{
$group: {
_id: {
mobile_ad_id: "$mobile_ad_id",
Date: "$Date",
Time: "$Time",
geometry: "$geometry"
},
id_List: { $push: "$_id" },
count: { $sum: 1 }
}
},
{
$match: { count: { $gt: 1 } }
},
{
$set: {
id_List: { $slice: [ "$id_List", { $subtract: [ { $size: "$id_List" }, 1 ] } ] }
}
},
{
$limit: 1000
}
])
mongoplayground
db.collection.remove( { _id: { $in: id_List } } )
I think you are working on IOT devices. Maybe you don't need to remove duplicates. You can share with me if there is some query bothering you. And those performance is bad due to duplicates documents.

MongoDB Conditional Projection

I would like to conditonally leave out fields in a response.
I have an aggregation query which uses geoNear to find the nearest POI and I would like to only retreive all the information if the distance between the query point and the POI is less than 500.
Let's assume I want to leave out "someField" if the distance is less than or equal to 500.
Here is what I have come up with:
db.pois.aggregate([
{
"$geoNear": {
"near": {
type: "Point",
coordinates: [49.607857, 6.129143]
},
"maxDistance": 0.5 * 1000,
"spherical": true,
"distanceField": "distance"
}
}, {
$project: {
_id:0,
"someField": {
$cond: [{$lte: ["$distance", 500]}, 1, 0 ]
}
}
}
]).pretty()
But instead of leaving the field out of the response, this query somehow replaces the value of "distance" with 0 or 1.
I would appreciate any help.
Starting in MongoDB 3.6, you can use the variable REMOVE in aggregation expressions to conditionally suppress a field.
$$REMOVE
Query:
db.pois
.aggregate([
{
$geoNear: {
near: {
type: "Point",
coordinates: [49.607857, 6.129143]
},
maxDistance: 0.5 * 1000,
spherical: true,
distanceField: "distance"
}
},
{
$project: {
_id: 0,
someField: {
$cond: {
if: { $lte: ["$distance", 500] },
then: "$$REMOVE",
else: "$distance"
}
}
}
}
])
.pretty();

How to use $geoNear with $lookup in mongodb aggregation

In my MongoDB aggregation query, I am using $lookup to join my offers collection with outlet collection. But, in my "outlets" collection, I have one field named location and i want the query to sort the results from closest to farthest of that location. So, how to use $geoNear with $lookup, any help would be appreciated? Below is my query:
db.offers.aggregate([
{
$geoNear: {
near: {
type: "Points",
coordinates: [
22,
77
]
},
distanceField: "distance",
maxDistance: 5000,
spherical: true
}
},
{
$match: {
$and: [
{
'totalDiscount': {
$gt: 40
}
},
{
'totalDiscount': {
$lt: 60
}
}
]
}
},
{
$unwind: "$storeUuid"
},
{
$lookup: {
from: "outlets",
localField: "storeUuid",
foreignField: "uuid",
as: "store"
}
},
{
$project: {
_id: 0,
location1: {
$arrayElemAt: [
"$store.location",
0
]
}
}
},
{
$addFields: {
'location.latitude': {
$ifNull: [
{
$arrayElemAt: [
"$location1.coordinates",
1
]
},
0
]
},
'location.longitude': {
$ifNull: [
{
$arrayElemAt: [
"$location1.coordinates",
0
]
},
0
]
}
}
},
{
$sort: {
location: 1
}
}
])
Offer data model
{
"offerId": "6e9d595a-16ad-4c6c-93d9-a7edc2bbb56f",
"brandUuid": [
"5b198438-8b4c-46f0-8cc2-6a938cb41d8e"
],
"storeUuid": [
"33ca653e-2af0-4728-b4a0-1178565c2b40",
"1b383916-8856-4f5a-8761-4bd4585e1d71"
],
"totalDiscount": 50
}
Outlet data model
{
"uuid": "20389cc1-2791-4d7b-a603-75b7abd6d48a",
"location": {
"type": "Point",
"coordinates": [
77.6504768,
12.9176082
]
}
},
EDIT: Based on Waqas Noor's answer
Actual Result
{
"offers": [
{
"uuid": "33ca653e-2af0-4728-b4a0-1178565c2b40",
"distance": 2780.7979952350124,
"offerId": "6e9d595a-16ad-4c6c-93d9-a7edc2bbb56f"
},
{
"uuid": "b4768792-a927-4d65-91a3-8ad67ad217b2",
"distance": 3930.1660094190306,
"offerId": "4f71fe98-cb43-4134-b360-b32017981de1"
},
{
"uuid": "1dbac2d2-b326-4d6d-8d74-9df99f35f542",
"distance": 3973.3702922423313,
"offerId": "070b916c-dd4d-42b4-b886-74318f576ffb"
},
{
"uuid": "20389cc1-2791-4d7b-a603-75b7abd6d48a",
"distance": 4107.770111767324,
"offerId": "0f037c18-a58f-4b03-b0f4-db8e2d971b74"
},
{
"uuid": "20389cc1-2791-4d7b-a603-75b7abd6d48a",
"distance": 4107.770111767324,
"offerId": "070b916c-dd4d-42b4-b886-74318f576ffb"
},
{
"uuid": "2f968cfa-1bf1-4344-bc73-998f4974f58a",
"distance": 4165.187832520325,
"offerId": "4f71fe98-cb43-4134-b360-b32017981de1"
},
{
"uuid": "3cc1461f-f29b-4744-a540-69d24ebb98a8",
"distance": 4262.636071210964,
"offerId": "0f037c18-a58f-4b03-b0f4-db8e2d971b74"
},
{
"uuid": "3cc1461f-f29b-4744-a540-69d24ebb98a8",
"distance": 4262.636071210964,
"offerId": "070b916c-dd4d-42b4-b886-74318f576ffb"
},
{
"uuid": "1b383916-8856-4f5a-8761-4bd4585e1d71",
"distance": 4361.786323018647,
"offerId": "6e9d595a-16ad-4c6c-93d9-a7edc2bbb56f"
},
{
"uuid": "7af0e1f8-d4d6-4700-adea-1df07a029f56",
"distance": 4564.666204168865,
"offerId": "8bbb5e27-89ff-417f-8312-f70e3911cb4c"
}
]
}
Expected Result
{
"offers": [
{
"uuid": "33ca653e-2af0-4728-b4a0-1178565c2b40",
"distance": 2780.7979952350124,
"offerId": "6e9d595a-16ad-4c6c-93d9-a7edc2bbb56f"
},
{
"uuid": "b4768792-a927-4d65-91a3-8ad67ad217b2",
"distance": 3930.1660094190306,
"offerId": "4f71fe98-cb43-4134-b360-b32017981de1"
},
{
"uuid": "1dbac2d2-b326-4d6d-8d74-9df99f35f542",
"distance": 3973.3702922423313,
"offerId": "070b916c-dd4d-42b4-b886-74318f576ffb"
},
{
"uuid": "20389cc1-2791-4d7b-a603-75b7abd6d48a",
"distance": 4107.770111767324,
"offerId": "0f037c18-a58f-4b03-b0f4-db8e2d971b74"
},
{
"uuid": "2f968cfa-1bf1-4344-bc73-998f4974f58a",
"distance": 4165.187832520325,
"offerId": "4f71fe98-cb43-4134-b360-b32017981de1"
},
{
"uuid": "3cc1461f-f29b-4744-a540-69d24ebb98a8",
"distance": 4262.636071210964,
"offerId": "0f037c18-a58f-4b03-b0f4-db8e2d971b74"
},
{
"uuid": "1b383916-8856-4f5a-8761-4bd4585e1d71",
"distance": 4361.786323018647,
"offerId": "6e9d595a-16ad-4c6c-93d9-a7edc2bbb56f"
},
{
"uuid": "7af0e1f8-d4d6-4700-adea-1df07a029f56",
"distance": 4564.666204168865,
"offerId": "8bbb5e27-89ff-417f-8312-f70e3911cb4c"
}
]
}
1) You need to have 2dsphare index on outlet collection on field location.
You can make one using:
db.outlet.createIndex( {location : "2dsphere" } )
2) You have to run aggregation on outlet collection since it contains the location field and you can only use $geoNear as first stage of pipeline.
Your query will look like
db.outlet.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [ 77.6504768,
12.9176088] },
distanceField: "distance",
includeLocs: "location",
spherical: true
}
}])
3) Then you can combine the offers in your outlets using $lookup Operator.
Your complete query will look something like
db.outlet.aggregate([
{
$geoNear: {
near: {
type: "Point", coordinates: [77.6504768,
12.9176088]
},
distanceField: "distance",
includeLocs: "location",
spherical: true
}
},
{ $project: { uuid: 1, distance: 1 } },
{
$lookup: {
from: "offers",
localField: "uuid",
foreignField: "storeUuid",
as: "offers"
}
},
{ $unwind: '$offers' },
{
$match: {
'offers.totalDiscount': {
$gt: 40,
$lt: 60
}
}
},
{ $sort: { distance: -1 } }
])

Is there a way to skip/bypass a Mongodb aggregation stage

I am having a problem with MongoDB query. My query to MongoDB database is:
Places.aggregate([
Stage-A
{ "$geoNear": {
"near": {
"type": "Point",
"coordinates": coord
},
"spherical": true,
"maxDistance" : maxDistance,
"query": {tags:{$all:tags}},
"limit": resultLimit,
"distanceField": "distance"
}
},
{Stage-B = check is stage-A have found any documents},
{Stage-C},
{Stage-D}
])
I want to check If the stage-A has found any results, if not, I want to move to stage-C with a different query. If Stage-A finds results, then I will directly move to final stage-D. My hypothetical aggregate query looks like this :
Places.aggregate([
Stage-A
{
"$geoNear": {
"near": {
"type": "Point",
"coordinates": coord
},
"spherical": true,
"maxDistance" : maxDistance,
"query": {tags:{$all:tags}},
"limit": resultLimit,
"distanceField": "distance"
}
},
Stage-B
{
In this stage-B, I want to check if the stage-A has found any results,
if results, I want to move to the last stage-D, otherwise stage-C then finally Stage-D
},
Stage-C
{
"$geoNear": {
"near": {
"type": "Point",
"coordinates": coord
},
"spherical": true,
"maxDistance" : maxDistance,
"query": {tags:{$in:tags}}, >> different query
"limit": resultLimit,
"distanceField": "distance"
}
},
Stage-D
{
$project:{
}
}
])
I was wondering if this logical flow is possible in MongoDB aggregate or something else. If so, what would be the stage-B? If not, what is the best way to solve this problem? I wanted to do all of these in a single request, so I can save some CPU cycles.

Running mongodb $geoWithin without a [long,lat] array

I have a mongodb $geoWithin query as followed
db.test.find(
{
'loc': {
$geoWithin: {
$geometry: {
type : "Polygon" ,
coordinates: [[list of co-ordinates]]
}
}
}
}
);
So here the query runs on the loc field which is a array of lng, lat values but fortunately in my data, lat and lng values are in 2 different fields like
{
lat:12,
long:122
}
In this case how can I run the above query?
The best thing you can do really is to tranform your documents to store the data better. By preference you should probably go for GeoJSON format. But more later.
Fortunately since $geoWithin does not actually "require" an index ( but it really still is the better option to have one ) you can actually do the transformation "on the fly" with the aggregation framework instead:
Transform "On the Fly"
Hoping you have at least MongoDB 2.6, there is $map:
db.collection.aggregate([
// Tranform to array
{ "$project": {
"location": {
"$map": {
"input": ["lng","lat"],
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "lng" ] },
"$long",
"$lat"
}
}
}
}
}},
// Then match
{ "$match": {
"location": {
"$geoWithin": {
"$geometry": {
"type": "Polygon" ,
"coordinates": [[list of co-ordinates]]
}
}
}
}}
])
MongoDB 3.2 has a much simpler syntax:
db.collection.aggregate([
// Tranform to array - pretty simple huh!
{ "$project": {
"location": ["$long","$lat"]
}},
// Then match
{ "$match": {
"location": {
"$geoWithin": {
"$geometry": {
"type": "Polygon" ,
"coordinates": [[list of co-ordinates]]
}
}
}
}}
])
Or if you still have MongoDB 2.4 - Upgrade! Okay, use this then:
db.collection.aggregate([
// Add an array field
{ "$project": {
"long": 1,
"lat": 1
"location": { "$const": [ "A", "B" ] }
}},
// Unwind it
{ "$unwind": "$location" },
// Group back and map it!
{ "$group": {
"_id": "$_id",
"location": {
"$push": {
"$cond": [
{ "$eq": [ "$location", "A" ] },
"$long",
"$lat"
]
}
}
}},
// Then match
{ "$match": {
"location": {
"$geoWithin": {
"$geometry": {
"type": "Polygon" ,
"coordinates": [[list of co-ordinates]]
}
}
}
}}
])
Tranform "Permanently"
But really the best case is to change the document structure permanantly. The modern way to do this is in bulk with something like:
var ops = [];
db.collection.find({}).forEach(function(doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": {
"location": {
"type": "Point",
"coordinates": [doc.long,doc.lat]
}
},
"$unset": { "long": "", "lat": "" }
}
}
});
// Send once in 1000 only
if ( ops.length % 1000 == 0 ) {
db.collection.bulkWrite(ops);
ops = [];
}
})
// Clear remaining queue
if ( ops.length > 0 )
db.collection.bulkWrite(ops);
But generally speaking, loop the source documents and update each one to create the new "location" field. Then of course "index" it:
db.collection.createIndex({ "location": "2dsphere" })
And now the documents look like that and actually have an index, you can use regular $geoWithin queries directly which will also work faster from the present indexed data.