We have a query which gets the min and max latitude/longitude. We use aggregation query for this. We have 2 million documents.
We are getting the below error while running aggregation query. How can we fix this? Will the performance degrade if we use allowDiskUse:true? Or can we add some index which can fix this issue?
2021-04-02T23:57:16.682+0000 I COMMAND [conn2829719] command loc-service.locations command: aggregate { aggregate: "locations", pipeline: [ { $match: { customerId: "8047380094" } }, { $unwind: "$outdoorLocationInfo.location.coordinates" }, { $group: { _id: "$_id", longitude: { $first: "$outdoorLocationInfo.location.coordinates" }, latitude: { $last: "$outdoorLocationInfo.location.coordinates" } } }, { $group: { _id: null, minLongitude: { $min: "$longitude" }, maxLongitude: { $max: "$longitude" }, minLatitude: { $min: "$latitude" }, maxLatitude: { $max: "$latitude" } } } ], cursor: {}, allowDiskUse: false, $db: "loc-service", $clusterTime: { clusterTime: Timestamp(1617407827, 2), signature: { hash: BinData(0, F980F28628AF21C214BD2D3F4B7C48F56ACB47BD), keyId: 6914764447386959875 } }, lsid: { id: UUID("a6e20fee-7714-4460-bdc8-2019425c7ff0") } } planSummary: IXSCAN { customerId: 1, deviceId: 1 } numYields:7900 ok:0 errMsg:"Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in." errName:Location16945 errCode:16945 reslen:313 locks:{ Global: { acquireCount: { r: 8061 } }, Database: { acquireCount: { r: 8060 } }, Collection: { acquireCount: { r: 8060 } } } storage:{} protocol:op_msg 5448ms
The query
db.locations.aggregate([
{
$match: {
customerId: "8047380094"
}
},
{
$unwind: "$outdoorLocationInfo.location.coordinates"
},
{
$group: {
_id: "$_id",
longitude: {
$first: "$outdoorLocationInfo.location.coordinates"
},
latitude: {
$last: "$outdoorLocationInfo.location.coordinates"
}
}
},
{
$group: {
_id: null,
minLongitude: {
$min: "$longitude"
},
maxLongitude: {
$max: "$longitude"
},
minLatitude: {
$min: "$latitude"
},
maxLatitude: {
$max: "$latitude"
}
}
}
])
Indexes we have on this collection:
db.locations.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "loc-service.locations"
},
{
"v" : 2,
"key" : {
"customerId" : 1,
"deviceId" : 1
},
"name" : "customerId_1_deviceId_1",
"ns" : "loc-service.locations",
"sparse" : true,
"background" : true
},
{
"v" : 2,
"key" : {
"customerId" : 1,
"geoHash" : 1
},
"name" : "customerId_1_geoHash_1",
"ns" : "loc-service.locations",
"sparse" : true,
"background" : true
},
{
"v" : 2,
"key" : {
"customerId" : 1,
"outdoorLocationInfo.location" : "2dsphere"
},
"name" : "customerId_1_outdoorLocationInfo.location_2dsphere",
"ns" : "loc-service.locations",
"sparse" : true,
"background" : true,
"2dsphereIndexVersion" : 3
},
{
"v" : 2,
"key" : {
"customerId" : 1,
"outdoorLocationInfo.location.coordinates" : 1
},
"name" : "customerId_1_outdoorLocationInfo.location.coordinates_1",
"ns" : "loc-service.locations",
"sparse" : true,
"background" : true
}
]
Sample Data:
db.locations.findOne()
{
"_id" : ObjectId("60551b70a48edf83848607d2"),
"outdoorLocationInfo" : {
"location" : {
"type" : "Point",
"coordinates" : [
-95.330024,
36.262476
]
}
},
"customerId" : "2868306879",
"deviceId" : "6eN7sMEOP1e",
"geoHash" : "9yknq9qu1rqp",
}
Thanks
I think you can simplify your query with $arrayElemAt
db.collection.aggregate([
{
$match: {
customerId: "8047380094"
}
},
{
$group: {
_id: null,
"maxLatitude": {
"$max": {
"$arrayElemAt": [
"$outdoorLocationInfo.location.coordinates",
1
]
}
},
"maxLongitude": {
"$max": {
"$arrayElemAt": [
"$outdoorLocationInfo.location.coordinates",
0
]
}
},
"minLatitude": {
"$min": {
"$arrayElemAt": [
"$outdoorLocationInfo.location.coordinates",
1
]
}
},
"minLongitude": {
"$min": {
"$arrayElemAt": [
"$outdoorLocationInfo.location.coordinates",
0
]
}
},
}
}
])
Try it here
Related
How can i sort if my time(time_required) is saved in this format ?
quiz_customer_record
{
"_id" : ObjectId("5f16eb4a5007bd5395c76ed9"),
"quiz_id" : "5f05bbd10cf3166085be68fc",
"user_id" : "5f06e0ddf718c04de30ea47f",
"name" : "ABC",
"time_required" : "0:6 Mins",
"questions_attempted" : 0,
"total_quiz_questions" : 1,
"attempt_date" : "2020-07-21T13:19:08.025Z"
},
{
"_id" : ObjectId("5f16eb5f5007bd5395c76edb"),
"quiz_id" : "5f05bbd10cf3166085be68fc",
"user_id" : "5f06e0ddf718c04de30ea47f",
"name" : "ABC",
"time_required" : "0:8 Mins",
"questions_attempted" : 0,
"total_quiz_questions" : 1,
"attempt_date" : "2020-07-21T13:19:29.377Z"
}
I want to sort it according to time_required but its in string and is in format of Mins:Seconds. Yes its a pretty messed up. But do we have a solution? I want to use mongo query for that as there are so many records and i sort of need to use limit(for pagination). That is why it is necessary for using mongo query.
Expected Result- Sort type- descending()
{
"_id" : ObjectId("5f16eb5f5007bd5395c76edb"),
"quiz_id" : "5f05bbd10cf3166085be68fc",
"user_id" : "5f06e0ddf718c04de30ea47f",
"name" : "ABC",
"time_required" : "0:8 Mins",
"questions_attempted" : 0,
"total_quiz_questions" : 1,
"attempt_date" : "2020-07-21T13:19:29.377Z"
},
{
"_id" : ObjectId("5f16eb4a5007bd5395c76ed9"),
"quiz_id" : "5f05bbd10cf3166085be68fc",
"user_id" : "5f06e0ddf718c04de30ea47f",
"name" : "ABC",
"time_required" : "0:6 Mins",
"questions_attempted" : 0,
"total_quiz_questions" : 1,
"attempt_date" : "2020-07-21T13:19:08.025Z"
}
The query i'm using is
db.quiz_customer_record.aggregate([{ $match: { quiz_id:quiz_id}},
{
$sort: { attempt_date: -1 }
},
{
$group: {
_id: "$user_id",
result1: { $first: "$attempt_date" },
quiz_id: { $first: "$quiz_id" },
time_required: { $first: "$time_required" },
o_id: { $first: "$_id" }
}
},
{
$project: {
_id: "$o_id",
user_id: "$_id",
quiz_id:"$quiz_id",
time_required:"$time_required",
result1: 1
}
}
]).sort({time_required:-1})
Answer for mongo version less than 4.2
$set was added in 4.2 version. For earlier version $addFields can be used.
db.collection.aggregate([
{
"$addFields": {
"time_required_split": {
$substr: [
"$time_required",
0,
3
]
}
}
},
{
"$addFields": {
"time_required_split": {
$split: [
"$time_required_split",
":"
]
}
}
},
{
"$addFields": {
"time_seconds": {
$sum: [
{
"$multiply": [
{
$toInt: {
$arrayElemAt: [
"$time_required_split",
0
]
}
},
60
]
},
{
$toInt: {
$arrayElemAt: [
"$time_required_split",
1
]
}
}
]
}
}
},
{
"$sort": {
time_seconds: -1
}
},
{
"$project": {
"time_required_split": 0,
"time_seconds": 0
}
}
])
Mongo Playground
Try this query -
db.collection.aggregate([
{
"$set": {
"time_required_split": {
$substr: [
"$time_required",
0,
3
]
}
}
},
{
"$set": {
"time_required_split": {
$split: [
"$time_required_split",
":"
]
}
}
},
{
"$set": {
"time_seconds": {
$sum: [
{
"$multiply": [
{
$toInt: {
$arrayElemAt: [
"$time_required_split",
0
]
}
},
60
]
},
{
$toInt: {
$arrayElemAt: [
"$time_required_split",
1
]
}
}
]
}
}
},
{
"$sort": {
time_seconds: -1
}
},
{
"$project": {
"time_required_split": 0,
"time_seconds": 0
}
}
])
Mongo Playground
Let me know if don't understand any stage.
please check this query
db.billsummaryofthedays.aggregate([
{
'$match': {
'userId': ObjectId('5e43de778b57693cd46859eb'),
'adminId': ObjectId('5e43e5cdc11f750864f46820'),
'date': ISODate("2020-02-11T16:30:00Z"),
}
},
{
$lookup:
{
from: "paymentreceivables",
let: { userId: '$userId', adminId: '$adminId' },
pipeline: [
{
$match:
{
paymentReceivedOnDate:ISODate("2020-02-11T16:30:00Z"),
$expr:
{
$and:
[
{ $eq: ["$userId", "$$userId"] },
{ $eq: ["$adminId", "$$adminId"] }
]
}
}
},
{ $project: { amount: 1, _id: 0 } }
],
as: "totalPayment"
}
}, {'$unwind':'$totalPayment'},
{ $group:
{ _id:
{ date: '$date',
userId: '$userId',
adminId: '$adminId' },
totalBill:
{
$sum: '$billOfTheDay'
},
totalPayment:
{
$sum: '$totalPayment.amount'
}
}
},
}
}])
this is the result i am getting in the shell
{
"_id" : {
"date" : ISODate("2020-02-11T18:30:00Z"),
"userId" : ObjectId("5e43de778b57693cd46859eb"),
"adminId" : ObjectId("5e43e5cdc11f750864f46820")
},
"totalBill" : 1595.6799999999998,
"totalPayments" : 100
}
now this is not what i expected,i assume due to {'$unwind':'$totalPayment'} it takes out all the values from the array and because of which every document is getting counted 2 times. When i remove {'$unwind':'$totalPayment'} then totalBill sum turns out to be correct but totalPayment is 0.
I have tried several other ways but not able to achieve the desired result
Below are my collections:
// collection:billsummaryofthedays//
{
"_id" : ObjectId("5e54f784f4032c1694535c0e"),
"userId" : ObjectId("5e43de778b57693cd46859eb"),
"adminId" : ObjectId("5e43e5cdc11f750864f46820"),
"date" : ISODate("2020-02-11T16:30:00Z"),
"UID":"acex01"
"billOfTheDay" : 468,
}
{
"_id" : ObjectId("5e54f784f4032c1694535c0f"),
"UID":"bdex02"
"userId" : ObjectId("5e43de778b57693cd46859eb"),
"adminId" : ObjectId("5e43e5cdc11f750864f46820"),
"date" : ISODate("2020-02-11T16:30:00Z"),
"billOfTheDay" : 329.84,
}
// collection:paymentreceivables//
{
"_id" : ObjectId("5e43e73169fe1e3fc07eb7c5"),
"paymentReceivedOnDate" : ISODate("2020-02-11T16:30:00Z"),
"adminId" : ObjectId("5e43e5cdc11f750864f46820"),
"userId" : ObjectId("5e43de778b57693cd46859eb"),
"amount" : 20,
}
{
"_id" : ObjectId("5e43e73b69fe1e3fc07eb7c6"),
"paymentReceivedOnDate" : ISODate("2020-02-11T16:30:00Z"),
"adminId" : ObjectId("5e43e5cdc11f750864f46820"),
"userId" : ObjectId("5e43de778b57693cd46859eb"),
"amount" : 30,
}
desired result should be totalBill:797.83 i.e[468+329.84,] and totalPayment:50 i.e[30+20,] but i am getting double the expected result and even if i am able to calculate one of the value correctly the other one result 0.How to tackle this??
Since you've multiple documents with same data in billsummaryofthedays collection then you can group first & then do $lookup - that way JOIN between two collections would be 1-Vs-many rather than many-Vs-many as like it's currently written, So you can try below query for desired o/p & performance gains :
db.billsummaryofthedays.aggregate([
{
"$match": {
"userId": ObjectId("5e43de778b57693cd46859eb"),
"adminId": ObjectId("5e43e5cdc11f750864f46820"),
"date": ISODate("2020-02-11T16:30:00Z"),
}
},
{
$group: {
_id: {
date: "$date",
userId: "$userId",
adminId: "$adminId"
},
totalBill: {
$sum: "$billOfTheDay"
}
}
},
{
$lookup: {
from: "paymentreceivables",
let: {
userId: "$_id.userId",
adminId: "$_id.adminId"
},
pipeline: [
{
$match: {
paymentReceivedOnDate: ISODate("2020-02-11T16:30:00Z"),
$expr: {
$and: [
{
$eq: [
"$userId",
"$$userId"
]
},
{
$eq: [
"$adminId",
"$$adminId"
]
}
]
}
}
},
{
$project: {
amount: 1,
_id: 0
}
}
],
as: "totalPayment"
}
},
{
$addFields: {
totalPayment: {
$reduce: {
input: "$totalPayment",
initialValue: 0,
in: {
$add: [
"$$value",
"$$this.amount"
]
}
}
}
}
}
])
Test : MongoDB-Playground
I have a question regarding MongoDB aggregation query which is almost similar to $unwind 2 fields separately in mongodb query.
This is the document:
{
"_id" : "1",
"details" : {
"phonenumber" : [
"1",
"2"
],
"name" : [
"a",
"b"
]
}
}
And I am trying to frame a query which will return me the following result:
{ "_id" : "1", "phonenumber" : "1", "name" : null },
{ "_id" : "1", "phonenumber" : "2", "name" : null },
{ "_id" : "1", "phonenumber" : null, "name" : "a" },
{ "_id" : "1", "phonenumber" : null, "name" : "b" }
Could someone please help me with that?
Closest solution I could figure out is by following query:
db.document.aggregate( [ { $unwind: { path: "$details.name"} }, { $unwind: { path: "$details.phonenumber" } }, { $project: { _id: 1, name: "$details.name", phonenumber: "$details.phonenumber" } } ] )
And the output from above query is:
{ "_id" : "1", "phonenumber" : "1", "name" : "a" },
{ "_id" : "1", "phonenumber" : "1", "name" : "b" },
{ "_id" : "1", "phonenumber" : "2", "name" : "a" },
{ "_id" : "1", "phonenumber" : "2", "name" : "b" }
With MongoDB v3.4, one of the possible solution would be,
db.document.aggregate({
'$facet': {
'phonenumber': [{
'$unwind': '$details.phonenumber'
}, {
'$project': {
phonenumber: '$details.phonenumber',
name: null
}
}],
'name': [{
'$unwind': '$details.name'
}, {
'$project': {
name: '$details.name',
phonenumber: null
}
}]
}
}, {
'$project': {
'combined': {
'$setUnion': ['$phonenumber', '$name']
}
}
}, {
'$unwind': '$combined'
}, {
'$replaceRoot': {
'newRoot': '$combined'
}
})
facet allows us to include multiple aggregation pipelines within a single stage, which is available from version 3.4
Alternate solution for earlier versions of mongodb,
db.document.aggregate([{
$unwind: {
path: "$details.name"
}
}, {
$group: {
_id: "$_id",
nameArr: {
$push: {
name: "$details.name",
phonenumber: {
$ifNull: ["$description", null]
}
}
},
"details": {
$first: "$details"
}
}
}, {
$unwind: "$details.phonenumber"
}, {
$group: {
_id: "$_id",
phoneArr: {
$push: {
phonenumber: "$details.phonenumber",
name: {
$ifNull: ["$description", null]
}
}
},
"nameArr": {
$first: "$nameArr"
}
}
}, {
$project: {
_id: 1,
value: {
$setUnion: ["$nameArr", "$phoneArr"]
}
}
}, {
$unwind: "$value"
}, {
$project: {
name: "$value.name",
phonenumber: "$value.phonenumber"
}
}])
I have highly nested mongodb set of objects and i want to sort subofdocuments according to the result of sum their votes for example :
{
"_id":17846384es,
"company_name":"company1",
"products":[
{
"product_id":"123785",
"product_name":"product1",
"user_votes":[
{
"user_id":1,
"vote":1
},
{
"user_id":2,
"vote":2
}
]
},
{
"product_id":"98765",
"product_name":"product2",
"user_votes":[
{
"user_id":5,
"vote":3
},
{
"user_id":3,
"vote":3
}
]
}
]
}
i want to sort as descending products according to the result of sum their votes
the expected output is
{
"_id":17846384es,
"company_name":"company1",
"products":[
{
"product_id":"98765",
"product_name":"product2",
"user_votes":[
{
"user_id":5,
"vote":3
},
{
"user_id":3,
"vote":3
}
]
"votes":8
},
{
"product_id":"123785",
"product_name":"product1",
"user_votes":[
{
"user_id":1,
"vote":1
},
{
"user_id":2,
"vote":2
}
],
"votes":3
}
]
}
Any Idea ?
The following pipeline
db.products.aggregate([
{ $unwind: "$products" },
{
$project: {
company_name: 1,
products: 1,
totalVotes: {
$sum: "$products.user_votes.vote"
}
}
},
{ $sort: { totalVotes: -1 } },
{
$group: {
_id: "$_id",
company_name: { $first: "$company_name" },
products: { $push: "$products" }
}
}
])
would output
{
"_id" : "17846384es",
"company_name" : "company1",
"products" : [
{
"product_id" : "98765",
"product_name" : "product2",
"user_votes" : [
{
"user_id" : 5,
"vote" : 3
},
{
"user_id" : 3,
"vote" : 3
}
]
},
{
"product_id" : "123785",
"product_name" : "product1",
"user_votes" : [
{
"user_id" : 1,
"vote" : 1
},
{
"user_id" : 2,
"vote" : 2
}
]
}
]
}
In case you want to keep the sum of the votes at the product level as shown in your expected output simply modify the $project stage as follows
...
{
$project: {
company_name: 1,
products: {
product_id: 1,
product_name: 1,
user_votes: 1,
votes: { $sum: "$products.user_votes.vote" }
}
}
}
...
I have an aggregation that groups on a date and creates a sum.
db.InboundWorkItems.aggregate({
$match: {
notificationDate: {
$gte: ISODate("2013-07-18T04:00:00Z")
},
dropType: 'drop'
}
}, {
$group: {
_id: {
notificationDate: "$notificationDate"
},
nd: {
$first: "$notificationDate"
},
count: {
$sum: 1
}
}
}, {
$sort: {
nd: 1
}
})
The output is
"result" : [
{
"_id" : {
"notificationDate" : ISODate("2013-07-18T04:00:00Z")
},
"nd" : ISODate("2013-07-18T04:00:00Z"),
"count" : 484
},
{
"_id" : {
"notificationDate" : ISODate("2013-07-19T04:00:00Z")
},
"nd" : ISODate("2013-07-19T04:00:00Z"),
"count" : 490
},
{
"_id" : {
"notificationDate" : ISODate("2013-07-20T04:00:00Z")
},
"nd" : ISODate("2013-07-20T04:00:00Z"),
"count" : 174
},
{
"_id" : {
"notificationDate" : ISODate("2013-07-21T04:00:00Z")
},
"nd" : ISODate("2013-07-21T04:00:00Z"),
"count" : 6
},
{
"_id" : {
"notificationDate" : ISODate("2013-07-22T04:00:00Z")
},
"nd" : ISODate("2013-07-22T04:00:00Z"),
"count" : 339
},
{
"_id" : {
"notificationDate" : ISODate("2013-07-23T04:00:00Z")
},
"nd" : ISODate("2013-07-23T04:00:00Z"),
"count" : 394
},
{
"_id" : {
"notificationDate" : ISODate("2013-07-24T04:00:00Z")
},
"nd" : ISODate("2013-07-24T04:00:00Z"),
"count" : 17
}
],
"ok" : 1
so far so good. What I need to do now is to keep this, but also add a distinct in the criteria (for argument's sake I want to use AccountId). The would yield me the count of the grouped dates only using distinct AccountId. Is distinct even possible within the aggregation framework?
you can use two group commands in the pipeline, the first to group by accoundId, followed by second group that does usual operation. something like this:
db.InboundWorkItems.aggregate(
{$match: {notificationDate: {$gte: ISODate("2013-07-18T04:00:00Z")}, dropType:'drop' }},
{$group: {_id:"accountId",notificationDate:"$notificationDate"}},
{$group: {_id:1, nd: {$first:"$notificationDate"}, count:{$sum:1} }},
{$sort:{nd:1}} )
db.InboundWorkItems.aggregate({
$match: {
notificationDate: {
$gte: ISODate("2013-07-18T04:00:00Z")
},
dropType: 'drop'
}
}, {
$group: {
_id: "$AccountId",
notificationDate: {
$max: "$notificationDate"
},
dropType: {
$max: "$dropType"
}
}
}, {
$group: {
_id: {
notificationDate: "$notificationDate"
},
nd: {
$first: "$notificationDate"
},
count: {
$sum: 1
}
}
}, {
$sort: {
nd: 1
}
})
I think you might actually be looking for a single group (English is a bit confusing) like so:
db.InboundWorkItems.aggregate({
$match: {
notificationDate: {
$gte: ISODate("2013-07-18T04:00:00Z")
},
dropType: 'drop'
}
}, {
$group: {
_id: {
notificationDate: "$notificationDate", accountId: '$accountId'
},
nd: {
$first: "$notificationDate"
},
count: {
$sum: 1
}
}
}, {
$sort: {
nd: 1
}
})
I add the compound _id in the $group because of:
The would yield me the count of the grouped dates only using distinct AccountId.
Which makes me think you want the grouped date count by account ID.