Mongo aggregation pipeline : How can I access one group stage variable in subsequent stage? - mongodb

I have two collections here :
> db.Unit.find({}).limit(1).pretty()
{
"_id" : ObjectId("58b6878de1648d05903e1beb"),
"number" : "07in15in4",
"owner" : {
"name" : "vacant"
},
"floor" : 15,
"tower" : 4,
"VisitorLogs" : [
ObjectId("58b6878ee1648d05903e1d22"),
ObjectId("58b6878ee1648d05903e236e"),
ObjectId("58b6878ee1648d05903e23c9"),
ObjectId("58b6878ee1648d05903e2454"),
ObjectId("58b6878ee1648d05903e2915"),
ObjectId("58b6878ee1648d05903e2aae"),
ObjectId("58b6878ee1648d05903e2b93")
]
}
and
> db.VisitorLog.find({}).limit(1).pretty()
{ "_id" : ObjectId("58b6878fe1648d05903e2efa"), "purpose" : "WorkHere" }
VisitorLogs in the Unit collection refers to the second collection, i.e. VisitorLog.
I need to find the the units which have the maximum number of visits for each purpose.
I tried this in mongo shell:
units=db.Unit
units.aggregate([
{$match:{'VisitorLogs':{$gt:[]}}},
{$unwind:'$VisitorLogs'},
{$lookup:{
from:"VisitorLog",
localField:"VisitorLogs",
foreignField:"_id",
as:"log"}
},
{$group:{_id:{number:"$number", purpose:"$log.purpose"}, count:{$sum:1}}},
{$sort:{count:-1}},
{$group:{_id:"$_id.purpose", unit:{$first:"$_id.number"}}},
{$limit:10}
])
I get the following result :
{ "_id" : [ "Business" ], "unit" : "05in12in2" }
{ "_id" : [ "WorkHere" ], "unit" : "09in04in2" }
{ "_id" : [ "Casual" ], "unit" : "10in05in2" }
{ "_id" : [ "JobInterview" ], "unit" : "05in14in2" }
This means that, for example, unit number 05in12in2 had the maximum visits which had the purpose "Business"
I now want to get the number of "business" visits that 05in12in2 had.
I think its in "count" variable of the first group stage.
How do I access that ? I tried {$group:{_id:"$_id.purpose", unit:{$first:"$_id.number"}, visits:"$count"}}, in the second last stage, i.e just before the limit stage but I get error :
> units.aggregate([
... {$match:{'VisitorLogs':{$gt:[]}}},
... {$unwind:'$VisitorLogs'},
... {$lookup:{from:"VisitorLog", localField:"VisitorLogs", foreignField:"_id", as:"log"}},
... {$group:{_id:{number:"$number", purpose:"$log.purpose"}, count:{$sum:1}}},
... {$sort:{count:-1}},
... {$group:{_id:"$_id.purpose", unit:{$first:"$_id.number"}, visits:"$count"}},
... {$limit:10}
... ])
assert: command failed: {
"ok" : 0,
"errmsg" : "the group aggregate field 'visits' must be defined as an expression inside an object",
"code" : 15951
} : aggregate failed
_getErrorWithCode#src/mongo/shell/utils.js:25:13
doassert#src/mongo/shell/assert.js:13:14
assert.commandWorked#src/mongo/shell/assert.js:287:5
DBCollection.prototype.aggregate#src/mongo/shell/collection.js:1312:5
#(shell):1:1
Can someone please help me ?

You have this information, but you are cutting it off by the $group stage. Just change the stage to the following (notice the last argument):
{ $group : {
_id : "$_id.purpose",
unit : { $first : "$_id.number" },
visits : { $first : "$count" }
}}

The following outputs what I needed :
units.aggregate([
{$match:{'VisitorLogs':{$gt:[]}}},
{$unwind:'$VisitorLogs'},
{$lookup:{from:"VisitorLog", localField:"VisitorLogs", foreignField:"_id", as:"log"}},
{$group:{_id:{number:"$number", purpose:"$log.purpose"}, count:{$sum:1}}},
{$sort:{count:-1}},
{$group:{_id:"$_id.purpose", number:{$first:"$_id.number"}, visits:{$first:"$count"}}},
{$limit:10}
])

Related

Individual search result in multiple values in arrays

I have following model:
{
"_id" : ObjectId("5d61aaf8108e185191552bbb"),
"serials" : [
"e127av48-0697-4977-b096-5ce79c89a414",
"d163f80a-55ff-40fe-90b4-331ece5bebd5",
"4740021f-e9b5-4ca5-bf0e-8554c123bb94",
"320ffd42-f101-4b1d-8ff4-80bc693a29e6",
"fef5e68b-aed0-4a96-9488-7941c41d1c1f",
"2c0752ba-bf7a-4a3b-bd9f-14db4b2f8bae",
"6c5ff44d-5979-4bff-af12-9e6d282c3789",
"9c91bf91-72d7-4b71-827b-924947d6e93d",
"fb34b28e-afb1-4b6a-a3c1-5a1fe44246ee",
"91ab22ef-702f-4cbd-8919-a67a2b9a684c",
"ee1a7cb2-e088-47e6-a824-c8697df7d94c",
"0dc4c687-4db2-481e-a1a6-491320dede11",
"34612148-3e01-44ee-b262-de2035e63691",
"5ba85baf-e48a-40af-8578-55ff1a873c76",
"19fe3672-b6cb-4bb6-8d21-93412b938584",
"1d0d6f6d-1b49-461b-8661-ecbf43a6595e",
"d9a5455c-65ee-45e1-ae49-33cc15dec841",
"4a690a00-a76c-4d3e-aee3-78b2bb731b0c",
"ae331830-40b4-457c-8cc4-5d548f769c3e",
"fe3e460b-c89d-4ace-8a36-5ba2b53bf4d0",
"2cc6a2a0-e029-475f-a7fc-a46a79afb605",
"a7d07767-eada-4ce3-b083-9b048e9ae9f4"
],
"name" : "ApiCard",
"producer" : "Farmina",
"form" : "syrop",
"__v" : 0
}
I would like to retrive documents (multiple) from collection based on this serial numbers ("serials" field). For example i am finding:
[
"e127av48-0697-4977-b096-5ce79c89a414",
"d163f80a-55ff-40fe-90b4-331ece5bebd5",
"4740021f-e9b5-4ca5-bf0e-8554c123bb94",
"key that doesn't exist",
]
We have to assume that one of the serial number doesn't exist, so would like to get information for individual serial, expected output:
[
{
"serial":"e127av48-0697-4977-b096-5ce79c89a414",
"doc":{
....whole document where above serial is in array field "serials"
}
},
{
"serial":"e127av48-0697-4977-b096-5ce79c89a414",
"doc":{
....whole document where above serial is in array field "serials"
}
},
{
"serial":"e127av48-0697-4977-b096-5ce79c89a414",
"doc":{
....whole document where above serial is in array field "serials"
}
},
{
"serial":"key that doesn't exist",
"doc": null
}
]
I was trying the simplest solution - mongodb find by multiple array items, but unfortunately it'doesn't return info for individual serial number. I'am not sure it's possible to prepare this kind of query. I think some complex aggregation could perform it, but i don't even know this kind of pipelines.
Of course, i can get simple solution by using multiple aggregate or even find, but it could impact on performance, when application will be looking for 10000 records per request.
The following query can do the trick:
db.collection.aggregate([
{
$limit:1
},
{
$project:{
"_id":0,
"serialsToSearch":[
"e127av48-0697-4977-b096-5ce79c89a414",
"d163f80a-55ff-40fe-90b4-331ece5bebd5",
"4740021f-e9b5-4ca5-bf0e-8554c123bb94",
"key that doesn't exist",
]
}
},
{
$unwind:"$serialsToSearch"
},
{
$lookup:{
"from":"collection",
"let":{
"serial":"$serialsToSearch"
},
"pipeline":[
{
$match:{
$expr:{
$in:["$$serial","$serials"]
}
}
},
{
$project:{
"serials":0
}
}
],
"as":"searialsLookup"
}
},
{
$unwind:{
"path":"$searialsLookup",
"preserveNullAndEmptyArrays":true
}
},
{
$project:{
"serial":"$serialsToSearch",
"doc":{
$ifNull:["$searialsLookup",null]
}
}
}
]).pretty()
Data Set:
{
"_id" : ObjectId("5d61aaf8108e185191552bbb"),
"serials" : [
"e127av48-0697-4977-b096-5ce79c89a414",
"d163f80a-55ff-40fe-90b4-331ece5bebd5",
"4740021f-e9b5-4ca5-bf0e-8554c123bb94",
"320ffd42-f101-4b1d-8ff4-80bc693a29e6",
"fef5e68b-aed0-4a96-9488-7941c41d1c1f",
"2c0752ba-bf7a-4a3b-bd9f-14db4b2f8bae",
"6c5ff44d-5979-4bff-af12-9e6d282c3789",
"9c91bf91-72d7-4b71-827b-924947d6e93d",
"fb34b28e-afb1-4b6a-a3c1-5a1fe44246ee",
"91ab22ef-702f-4cbd-8919-a67a2b9a684c",
"ee1a7cb2-e088-47e6-a824-c8697df7d94c",
"0dc4c687-4db2-481e-a1a6-491320dede11",
"34612148-3e01-44ee-b262-de2035e63691",
"5ba85baf-e48a-40af-8578-55ff1a873c76",
"19fe3672-b6cb-4bb6-8d21-93412b938584",
"1d0d6f6d-1b49-461b-8661-ecbf43a6595e",
"d9a5455c-65ee-45e1-ae49-33cc15dec841",
"4a690a00-a76c-4d3e-aee3-78b2bb731b0c",
"ae331830-40b4-457c-8cc4-5d548f769c3e",
"fe3e460b-c89d-4ace-8a36-5ba2b53bf4d0",
"2cc6a2a0-e029-475f-a7fc-a46a79afb605",
"a7d07767-eada-4ce3-b083-9b048e9ae9f4"
],
"name" : "ApiCard",
"producer" : "Farmina",
"form" : "syrop",
"__v" : 0
}
Output:
{
"serial" : "e127av48-0697-4977-b096-5ce79c89a414",
"doc" : {
"_id" : ObjectId("5d61aaf8108e185191552bbb"),
"name" : "ApiCard",
"producer" : "Farmina",
"form" : "syrop",
"__v" : 0
}
}
{
"serial" : "d163f80a-55ff-40fe-90b4-331ece5bebd5",
"doc" : {
"_id" : ObjectId("5d61aaf8108e185191552bbb"),
"name" : "ApiCard",
"producer" : "Farmina",
"form" : "syrop",
"__v" : 0
}
}
{
"serial" : "4740021f-e9b5-4ca5-bf0e-8554c123bb94",
"doc" : {
"_id" : ObjectId("5d61aaf8108e185191552bbb"),
"name" : "ApiCard",
"producer" : "Farmina",
"form" : "syrop",
"__v" : 0
}
}
{ "serial" : "key that doesn't exist", "doc" : null }
Note: The query won't give expected output if the collection would be empty.
Aggregation stages details:
STAGE I: Limiting the records to 1, as initially, our motive is to inject the input array in aggregation. The injection would be done in no time.
STAGE II: Projecting the input array as serialsToSearch
STAGE III: Now we have the input array as a field, we can unwind it
STAGE IV: Lookup in the same collection with each field of the input array and check if the searched serial is present in serials array
STAGE V: unwinding the lookup output
STAGE VI: Projecting fields as per the response required.

MongoDB update latest subdocument

here is my mongo document..
{
"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8"),
"subscriptions" : [
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8d"
},
{
"sub_id" : "5a56fd399dd78e33948c9b8e"
}
]
}
i want to update and upsert invoice_id into last element of sub-array..
i have tried..
sort: {$natural: -1},
subscription.$.invoice
what i want it to be is....
{
"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8"),
"subscriptions" : [
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8d"
},
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8f"
}
]
}
While there are ways to get the last array element, like Saravana shows in her answer, I don't recommend doing it that way because it introduces race conditions. For example, if two subs are added simultaneously, you can't depend on which one is 'last' in the array.
If an invoice_id has to be tied to a specific sub_id, then it's far better to query and find that specific element in the array, then add the invoice_id to it.
In the comments, the OP indicated that the current order of operations is 1) add sub_id, 2) insert the invoice record into the INVOICE collection and get the invoice_id, 3) add the invoice_id into the new subscription.
However, if you already have the sub_id, then it's better to re-order your operations this way: 1) insert the invoice record and get the invoice_id 2) add both sub_id and invoice_id with a single operation.
Doing this improves performance (eliminates the second update operation), but more importantly, eliminates race conditions because you're adding both sub_id and invoice_id at the same time.
we can get the document and update last element by index
> var doc = db.sub.findOne({"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8")})
> if ( doc.subscriptions.length - 1 >= 0 )
doc.subscriptions[doc.subscriptions.length-1].invoice_id="5a56fd399dd78e33948c9b8f"
> db.sub.update({_id:doc._id},doc)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
or write an aggregation pipeline to form the document and use it for update
db.sub.aggregate(
[
{$match : { "_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8") }},
{$addFields : { last : { $subtract : [{$size : "$subscriptions"},1]}}},
{$unwind : { path :"$subscriptions" , includeArrayIndex : "idx"}},
{$project : { "subscriptions.sub_id" : 1,
"subscriptions.invoice_id" : {
$cond : {
if: { $eq: [ "$idx", "$last" ] },
then: "5a56fd399dd78e33948c9b8f",
else: "$$REMOVE"
}
}
}
},
{$group : {_id : "$_id", subscriptions : {$push : "$subscriptions"}}}
]
).pretty()
result doc
{
"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8"),
"subscriptions" : [
{
"sub_id" : "5a56fd399dd78e33948c9b8e"
},
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8f"
}
]
}

I need to count how many children orgs are assigned to a parent org in MongoDB

I'm new to the MongoDB world. I'm trying to figure out how to count the number of children organizations assigned to a parent organization. I have documents that have this general structure:
{
"_id" : "001",
"parentOrganization" : {
"organizationId" : "pOrg1"
},
"childOrganization" : {
"organizationId" : "cOrg1"
}
},
{
"_id" : "002",
"parentOrganization" : {
"organizationId" : "pOrg1"
},
"childOrganization" : {
"organizationId" : "cOrg2"
}
},
{
"_id" : "003",
"parentOrganization" : {
"organizationId" : "pOrg2"
},
"childOrganization" : {
"organizationId" : "cOrg3"
}
}
Each document has a parentOrganization with an associated childOrganization. There may be multiple documents with the same parentOrganization, but different childOrganizations. There may also be multiple documents with the same parent/child relationship. Additionally, there may even be a case where a child org may associate with multiple parent orgs.
I'm trying to group by parentOrganization and then count the number of unique childOrganization's associated with each parentOrganization, as well as display the unique id's.
I have tried using an aggregation framework with $match and $group, but I'm still not getting into the child organization parts to count them. Here is what I'm currently attempting:
var s1 = {$match: {"parentOrganization.organizationId": {$exists: true}}};
var s2 = {$group: {_id: "$parentOrganization.organizationId", count: {$sum: "$childOrganization.organizationId"}}};
db.collection.aggregate(s1, s2);
My results are returning the parentOrganization, but my $sum is not returning the number of associated childOrganizations:
/* 1 */
{
"_id" : "pOrg1",
"count" : 0
}
/* 2 */
{
"_id" : "pOrg2",
"count" : 0
}
I get the feeling it is a bit more complicated than my limited knowledge has access to at this time. What details am I missing in this query?
Your $sum is referencing the childOrganization.organizationId value, which is a string. When $sum references a string, it will return the value 0.
I was a unsure of exactly what you were asking for, but I believe that these aggregations can help you on your way.
This will return a count of documents groups by the parentOrganization.organizationId
db.collection.aggregate({$group: {"_id":"$parentOrganization.organizationId", "count": {"$sum": 1}}})
Output:
{ "_id" : "pOrg2", "count" : 1 }
{ "_id" : "pOrg1", "count" : 2 }
This will return a count of unique parent/child organizations:
db.collection.aggregate(
{$group: {"_id": {"parentOrganization": "$parentOrganization.organizationId", "childOrganization": "$childOrganization.organizationId"}, "count":{$sum:1}}})
Output:
{ "_id" : { "parentOrganization" : "pOrg2", "childOrganization" : "cOrg3" }, "count" : 1 }
{ "_id" : { "parentOrganization" : "pOrg1", "childOrganization" : "cOrg2" }, "count" : 1 }
{ "_id" : { "parentOrganization" : "pOrg1", "childOrganization" : "cOrg1" }, "count" : 1 }
This will return a count of unique child organizations and get the set of unique child organizations as well using $addToSet. One caveat of using $addToSet is that the MongoDB 16MB limit on document size still holds. This means that if your collection is large enough such that the size of the set will make one document greater than 16MB, the command will fail. The first $group will create a set of child organizations grouped by parent organization. The $project is used simply to add the total size of the set to the result.
db.collection.aggregate([
{$group: {"_id" : "$parentOrganization.organizationId", "childOrgs" : { "$addToSet" : "$childOrganization.organizationId"}}},
{$project: {"_id" : "$_id", "uniqueChildOrgsCount": {"$size" : "$childOrgs"}, "uniqueChildOrgs": "$childOrgs"}}])
Output:
{ "_id" : "pOrg2", "uniqueChildOrgsCount" : 1, "uniqueChildOrgs" : [ "cOrg3" ]}
{ "_id" : "pOrg1", "uniqueChildOrgsCount" : 2, "uniqueChildOrgs" : [ "cOrg2", "cOrg1" ]}
During these aggregations, I left out the $match statement you included for simplicity, but you could add that back as well.

Positional operator and field limitation

In a find query projection, fields I specify after the positional operator are ignored and the whole document is always returned.
'myArray.$.myField' : 1 behave exactly like 'myArray.$' : 1
the positional operator selects the right document. But this document is quite big. I would like to project only 1 field from it.
Exemple:
db.getCollection('match').find({"participantsData.id" : 0001}, { 'participantsData.$.id': 1, })
here the response I have
{
"_id" : "myid",
"matchCreation" : 1463916465614,
"participantsData" : [
{
"id" : 0001,
"plenty" : "of",
"other" : "fields",
"and" : "subdocuments..."
}
]
}
This is what I want
{
"_id" : "myid",
"matchCreation" : 1463916465614,
"participantsData" : [
{
"id" : 0001
}
]
}
Is it possible with mongo?
Yes it can be done in mongo
Please try the below query
db.getCollection('match').find(
{"participantsData.id" : 0001},
{"participantsData.id": 1, "matchCreation": 1 })
This will give you the below result
{
"_id" : "myid",
"matchCreation" : 1463916465614,
"participantsData" : [
{
"id" : 1
}
]
}

$add operation returns null value

I have collection with document structure :
{
'year' : 2014,
'month' : 1
}
I am executing the following operation :
db.collname.aggregate(
[
{
$project : {
'year100' : {$multiply : ["$year" , 100]},
'result' : { '$add' : ['$year100', '$month'] }
}
}
]
);
I get the following result :
{
"result" : [
{
"_id" : ObjectId("5563596c515a88832210f0e4"),
"year100" : 201400.0000000000000000,
"result" : null
},
}
Why is add operation returuning null value as against to actual value ? Please help.
MongoDb not allow to used same fields in project to arithmetic operation instead of one $project used two different projects like this :
db.collname.aggregate({ $project : { 'year100' : {$multiply : ["$year" , 100]} ,"month":"$month"} },{"$project":{"year100":1,"result":{"$add":["$year100","$month"]}}})