Multiple Nested Group Within Array - mongodb

I'm having group of elements in MongoDB as given below:
/* 1 */
{
"_id" : ObjectId("58736c7f7d43c305461cdb9b"),
"Name" : "Kevin",
"pb_event" : [
{
"event_type" : "Birthday",
"event_date" : "2014-08-31"
},
{
"event_type" : "Anniversary",
"event_date" : "2014-08-31"
}
]
}
/* 2 */
{
"_id" : ObjectId("58736cfc7d43c305461cdba8"),
"Name" : "Peter",
"pb_event" : [
{
"event_type" : "Birthday",
"event_date" : "2014-08-31"
},
{
"event_type" : "Anniversary",
"event_date" : "2015-03-24"
}
]
}
/* 3 */
{
"_id" : ObjectId("58736cfc7d43c305461cdba9"),
"Name" : "Pole",
"pb_event" : [
{
"event_type" : "Birthday",
"event_date" : "2015-03-24"
},
{
"event_type" : "Work Anniversary",
"event_date" : "2015-03-24"
}
]
}
Now I want the result that has group on event_date then after group on event_type. event_type contain all names of the related user, then count of records in the respective array.
Expected Output
/* 1 */
{
"event_date" : "2014-08-31",
"data" : [
{
"event_type" : "Birthday",
"details" : [
{
"_id" : ObjectId("58736c7f7d43c305461cdb9b"),
"name" : "Kevin"
},
{
"_id" : ObjectId("58736cfc7d43c305461cdba8"),
"name" : "Peter"
}
],
"count" : 2
},
{
"event_type" : "Anniversary",
"details" : [
{
"_id" : ObjectId("58736c7f7d43c305461cdb9b"),
"name" : "Kevin"
}
],
"count" : 1
}
]
}
/* 2 */
{
"event_date" : "2015-03-24",
"data" : [
{
"event_type" : "Anniversary",
"details" : [
{
"_id" : ObjectId("58736cfc7d43c305461cdba8"),
"name" : "Peter"
}
],
"count" : 1
},
{
"event_type" : "Birthday",
"details" : [
{
"_id" : ObjectId("58736cfc7d43c305461cdba9"),
"name" : "Pole"
}
],
"count" : 1
},
{
"event_type" : "Work Anniversary",
"details" : [
{
"_id" : ObjectId("58736cfc7d43c305461cdba9"),
"name" : "Pole"
}
],
"count" : 1
}
]
}

Using the aggregation framework, you would need to run a pipeline that has the following stages so that you get the desired result:
db.collection.aggregate([
{ "$unwind": "$pb_event" },
{
"$group": {
"_id": {
"event_date": "$pb_event.event_date",
"event_type": "$pb_event.event_type"
},
"details": {
"$push": {
"_id": "$_id",
"name": "$Name"
}
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": "$_id.event_date",
"data": {
"$push": {
"event_type": "$_id.event_type",
"details": "$details",
"count": "$count"
}
}
}
},
{
"$project": {
"_id": 0,
"event_date": "$_id",
"data": 1
}
}
])
In the above pipeline, the first step is the $unwind operator
{ "$unwind": "$pb_event" }
which comes in quite handy when the data is stored as an array. When the unwind operator is applied on a list data field, it will generate a new record for each and every element of the list data field on which unwind is applied. It basically flattens the data.
This is a necessary operation for the next pipeline stage, the $group step where you group the flattened documents by the deconstructed pb_event array fields event_date and event_type:
{
"$group": {
"_id": {
"event_date": "$pb_event.event_date",
"event_type": "$pb_event.event_type"
},
"details": {
"$push": {
"_id": "$_id",
"name": "$Name"
}
},
"count": { "$sum": 1 }
}
},
The $group pipeline operator is similar to the SQL's GROUP BY clause. In SQL, you can't use GROUP BY unless you use any of the aggregation functions. The same way, you have to use an aggregation function in MongoDB (called an accumulator operator) as well. You can read more about the aggregation functions here.
In this $group operation, the logic to calculate the count aggregate i.e. the total number of documents in the group using the $sum accumulator operator. Within the same pipeline, you can aggregate a list of the name and _id subdocuments by using the $push operator which returns an array of expression values for each group.
The preceding $group pipeline
{
"$group": {
"_id": "$_id.event_date",
"data": {
"$push": {
"event_type": "$_id.event_type",
"details": "$details",
"count": "$count"
}
}
}
}
will further aggregate the results from the last pipeline by grouping on the event_date, which forms basis of the desired output by creating a new data list using $push and then the final $project pipeline stage
{
"$project": {
"_id": 0,
"event_date": "$_id",
"data": 1
}
}
reshapes the documents fields by renaming the _id field to event_date and retaining the other field.

Related

mongodb count number of documents for every category

My collection looks like this:
{
"_id":ObjectId("5744b6cd9c408cea15964d18"),
"uuid":"bbde4bba-062b-4024-9bb0-8b12656afa7e",
"version":1,
"categories":["sport"]
},
{
"_id":ObjectId("5745d2bab047379469e10e27"),
"uuid":"bbde4bba-062b-4024-9bb0-8b12656afa7e",
"version":2,
"categories":["sport", "shopping"]
},
{
"_id":ObjectId("5744b6359c408cea15964d15"),
"uuid":"561c3705-ba6d-432b-98fb-254483fcbefa",
"version":1,
"categories":["politics"]
}
I want to count the number of documents for every category. To do this, I unwind the categories array:
db.collection.aggregate(
{$unwind: '$categories'},
{$group: {_id: '$categories', count: {$sum: 1}} }
)
Result:
{ "_id" : "sport", "count" : 2 }
{ "_id" : "shopping", "count" : 1 }
{ "_id" : "politics", "count" : 1 }
Now I want to count the number of documents for every category, but where document version is the latest version.
This is where I am stuck.
It's ugly but I think this gives you what you're after:
db.collection.aggregate(
{ $unwind : "$categories" },
{ $group :
{ "_id" : { "uuid" : "$uuid" },
"doc" : { $push : { "version" : "$version", "category" : "$categories" } },
"maxVersion" : { $max : "$version" }
}
},
{ $unwind : "$doc" },
{ $project : { "_id" : 0, "uuid" : "$id.uuid", "category" : "$doc.category", "isCurrentVersion" : { $eq : [ "$doc.version", "$maxVersion" ] } } },
{ $match : { "isCurrentVersion" : true }},
{ $group : { "_id" : "$category", "count" : { $sum : 1 } } }
)
You can do this by first grouping the denormalized documents (from the $unwind operator step) by two keys, i.e. the categories and version fields. This is necessary for the preceding pipeline step which orders the grouped documents and their accumulated counts by the version (desc) and categories (asc) keys respectively using the $sort operator.
Another grouping will be required to get the top documents in each categories group after ordering using the $first operator. The following shows this
db.collection.aggregate(
{ "$unwind": "$categories" },
{
"$group": {
"_id": {
'categories': '$categories',
'version': '$version'
},
"count": { "$sum": 1 }
}
},
{ "$sort": { "_id.version": -1, "_id.categories": 1 } },
{
"$group": {
"_id": "$_id.categories",
"count": { "$first": "$count" },
"version": { "$first": "$_id.version" }
}
}
)
Sample Output
{ "_id" : "shopping", "count" : 1, "version" : 2 }
{ "_id" : "sport", "count" : 1, "version" : 2 }
{ "_id" : "politics", "count" : 1, "version" : 1 }

Return array of elements from multiple arrays

I got a collection of companies that looks like this. I also want to merge other documents deals.
I need this:
{
"_id" : ObjectId("561637942d25a7644cae993e"),
"locations" : [
{
"deals" : [
{
"name" : "1",
"_id" : ObjectId("561637942d25a7644cae9940")
},
{
"name" : "2",
"_id" : ObjectId("562f868ce73962c626a16b15")
}
]
}
],
"deals" : [
{
"name" : "3",
"_id" : ObjectId("562f86ebe73962c626a16b17")
}
]
}
{
"_id" : ObjectId("561637942d25a7644cae993e"),
"locations" : [
{
"deals" : [
{
"name" : "4",
"_id" : ObjectId("561637942d25a7644cae9940")
}
]
}
],
"deals" : []
}
To be like this:
{
"deals": [{
"name" : "1",
"_id" : ObjectId("561637942d25a7644cae9940")
},{
"name" : "2",
"_id" : ObjectId("562f868ce73962c626a16b15")
},{
"name" : "3",
"_id" : ObjectId("562f86ebe73962c626a16b17")
},{
"name" : "4",
"_id" : ObjectId("561637942d25a7644cae9949")
}]
}
But I have only failed to do this. It seems like if I want all the deals to be grouped together into one array I should not use unwind since that create more documents because I only need to group once.
This is my attempt which does not work at all.
{
"$project": {
"_id": 1,
"locations": 1,
"deals": 1
}
}, {
"$unwind": "$locations"
}, {
"$unwind": "$locations.deals"
}, {
"$unwind": "$deals"
}, {
"$group": {
"_id": null,
"deals": {
"$addToSet": "$locations.deals",
"$addToSet": "$deals"
}
}
}
You should first use filter your documents to reduce the size of documents to process in the pipeline using the $match operator. Then we need to $unwind the "locations" array after that we use the $project operator to reshape your documents. The $cond operator is used to return a single element array [false] if the deals field is empty array or the deals value because $unwinding empty array will throw an exception. Of course the $setUnion operator does return an array of element that appear in the locations.deals array or the deals array. We then use the $setDifference operator to filter out the false element from the merged array. We then need another $unwind stage where we deconstruct the deals array. From there we can easily $group your documents.
db.collection.aggregate([
{ "$match": { "locations.0": { "$exists": true } } },
{ "$unwind": "$locations" },
{ "$project": {
"deals": {
"$setDifference": [
{ "$setUnion": [
{ "$cond": [
{ "$eq" : [ { "$size": "$deals" }, 0 ] },
[false],
"$deals"
]},
"$locations.deals"
]},
[false]
]
}
}},
{ "$unwind": "$deals" },
{ "$group": {
"_id": null,
"deals": { "$addToSet": "$deals" }
}}
])
Which returns:
{
"_id" : null,
"deals" : [
{
"name" : "1",
"_id" : ObjectId("561637942d25a7644cae9940")
},
{
"name" : "2",
"_id" : ObjectId("562f868ce73962c626a16b15")
},
{
"name" : "3",
"_id" : ObjectId("562f86ebe73962c626a16b17")
},
{
"name" : "4",
"_id" : ObjectId("561637942d25a7644cae9940")
}
]
}

Mongodb output field with multiple $cond

Here's an example of documents I use :
{
"_id" : ObjectId("554a1f5fe36a768b362ea5c0"),
"store_state" : 1,
"services" : [
{
"id" : "XXX",
"state" : 1,
"active": true
},
{
"id" : "YYY",
"state" : 1,
"active": true
},
...
]
}
I want to output a new field with "Y" if the id is "XXX" and active is true and "N" in any other cases. The service element with "XXX" as id is not present on every documents (output "N" in this case).
Here's my query for the moment :
db.stores.aggregate({
$match : {"store_state":1}
},
{ $project : {
"XXX_active": {
$cond: [ {
$and:[
{$eq:["services.$id","XXX"]},
{$eq:["services.$active",true]}
]},"Y","N"
] }
}
}).pretty()
But it always output "N" for "XXX_active" field.
The expected output I need is :
{
"_id" : ObjectId("554a1f5de36a768b362e7e6f"),
"XXX_active" : "Y"
},
{
"_id" : ObjectId("554a1f5ee36a768b362e9d25"),
"XXX_active" : "N"
},
{
"_id" : ObjectId("554a1f5de36a768b362e73a5"),
"XXX_active" : "Y"
}
Other example of possible result :
{
"_id" : ObjectId("554a1f5de36a768b362e7e6f"),
"XXX_active" : "Y",
"YYY_active" : "N"
},
{
"_id" : ObjectId("554a1f5ee36a768b362e9d25"),
"XXX_active" : "N",
"YYY_active" : "N"
},
{
"_id" : ObjectId("554a1f5de36a768b362e73a5"),
"XXX_active" : "Y",
"YYY_active" : "Y"
}
Only one XXX_active per object and no duplicates objects but I need all objects with an XXX_active even if the services id element "XXX" is not present. Could someone help please?
First $unwind services array and then used $cond as below :
db.stores.aggregate({
"$match": {
"store_state": 1
}
}, {
"$unwind": "$services"
}, {
"$project": {
"XXX_active": {
"$cond": [{
"$and": [{
"$eq": ["$services.id", "XXX"]
}, {
"$eq": ["$services.active", true]
}]
}, "Y", "N"]
}
}
},{"$group":{"_id":"$_id","XXX_active":{"$first":"$XXX_active"}}}) //group by id
The following aggregation pipeline will give the desired result. You would need to first apply the $unwind operator on the services array field first as your initial aggregation pipeline step. This will deconstruct the services array field from the input documents to output a document for each element. Each output document replaces the array with an element value.
db.stores.aggregate([
{
"$match" : {"store_state": 1}
},
{
"$unwind": "$services"
},
{
"$project": {
"store_state" : 1,
"services": 1,
"XXX_active": {
"$cond": [
{
"$and": [
{"$eq":["$services.id", "XXX"]},
{"$eq":["$services.active",true]}
]
},"Y","N"
]
}
}
},
{
"$match": {
"services.id": "XXX"
}
},
{
"$group": {
"_id": {
"_id": "$_id",
"store_state": "$store_state",
"XXX_active": "$XXX_active"
},
"services": {
"$push": "$services"
}
}
},
{
"$project": {
"_id": "$_id._id",
"store_state" : "$_id.store_state",
"services": 1,
"XXX_active": "$_id.XXX_active"
}
}
])

How to query a mongo collection to return the full document with virtual fields containing calculated values from the sub-document?

I'm trying to query a collection for a specific document that contains a sub-document. The sub-document contains values for which I'd like to obtain
the highest and lowest scores from that sub-document and return that result as virtual fields to the original document.
I have the following dataset:
{
"_id" : "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e",
"name" : "Addison Hunt",
"tests" : [
{
"name" : "lorem",
"score" : 79
},
{
"name" : "vallum",
"score" : 100
},
{
"name" : "ipsum",
"score" : 65
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU"
}
In mongo 2.4, how can I query mongo once to return the following result:
{
"_id" : "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e",
"name" : "Addison Hunt",
"tests" : [
{
"name" : "lorem",
"score" : 79
},
{
"name" : "vallum",
"score" : 100
},
{
"name" : "ipsum",
"score" : 65
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
"worst_test": {
"name" : "ipsum",
"score" : 65
},
"best_test": {
"name" : "vallum",
"score" : 100
}
}
Where "best_test" and "worst_test" are virtual fields representing the tests with the highest and lowest scores, respectively.
I've tried with many different ways and the closest I've gotten is with this query:
db.students.aggregate([
{ $match: {
'_id': 'd0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e'
}},
{ $unwind: '$tests' },
{ $sort: {'tests.score': 1} },
{ $group: {
_id: '$_id',
student_tests: {$push: "$$ROOT"},
worst_test: {$first: '$tests'},
best_test: { $last: '$tests' }
}}
]);
Which yields this result:
{
"_id" : "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e",
"student_tests" : [
{
"name" : "Addison Hunt",
"tests" : [
{
"name" : "ipsum",
"score" : 65
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
},
{
"name" : "Addison Hunt",
"tests" : [
{
"name" : "lorem",
"score" : 79
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
},
{
"name" : "Addison Hunt",
"tests" : [
{
"name" : "vallum",
"score" : 100
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
},
],
"worst_test": {
"name" : "ipsum",
"score" : 65
},
"best_test": {
"name" : "vallum",
"score" : 100
}
}
If you are using $$ROOT then in fact you are using MongoDB 2.6 as this is an aggregation variable only introduced in that version.
But while handy for various things, all it does is represent the entire document at the present stage of the pipeline where used. To do what you want and return the original document unmodified but with additional fields, you could use it in $project stage before the $unwind to assign to the _id field, but really you don't have exactly the same document as you would still need to $project at the end in order to get the correct document shape out of those elements.
You best bet is just projecting the fields, but keeping an un-altered copy of the array before any $sort is applied:
db.students.aggregate([
{ "$match": {
"_id": "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e"
}},
{ "$project": {
"name": 1,
"tests": 1,
"created_at": 1,
"class": 1,
"user_id": 1,
"testCopy": "$tests"
}},
{ "$unwind": "$testCopy" },
{ "$sort": { "testCopy.score": 1 } },
{ "$group": {
"_id: "$_id",
"tests": { "$first": "$tests" },
"created_at": { "$first": "$created_at" },
"class": { "$first": "$class" },
"user_id": { "$first": "$user_id" },
"worst_test": { "$first": "$testCopy" },
"best_test": { "$last": "$testCopy" }
}}
]);
Or using $$ROOT as mentioned before, alternately just placing the fields under the _id individually in the $project:
db.students.aggregate([
{ "$match": {
"_id": "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e"
}},
{ "$project": {
"_id": "$$ROOT",
"tests": 1
}},
{ "$unwind": "$tests" },
{ "$sort": { "tests.score": 1 } },
{ "$group": {
"_id": "$_id",
"aworst_test": { "$first": "$tests" },
"abest_test": { "$last": "$tests" }
}},
{ "$project": {
"_id": "$_id._id",
"tests": "$_id.tests",
"created_at": "$_id.created_at",
"class": "$_id.class",
"user_id": "$_id.user_id",
"worst_test": "$aworst_test",
"best_test": "$abest_test"
}}
]);
But as you see, you are still doing the $project work somewhere in order to get the structure you want, as well as the "renamed fields" to maintain the field order you want as the $project will otherwise "optimize" and "keep" any fields that have not been renamed and "append" new fields after the existing ones.
There really is no simple way to "get all fields" in the same way as you originally found them. Operations like $project and $group are an "all or nothing" affair, where they only explicitly produce what you tell them to.

Reduce the response of mongoDB query

I have documents like this:
{
"_id" : ObjectId("53340d07d6429d27e1284c77"),
"worktypes" : [
{
"name" : "Pompas",
"works" : [
{
"name" : "work 1",
"code" : "0001"
}
]
},
{
"name" : "Pompas "",
"works" : [
{
"name" : "work 2",
"code" : "0002"
}
]
}
]
}
I did a query for get ONLY the works of one of worktype for this document, this is the query:
db.categories.find({$and: [
{ "_id": ObjectId('53340d07d6429d27e1284c77')},
{"worktypes.name": "Pompas"}
]},{"worktypes.works.$":1})
But i got
{
"_id" : ObjectId("53340d07d6429d27e1284c77"),
"worktypes" : [
{
"name" : "Pompas",
"works" : [
{
"name" : "work 1",
"code" : "0001"
}
]
}
]
}
But i only need:
"works" : [
{
"name" : "work 1",
"code" : "0001"
}
]
How can i reduce this?
I think Neil Lunn's answer is mostly correct, but in my opinion it needs a few tweaks to get the expected result:
Match against "worktypes.name" rather than "worktypes.works.name"
In the $group phase, use $first instead of $push to get the first element alone
Add a $project phase to just get the "works"
db.categories.aggregate([
{ "$unwind": "$worktypes" },
{ "$unwind": "$worktypes.works" },
{ "$match": {
"worktypes.name": "Pompas"
}},
{ "$group": {
"_id": "$_id",
"works": { "$first": "$worktypes.works" }
}},
{ "$project": {"_id":0, "works":1} }
])
Output:
{
"result" : [
{
"works" : {
"name" : "work 1",
"code" : "0001"
}
}
],
"ok" : 1
}
You need to use the $unwind operator when working with arrays:
db.catefories.aggregate([
// Unwind the first array
{ "$unwind": "$worktypes" },
// Then unwind the embedded array
{ "$unwind": "$worktypes.works" },
// Match the item you want
{ "$match": {
"worktypes.works.name": "work 1"
}},
// Group to reform the array structure
{ "$group": {
"_id": "$_id",
"worktypes": { "$push": "$worktypes" }
}}
])
And to get back as an array use $group after the $unwind.
You can selectively remove fields from the projection like so:
db.categories.find({$and: [
{ "_id": ObjectId('53340d07d6429d27e1284c77')},
{"worktypes.name": "Pompas"}
]},{"worktypes.works.$":1, _id:0 })
this will prevent the _id field from being projected.