Aggregation framework flatten subdocument data with parent document - mongodb

I am building a dashboard that rotates between different webpages. I am wanting to pull all slides that are part of the "Test" deck and order them appropriately. After the query my result would ideally look like.
[
{ "url" : "http://10.0.1.187", "position": 1, "duartion": 10 },
{ "url" : "http://10.0.1.189", "position": 2, "duartion": 3 }
]
I currently have a dataset that looks like the following
{
"_id" : ObjectId("53a612043c24d08167b26f82"),
"url" : "http://10.0.1.189",
"decks" : [
{
"title" : "Test",
"position" : 2,
"duration" : 3
}
]
}
{
"_id" : ObjectId("53a6103e3c24d08167b26f81"),
"decks" : [
{
"title" : "Test",
"position" : 1,
"duration" : 2
},
{
"title" : "Other Deck",
"position" : 1,
"duration" : 10
}
],
"url" : "http://10.0.1.187"
}
My attempted query looks like:
db.slides.aggregate([
{
"$match": {
"decks.title": "Test"
}
},
{
"$sort": {
"decks.position": 1
}
},
{
"$project": {
"_id": 0,
"position": "$decks.position",
"duration": "$decks.duration",
"url": 1
}
}
]);
But it does not yield my desired results. How can I query my dataset and get my expected results in a optimal way?

Well to truly "flatten" the document as your title suggests then $unwind is always going to be employed as there really is not other way to do that. There are however some different approaches if you can live with the array being filtered down to the matching element.
Basically speaking, if you really only have one thing to match in the array then your fastest approach is to simply use .find() matching the required element and projecting:
db.slides.find(
{ "decks.title": "Test" },
{ "decks.$": 1 }
).sort({ "decks.position": 1 }).pretty()
That is still an array but as long as you have only one element that matches then this does work. Also the items are sorted as expected, though of course the "title" field is not dropped from the matched documents, as that is beyond the possibilities for simple projection.
{
"_id" : ObjectId("53a6103e3c24d08167b26f81"),
"decks" : [
{
"title" : "Test",
"position" : 1,
"duration" : 2
}
]
}
{
"_id" : ObjectId("53a612043c24d08167b26f82"),
"decks" : [
{
"title" : "Test",
"position" : 2,
"duration" : 3
}
]
}
Another approach, as long as you have MongoDB 2.6 or greater available, is using the $map operator and some others in order to both "filter" and re-shape the array "in-place" without actually applying $unwind:
db.slides.aggregate([
{ "$project": {
"url": 1,
"decks": {
"$setDifference": [
{
"$map": {
"input": "$decks",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.title", "Test" ] },
{
"position": "$$el.position",
"duration": "$$el.duration"
},
false
]
}
}
},
[false]
]
}
}},
{ "$sort": { "decks.position": 1 }}
])
The advantage there is that you can make the changes without "unwinding", which can reduce processing time with large arrays as you are not essentially creating new documents for every array member and then running a separate $match stage to "filter" or another $project to reshape.
{
"_id" : ObjectId("53a6103e3c24d08167b26f81"),
"decks" : [
{
"position" : 1,
"duration" : 2
}
],
"url" : "http://10.0.1.187"
}
{
"_id" : ObjectId("53a612043c24d08167b26f82"),
"url" : "http://10.0.1.189",
"decks" : [
{
"position" : 2,
"duration" : 3
}
]
}
You can again either live with the "filtered" array or if you want you can again "flatten" this truly by adding in an additional $unwind where you do not need to filter with $match as the result already contains only the matched items.
But generally speaking if you can live with it then just use .find() as it will be the fastest way. Otherwise what you are doing is fine for small data, or there is the other option for consideration.

Well as soon as I posted I realized I should be using an $unwind. Is this query the optimal way to do it, or can it be done differently?
db.slides.aggregate([
{
"$unwind": "$decks"
},
{
"$match": {
"decks.title": "Test"
}
},
{
"$sort": {
"decks.position": 1
}
},
{
"$project": {
"_id": 0,
"position": "$decks.position",
"duration": "$decks.duration",
"url": 1
}
}
]);

Related

mongodb check if all subdocuments in array have the same value in one field

I have a collection of documents, each has a field which is an array of subdocuments, and all subdocuments have a common field 'status'. I want to find all documents that have the same status for all subdocuments.
collection:
{
"name" : "John",
"wives" : [
{
"name" : "Mary",
"status" : "dead"
},
{
"name" : "Anne",
"status" : "alive"
}
]
},
{
"name" : "Bill",
"wives" : [
{
"name" : "Mary",
"status" : "dead"
},
{
"name" : "Anne",
"status" : "dead"
}
]
},
{
"name" : "Mohammed",
"wives" : [
{
"name" : "Jane",
"status" : "dead"
},
{
"name" : "Sarah",
"status" : "dying"
}
]
}
I want to check if all wives are dead and find only Bill.
You can use the following aggregation query to get records of person whose wives are all dead:
db.collection.aggregate(
{$project: {name:1, wives:1, size:{$size:'$wives'}}},
{$unwind:'$wives'},
{$match:{'wives.status':'dead'}},
{$group:{_id:'$_id',name:{$first:'$name'}, wives:{$push: '$wives'},size:{$first:'$size'},count:{$sum:1}}},
{$project:{_id:1, wives:1, name:1, cmp_value:{$cmp:['$size','$count']}}},
{$match:{cmp_value:0}}
)
Output:
{ "_id" : ObjectId("56d401de8b953f35aa92bfb8"), "name" : "Bill", "wives" : [ { "name" : "Mary", "status" : "dead" }, { "name" : "Anne", "status" : "dead" } ], "cmp_value" : 0 }
If you need to find records of users who has same status, then you may remove the initial match stage.
The most efficient way to handle this is always going to be to "match" on the status of "dead" as the opening query, otherwise you are processing items that cannot possibly match, and the logic really quite simply followed with $map and $allElementsTrue:
db.collection.aggregate([
{ "$match": { "wives.status": "dead" } },
{ "$redact": {
"$cond": {
"if": {
"$allElementsTrue": {
"$map": {
"input": "$wives",
"as": "wife",
"in": { "$eq": [ "$$wife.status", "dead" ] }
}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
Or the same thing with $where:
db.collection.find({
"wives.status": "dead",
"$where": function() {
return this.wives.length
== this.wives.filter(function(el) {
el.status == "dead";
}).length;
}
})
Both essentially test the "status" value of all elements to make sure they match in the fastest possible way. But the aggregate pipeline with just $match and $redact should be faster. And "less" pipeline stages ( essentially each a pass through the data ) means faster as well.
Of course keeping a property on the document is always fastest, but it would involve logic to set that only where "all elements" are the same property. Which of course would typically mean inspecting the document by loading it from the server prior to each update.

Group and count using aggregation framework

I'm trying to group and count the following structure:
[{
"_id" : ObjectId("5479c4793815a1f417f537a0"),
"status" : "canceled",
"date" : ISODate("2014-11-29T00:00:00.000Z"),
"offset" : 30,
"devices" : [
{
"name" : "Mouse",
"cost" : 150,
},
{
"name" : "Keyboard",
"cost" : 200,
}
],
},
{
"_id" : ObjectId("5479c4793815a1f417d557a0"),
"status" : "done",
"date" : ISODate("2014-10-20T00:00:00.000Z"),
"offset" : 30,
"devices" : [
{
"name" : "LCD",
"cost" : 150,
},
{
"name" : "Keyboard",
"cost" : 200,
}
],
}
,
{
"_id" : ObjectId("5479c4793815a1f417f117a0"),
"status" : "done",
"date" : ISODate("2014-12-29T00:00:00.000Z"),
"offset" : 30,
"devices" : [
{
"name" : "Headphones",
"cost" : 150,
},
{
"name" : "LCD",
"cost" : 200,
}
],
}]
I need group and count something like that:
"result" : [
{
"_id" : {
"status" : "canceled"
},
"count" : 1
},
{
"_id" : {
"status" : "done"
},
"count" : 2
},
totaldevicecost: 730,
],
"ok" : 1
}
My problem in calculating cost sum in subarray "devices". How to do that?
It seems like you got a start on this but you got lost on some of the other concepts. There are some basic truths when working with arrays in documents, but's let's start where you left off:
db.sample.aggregate([
{ "$group": {
"_id": "$status",
"count": { "$sum": 1 }
}}
])
So that is just going to use the $group pipeline to gather up your documents on the different values of the "status" field and then also produce another field for "count" which of course "counts" the occurrences of the grouping key by passing a value of 1 to the $sum operator for each document found. This puts you at a point much like you describe:
{ "_id" : "done", "count" : 2 }
{ "_id" : "canceled", "count" : 1 }
That's the first stage of this and easy enough to understand, but now you need to know how to get values out of an array. You might then be tempted once you understand the "dot notation" concept properly to do something like this:
db.sample.aggregate([
{ "$group": {
"_id": "$status",
"count": { "$sum": 1 },
"total": { "$sum": "$devices.cost" }
}}
])
But what you will find is that the "total" will in fact be 0 for each of those results:
{ "_id" : "done", "count" : 2, "total" : 0 }
{ "_id" : "canceled", "count" : 1, "total" : 0 }
Why? Well MongoDB aggregation operations like this do not actually traverse array elements when grouping. In order to do that, the aggregation framework has a concept called $unwind. The name is relatively self-explanatory. An embedded array in MongoDB is much like having a "one-to-many" association between linked data sources. So what $unwind does is exactly that sort of "join" result, where the resulting "documents" are based on the content of the array and duplicated information for each parent.
So in order to act on array elements you need to use $unwind first. This should logically lead you to code like this:
db.sample.aggregate([
{ "$unwind": "$devices" },
{ "$group": {
"_id": "$status",
"count": { "$sum": 1 },
"total": { "$sum": "$devices.cost" }
}}
])
And then the result:
{ "_id" : "done", "count" : 4, "total" : 700 }
{ "_id" : "canceled", "count" : 2, "total" : 350 }
But that isn't quite right is it? Remember what you just learned from $unwind and how it does a de-normalized join with the parent information? So now that is duplicated for every document since both had two array member. So while the "total" field is correct, the "count" is twice as much as it should be in each case.
A bit more care needs to be taken, so instead of doing this in a single $group stage, it is done in two:
db.sample.aggregate([
{ "$unwind": "$devices" },
{ "$group": {
"_id": "$_id",
"status": { "$first": "$status" },
"total": { "$sum": "$devices.cost" }
}},
{ "$group": {
"_id": "$status",
"count": { "$sum": 1 },
"total": { "$sum": "$total" }
}}
])
Which now gets the result with correct totals in it:
{ "_id" : "canceled", "count" : 1, "total" : 350 }
{ "_id" : "done", "count" : 2, "total" : 700 }
Now the numbers are right, but it is still not exactly what you are asking for. I would think you should stop there as the sort of result you are expecting is really not suited to just a single result from aggregation alone. You are looking for the total to be "inside" the result. It really doesn't belong there, but on small data it is okay:
db.sample.aggregate([
{ "$unwind": "$devices" },
{ "$group": {
"_id": "$_id",
"status": { "$first": "$status" },
"total": { "$sum": "$devices.cost" }
}},
{ "$group": {
"_id": "$status",
"count": { "$sum": 1 },
"total": { "$sum": "$total" }
}},
{ "$group": {
"_id": null,
"data": { "$push": { "count": "$count", "total": "$total" } },
"totalCost": { "$sum": "$total" }
}}
])
And a final result form:
{
"_id" : null,
"data" : [
{
"count" : 1,
"total" : 350
},
{
"count" : 2,
"total" : 700
}
],
"totalCost" : 1050
}
But, "Do Not Do That". MongoDB has a document limit on response of 16MB, which is a limitation of the BSON spec. On small results you can do this kind of convenience wrapping, but in the larger scheme of things you want the results in the earlier form and either a separate query or live with iterating the whole results in order to get the total from all documents.
You do appear to be using a MongoDB version less than 2.6, or copying output from a RoboMongo shell which does not support the latest version features. From MongoDB 2.6 though the results of aggregation can be a "cursor" rather than a single BSON array. So the overall response can be much larger than 16MB, but only when you are not compacting to a single document as results, shown for the last example.
This would be especially true in cases where you were "paging" the results, with 100's to 1000's of result lines but you just wanted a "total" to return in an API response when you are only returning a "page" of 25 results at a time.
Anyhow, that should give you a reasonable guide on how to get the type of results you are expecting from your common document form. Remember $unwind in order to process arrays, and generally $group multiple times in order to get totals at different grouping levels from your document and collection groupings.

How to get mongodb deeply embeded document id

I have the following mongo document, which is part of a bigger document called attributes, which also has Colour and Size
> db.attributes.find({'name': {'en-UK': 'Fabric'}}).pretty()
{
"_id" : ObjectId("543261cda14c971132fa2b91"),
"values" : [
{
"source" : [
{
"_id" : ObjectId("543261cda14c971132fa2b79"),
"name" : {
"en-UK" : "Combed Cotton"
}
},
],
"name" : [
{
"_id" : ObjectId("543261cda14c971132fa2b85"),
"name" : {
"en-UK" : "Brushed 3-ply"
}
},
{
"_id" : ObjectId("543261cda14c971132fa2b8f"),
"name" : {
"en-UK" : "Plain Weave"
}
},
{
"_id" : ObjectId("543261cda14c971132fa2b90"),
"name" : {
"en-UK" : "1x1 Rib"
}
}
]
}
],
"name" : {
"en-UK" : "Fabric"
}
}
I am trying to return the _id for a sub document and have the following:
db.attributes.aggregate([
{ '$match': {'name.en-UK': 'Fabric'} },
{ '$unwind' : '$values' },
{ '$project': { 'name' : '$values.name'} },
{ '$match': { '$and': [{"name.name.en-UK" : "1x1 Rib"} ] }}
])
What is the correct way to do this?
Also, the values of Fabric is an array with two items, source and name, but if I populate it like:
> db.attributes.find({'name': {'en-UK': 'Fabric'}}).pretty()
{
"_id" : ObjectId("543261cda14c971132fa2b91"),
"values" : {
"source" : [{ ... }]
"name": [{ ... }]
}
}
I get the following error
"errmsg" : "exception: $unwind: value at end of field path must be an array"
But if I wrap it inside a square brackets this then works, so that
> db.attributes.find({'name': {'en-UK': 'Fabric'}}).pretty()
{
"_id" : ObjectId("543261cda14c971132fa2b91"),
"values" : [{
"source" : [{ ... }],
"name": [{ ... }]
}]
}
what am I missing as values is an array of two objects, source and name each containing a list of arrays
Any advice much appreciated
What you seem to be "missing" here is that "some" of your documents do either not contain a "value" property at all or at the very least it is "not an array". This is the basic context of the error you have been given.
Fortunately there are a couple of ways to get around this. Namely, either "testing" for the presence of an array when submitting you original query. Or actually "substituting" the missing element for some kind of array when processing the pipeline.
Here are both approaches in what is effectively an redundant form since the first $match condition really sorts this out:
db.attributes.aggregate([
{ "$match": {
"name.en-UK": "Fabric",
"values.0": { "$exists": true }
}},
{ "$project": {
"name": 1,
"values": { "$ifNull": [ "$values", [] ] }
}},
{ "$unwind": "$values" },
{ "$unwind": "$values.name" },
{ "$match": { "values.name.name.en-UK" : "1x1 Rib" }}
])
So as I said. Really redundant in that the initial $match actually asks if an "initial array element" actually exists. Which kind of means that there is an array there.
The second $project phase actually uses the $ifNull operator to "fill in" a value ( or basically an empty array ) where the tested element does not exist. We tested for that anyway before, but this demonstrates the different approaches.
But the basic idea id either "avoiding" or "filling-in" where your document does not have the expected data that you want to process. Which is the cause of your error.

add where condition in aggregate and group function in mongodb

I have mongo model lets say MYLIST containing data like:-
{
"_id" : ObjectId("542139f31284ad1461dbc15f"),
"Category" : "CENTER",
"Name" : "STAND",
"Url" : "center/stand",
"Img" : [ {
"url" : "www.google.com/images",
"main" : "1",
"home" : "1",
"id" : "34faf230-43cf-11e4-8743-311ea2261289"
},
{
"url" : "www.google.com/images1",
"main" : "1",
"home" : "0",
"id" : "34faf230-43cf-11e4-8743-311e66441289"
} ]
}
I execute the following query to the MYLIST collection:
db.MYLIST.aggregate([
{ "$group": {
"_id": "$Category",
"Name": { "$addToSet": {
"name": "$Name",
"url": "$Url",
"img": "$Img"
}}
}},
{ "$sort": { "_id" : 1 } }
]);
And I got the following result -
[
{ _id: 'CENTER',
Name:
[ { "name" : "Stand",
"url" : "center/stand",
"img": { "url" : "www.google.com/images" , "main" : "1", "home" : "1", "id" : "350356a0-43cf-11e4-8743-311ea2261289" }
}]
},
{ _id: 'CENTER',
Name:
[ { "name" : "Stand",
"url" : "center/stand",
"img": { "url" : "www.google.com/images1" , "main" : "1", "home" : "0", "id" : "34faf230-43cf-11e4-8743-311ea2261289" }
}]
}
]
As you can see my img key itself is an array of objects, Hence I am getting multiple entries for the same category of each entry in img array.
What I actually need is to get only those images that have some value for home key.
expected result:-
[
{ _id: 'CENTER',
Name:
[ { "name" : "Stand",
"url" : "center/stand",
"img": { "url" : "www.google.com/images" , "main" : "1", "home" : "1", "id" : "350356a0-43cf-11e4-8743-311ea2261289" }
}]
},
]
Hence I would like to add where the condition for img.home > 0 on the above-mentioned query, Could anybody help me to resolve this issue as my relatively new to MongoDB.
Still really not sure if this is what you want or even why you would be using $addToSet on this grouping. But if all you want to do is "filter" the content of the array returned in your result, then what you want to do is $match the array elements to your condition after processing an $unwind pipeline in order to "de-normalize" the content:
db.MYLIST.aggregate([
// If you only want those matching array members it makes sense to match the
// documents that contain them first
{ "$match": { "Img.home": 1 } },
// Unwind to de-normalize or "un-join" the documents
{ "$unwind": "$Img" },
// Match again to "filter" out those elements that do not match
{ "$match": { "Img.home": 1 } },
// Then do your grouping
{ "$group": {
"_id": "$Category",
"Name": {
"$addToSet": {
"name": "$Name",
"url": "$Url",
"img": "$Img"
}
}
}},
// Finally sort
{ "$sort": { "_id" : 1 } }
]);
So the $match pipeline is the equivalent of a general query or "where clause" in SQL terms, and can be used at any stage. It is usually best to have this as a first stage when there is some type of filtering that results from this. It reduces the overall load by reducing documents to be processed even if "all" of the end results are not removed as would be the case of working with an array.
The $unwind stage allows the array elements to be processed just like another document. And of course you can just use another $match pipeline stage after this in order to just match the documents to your query condition.

Grouping records in nested documents

I have a document like this:
{
"_id" : ObjectId("533e6ab0ef2188940b00002c"),
"uin" : "1396599472869",
"vm" : {
"0" : {
"draw" : "01s",
"count" : "2",
"type" : "",
"data" : {
"title" : "K1"
},
"child" : [
"1407484608965"
]
},
"1407484608965" : {
"data" : {
"title" : "K2",
"draw" : "1407473540857",
"count" : "1",
"type" : "Block"
},
"child" : [
"1407484647012"
]
},
"1407484647012" : {
"data" : {
"title" : "K3",
"draw" : "03.8878.98",
"count" : "1",
"type" : "SB"
},
"child" : [
"1407484762473"
]
},
"1407484762473" : {
"data" : {
"type" : "SB",
"title" : "D1",
"draw" : "7984",
"count" : "1"
},
"child" : []
}
}
}
How to group all records with condition (type="Block")?
I've tried:
db.ITR.aggregate({$match:{"uin":"1396599472869"}},{$project:{"vm":1}},{$group:{_id:null,r1:{$push:"$vm"}}},{$unwind:"$r1"},{$group:{_id:null,r2:{$push:"$r1"}}},{$unwind:"$r2"})
But the result is still in the form of an object and not an array. With "MapReduce" I did not get.
Your problem here is basically with the way you currently have your document structured. The usage of "keys" under "vm" here that actually identify data points does not play well with the standard query forms and the aggregation framework in general.
It also is generally not a very good pattern, as in order to access any part under "vm" you need to specify the "exact path" to the data. So looking for type "Block" requires this:
db.collection.find({
"$or": [
{ "vm.0.type": "Block" },
{ "vm.1407484608965.type": "Block" }
{ ... }
]
})
And so on. You cannot "wildcard" field names like this so the exact path is required.
A better approach to modelling is to use an array instead, and move that inner key inside the documents:
{
"_id" : ObjectId("533e6ab0ef2188940b00002c"),
"uin" : "1396599472869",
"vm" : [
{
"key": 0,
"draw" : "01s",
"count" : "2",
"type" : "",
"data" : {
"title" : "K1"
},
"child" : [
"1407484608965"
]
},
{
"key": "1407484608965",
"title" : "K2",
"draw" : "1407473540857",
"count" : "1",
"type" : "Block",
"child" : [
"1407484647012"
]
},
{
"key": "1407484647012",
"title" : "K3",
"draw" : "03.8878.98",
"count" : "1",
"type" : "SB",
"child" : [
"1407484762473"
]
}
]
}
This allows you to query for documents that contain the matching property by a common path, which greatly simplifies things:
db.collection.find({ "vm.type": "Block" })
Or if you want to "filter" the array contents so that only those "sub-documents" that match are returned you can do this:
db.collection.aggregate([
{ "$match": { "vm.type": "Block" } },
{ "$unwind": "$vm" },
{ "$match": { "vm.type": "Block" } },
{ "$group": {
"_id": "$_id",
"uin": { "$first": "$uin" },
"vm": { "$push": "$vm" }
}}
])
Or even possibly this with MongoDB 2.6 or greater:
db.collection.aggregate([
{ "$match": { "vm.type": "Block" } },
{ "$project": {
"uin": 1,
"vm": {
"$setDifference": [
{ "$map": {
"input": "$vm",
"as": "el",
"in": {"$cond": [
{ "$eq": [ "$$el.type", "Block" ] },
"$$el",
false
]}
}},
[false]
]
}
}}
])
Or any other operation, which is simplified to traverse now the data is structured that way. But as your data presently stands your only option to "traverse keys" is to use JavaScript operations, which is much slower than being able to query in a proper way:
db.collection.find(function() {
return Object.keys(this.vm).some(function(x) {
return this.vm[x].type == "Block"
})
})
Or with similar object processing using mapReduce but essentially with no other way to access the fields with fixed paths that vary all the time.
Perhaps this was a design entered into to avoid having "nested arrays" which is where the "child" element would be placed. Of course this poses a problem with updates. But really if any element should not be an array it is probably the "inner" element such as "child", which could have some kind of structure that does not use an array.
So the key is to look at restructuring, as this will likely suit the patterns that you want without causing performance problems that JavaScript traversal will introduce.