Merge inner array by using common data - mongodb

db.test.aggregate([
{ "$unwind": "$Data" },
{ "$sort" : { "Data.cost" : -1 } },
{ "$group":{
"_id":"$Data.name",
"Data":{ "$push": "$Data" }
}}
])
I fired above query. It is giving me result as follows:
{
"result":[{
"_id" : "abc"
"Data" : [
{
"uid" : "1...A",
"name" : "abc",
"city" : "Paris",
"description" : {
"things" : [
{
"fruit" : {
"fruit_name" : "apple",
"fruit_rate" : "4 USD"
},
"flower" : {
"flower_name" : "rose",
"flower_rate" : "2 USD"
}
}
]
},
"cost" : "6 USD"
},
{
"uid" : "1...B",
"name" : "abc",
"city" : "Paris",
"description" : {
"things" : [
{
"fruit" : {
"fruit_name" : "cherry",
"fruit_rate" : "3 USD"
},
"flower" : {
"flower_name" : "orchid",
"flower_rate" : "2 USD"
}
}
]
},
"cost" : "5 USD"
}
]
}]
}
But I dont want result like this. I want to merge "description" array of both data if "name" is same.
like mention below:
{
"result":[{
"_id" : "abc"
"Data" : [
{
"uid" : "1...A",
"name" : "abc",
"city" : "Paris",
"description" : {
"things" : [
{
"fruit" : {
"fruit_name" : "apple",
"fruit_rate" : "4 USD"
},
"flower" : {
"flower_name" : "rose",
"flower_rate" : "2 USD"
}
}
]
},
"description" : {
"things" : [
{
"fruit" : {
"fruit_name" : "cherry",
"fruit_rate" : "3 USD"
},
"flower" : {
"flower_name" : "orchid",
"flower_rate" : "2 USD"
}
}
]
},
"cost" : "6 USD"
}
]
}]
}
Is it possible to get result like this? what changes I have to do in my query?
Thank you.

The way you have structured your desired result is simply not possible. The reason for this is you are basically breaking the principle behind a Hash Table or dictionary/associative array ( whatever term suits you better ) in that you cannot have more than one key value with the same name.
If you want multiple keys of the same name, then those must be contained within an array, which is very much similar to the sort of structure you have and also within your result. And that result doesn't really do anything other than sort the array elements and then group them back into an array.
So giving you a bit of headroom here for that you have simply done a copy and paste to represent your desired result, and that you actually want some form of merging of the inner elements, you can always do something like this:
db.test.aggregate([
{ "$unwind": "$Data" },
{ "$unwind": "$Data.description.things" },
{ "$group": {
"_id": "$Data.name",
"city": { "$first": "$Data.city" },
"things": { "$addToSet": "$Data.description.things" }
}}
])
Which produces a result:
{
"_id" : "abc",
"city" : "Paris",
"things" : [
{
"fruit" : {
"fruit_name" : "cherry",
"fruit_rate" : "3 USD"
},
"flower" : {
"flower_name" : "orchid",
"flower_rate" : "2 USD"
}
},
{
"fruit" : {
"fruit_name" : "apple",
"fruit_rate" : "4 USD"
},
"flower" : {
"flower_name" : "rose",
"flower_rate" : "2 USD"
}
}
]
}
So that has the inner "things" now "pushed" together into a singular array while grouping on a common element and adding some additional fields.
If you actually want something with even more "merging" and even possibly avoiding removal of duplicate "set" items, then you could further re-shape with a statement like this:
db.test.aggregate([
{ "$unwind": "$Data" },
{ "$unwind": "$Data.description.things" },
{ "$project": {
"name": "$Data.name",
"city": "$Data.city",
"things": "$Data.description.things",
"type": { "$literal": [ "flower", "fruit" ] }
}},
{ "$unwind": "$type" },
{ "$group": {
"_id": "$name",
"city": { "$first": "$city" },
"things": { "$push": { "$cond": [
{ "$eq": [ "$type", "flower" ] },
{
"type": "$type",
"name": "$things.flower.flower_name",
"rate": "$things.flower.flower_rate"
},
{
"type": "$type",
"name": "$things.fruit.fruit_name",
"rate": "$things.fruit.fruit_rate"
},
]}}
}}
])
Which gives a result:
{
"_id" : "abc",
"city" : "Paris",
"things" : [
{
"type" : "flower",
"name" : "rose",
"rate" : "2 USD"
},
{
"type" : "fruit",
"name" : "apple",
"rate" : "4 USD"
},
{
"type" : "flower",
"name" : "orchid",
"rate" : "2 USD"
},
{
"type" : "fruit",
"name" : "cherry",
"rate" : "3 USD"
}
]
}
Which would possibly even indicate how you original data would be better structured in the first place. Certainly you would need to re-shape like this if you wanted to do something like "Find the total value of 'cherries', or 'flowers' or 'fruit'" or whatever the type.
So the way you structured your result, not possible, your breaking the rules as mentioned. In the forms I have presented, well there are a few ways to do that.
P.S: I am deliberately staying away from your $sort representation as though it "sort of" worked for you in your initial example, do not expect this to work in wider examples as your value is a string and not a number. In short this means that "10 USD" is actually less than "4 USD" as that is how strings are lexically compared. i.e: 4 is greater than 1, which is the order in which the comparison is done.
So change these by splitting up your fields and using a numerical type, as in:
{
"type" : "fruit",
"name" : "cherry",
"rate" : 3,
"currency": "USD"
}
And you even get to filter on "currency" if that is required.
P.P.S: the $literal operator is a construct available for MongoDB 2.6 and upwards. In prior versions where that operator is not available, you can instead code that like this:
"type": { "$cond": [ 1, [ "flower", "fruit" ], 0 ] }
Which obscurely does that same thing as the returned true value from $cond (or even the false value ) is "literally" declared, so what you put there will actually be produced. In this case, it is a way of adding an "array" to the projection, which is wanted in order to match the "types".
You might find references on the net that use $const for this purpose, but I don't particularly trust that as, while it does exist, it was not intended for this purpose and is hence not officially documented.

Related

Mongodb how to reduce the array within the matching key and calculate avg

{
"_id" : {
"state" : "NY",
"st" : "value"
},
"List" : [
{
"id" : "21",
"score" : 18.75,
"name" : "PU"
},
{
"id" : "21",
"score" : 25.0,
"name" : "PU"
},
{
"id" : "23",
"score" : 25.0,
"name" : "CL"
},
{
"id" : "23",
"score" : 56.25,
"name" : "CL"
}
]
}
Desired result:
Match the key with id within the array and calculate avg of score.
{
"_id" : {
"state" : "New York",
"st" : "value"
},
"List" : [
{
"id" : "21",
"score" : 21.875,
"name" : "PU"
},
{
"id" : "23",
"score" : 40.625,
"name" : "CL"
}
]
}
Thank you in advance.
Query
(returns the expected result)
unwind List
group with including the id, and find avg
fix the structure to be similar with the document you want
group back to restore the document structure (reverse the unwind)
if 2 sames ids have different name(if possible to happen)
query will make them seperated members in the array.
(alternativly it could make them same member and pack the names in an array, but that would produce different schema from the one you expect to see)
Test code here
db.collection.aggregate([
{
"$unwind": {
"path": "$List"
}
},
{
"$group": {
"_id": {
"state": "$_id.state",
"st": "$_id.st",
"id": "$List.id",
"name": "$List.name"
},
"avg": {
"$avg": "$List.score"
}
}
},
{
"$project": {
"_id": {
"state": "$_id.state",
"st": "$_id.st"
},
"List": {
"name": "$_id.name",
"id": "$_id.id",
"avg": "$avg"
}
}
},
{
"$group": {
"_id": "$_id",
"List": {
"$push": "$List"
}
}
}
])

Aggregate group multiple fields

Given the following dataset:
{ "_id" : 1, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 25, "Q3" : 0, "Q4" : 0 }
{ "_id" : 2, "city" : "Reno", "cat": "roads", "Q1" : 30, "Q2" : 0, "Q3" : 0, "Q4" : 60 }
{ "_id" : 3, "city" : "Yuma", "cat": "parks", "Q1" : 0, "Q2" : 0, "Q3" : 45, "Q4" : 0 }
{ "_id" : 4, "city" : "Reno", "cat": "parks", "Q1" : 35, "Q2" : 0, "Q3" : 0, "Q4" : 0 }
{ "_id" : 5, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 15, "Q3" : 0, "Q4" : 20 }
I'm trying to achieve the following result. It would be great to just return the totals greater than zero, and also compress each city, cat and Qx total to a single record.
{
"city" : "Yuma",
"cat" : "roads",
"Q2total" : 40
},
{
"city" : "Reno",
"cat" : "roads",
"Q1total" : 30
},
{
"city" : "Reno",
"cat" : "roads",
"Q4total" : 60
},
{
"city" : "Yuma",
"cat" : "parks",
"Q3total" : 45
},
{
"city" : "Reno",
"cat" : "parks",
"Q1total" : 35
},
{
"city" : "Yuma",
"cat" : "roads",
"Q4total" : 20
}
Possible?
We could ask, to what end? Your documents already have a nice consistent Object structure which is recommended. Having objects with varying keys is not a great idea. Data is "data" and should not really be the name of the keys.
With that in mind, the aggregation framework actually follows this sense and does not allow for the generation of arbitrary key names from data contained in the document. But you could get a similar result with the output as data points:
db.junk.aggregate([
// Aggregate first to reduce the pipeline documents somewhat
{ "$group": {
"_id": {
"city": "$city",
"cat": "$cat"
},
"Q1": { "$sum": "$Q1" },
"Q2": { "$sum": "$Q2" },
"Q3": { "$sum": "$Q3" },
"Q4": { "$sum": "$Q4" }
}},
// Convert the "quarter" elements to array entries with the same keys
{ "$project": {
"totals": {
"$map": {
"input": { "$literal": [ "Q1", "Q2", "Q3", "Q4" ] },
"as": "el",
"in": { "$cond": [
{ "$eq": [ "$$el", "Q1" ] },
{ "quarter": "$$el", "total": "$Q1" },
{ "$cond": [
{ "$eq": [ "$$el", "Q2" ] },
{ "quarter": "$$el", "total": "$Q2" },
{ "$cond": [
{ "$eq": [ "$$el", "Q3" ] },
{ "quarter": "$$el", "total": "$Q3" },
{ "quarter": "$$el", "total": "$Q4" }
]}
]}
]}
}
}
}},
// Unwind the array produced
{ "$unwind": "$totals" },
// Filter any "0" resutls
{ "$match": { "totals.total": { "$ne": 0 } } },
// Maybe project a prettier "flatter" output
{ "$project": {
"_id": 0,
"city": "$_id.city",
"cat": "$_id.cat",
"quarter": "$totals.quarter",
"total": "$totals.total"
}}
])
Which gives you results like this:
{ "city" : "Reno", "cat" : "parks", "quarter" : "Q1", "total" : 35 }
{ "city" : "Yuma", "cat" : "parks", "quarter" : "Q3", "total" : 45 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q1", "total" : 30 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q4", "total" : 60 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q2", "total" : 40 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q4", "total" : 20 }
You could alternately use mapReduce which allows "some" flexibility with key names. The catch is though that your aggregation is still by "quarter", so you need that as part of the primary key, which cannot be changed once emitted.
Additionally, you cannot "filter" any aggregated results of "0" without a second pass after outputting to a collection, so it's not really of much use for what you want to do, unless you can live with a second mapReduce operation of "transform" query on the output collection.
Worth note is if you look at what is being done in the "second" pipeline stage here with $project and $map you will see that the document structure is essentially being altered to sometime like what you could alternately structure your documents like originally, like this:
{
"city" : "Reno",
"cat" : "parks"
"totals" : [
{ "quarter" : "Q1", "total" : 35 },
{ "quarter" : "Q2", "total" : 0 },
{ "quarter" : "Q3", "total" : 0 },
{ "quarter" : "Q4", "total" : 0 }
]
},
{
"city" : "Yuma",
"cat" : "parks"
"totals" : [
{ "quarter" : "Q1", "total" : 0 },
{ "quarter" : "Q2", "total" : 0 },
{ "quarter" : "Q3", "total" : 45 },
{ "quarter" : "Q4", "total" : 0 }
]
}
Then the aggregation operation becomes simple for your documents to the same results as shown above:
db.collection.aggregate([
{ "$unwind": "$totals" },
{ "$group": {
"_id": {
"city": "$city",
"cat": "$cat",
"quarter": "$totals.quarter"
},
"ttotal": { "$sum": "$totals.total" }
}},
{ "$match": { "ttotal": { "$ne": 0 } },
{ "$project": {
"_id": 0,
"city": "$_id.city",
"cat": "$_id.cat",
"quarter": "$_id.quarter",
"total": "$ttotal"
}}
])
So it might make more sense to consider structuring your documents in that way to begin with and avoid any overhead required by the document transformation.
I think you'll find that consistent key names makes a far better object model to program to, where you should be reading the data point from the key-value and not the key-name. If you really need to, then it's a simple matter of reading the data from the object and transforming the keys of each already aggregated result in post processing.

Grouping records in nested documents

I have a document like this:
{
"_id" : ObjectId("533e6ab0ef2188940b00002c"),
"uin" : "1396599472869",
"vm" : {
"0" : {
"draw" : "01s",
"count" : "2",
"type" : "",
"data" : {
"title" : "K1"
},
"child" : [
"1407484608965"
]
},
"1407484608965" : {
"data" : {
"title" : "K2",
"draw" : "1407473540857",
"count" : "1",
"type" : "Block"
},
"child" : [
"1407484647012"
]
},
"1407484647012" : {
"data" : {
"title" : "K3",
"draw" : "03.8878.98",
"count" : "1",
"type" : "SB"
},
"child" : [
"1407484762473"
]
},
"1407484762473" : {
"data" : {
"type" : "SB",
"title" : "D1",
"draw" : "7984",
"count" : "1"
},
"child" : []
}
}
}
How to group all records with condition (type="Block")?
I've tried:
db.ITR.aggregate({$match:{"uin":"1396599472869"}},{$project:{"vm":1}},{$group:{_id:null,r1:{$push:"$vm"}}},{$unwind:"$r1"},{$group:{_id:null,r2:{$push:"$r1"}}},{$unwind:"$r2"})
But the result is still in the form of an object and not an array. With "MapReduce" I did not get.
Your problem here is basically with the way you currently have your document structured. The usage of "keys" under "vm" here that actually identify data points does not play well with the standard query forms and the aggregation framework in general.
It also is generally not a very good pattern, as in order to access any part under "vm" you need to specify the "exact path" to the data. So looking for type "Block" requires this:
db.collection.find({
"$or": [
{ "vm.0.type": "Block" },
{ "vm.1407484608965.type": "Block" }
{ ... }
]
})
And so on. You cannot "wildcard" field names like this so the exact path is required.
A better approach to modelling is to use an array instead, and move that inner key inside the documents:
{
"_id" : ObjectId("533e6ab0ef2188940b00002c"),
"uin" : "1396599472869",
"vm" : [
{
"key": 0,
"draw" : "01s",
"count" : "2",
"type" : "",
"data" : {
"title" : "K1"
},
"child" : [
"1407484608965"
]
},
{
"key": "1407484608965",
"title" : "K2",
"draw" : "1407473540857",
"count" : "1",
"type" : "Block",
"child" : [
"1407484647012"
]
},
{
"key": "1407484647012",
"title" : "K3",
"draw" : "03.8878.98",
"count" : "1",
"type" : "SB",
"child" : [
"1407484762473"
]
}
]
}
This allows you to query for documents that contain the matching property by a common path, which greatly simplifies things:
db.collection.find({ "vm.type": "Block" })
Or if you want to "filter" the array contents so that only those "sub-documents" that match are returned you can do this:
db.collection.aggregate([
{ "$match": { "vm.type": "Block" } },
{ "$unwind": "$vm" },
{ "$match": { "vm.type": "Block" } },
{ "$group": {
"_id": "$_id",
"uin": { "$first": "$uin" },
"vm": { "$push": "$vm" }
}}
])
Or even possibly this with MongoDB 2.6 or greater:
db.collection.aggregate([
{ "$match": { "vm.type": "Block" } },
{ "$project": {
"uin": 1,
"vm": {
"$setDifference": [
{ "$map": {
"input": "$vm",
"as": "el",
"in": {"$cond": [
{ "$eq": [ "$$el.type", "Block" ] },
"$$el",
false
]}
}},
[false]
]
}
}}
])
Or any other operation, which is simplified to traverse now the data is structured that way. But as your data presently stands your only option to "traverse keys" is to use JavaScript operations, which is much slower than being able to query in a proper way:
db.collection.find(function() {
return Object.keys(this.vm).some(function(x) {
return this.vm[x].type == "Block"
})
})
Or with similar object processing using mapReduce but essentially with no other way to access the fields with fixed paths that vary all the time.
Perhaps this was a design entered into to avoid having "nested arrays" which is where the "child" element would be placed. Of course this poses a problem with updates. But really if any element should not be an array it is probably the "inner" element such as "child", which could have some kind of structure that does not use an array.
So the key is to look at restructuring, as this will likely suit the patterns that you want without causing performance problems that JavaScript traversal will introduce.

mongodb multiple aggregations in single operation

I have an item collection with following documents.
{ "item" : "i1", "category" : "c1", "brand" : "b1" }
{ "item" : "i2", "category" : "c2", "brand" : "b1" }
{ "item" : "i3", "category" : "c1", "brand" : "b2" }
{ "item" : "i4", "category" : "c2", "brand" : "b1" }
{ "item" : "i5", "category" : "c1", "brand" : "b2" }
I want to separate aggregation results --> count by category, count by brand. Please note, it is not count by (category,brand)
I am able to do this using map-reduce using following code.
map = function(){
emit({type:"category",category:this.category},1);
emit({type:"brand",brand:this.brand},1);
}
reduce = function(key, values){
return Array.sum(values)
}
db.item.mapReduce(map,reduce,{out:{inline:1}})
And the result is
{
"results" : [
{
"_id" : {
"type" : "brand",
"brand" : "b1"
},
"value" : 3
},
{
"_id" : {
"type" : "brand",
"brand" : "b2"
},
"value" : 2
},
{
"_id" : {
"type" : "category",
"category" : "c1"
},
"value" : 3
},
{
"_id" : {
"type" : "category",
"category" : "c2"
},
"value" : 2
}
],
"timeMillis" : 21,
"counts" : {
"input" : 5,
"emit" : 10,
"reduce" : 4,
"output" : 4
},
"ok" : 1,
}
I can get same results by firing two different aggregation commands as below.
db.item.aggregate({$group:{_id:"$category",count:{$sum:1}}})
db.item.aggregate({$group:{_id:"$brand",count:{$sum:1}}})
Is there anyway I can do the same using aggregation framework by single aggregation command.
I have simplified my case here, but in actual I need this grouping from fields in array of subdocuments. Assume the above is structure after I do unwind.
It is a real-time query (someone waiting for response), though on smaller dataset, so execution time is important.
I am using MongoDB 2.4.
Starting in Mongo 3.4, the $facet aggregation stage greatly simplifies this type of use case by processing multiple aggregation pipelines within a single stage on the same set of input documents:
// { "item" : "i1", "category" : "c1", "brand" : "b1" }
// { "item" : "i2", "category" : "c2", "brand" : "b1" }
// { "item" : "i3", "category" : "c1", "brand" : "b2" }
// { "item" : "i4", "category" : "c2", "brand" : "b1" }
// { "item" : "i5", "category" : "c1", "brand" : "b2" }
db.collection.aggregate(
{ $facet: {
categories: [{ $group: { _id: "$category", count: { "$sum": 1 } } }],
brands: [{ $group: { _id: "$brand", count: { "$sum": 1 } } }]
}}
)
// {
// "categories" : [
// { "_id" : "c1", "count" : 3 },
// { "_id" : "c2", "count" : 2 }
// ],
// "brands" : [
// { "_id" : "b1", "count" : 3 },
// { "_id" : "b2", "count" : 2 }
// ]
// }
Over a large data set I would say that your current mapReduce approach would be the best one, because the aggregation technique for this would not work well with large data. But possibly over a reasonably small size it might just be what you need:
db.items.aggregate([
{ "$group": {
"_id": null,
"categories": { "$push": "$category" },
"brands": { "$push": "$brand" }
}},
{ "$project": {
"_id": {
"categories": "$categories",
"brands": "$brands"
},
"categories": 1
}},
{ "$unwind": "$categories" },
{ "$group": {
"_id": {
"brands": "$_id.brands",
"category": "$categories"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.brands",
"categories": { "$push": {
"category": "$_id.category",
"count": "$count"
}},
}},
{ "$project": {
"_id": "$categories",
"brands": "$_id"
}},
{ "$unwind": "$brands" },
{ "$group": {
"_id": {
"categories": "$_id",
"brand": "$brands"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"categories": { "$first": "$_id.categories" },
"brands": { "$push": {
"brand": "$_id.brand",
"count": "$count"
}}
}}
])
Not really the same as the mapReduce output, you could throw in some more stages to change the output format, but this should be usable:
{
"_id" : null,
"categories" : [
{
"category" : "c2",
"count" : 2
},
{
"category" : "c1",
"count" : 3
}
],
"brands" : [
{
"brand" : "b2",
"count" : 2
},
{
"brand" : "b1",
"count" : 3
}
]
}
As you can see, this involves a fair bit of shuffling between arrays in order to group each set of either "category" or "brand" within the same pipeline process. Again I will say, this will not do well for large data, but for something like "items in an order" it would probably do nicely.
Of course as you say, you have simplified somewhat, so the first grouping key on null is either going to be something else or either narrowed down to do that null case by an earlier $match stage, which is probably what you want to do.

In MongoDB how can I sort documents based on a property in an embedded object?

In my products collection, I can find all products that have been released in the region 'GB':
> db.products.find({'release.region':'GB'}).pretty();
{
"_id" : "foo",
"release" : [
{
"region" : "GB",
"date" : ISODate("2012-03-01T00:00:00Z")
},
{
"region" : "US",
"date" : ISODate("2012-09-01T00:00:00Z")
}
]
}
{
"_id" : "bar",
"release" : [
{
"region" : "FR",
"date" : ISODate("2010-07-01T00:00:00Z")
},
{
"region" : "GB",
"date" : ISODate("2012-05-01T00:00:00Z")
}
]
}
{
"_id" : "baz",
"release" : [
{
"region" : "GB",
"date" : ISODate("2011-05-01T00:00:00Z")
},
{
"region" : "NZ",
"date" : ISODate("2012-02-01T00:00:00Z")
}
]
}
How can I sort the results in ascending date order, using the GB release date? (e.g. the order should be baz, foo, bar)
Note, I cannot do the sorting on the client side.
Alternatively, how can I better organise the data to make this possible.
Edit: I changed the FR release date for 'bar' to illustrate that vivek's solution is not correct.
Because you don't need the release elements besides the ones from the "GB" region, you can do it with aggregate like this:
db.products.aggregate(
// Filter the docs to just those containing the 'GB' region
{ $match: {'release.region': 'GB'}},
// Duplicate the docs, one per release element
{ $unwind: '$release'},
// Filter the resulting docs to just include the ones from the 'GB' region
{ $match: {'release.region': 'GB'}},
// Sort by release date
{ $sort: {'release.date': 1}})
output:
{
"result": [
{
"_id": "baz",
"release": {
"region": "GB",
"date": ISODate("20110501T00:00:00Z")
}
},
{
"_id": "foo",
"release": {
"region": "GB",
"date": ISODate("20120301T00:00:00Z")
}
},
{
"_id": "bar",
"release": {
"region": "GB",
"date": ISODate("20120501T00:00:00Z")
}
}
],
"ok": 1
}