Aggregate group multiple fields - mongodb

Given the following dataset:
{ "_id" : 1, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 25, "Q3" : 0, "Q4" : 0 }
{ "_id" : 2, "city" : "Reno", "cat": "roads", "Q1" : 30, "Q2" : 0, "Q3" : 0, "Q4" : 60 }
{ "_id" : 3, "city" : "Yuma", "cat": "parks", "Q1" : 0, "Q2" : 0, "Q3" : 45, "Q4" : 0 }
{ "_id" : 4, "city" : "Reno", "cat": "parks", "Q1" : 35, "Q2" : 0, "Q3" : 0, "Q4" : 0 }
{ "_id" : 5, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 15, "Q3" : 0, "Q4" : 20 }
I'm trying to achieve the following result. It would be great to just return the totals greater than zero, and also compress each city, cat and Qx total to a single record.
{
"city" : "Yuma",
"cat" : "roads",
"Q2total" : 40
},
{
"city" : "Reno",
"cat" : "roads",
"Q1total" : 30
},
{
"city" : "Reno",
"cat" : "roads",
"Q4total" : 60
},
{
"city" : "Yuma",
"cat" : "parks",
"Q3total" : 45
},
{
"city" : "Reno",
"cat" : "parks",
"Q1total" : 35
},
{
"city" : "Yuma",
"cat" : "roads",
"Q4total" : 20
}
Possible?

We could ask, to what end? Your documents already have a nice consistent Object structure which is recommended. Having objects with varying keys is not a great idea. Data is "data" and should not really be the name of the keys.
With that in mind, the aggregation framework actually follows this sense and does not allow for the generation of arbitrary key names from data contained in the document. But you could get a similar result with the output as data points:
db.junk.aggregate([
// Aggregate first to reduce the pipeline documents somewhat
{ "$group": {
"_id": {
"city": "$city",
"cat": "$cat"
},
"Q1": { "$sum": "$Q1" },
"Q2": { "$sum": "$Q2" },
"Q3": { "$sum": "$Q3" },
"Q4": { "$sum": "$Q4" }
}},
// Convert the "quarter" elements to array entries with the same keys
{ "$project": {
"totals": {
"$map": {
"input": { "$literal": [ "Q1", "Q2", "Q3", "Q4" ] },
"as": "el",
"in": { "$cond": [
{ "$eq": [ "$$el", "Q1" ] },
{ "quarter": "$$el", "total": "$Q1" },
{ "$cond": [
{ "$eq": [ "$$el", "Q2" ] },
{ "quarter": "$$el", "total": "$Q2" },
{ "$cond": [
{ "$eq": [ "$$el", "Q3" ] },
{ "quarter": "$$el", "total": "$Q3" },
{ "quarter": "$$el", "total": "$Q4" }
]}
]}
]}
}
}
}},
// Unwind the array produced
{ "$unwind": "$totals" },
// Filter any "0" resutls
{ "$match": { "totals.total": { "$ne": 0 } } },
// Maybe project a prettier "flatter" output
{ "$project": {
"_id": 0,
"city": "$_id.city",
"cat": "$_id.cat",
"quarter": "$totals.quarter",
"total": "$totals.total"
}}
])
Which gives you results like this:
{ "city" : "Reno", "cat" : "parks", "quarter" : "Q1", "total" : 35 }
{ "city" : "Yuma", "cat" : "parks", "quarter" : "Q3", "total" : 45 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q1", "total" : 30 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q4", "total" : 60 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q2", "total" : 40 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q4", "total" : 20 }
You could alternately use mapReduce which allows "some" flexibility with key names. The catch is though that your aggregation is still by "quarter", so you need that as part of the primary key, which cannot be changed once emitted.
Additionally, you cannot "filter" any aggregated results of "0" without a second pass after outputting to a collection, so it's not really of much use for what you want to do, unless you can live with a second mapReduce operation of "transform" query on the output collection.
Worth note is if you look at what is being done in the "second" pipeline stage here with $project and $map you will see that the document structure is essentially being altered to sometime like what you could alternately structure your documents like originally, like this:
{
"city" : "Reno",
"cat" : "parks"
"totals" : [
{ "quarter" : "Q1", "total" : 35 },
{ "quarter" : "Q2", "total" : 0 },
{ "quarter" : "Q3", "total" : 0 },
{ "quarter" : "Q4", "total" : 0 }
]
},
{
"city" : "Yuma",
"cat" : "parks"
"totals" : [
{ "quarter" : "Q1", "total" : 0 },
{ "quarter" : "Q2", "total" : 0 },
{ "quarter" : "Q3", "total" : 45 },
{ "quarter" : "Q4", "total" : 0 }
]
}
Then the aggregation operation becomes simple for your documents to the same results as shown above:
db.collection.aggregate([
{ "$unwind": "$totals" },
{ "$group": {
"_id": {
"city": "$city",
"cat": "$cat",
"quarter": "$totals.quarter"
},
"ttotal": { "$sum": "$totals.total" }
}},
{ "$match": { "ttotal": { "$ne": 0 } },
{ "$project": {
"_id": 0,
"city": "$_id.city",
"cat": "$_id.cat",
"quarter": "$_id.quarter",
"total": "$ttotal"
}}
])
So it might make more sense to consider structuring your documents in that way to begin with and avoid any overhead required by the document transformation.
I think you'll find that consistent key names makes a far better object model to program to, where you should be reading the data point from the key-value and not the key-name. If you really need to, then it's a simple matter of reading the data from the object and transforming the keys of each already aggregated result in post processing.

Related

match element in the array with aggregation

i have mongo db collection the follwing structure
{
{
"_id" : ObjectId("63e37afe7a3453d5014c011b"),
"schemaVersion" : NumberInt(1),
"Id" : "ObjectId("63e37afe7a3453d5014c0112")",
"Id1" : "ObjectId("63e37afe7a3453d5014c0113")",
"Id2" : "ObjectId("63e37afe7a3453d5014c0114")",
"collectionName" : "Country",
"List" : [
{
"countryId" : NumberInt(1),
"name" : "Afghanistan",
},{
"countryId" : NumberInt(1),
"name" : "India",
},
{
"countryId" : NumberInt(1),
"name" : "USA",
}
}
i need to match the value with id, id1, id2, collectionName and name in the list to get country id for example if match the below value
"Id" : "ObjectId("63e37afe7a3453d5014c0112")",
"Id1" : "ObjectId("63e37afe7a3453d5014c0113")",
"Id2" : "ObjectId("63e37afe7a3453d5014c0114")",
"collectionName" : "Country",
"name" : "Afghanistan",
i need result
{
"countryId" : 1,
"name" : "Afghanistan",
}
i tried like below
db.country_admin.aggregate([
{ $match: { collectionName: "Country" } },
{ $unwind : '$countryList' },
{ $project : { _id : 0, 'countryList.name' : 1, 'countryList.countryId' : 1 } }
]).pretty()
and i have following output
[
{
"List" : {
"countryId" : 1.0,
"name" : "Afghanistan"
}
},
{
"List" : {
"countryId" : 2.0,
"name" : "india"
}
},
{
"List" : {
"countryId" : 3.0,
"name" : "USA"
}
}]```
You can try using $filter to avoid $unwind like this example:
First $match by your desired condition(s).
Then $filter and get the first element (as "List.name": "Afghanistan" is used into $match stage there will be at least one result).
And output only values you want using $project.
db.collection.aggregate([
{
"$match": {
"Id": ObjectId("63e37afe7a3453d5014c0112"),
"Id1": ObjectId("63e37afe7a3453d5014c0113"),
"Id2": ObjectId("63e37afe7a3453d5014c0114"),
"collectionName": "Country",
"List.name": "Afghanistan",
}
},
{
"$project": {
"country": {
"$arrayElemAt": [
{
"$filter": {
"input": "$List",
"cond": {
"$eq": [
"$$this.name",
"Afghanistan"
]
}
}
},
0
]
}
}
},
{
"$project": {
"_id": 0,
"countryId": "$country.countryId",
"name": "$country.name"
}
}
])
Example here
By the way, using $unwind is also possible and you can check this example

Mongodb how to reduce the array within the matching key and calculate avg

{
"_id" : {
"state" : "NY",
"st" : "value"
},
"List" : [
{
"id" : "21",
"score" : 18.75,
"name" : "PU"
},
{
"id" : "21",
"score" : 25.0,
"name" : "PU"
},
{
"id" : "23",
"score" : 25.0,
"name" : "CL"
},
{
"id" : "23",
"score" : 56.25,
"name" : "CL"
}
]
}
Desired result:
Match the key with id within the array and calculate avg of score.
{
"_id" : {
"state" : "New York",
"st" : "value"
},
"List" : [
{
"id" : "21",
"score" : 21.875,
"name" : "PU"
},
{
"id" : "23",
"score" : 40.625,
"name" : "CL"
}
]
}
Thank you in advance.
Query
(returns the expected result)
unwind List
group with including the id, and find avg
fix the structure to be similar with the document you want
group back to restore the document structure (reverse the unwind)
if 2 sames ids have different name(if possible to happen)
query will make them seperated members in the array.
(alternativly it could make them same member and pack the names in an array, but that would produce different schema from the one you expect to see)
Test code here
db.collection.aggregate([
{
"$unwind": {
"path": "$List"
}
},
{
"$group": {
"_id": {
"state": "$_id.state",
"st": "$_id.st",
"id": "$List.id",
"name": "$List.name"
},
"avg": {
"$avg": "$List.score"
}
}
},
{
"$project": {
"_id": {
"state": "$_id.state",
"st": "$_id.st"
},
"List": {
"name": "$_id.name",
"id": "$_id.id",
"avg": "$avg"
}
}
},
{
"$group": {
"_id": "$_id",
"List": {
"$push": "$List"
}
}
}
])

How do I create nested aggregations with count on MongoDB?

I am learning MongoDB in order to see if it matches our needs.
Currently we use heavily aggregations, so I am testing the flexibility of the Aggregation Framework.
I started with this hierarchy
db.companytest3.insert({"name":"A", age:7})
db.companytest3.insert({"name":"B", age:17, owner:"A"})
db.companytest3.insert({"name":"C", age:12, owner:"A"})
db.companytest3.insert({"name":"D", age:7, owner:"B"})
db.companytest3.insert({"name":"E", age:13, owner:"B"})
db.companytest3.insert({"name":"F", age:23, owner:"C"})
So I have:
db.companytest3.find()
{ "_id" : ObjectId("5457c2c0fa82c305e0b80006"), "name" : "A", "age" : 7 }
{ "_id" : ObjectId("5457c2cafa82c305e0b80007"), "name" : "A", "age" : 7 }
{ "_id" : ObjectId("5457c2d0fa82c305e0b80008"), "name" : "B", "age" : 17, "owner" : "A" }
{ "_id" : ObjectId("5457c2d6fa82c305e0b80009"), "name" : "C", "age" : 12, "owner" : "A" }
{ "_id" : ObjectId("5457c2ddfa82c305e0b8000a"), "name" : "D", "age" : 7, "owner" : "B" }
{ "_id" : ObjectId("5457c2e4fa82c305e0b8000b"), "name" : "E", "age" : 13, "owner" : "B" }
{ "_id" : ObjectId("5457c2eafa82c305e0b8000c"), "name" : "F", "age" : 23, "owner" : "C" }
My goal is to aggregate the children using their ages, so I have something like this:
{
"_id" : null,
"children" : [
{
"range:" : "lower than 10",
total: 1,
names: ["A"]
}
{
"range:" : "higher than 10",
total: 0,
names: []
}
],
"total" : 1
}
{
"_id" : "A",
"children" : [
{
"range:" : "lower than 10",
total: 0,
names: []
}
{
"range:" : "higher than 10",
total: 2,
names: ["C","B"]
}
],
"total" : 1
}
{
"_id" : "B",
"children" : [
{
"range:" : "lower than 10",
total: 1,
names: ["D"]
}
{
"range:" : "higher than 10",
total: 13,
names: ["E"]
}
],
"total" : 1
}
{
"_id" : "C",
"children" : [
{
"range:" : "lower than 10",
total: 0,
names: []
}
{
"range:" : "higher than 10",
total: 1,
names: ["F"]
}
],
"total" : 1
}
I feel I am getting near, I've got this query:
db.companytest3.aggregate(
{ $project: {
"_id": 0,
"range": {
$concat: [{
$cond: [ { $lte: ["$age", 10] }, "até 10", "" ]
}, {
$cond: [ { $gte: ["$age", 11] }, "mais de 10", "" ]
}]
},
"owner": "$owner",
"name" : "$name"
}
},
{
$group: {
_id: { owner: "$owner", range: "$range" },
children: { $addToSet: { name: "$name", range: "$range"} } ,
total: { $sum: 1}
}
},
{
$group: {
_id: { owner:"$_id.owner" },
children: { $addToSet: "$children" }
}
}
)
which gives me the following output:
{ "_id" : { "owner" : null }, "children" : [ [ { "name" : "A", "range" : "até 10" } ] ] }
{ "_id" : { "owner" : "A" }, "children" : [ [ { "name" : "C", "range" : "mais de 10" }, { "name" : "B", "range" : "mais de 10" } ] ] }
{ "_id" : { "owner" : "B" }, "children" : [ [ { "name" : "D", "range" : "até 10" } ], [ { "name" : "E", "range" : "mais de 10" } ] ] }
{ "_id" : { "owner" : "C" }, "children" : [ [ { "name" : "F", "range" : "mais de 10" } ] ] }
Now I am having issues to group the items by owner and keep sum the total, I am stuck and I do not know how to proceed. I've been trying many diferent alternatives using groups variations but I do not feel they are worth posting here.
How can I change my current query so I group the children by range and add the count?
thanks! :D
It should be possible in earlier versions, but even basically looking at how you want to manipulate the result, the simplest way I can see is with the help of some operators introduced in MongoDB 2.6.
db.companytest3.aggregate([
{ "$group": {
"_id": "$owner",
"lowerThanTenNames": {
"$addToSet": {
"$cond": [
{ "$lte": [ "$age", 10 ] },
"$name",
false
]
}
},
"lowerThanTenTotal": {
"$sum": {
"$cond": [
{ "$lte": [ "$age", 10 ] },
1,
0
]
}
},
"moreThanTenNames": {
"$addToSet": {
"$cond": [
{ "$gte": [ "$age", 11 ] },
"$name",
false
]
}
},
"moreThanTenTotal": {
"$sum": {
"$cond": [
{ "$gte": [ "$age", 11 ] },
1,
0
]
}
}
}},
{ "$project": {
"children": {
"$map": {
"input": { "$literal": ["L", "M"] },
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "L" ] },
{
"range": { "$literal": "lower than 10" },
"total": "$lowerThanTenTotal",
"names": {
"$setDifference": [
"$lowerThanTenNames",
[false]
]
}
},
{
"range": { "$literal": "higher than 10" },
"total": "$moreThanTenTotal",
"names": {
"$setDifference": [
"$moreThanTenNames",
[false]
]
}
}
]
}
}
},
"total": { "$add": [ "$lowerThanTenTotal", "$moreThanTenTotal" ]},
}},
{ "$sort": { "_id": 1 } }
])
Basically you want to separate these out into two sets of results for each grouping, being one for each age range. Due to the use of conditional operators, the "names" sets then need to be filtered for any false values where the conditions did not match.
The other thing that needs to be done is to coerce these results from separate fields into an array. The $map operator makes this simple by just providing a two element template with effectively "A/B" choices to do the re-mapping.
Since we had discrete fields here before they were re-mapped onto an array, you can just supply each "total" field as an argument to $add in order to get the combined total.
Produces exactly this:
{
"_id" : null,
"children" : [
{
"range" : "lower than 10",
"total" : 1,
"names" : ["A"]
},
{
"range" : "higher than 10",
"total" : 0,
"names" : [ ]
}
],
"total" : 1
}
{
"_id" : "A",
"children" : [
{
"range" : "lower than 10",
"total" : 0,
"names" : [ ]
},
{
"range" : "higher than 10",
"total" : 2,
"names" : ["C","B"]
}
],
"total" : 2
}
{
"_id" : "B",
"children" : [
{
"range" : "lower than 10",
"total" : 1,
"names" : ["D"]
},
{
"range" : "higher than 10",
"total" : 1,
"names" : ["E"]
}
],
"total" : 2
}
{
"_id" : "C",
"children" : [
{
"range" : "lower than 10",
"total" : 0,
"names" : [ ]
},
{
"range" : "higher than 10",
"total" : 1,
"names" : ["F"]
}
],
"total" : 1
}

Aggregating and comparing scores of teams using mongodb

I'm just starting out with mongodb and have been reading through the documentation on aggregation but still struggling to relate equivalent knowledge of sql statements to the methods used in mongo.
I have this data:
{
"_id" : ObjectId("53ac7bce4eaf6de4d5601c19"),
"uid" : ObjectId("53ac7bb84eaf6de4d5601c15"),
"mid" : ObjectId("53ab27504eaf6de4d5601be4"),
"score" : 1
},{
"_id" : ObjectId("53ac7bce4eaf6de4d5601c1a"),
"uid" : ObjectId("53ac7bb84eaf6de4d5601c16"),
"mid" : ObjectId("53ab27504eaf6de4d5601be4"),
"score" : 5
}
...
And I'm trying to get to this result:
{
"uid" : ObjectId("53ac7bb84eaf6de4d5601c15"),
"uid_2" : ObjectId("53ac7bb84eaf6de4d5601c16"),
"mid" : ObjectId("53ab27504eaf6de4d5601be4"),
"score" : 1,
"score_2" : 5,
"difference" : 4
}
...
Where I am comparing every uid against every other uid around a single mid and calculating the difference in their scores (can't be a negative difference, only positive).
Most of the examples I'm running into don't quite fit my requirements and hoping some mongo guru can help me out. Thanks!
As stated, I think your data modelling is a little off here as you need something to "pair" the "matches" as it were. I have a "simplified" case here:
{
"_id" : ObjectId("53ae9da2e24682cac4215e0c"),
"match" : ObjectId("53ae9d78e24682cac4215e0b"),
"score" : 1
}
{
"_id" : ObjectId("53ae9da5e24682cac4215e0d"),
"match" : ObjectId("53ae9d78e24682cac4215e0b"),
"score" : 5
}
{
"_id" : ObjectId("53aea6cde24682cac4215e15"),
"match" : ObjectId("53aea6c1e24682cac4215e14"),
"score" : 2
}
{
"_id" : ObjectId("53aea6e4e24682cac4215e16"),
"match" : ObjectId("53aea6c1e24682cac4215e14"),
"score" : 1
}
{
"_id" : ObjectId("53aea6eae24682cac4215e18"),
"match" : ObjectId("53aea6e6e24682cac4215e17"),
"score" : 2
}
{
"_id" : ObjectId("53aea6ece24682cac4215e19"),
"match" : ObjectId("53aea6e6e24682cac4215e17"),
"score" : 2
}
What that basically represents is the scores for "six" teams in "three" distinct matches.
Given that, my take on getting to results would be this:
db.matches.aggregate([
// Group on matches and find the "min" and "max" score
{ "$group": {
"_id": "$match",
"teams": {
"$push": {
"_id": "$_id",
"score": "$score"
}
},
"minScore": { "$min": "$score" },
"maxScore": { "$max": "$score" }
}},
// Unwind the "teams" array created
{ "$unwind": "$teams" },
// Compare scores for "win", "loss" or "draw"
{ "$group": {
"_id": "$_id",
"win": {
"$min": { "$cond": [
{ "$and": [
{ "$eq": [ "$teams.score", "$maxScore" ] },
{ "$gt": [ "$teams.score", "$minScore" ] }
]},
"$teams",
false
]}
},
"loss": {
"$min": { "$cond": [
{ "$and": [
{ "$eq": [ "$teams.score", "$minScore" ] },
{ "$lt": [ "$teams.score", "$maxScore" ] }
]},
"$teams",
false
]}
},
"draw": {
"$push": { "$cond": [
{ "$eq": [ "$minScore", "$maxScore" ] },
"$teams",
false
]}
},
"difference": {
"$max": { "$subtract": [ "$maxScore", "$minScore" ] }
}
}},
// Just fix up those "draw" results with a [false,false] array
{ "$project": {
"win": 1,
"loss": 1,
"draw": { "$cond": [
{ "$gt": [
{ "$size": { "$setDifference": [ "$draw", [false] ] } },
0
]},
"$draw",
false
]},
"difference": 1
}}
])
And this gives you a quite nice result:
{
"_id" : ObjectId("53ae9d78e24682cac4215e0b"),
"win" : {
"_id" : ObjectId("53ae9da5e24682cac4215e0d"),
"score" : 5
},
"loss" : {
"_id" : ObjectId("53ae9da2e24682cac4215e0c"),
"score" : 1
},
"draw" : false,
"difference" : 4
}
{
"_id" : ObjectId("53aea6c1e24682cac4215e14"),
"win" : {
"_id" : ObjectId("53aea6cde24682cac4215e15"),
"score" : 2
},
"loss" : {
"_id" : ObjectId("53aea6e4e24682cac4215e16"),
"score" : 1
},
"draw" : false,
"difference" : 1
}
{
"_id" : ObjectId("53aea6e6e24682cac4215e17"),
"win" : false,
"loss" : false,
"draw" : [
{
"_id" : ObjectId("53aea6eae24682cac4215e18"),
"score" : 2
},
{
"_id" : ObjectId("53aea6ece24682cac4215e19"),
"score" : 2
}
],
"difference" : 0
}
That is essentially the results per "match" and determines the "difference" between winner and looser while identifying which team "won" or "lost". The final stage there uses some operators only introduced in MongoDB 2.6, but that really is not necessary if you do not have that version available. Or you could actually still do the same thing if you wanted to by using $unwind and some other processing.

mongodb multiple aggregations in single operation

I have an item collection with following documents.
{ "item" : "i1", "category" : "c1", "brand" : "b1" }
{ "item" : "i2", "category" : "c2", "brand" : "b1" }
{ "item" : "i3", "category" : "c1", "brand" : "b2" }
{ "item" : "i4", "category" : "c2", "brand" : "b1" }
{ "item" : "i5", "category" : "c1", "brand" : "b2" }
I want to separate aggregation results --> count by category, count by brand. Please note, it is not count by (category,brand)
I am able to do this using map-reduce using following code.
map = function(){
emit({type:"category",category:this.category},1);
emit({type:"brand",brand:this.brand},1);
}
reduce = function(key, values){
return Array.sum(values)
}
db.item.mapReduce(map,reduce,{out:{inline:1}})
And the result is
{
"results" : [
{
"_id" : {
"type" : "brand",
"brand" : "b1"
},
"value" : 3
},
{
"_id" : {
"type" : "brand",
"brand" : "b2"
},
"value" : 2
},
{
"_id" : {
"type" : "category",
"category" : "c1"
},
"value" : 3
},
{
"_id" : {
"type" : "category",
"category" : "c2"
},
"value" : 2
}
],
"timeMillis" : 21,
"counts" : {
"input" : 5,
"emit" : 10,
"reduce" : 4,
"output" : 4
},
"ok" : 1,
}
I can get same results by firing two different aggregation commands as below.
db.item.aggregate({$group:{_id:"$category",count:{$sum:1}}})
db.item.aggregate({$group:{_id:"$brand",count:{$sum:1}}})
Is there anyway I can do the same using aggregation framework by single aggregation command.
I have simplified my case here, but in actual I need this grouping from fields in array of subdocuments. Assume the above is structure after I do unwind.
It is a real-time query (someone waiting for response), though on smaller dataset, so execution time is important.
I am using MongoDB 2.4.
Starting in Mongo 3.4, the $facet aggregation stage greatly simplifies this type of use case by processing multiple aggregation pipelines within a single stage on the same set of input documents:
// { "item" : "i1", "category" : "c1", "brand" : "b1" }
// { "item" : "i2", "category" : "c2", "brand" : "b1" }
// { "item" : "i3", "category" : "c1", "brand" : "b2" }
// { "item" : "i4", "category" : "c2", "brand" : "b1" }
// { "item" : "i5", "category" : "c1", "brand" : "b2" }
db.collection.aggregate(
{ $facet: {
categories: [{ $group: { _id: "$category", count: { "$sum": 1 } } }],
brands: [{ $group: { _id: "$brand", count: { "$sum": 1 } } }]
}}
)
// {
// "categories" : [
// { "_id" : "c1", "count" : 3 },
// { "_id" : "c2", "count" : 2 }
// ],
// "brands" : [
// { "_id" : "b1", "count" : 3 },
// { "_id" : "b2", "count" : 2 }
// ]
// }
Over a large data set I would say that your current mapReduce approach would be the best one, because the aggregation technique for this would not work well with large data. But possibly over a reasonably small size it might just be what you need:
db.items.aggregate([
{ "$group": {
"_id": null,
"categories": { "$push": "$category" },
"brands": { "$push": "$brand" }
}},
{ "$project": {
"_id": {
"categories": "$categories",
"brands": "$brands"
},
"categories": 1
}},
{ "$unwind": "$categories" },
{ "$group": {
"_id": {
"brands": "$_id.brands",
"category": "$categories"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.brands",
"categories": { "$push": {
"category": "$_id.category",
"count": "$count"
}},
}},
{ "$project": {
"_id": "$categories",
"brands": "$_id"
}},
{ "$unwind": "$brands" },
{ "$group": {
"_id": {
"categories": "$_id",
"brand": "$brands"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"categories": { "$first": "$_id.categories" },
"brands": { "$push": {
"brand": "$_id.brand",
"count": "$count"
}}
}}
])
Not really the same as the mapReduce output, you could throw in some more stages to change the output format, but this should be usable:
{
"_id" : null,
"categories" : [
{
"category" : "c2",
"count" : 2
},
{
"category" : "c1",
"count" : 3
}
],
"brands" : [
{
"brand" : "b2",
"count" : 2
},
{
"brand" : "b1",
"count" : 3
}
]
}
As you can see, this involves a fair bit of shuffling between arrays in order to group each set of either "category" or "brand" within the same pipeline process. Again I will say, this will not do well for large data, but for something like "items in an order" it would probably do nicely.
Of course as you say, you have simplified somewhat, so the first grouping key on null is either going to be something else or either narrowed down to do that null case by an earlier $match stage, which is probably what you want to do.