Combing aggregate operations in a single result - mongodb

I have two aggregate operations that I'd like to combine. The first operation returns, for example:
{ "_id" : "Colors", "count" : 12 }
{ "_id" : "Animals", "count" : 6 }
and the second operation returns, for example:
{ "_id" : "Red", "count" : 10 }
{ "_id" : "Blue", "count" : 9 }
{ "_id" : "Green", "count" : 9 }
{ "_id" : "White", "count" : 7 }
{ "_id" : "Yellow", "count" : 7 }
{ "_id" : "Orange", "count" : 7 }
{ "_id" : "Black", "count" : 5 }
{ "_id" : "Goose", "count" : 4 }
{ "_id" : "Chicken", "count" : 3 }
{ "_id" : "Grey", "count" : 3 }
{ "_id" : "Cat", "count" : 3 }
{ "_id" : "Rabbit", "count" : 3 }
{ "_id" : "Duck", "count" : 3 }
{ "_id" : "Turkey", "count" : 2 }
{ "_id" : "Elephant", "count" : 2 }
{ "_id" : "Shark", "count" : 2 }
{ "_id" : "Fish", "count" : 2 }
{ "_id" : "Tiger", "count" : 2 }
{ "_id" : "Purple", "count" : 1 }
{ "_id" : "Pink", "count" : 1 }
How do I combine the 2 operations to achieve the following?
{ "_id" : "Colors", "count" : 12, "items" :
[
{ "_id" : "Red", "count" : 10 },
{ "_id" : "Blue", "count" : 9 },
{ "_id" : "Green", "count" : 9 },
{ "_id" : "White", "count" : 7 },
{ "_id" : "Yellow", "count" : 7 },
{ "_id" : "Orange", "count" : 7 },
{ "_id" : "Black", "count" : 5 },
{ "_id" : "Grey", "count" : 3 },
{ "_id" : "Purple", "count" : 1 },
{ "_id" : "Pink", "count" : 1 }
]
},
{ "_id" : "Animals", "count" : 6, "items" :
[
{ "_id" : "Goose", "count" : 4 },
{ "_id" : "Chicken", "count" : 3 },
{ "_id" : "Cat", "count" : 3 },
{ "_id" : "Rabbit", "count" : 3 },
{ "_id" : "Duck", "count" : 3 },
{ "_id" : "Turkey", "count" : 2 },
{ "_id" : "Elephant", "count" : 2 },
{ "_id" : "Shark", "count" : 2 },
{ "_id" : "Fish", "count" : 2 },
{ "_id" : "Tiger", "count" : 2 }
]
}
Schema
var ListSchema = new Schema({
created: {
type: Date,
default: Date.now
},
title: {
type: String,
default: '',
trim: true,
required: 'Title cannot be blank'
},
items: {
type: Array,
default: [String],
trim: true
},
creator: {
type: Schema.ObjectId,
ref: 'User'
}
});
Operation 1
db.lists.aggregate(
[
{ $group: { _id: "$title", count: { $sum: 1 } } },
{ $sort: { count: -1 } }
]
)
Operation 2
db.lists.aggregate(
[
{ $unwind: "$items" },
{ $group: { _id: "$items", count: { $sum: 1 } } },
{ $sort: { count: -1 } }
]
)

This really depends on the kind of results you are after in a respone. The things you are asking about seem to indicate that you are looking for "facet counts" in a result, but I'll touch on that a bit later.
For as basic result, there is nothing wrong with this as an approach:
Thing.aggregate(
[
{ "$group": {
"_id": {
"type": "$type", "name": "$name"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.type",
"count": { "$sum": "$count" },
"names": {
"$push": { "name": "$_id.name", "count": "$count" }
}
}}
],
function(err,results) {
console.log(JSON.stringify(results, undefined, 2));
callback(err);
}
)
Which should give you a result like this:
[
{
"_id": "colours",
"count": 50102,
"names": [
{ "name": "Green", "count": 9906 },
{ "name": "Yellow", "count": 10093 },
{ "name": "Red", "count": 10083 },
{ "name": "Orange", "count": 9997 },
{ "name": "Blue", "count": 10023 }
]
},
{
"_id": "animals",
"count": 49898,
"names": [
{ "name": "Tiger", "count": 9710 },
{ "name": "Lion", "count": 10058 },
{ "name": "Elephant", "count": 10069 },
{ "name": "Monkey", "count": 9963 },
{ "name": "Bear", "count": 10098 }
]
}
]
Where the very basic approach here is to simply $group in two stages, where the first stage aggregates on the combination of keys down to the lowest ( most granular ) grouping level, and then process a $group again to basically "add up" the totals on the highest ( least granular ) grouping level, also thus adding the lower results to an array of items.
But this is not "separated" as it would be in "facet counts", so to do this becomes a little more complex, as well as a little more insane. But first the example:
Thing.aggregate(
[
{ "$group": {
"_id": {
"type": "$type",
"name": "$name"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.type",
"count": { "$sum": "$count" },
"names": {
"$push": { "name": "$_id.name", "count": "$count" }
}
}},
{ "$group": {
"_id": null,
"types": {
"$push": {
"type": "$_id", "count": "$count"
}
},
"names": { "$push": "$names" }
}},
{ "$unwind": "$names" },
{ "$unwind": "$names" },
{ "$group": {
"_id": "$types",
"names": { "$push": "$names" }
}},
{ "$project": {
"_id": 0,
"facets": {
"types": "$_id",
"names": "$names",
},
"data": { "$literal": [] }
}}
],
function(err,results) {
console.log(JSON.stringify(results[0], undefined, 2));
callback(err);
}
);
Which will produce output like this:
{
"facets": {
"types": [
{ "type": "colours", "count": 50102 },
{ "type": "animals", "count": 49898 }
],
"names": [
{ "name": "Green", "count": 9906 },
{ "name": "Yellow", "count": 10093 },
{ "name": "Red", "count": 10083 },
{ "name": "Orange", "count": 9997 },
{ "name": "Blue", "count": 10023 },
{ "name": "Tiger", "count": 9710 },
{ "name": "Lion", "count": 10058 },
{ "name": "Elephant", "count": 10069 },
{ "name": "Monkey", "count": 9963 },
{ "name": "Bear", "count": 10098 }
]
},
"data": []
}
What should be apparent though is while "possible", the kind of "juggling" going on here in the pipeline to produce this output format is not really efficient. Compared to the first example, there is a lot of overhead in here just to simply split out the results into their own array responses and independently of the grouping keys. This notably becomes more complex with the more "facets" to generate.
Also as hinted at here in the output, what people generally ask of "facet counts" is that that the result "data" is also included in the response ( likely paged ) in addition to the aggregated facets. So the further complications should be apparent right here:
{ "$group": {
"_id": null,
(...)
Where the requirement of this type of operation is to basically "stuff" every piece of data into a single object. In most cases, and certainly where you want the actual data in results ( using 100,000 in this sample ) it becomes completely impractical to follow this approach and will almost certainly exceed the BSON document limit size of 16MB.
In such a case, where you want to produce results and the "facets" of that data in a response, then the best approach here is to run each aggregation and the output page as separate query operations and "stream" the output JSON ( or other format ) back to the receiving client.
As a self contained example:
var async = require('async'),
mongoose = require('mongoose'),
Schema = mongoose.Schema;
mongoose.connect('mongodb://localhost/things');
var data = {
"colours": [
"Red","Blue","Green","Yellow","Orange"
],
"animals": [
"Lion","Tiger","Bear","Elephant","Monkey"
]
},
dataKeys = Object.keys(data);
var thingSchema = new Schema({
"name": String,
"type": String
});
var Thing = mongoose.model( 'Thing', thingSchema );
var writer = process.stdout;
mongoose.connection.on("open",function(err) {
if (err) throw err;
async.series(
[
function(callback) {
process.stderr.write("removing\n");
Thing.remove({},callback);
},
function(callback) {
process.stderr.write("inserting\n");
var bulk = Thing.collection.initializeUnorderedBulkOp(),
count = 0;
async.whilst(
function() { return count < 100000; },
function(callback) {
var keyLen = dataKeys.length,
keyIndex = Math.floor(Math.random(keyLen)*keyLen),
type = dataKeys[keyIndex],
types = data[type],
typeLen = types.length,
nameIndex = Math.floor(Math.random(typeLen)*typeLen),
name = types[nameIndex];
var obj = { "type": type, "name": name };
bulk.insert(obj);
count++;
if ( count % 1000 == 0 ) {
process.stderr.write('insert count: ' + count + "\n");
bulk.execute(function(err,resp) {
bulk = Thing.collection.initializeUnorderedBulkOp();
callback(err);
});
} else {
callback();
}
},
callback
);
},
function(callback) {
writer.write("{ \n \"page\": 1,\n \"pageSize\": 25,\n")
writer.write(" \"facets\": {\n"); // open object response
var stream = Thing.collection.aggregate(
[
{ "$group": {
"_id": "$name",
"count": { "$sum": 1 }
}}
],
{
"cursor": {
"batchSize": 1000
}
}
);
var counter = 0;
stream.on("data",function(data) {
stream.pause();
if ( counter == 0 ) {
writer.write(" \"names\": [\n");
} else {
writer.write(",\n");
}
data = { "name": data._id, "count": data.count };
writer.write(" " + JSON.stringify(data));
counter++;
stream.resume();
});
stream.on("end",function() {
writer.write("\n ],\n");
var stream = Thing.collection.aggregate(
[
{ "$group": {
"_id": "$type",
"count": { "$sum": 1 }
}}
],
{
"cursor": {
"batchSize": 1000
}
}
);
var counter = 0;
stream.on("data",function(data) {
stream.pause();
if ( counter == 0 ) {
writer.write(" \"types\": [\n");
} else {
writer.write(",\n");
}
data = { "name": data._id, "count": data.count };
writer.write(" " + JSON.stringify(data));
counter++;
stream.resume();
});
stream.on("end",function() {
writer.write("\n ]\n },\n");
var stream = Thing.find({}).limit(25).stream();
var counter = 0;
stream.on("data",function(data) {
stream.pause();
if ( counter == 0 ) {
writer.write(" \"data\": [\n");
} else {
writer.write(",\n");
}
writer.write(" " + JSON.stringify(data));
counter++;
stream.resume();
});
stream.on("end",function() {
writer.write("\n ]\n}\n");
callback();
});
});
});
}
],
function(err) {
if (err) throw err;
process.exit();
}
);
});
With the output like:
{
"page": 1,
"pageSize": 25,
"facets": {
"names": [
{"name":"Red","count":10007},
{"name":"Tiger","count":10012},
{"name":"Yellow","count":10119},
{"name":"Monkey","count":9970},
{"name":"Elephant","count":10046},
{"name":"Bear","count":10082},
{"name":"Orange","count":9982},
{"name":"Green","count":10005},
{"name":"Blue","count":9884},
{"name":"Lion","count":9893}
],
"types": [
{"name":"colours","count":49997},
{"name":"animals","count":50003}
]
},
"data": [
{"_id":"55bf141f3edc150b6abdcc02","type":"animals","name":"Lion"},
{"_id":"55bf141f3edc150b6abdc81b","type":"colours","name":"Blue"},
{"_id":"55bf141f3edc150b6abdc81c","type":"colours","name":"Orange"},
{"_id":"55bf141f3edc150b6abdc81d","type":"animals","name":"Bear"},
{"_id":"55bf141f3edc150b6abdc81e","type":"animals","name":"Elephant"},
{"_id":"55bf141f3edc150b6abdc81f","type":"colours","name":"Orange"},
{"_id":"55bf141f3edc150b6abdc820","type":"colours","name":"Green"},
{"_id":"55bf141f3edc150b6abdc821","type":"animals","name":"Lion"},
{"_id":"55bf141f3edc150b6abdc822","type":"animals","name":"Monkey"},
{"_id":"55bf141f3edc150b6abdc823","type":"colours","name":"Yellow"},
{"_id":"55bf141f3edc150b6abdc824","type":"colours","name":"Yellow"},
{"_id":"55bf141f3edc150b6abdc825","type":"colours","name":"Orange"},
{"_id":"55bf141f3edc150b6abdc826","type":"animals","name":"Monkey"},
{"_id":"55bf141f3edc150b6abdc827","type":"colours","name":"Blue"},
{"_id":"55bf141f3edc150b6abdc828","type":"animals","name":"Tiger"},
{"_id":"55bf141f3edc150b6abdc829","type":"colours","name":"Red"},
{"_id":"55bf141f3edc150b6abdc82a","type":"animals","name":"Monkey"},
{"_id":"55bf141f3edc150b6abdc82b","type":"animals","name":"Elephant"},
{"_id":"55bf141f3edc150b6abdc82c","type":"animals","name":"Tiger"},
{"_id":"55bf141f3edc150b6abdc82d","type":"animals","name":"Bear"},
{"_id":"55bf141f3edc150b6abdc82e","type":"colours","name":"Yellow"},
{"_id":"55bf141f3edc150b6abdc82f","type":"animals","name":"Lion"},
{"_id":"55bf141f3edc150b6abdc830","type":"animals","name":"Elephant"},
{"_id":"55bf141f3edc150b6abdc831","type":"colours","name":"Orange"},
{"_id":"55bf141f3edc150b6abdc832","type":"animals","name":"Elephant"}
]
}
There are some considerations in here, notably that mongoose .aggregate() does not really directly support the standard node stream interface. There is an .each() method available from .cursor() on an aggregate method, but the "stream" implied from the core API method gives a lot more control here, so the .collection mehod here to get the underlying driver object is preferable. Hopefully a future mongoose release will consider this.
So if your end goal is such a "facet count" alongside the results as demonstrated here, then each aggregation and results make the most sense to "stream" in the way as demonstrated. Without that, the aggregation becomes both overcomplicated as well as very likely to exceed the BSON limit, just as doing otherwise in this case would.

Related

mongodb: match, group by multiple fields, project and count

So I'm learning mongodb and I got a collection of writers to train.
Here I'm trying to count works by sorting them by country and gender of the author. This is what I accoplished so far:
db.writers.aggregate([
{ "$match": { "gender": {"$ne": male}}},
{ "$group": {
"_id": {
"country_id": "$country_id",
"type": "$type"
},
}},
{ "$group": {
"_id": "$_id.country_id",
"literary_work": {
"$push": {
"type": "$_id.type",
"count": { "$sum": "$type" }
}
},
"total": { "$sum": "$type" }
}},
{ "$sort": { "country_id": 1 } },
{ "$project": {
"literary_work": { "$slice": [ "$literary_work", 3 ] },
"total": { "$sum": "$type" }
}}
])
Sadly, the output that I get is not the one I'm expecting:
"_id" : GREAT BRITAIN,
"literary_work" : [
{
"type" : "POEM",
"count" : 0
},
{
"type" : "NOVEL",
"count" : 0
},
{
"type" : "SHORT STORY",
"count" : 0
}
],
"total" : 0
Could anyone tell me where do I insert the count stage or what is my mistake?)
upd:
Data sample:
{
"_id" : ObjectId("5f115c5d5f62f9f482cd7a49"),
"author" : George Sand,
"gender" : female,
"country_id" : FRANCE,
"title": "Consuelo",
"type" : "NOVEL",
}
Expected result (NB! this is a result for both genders):
{
"_id" : FRANCE,
"count" : 59.0,
"literary_work" : [
{
"type" : "POEM",
"count" : 14.0
},
{
"type" : "NOVEL",
"count" : 34.0
},
{
"type" : "SHORT STORY",
"count" : 11.0
}
]
}
Your implementation is correct way but there are missing things:
missed count in first $group
on the base of first group count it can count whole count of literary_work
and $project is not needed from your query
Corrected things in query,
db.writers.aggregate([
{
$match: {
gender: { $ne: "male" }
}
},
{
$group: {
_id: {
country_id: "$country_id",
type: "$type"
},
// missed this
count: { $sum: 1 }
}
},
{
$group: {
_id: "$_id.country_id",
// this count will be on the base of first group count
count: { $sum: "$count" },
literary_work: {
$push: {
type: "$_id.type",
// add count in inner count
count: "$count"
}
}
}
},
// corrected from country_id to _id
{
$sort: { "_id": 1 }
}
])
Working Playground: https://mongoplayground.net/p/JWP7qdDY6cc

Combine results based on condition during group by

Mongo query generated out of java code:
{
"pipeline": [{
"$match": {
"Id": "09cd9a5a-85c5-4948-808b-20a52d92381a"
}
},
{
"$group": {
"_id": "$result",
"id": {
"$first": "$result"
},
"labelKey": {
"$first": {
"$ifNull": ["$result",
"$result"]
}
},
"value": {
"$sum": 1
}
}
}]
}
Field 'result' can have values like Approved, Rejected, null and "" (empty string). What I am trying to achieve is combining the count of both null and empty together.
So that the empty string Id will have the count of both null and "", which is equal to 4
I'm sure theres a more "proper" way but this is what i could quickly come up with:
[
{
"$group" : {
"_id" : "$result",
"id" : {
"$first" : "$result"
},
"labelKey" : {
"$first" : {
"$ifNull" : [
"$result",
"$result"
]
}
},
"value" : {
"$sum" : 1.0
}
}
},
{
"$group" : {
"_id" : {
"$cond" : [{
$or: [
{"$eq": ["$_id", "Approved"]},
{"$eq": ["$_id", "Rejected"]},
]}},
"$_id",
""
]
},
"temp" : {
"$push" : {
"_id" : "$_id",
"labelKey" : "$labelKey"
}
},
"count" : {
"$sum" : "$value"
}
}
},
{
"$unwind" : "$temp"
},
{
"$project" : {
"_id" : "$temp._id",
"labelKey": "$temp.labelKey",
"count" : "$count"
}
}
],
);
Due to the fact the second group is only on 4 documents tops i don't feel too bad about doing this.
I have used $facet.
The MongoDB stage $facet lets you run several independent pipelines within the stage of a pipeline, all using the same data. This means that you can run several aggregations with the same preliminary stages, and successive stages.
var queries = [{
"$match": {
"Id": "09cd9a5a-85c5-4948-808b-20a52d92381a"
}
},{
$facet: {//
"empty": [
{
$match : {
result : { $in : ['',null]}
}
},{
"$group" : {
"_id" : null,
value : { $sum : 1}
}
}
],
"non_empty": [
{
$match : {
result : { $nin : ['',null]}
}
},{
"$group" : {
"_id" : '$result',
value : { $sum : 1}
}
}
]
}
},
{
$project: {
results: {
$concatArrays: [ "$empty", "$non_empty" ]
}
}
}];
Output :
{
"results": [{
"_id": null,
"value": 52 // count of both '' and null.
}, {
"_id": "Approved",
"value": 83
}, {
"_id": "Rejected",
"value": 3661
}]
}
Changing the group by like below solved the problem
{
"$group": {
"_id": {
"$ifNull": ["$result", ""]
},
"id": {
"$first": "$result"
},
"labelKey": {
"$first": {
"$ifNull": ["$result",
"$result"]
}
},
"value": {
"$sum": 1
}
}
}

mongo aggregation framework group by quarter/half year/year

I have a database with this schema structure :
{
"name" : "Carl",
"city" : "paris",
"time" : "1-2018",
"notes" : [
"A",
"A",
"B",
"C",
"D"
]
}
And this query using the aggregation framework :
db.getCollection('collection').aggregate(
[{
"$match": {
"$and": [{
"$or": [ {
"time": "1-2018"
}, {
"time": "2-2018"
} ]
}, {
"name": "Carl"
}, {
"city": "paris"
}]
}
}, {
"$unwind": "$notes"
}, {
"$group": {
"_id": {
"notes": "$notes",
"time": "$time"
},
"count": {
"$sum": 1
}
}
}
, {
"$group": {
"_id": "$_id.time",
"count": {
"$sum": 1
}
}
}, {
"$project": {
"_id": 0,
"time": "$_id",
"count": 1
}
}])
It working correcly and i'm getting these results these results :
{
"count" : 4.0,
"time" : "2-2018"
}
{
"count" : 4.0,
"time" : "1-2018"
}
My issue is that i'd like to keep the same match stage and i'd like to group by quarter.
Here the result i'd like to have :
{
"count" : 8.0,
"time" : "1-2018" // here quarter 1
}
Thanks

Mongodb use multiple group operator in single aggregation

I am using mongodb aggregation for getting counts of different fields. Here are some documents from the mobile collection:-
{
"title": "Moto G",
"manufacturer": "Motorola",
"releasing": ISODate("2011-03-00T10:26:48.424Z"),
"rating": "high"
}
{
"title": "Asus Zenfone 2",
"manufacturer": "Asus",
"releasing": ISODate("2014-10-00T10:26:48.424Z"),
"rating": "high"
}
{
"title": "Moto Z",
"manufacturer": "Motorola",
"releasing": ISODate("2016-10-12T10:26:48.424Z"),
"rating": "none"
}
{
"title": "Asus Zenfone 3",
"manufacturer": "Asus",
"releasing": ISODate("2016-08-00T10:26:48.424Z"),
"rating": "medium"
}
I can find manufacturer and rating counts but this fails:
db.mobile.aggregate([
{
$group: { _id: "$manufacturer", count: { $sum: 1 } }
}, {
$group: { _id: "$rating", count: { $sum: 1 } }
}
])
Output:-
{
"_id" : null,
"count" : 2.0
}
Expected Output something like:-
{
"_id":"Motorola",
"count" : 2.0
}
{
"_id":"Asus",
"count" : 2.0
}
{
"_id":"high",
"count" : 2.0
}
{
"_id":"none",
"count" : 1.0
}
{
"_id":"medium",
"count" : 1.0
}
I believe you are after an aggregation operation that groups the documents by the manufacturer and rating keys, then do a further group on the manufacturer while aggregating the ratings per manufacturer, something like the following pipeline:
db.mobile.aggregate([
{
"$group": {
"_id": {
"manufacturer": "$manufacturer",
"rating": "$rating"
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": "$_id.manufacturer",
"total": { "$sum": 1 },
"counts": {
"$push": {
"rating": "$_id.rating",
"count": "$count"
}
}
}
}
])
Sample Output
/* 1 */
{
"_id" : "Motorola",
"total" : 2,
"counts" : [
{
"rating" : "high",
"count" : 1
},
{
"rating" : "none",
"count" : 1
}
]
}
/* 2 */
{
"_id" : "Asus",
"total" : 2,
"counts" : [
{
"rating" : "high",
"count" : 1
},
{
"rating" : "medium",
"count" : 1
}
]
}
or if you are after a more "flat" or "denormalised" result, run this aggregate operation:
db.mobile.aggregate([
{
"$group": {
"_id": "$manufacturer",
"total": { "$sum": 1 },
"high_ratings": {
"$sum": {
"$cond": [ { "$eq": [ "$rating", "high" ] }, 1, 0 ]
}
},
"medium_ratings": {
"$sum": {
"$cond": [ { "$eq": [ "$rating", "medium" ] }, 1, 0 ]
}
},
"low_ratings": {
"$sum": {
"$cond": [ { "$eq": [ "$rating", "low" ] }, 1, 0 ]
}
},
"none_ratings": {
"$sum": {
"$cond": [ { "$eq": [ "$rating", "none" ] }, 1, 0 ]
}
}
}
}
])
Sample Output
/* 1 */
{
"_id" : "Motorola",
"total" : 2,
"high_ratings" : 1,
"medium_ratings" : 0,
"low_ratings" : 0,
"none_ratings" : 1
}
/* 2 */
{
"_id" : "Asus",
"total" : 2,
"high_ratings" : 1,
"medium_ratings" : 1,
"low_ratings" : 0,
"none_ratings" : 0
}

Group Multiple Values in Aggregation

I want to group the all field of a collection with unique total. Let's assume there is collection like this:
id country state operator
121 IN HR AIRTEL
212 IN MH AIRTEL
213 US LA AT&T
214 UK JK VODAFONE
Output should be like this:
{
"country": { "IN": 2, "US":1, "UK":1 },
"state": { "HR":1, "MH":1, "LA":1, "JK": 1 },
"operator": { "AIRTEL":2, "AT&T": 1, "VODAFONE": 1 }
}
I am trying to use mongo aggregation framework, but can't really think how to do this?
I find out some similar to your output using aggregation check below code
db.collectionName.aggregate({
"$group": {
"_id": null,
"countryOfIN": {
"$sum": {
"$cond": [{
$eq: ["$country", "IN"]
}, 1, 0]
}
},
"countryOfUK": {
"$sum": {
"$cond": [{
$eq: ["$country", "UK"]
}, 1, 0]
}
},
"countryOfUS": {
"$sum": {
"$cond": [{
$eq: ["$country", "US"]
}, 1, 0]
}
},
"stateOfHR": {
"$sum": {
"$cond": [{
$eq: ["$state", "HR"]
}, 1, 0]
}
},
"stateOfMH": {
"$sum": {
"$cond": [{
$eq: ["$state", "MH"]
}, 1, 0]
}
},
"stateOfLA": {
"$sum": {
"$cond": [{
$eq: ["$state", "LA"]
}, 1, 0]
}
},
"stateOfJK": {
"$sum": {
"$cond": [{
$eq: ["$state", "JK"]
}, 1, 0]
}
},
"operatorOfAIRTEL": {
"$sum": {
"$cond": [{
$eq: ["$operator", "AIRTEL"]
}, 1, 0]
}
},
"operatorOfAT&T": {
"$sum": {
"$cond": [{
$eq: ["$operator", "AT&T"]
}, 1, 0]
}
},
"operatorOfVODAFONE": {
"$sum": {
"$cond": [{
$eq: ["$operator", "VODAFONE"]
}, 1, 0]
}
}
}
}, {
"$group": {
"_id": null,
"country": {
"$push": {
"IN": "$countryOfIN",
"UK": "$countryOfUK",
"US": "$countryOfUS"
}
},
"STATE": {
"$push": {
"HR": "$stateOfHR",
"MH": "$stateOfMH",
"LA": "$stateOfLA",
"JK": "$stateOfJK"
}
},
"operator": {
"$push": {
"AIRTEL": "$operatorOfAIRTEL",
"AT&T": "$operatorOfAT&T",
"VODAFONE": "$operatorOfVODAFONE"
}
}
}
}, {
"$project": {
"_id": 0,
"country": 1,
"STATE": 1,
"operator": 1
}
})
using $cond created groups of matched data and pushed them in second groups to combine.
An output format like you are looking for is not really suited to the aggregation framework since you are tranforming part of your data in to "key" names. The aggregation framework does not do this but rather sticks to database "best practice" as does not transform "data" to "key" names in any way.
You can perform a mapReduce operation instead with allows more flexibilty with the manipulation, but not as good performance due to the need to use JavaScript code to perform the manipulation:
db.collection.mapReduce(
function () {
var obj = {},
doc = this;
delete doc._id;
Object.keys(doc).forEach(function(key) {
obj[key] = {};
obj[key][doc[key]] = 1;
});
emit( null, obj );
},
function (key,values) {
var result = {};
values.forEach(function(value) {
Object.keys(value).forEach(function(outerKey) {
Object.keys(value[outerKey]).forEach(function(innerKey) {
if ( !result.hasOwnProperty(outerKey) ) {
result[outerKey] = {};
}
if ( result[outerKey].hasOwnProperty(innerKey) ) {
result[outerKey][innerKey] += value[outerKey][innerKey];
} else {
result[outerKey][innerKey] = value[outerKey][innerKey];
}
});
});
});
return result;
},
{ "out": { "inline": 1 } }
)
And in the stucture that applies to all mapReduce results:
{
"results" : [
{
"_id" : null,
"value" : {
"country" : {
"IN" : 2,
"US" : 1,
"UK" : 1
},
"state" : {
"HR" : 1,
"MH" : 1,
"LA" : 1,
"JK" : 1
},
"operator" : {
"AIRTEL" : 2,
"AT&T" : 1,
"VODAFONE" : 1
}
}
}
]
}
For the aggregation framework itself, it is better suited to producing aggregation results that are more consistently structured:
db.mapex.aggregate([
{ "$project": {
"country": 1,
"state": 1,
"operator": 1,
"type": { "$literal": ["country","state","operator"] }
}},
{ "$unwind": "$type" },
{ "$group": {
"_id": {
"type": "$type",
"key": { "$cond": {
"if": { "$eq": [ "$type", "country" ] },
"then": "$country",
"else": { "$cond": {
"if": { "$eq": [ "$type", "state" ] },
"then": "$state",
"else": "$operator"
}}
}}
},
"count": { "$sum": 1 }
}}
])
Which would output:
{ "_id" : { "type" : "state", "key" : "JK" }, "count" : 1 }
{ "_id" : { "type" : "country", "key" : "UK" }, "count" : 1 }
{ "_id" : { "type" : "country", "key" : "US" }, "count" : 1 }
{ "_id" : { "type" : "operator", "key" : "AT&T" }, "count" : 1 }
{ "_id" : { "type" : "state", "key" : "LA" }, "count" : 1 }
{ "_id" : { "type" : "operator", "key" : "AIRTEL" }, "count" : 2 }
{ "_id" : { "type" : "state", "key" : "MH" }, "count" : 1 }
{ "_id" : { "type" : "state", "key" : "HR" }, "count" : 1 }
{ "_id" : { "type" : "operator", "key" : "VODAFONE" }, "count" : 1 }
{ "_id" : { "type" : "country", "key" : "IN" }, "count" : 2 }
But is fairly easy to transform in client code while iterating the results:
var result = {};
db.mapex.aggregate([
{ "$project": {
"country": 1,
"state": 1,
"operator": 1,
"type": { "$literal": ["country","state","operator"] }
}},
{ "$unwind": "$type" },
{ "$group": {
"_id": {
"type": "$type",
"key": { "$cond": {
"if": { "$eq": [ "$type", "country" ] },
"then": "$country",
"else": { "$cond": {
"if": { "$eq": [ "$type", "state" ] },
"then": "$state",
"else": "$operator"
}}
}}
},
"count": { "$sum": 1 }
}}
]).forEach(function(doc) {
if ( !result.hasOwnProperty(doc._id.type) )
result[doc._id.type] = {};
result[doc._id.type][doc._id.key] = doc.count;
})
Which gives the final structure in "result":
{
"state" : {
"JK" : 1,
"LA" : 1,
"MH" : 1,
"HR" : 1
},
"country" : {
"UK" : 1,
"US" : 1,
"IN" : 2
},
"operator" : {
"AT&T" : 1,
"AIRTEL" : 2,
"VODAFONE" : 1
}
}