Mongodb - grouping several fields using aggregation framework - mongodb

I have some documents
{name: 'apple', type: 'fruit', color: 'red'}
{name: 'banana', type: 'fruit', color: 'yellow'}
{name: 'orange', type: 'fruit', color: 'orange'}
{name: 'eggplant', type: 'vege', color: 'purple'}
{name: 'brocoli', type: 'vege', color: 'green'}
{name: 'rose', type: 'flower', color: 'red'}
{name: 'cauli', type: 'vege', color: 'white'}
{name: 'potato', type: 'vege', color: 'brown'}
{name: 'onion', type: 'vege', color: 'white'}
{name: 'strawberry', type: 'fruit', color: 'red'}
{name: 'cashew', type: 'nut', color: ''}
{name: 'almond', type: 'nut', color: ''}
{name: 'lemon', type: 'vege', color: 'yellow'}
{name: 'tomato', type: 'vege', color: 'red'}
{name: 'tomato', type: 'fruit', color: 'red'}
{name: 'fig', type: 'fruit', color: 'pink'}
{name: 'nectarin', type: 'fruit', color: 'pink'}
I want to group them into alphabets like below
{
_id:'a',
name:['apple','almond'],
type:[],
color:[]
}
{
_id:'b',
name:['banana','brocoli'],
type:[],
color:['brown']
}
...
{
_id:'f',
name:['fig'],
type:['fruit','flower'],
color:['']
}
...
{
_id:'n',
name:['nectarin'],
type:['nut'],
color:['']
}
...
{
_id:'p',
name:['potato'],
type:[''],
color:['pink','purple']
}
...
The result can be saved into another collection. So I can issue a query in the newly created collection: find({_id:'a'}) to return name, type and color begins with the letter 'a'.
I have thought about using $group
$group: {
_id: $substr: ['$name', 0, 1],
name: {$addToSet: '$name'},
}
Then another command
$group: {
_id: $substr: ['$type', 0, 1],
name: {$addToSet: '$type'},
}
And
$group: {
_id: $substr: ['$color', 0, 1],
name: {$addToSet: '$color'},
}
But I am stuck at how to unify all three together to save into a new collection. Or is aggregation framework not suitable for this kind of data summary?
In a real world example, e.g. a e-commerce site, the front page displays something like: "currently we have 135636 products under 231 categories from 111 brands". Surely, these numbers should be cached somewhere (in memory or in another collection), because running $group each time is resource intensive? What would be the optimal schema/design for these situations?
Sorry, my questions are a bit 'confusing'.

Since you have multiple arrays here, the key thing is to "merge" them all into one for the simplest processing.
The $map operator of the aggregation framework works well here, as well as transforming elements so that you get your "first letter" from each word within the data:
db.alpha.aggregate([
{ "$project": {
"list": {
"$map": {
"input": [ "A", "B", "C" ],
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
{
"type": { "$literal": "name" },
"value": "$name",
"alpha": { "$substr": [ "$name",0,1 ] }
},
{ "$cond": [
{ "$eq": [ "$$el", "B" ] },
{
"type": { "$literal": "type" },
"value": "$type",
"alpha": { "$substr": [ "$type",0,1 ] }
},
{
"type": { "$literal": "color" },
"value": "$color",
"alpha": { "$substr": [ "$color",0,1 ] }
}
]}
]
}
}
}
}},
{ "$unwind": "$list" },
{ "$match": { "list.alpha": { "$ne": "" } } },
{ "$group": {
"_id": "$list.alpha",
"list": {
"$addToSet": "$list"
}
}},
{ "$project": {
"name": {
"$setDifference": [
{ "$map": {
"input": "$list",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.type", "name" ] },
"$$el.value",
false
]
}
}},
[false]
]
},
"type": {
"$setDifference": [
{ "$map": {
"input": "$list",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.type", "type" ] },
"$$el.value",
false
]
}
}},
[false]
]
},
"color": {
"$setDifference": [
{ "$map": {
"input": "$list",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.type", "color" ] },
"$$el.value",
false
]
}
}},
[false]
]
}
}},
{ "$sort": { "_id": 1 } }
])
If you look at the data in "stages" it makes a lot of sense what is happening here in the tranformation.
The first stage "maps" all of the fields into a single array per document, so all documents now look like this:
{
"_id" : ObjectId("55df0652c9064ef625d7f36e"),
"list" : [
{
"type" : "name",
"value" : "nectarin",
"alpha" : "n"
},
{
"type" : "type",
"value" : "fruit",
"alpha" : "f"
},
{
"type" : "color",
"value" : "pink",
"alpha" : "p"
}
]
}
The $unwind is of little consequence, as it does the standard and creates new documents from each member. It is the $group that does most of the work here with this result per "alpha" in the grouping:
{
"_id" : "o",
"list" : [
{
"type" : "name",
"value" : "orange",
"alpha" : "o"
},
{
"type" : "color",
"value" : "orange",
"alpha" : "o"
},
{
"type" : "name",
"value" : "onion",
"alpha" : "o"
}
]
}
That has a nice grouping, and is arguably a decent output format. But in order to get to the end results then the $map operator is employed again alongside $setDifference which can be used to remove the false values where each field "type" conversion does not match the required output field.
The full result is:
{ "_id" : "a", "name" : [ "almond", "apple" ], "type" : [ ], "color" : [ ] }
{ "_id" : "b", "name" : [ "brocoli", "banana" ], "type" : [ ], "color" : [ "brown" ] }
{ "_id" : "c", "name" : [ "cashew", "cauli" ], "type" : [ ], "color" : [ ] }
{ "_id" : "e", "name" : [ "eggplant" ], "type" : [ ], "color" : [ ] }
{ "_id" : "f", "name" : [ "fig" ], "type" : [ "flower", "fruit" ], "color" : [ ] }
{ "_id" : "g", "name" : [ ], "type" : [ ], "color" : [ "green" ] }
{ "_id" : "l", "name" : [ "lemon" ], "type" : [ ], "color" : [ ] }
{ "_id" : "n", "name" : [ "nectarin" ], "type" : [ "nut" ], "color" : [ ] }
{ "_id" : "o", "name" : [ "onion", "orange" ], "type" : [ ], "color" : [ "orange" ] }
{ "_id" : "p", "name" : [ "potato" ], "type" : [ ], "color" : [ "pink", "purple" ] }
{ "_id" : "r", "name" : [ "rose" ], "type" : [ ], "color" : [ "red" ] }
{ "_id" : "s", "name" : [ "strawberry" ], "type" : [ ], "color" : [ ] }
{ "_id" : "t", "name" : [ "tomato" ], "type" : [ ], "color" : [ ] }
{ "_id" : "v", "name" : [ ], "type" : [ "vege" ], "color" : [ ] }
{ "_id" : "w", "name" : [ ], "type" : [ ], "color" : [ "white" ] }
{ "_id" : "y", "name" : [ ], "type" : [ ], "color" : [ "yellow" ] }
Where everything is grouped alphabetically and with their own arrays for each field.
Upcoming releases of MongoDB will have a $filter that makes the $map and $setDifference combination a bit nicer. But that does not make "sets", not that it matters much to this process as long as $addToSet is employed where it is.
Thinking about this, I would like to "advise" that considering the amount of data you want to process here that the resulting "arrays" for each letter might just possibly exceed the BSON limits depending on how many distinct "words" there actually are.
In which case the "advice" here would be follow the process right up to and including the $match, but then only $group afterwards like this:
{ "$group": {
"_id": {
"alpha": "$list.alpha",
"type": "$list.type",
"value": "$list.value",
}
}},
{ "$sort": { "_id": 1 } }
It's longer output of course, but will not exceed the BSON limit for documents at any stage.

Using aggregation you should use some complex aggregation query. First find out all name first letters using substr after that create all name,type and color array using group use $map to check whether given name starts with or not
$setDifference used to remove duplicate empty parameter and finally $out used for writing documents in new collection.
Check this aggregation query :
db.collection.aggregate({
"$project": {
"firstName": {
"$substr": ["$name", 0, 1]
},
"name": 1,
"type": 1,
"color": 1
}
}, {
"$group": {
"_id": null,
"allName": {
"$push": "$name"
},
"allType": {
"$push": "$type"
},
"allColor": {
"$push": "$color"
},
"allfirstName": {
"$push": "$firstName"
}
}
}, {
"$unwind": "$allfirstName"
}, {
"$group": {
"_id": "$allfirstName",
"allType": {
"$first": "$allType"
},
"allName": {
"$first": "$allName"
},
"allColor": {
"$first": "$allColor"
}
}
}, {
"$project": {
"type": {
"$setDifference": [{
"$map": {
"input": "$allType",
"as": "type",
"in": {
"$cond": {
"if": {
"$eq": [{
"$substr": ["$$type", 0, 1]
}, "$_id"]
},
"then": "$$type",
"else": ""
}
}
}
},
[""]
]
},
"color": {
"$setDifference": [{
"$map": {
"input": "$allColor",
"as": "color",
"in": {
"$cond": {
"if": {
"$eq": [{
"$substr": ["$$color", 0, 1]
}, "$_id"]
},
"then": "$$color",
"else": ""
}
}
}
},
[""]
]
},
"name": {
"$setDifference": [{
"$map": {
"input": "$allName",
"as": "name",
"in": {
"$cond": {
"if": {
"$eq": [{
"$substr": ["$$name", 0, 1]
}, "$_id"]
},
"then": "$$name",
"else": ""
}
}
}
},
[""]
]
}
}
}, {
"$sort": {
"_id": 1
}
}, {
"$out": "newCollection"
})

Related

MongoDB iteration on aggregate

I have a collection :
{
"value" : "20",
"type" : "square",
"name" : "form1"
},
{
"value" : "24",
"type" : "circle",
"name" : "form2"
},
{
"value" : "12",
"type" : "square",
"name" : "form3"
}
This aggregation :
let searchTerm = "form2"
db.myCollec.aggregate([
{ "$facet": {
"data": [
{ "$match": { "name": searchTerm }},
{ "$project": { "name": 1, "type": 1, "_id": 0 }}
]
}},
{ "$project": {
"name": {
"$ifNull": [{ "$arrayElemAt": ["$data.name", 0] }, searchTerm ]
},
"type": {
"$ifNull": [{ "$arrayElemAt": ["$data.type", 0] }, null]
}
}}
])
give this result :
{ "name" : "form2", "type" : "circle" }
and if i'm looking for a non existing "form4" :
{ "name" : "form4", "type" : null }
Now I want to do it for a lot of values so I try to put them in an array then loop on this array. According to the asynchronous property of javascript I try this code :
tab = [ "form2", "form4" ]
for( var i =0; i<(tab.length);i++) { (function (i) {
searchTerm = tab[i]
db.myCollec.aggregate([
{ "$facet": {
"data": [
{ "$match": { "name": searchTerm }},
{ "$project": { "name": 1, "type": 1, "_id": 0 }}
]
}},
{ "$project": {
"name": {
"$ifNull": [{ "$arrayElemAt": ["$data.name", 0] }, searchTerm ]
},
"type": {
"$ifNull": [{ "$arrayElemAt": ["$data.type", 0] }, null]
}
}}
])
}) (i) }
There is no result...
If I add a print(searchTerm) the values are well printed but no result for the aggregation.
Thanx for your help.

Compare Size of Arrays Inside an Array of Objects

I want to find all documents where sCompetitions.length is greater than competitions.length.
Here's some sample documents document:
{
"_id" : ObjectId("59b28f432b4353d3f311dd1b"),
"name" : "Ford Focus RS 2008",
"requirements" : [
{
"rankType" : "D1",
"competitions" : [
ObjectId("59b151fd2b4353d3f3116827"),
ObjectId("59b151fd2b4353d3f3116829")
],
"sCompetitions" : [
"Rallye Monte-Carlo",
"Rally Sweden"
]
},
{
"rankType" : "A3",
"competitions" : [
ObjectId("59b151fd2b4353d3f3116f6b")
],
"sCompetitions" : [
"Rally Italia Sardegna",
"Neste Rally Finland"
]
}
]
},
{
"_id" : ObjectId("0000b28f432b4353f311dd1b"),
"name" : "Ford Focus RS 2012",
"requirements" : [
{
"rankType" : "D1",
"competitions" : [
ObjectId("59b151fd2b4353d3f3116827"),
ObjectId("59b151fd2b4353d3f3116829")
],
"sCompetitions" : [
"Rallye Monte-Carlo",
"Rally Sweden"
]
},
{
"rankType" : "A3",
"competitions" : [
ObjectId("59b151fd2b4353d3f3116f6b"),
ObjectId("59b151fd2b4353d3f3116f6b")
],
"sCompetitions" : [
"Rally Italia Sardegna",
"Neste Rally Finland"
]
}
]
}
So looking at the samples it would only return ObjectId("59b28f432b4353d3f311dd1b")
My problem is that requirements is an array by itself, so I would need to somehow iterate it
No need to "iterate". All you really need is an $anyElementTrue check after returning results from $map. And you can do this all inside a $redact action:
Model.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$anyElementTrue": {
"$map": {
"input": "$requirements",
"as": "r",
"in": {
"$gt": [
{ "$size": "$$r.sCompetitions" },
{ "$size": "$$r.competitions" }
]
}
}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
So it's a simple comparison by $size for each array element, and then if "any" of those elements is true, the document is "kept" or otherwise "pruned" from the results.

Aggregate and Reduce Nested Documents and Arrays

EDIT:
Our use case:
We get continues reports from servers about visitors. We pre-aggregate the data on the servers for a few seconds aber after that insert these "reports" into MongoDB.
In our dashboard we would like to query the different browsers, OSes, geolocation (country etc.) based on time ranges.
So like: Within the last 7 days, there were 1000 visitors using Chrome, 500 from Germany, 200 from England and so on.
I'm pretty stuck with a MongoDB query we need for our dashboard.
We have following report entries:
{
"_id" : ObjectId("59b9d08e402025326e1a0f30"),
"channel_perm_id" : "c361049fb4144b0e81b71c0b6cfdc296",
"source_id" : "insomnia",
"start_timestamp" : ISODate("2017-09-14T00:42:54.510Z"),
"end_timestamp" : ISODate("2017-09-14T00:42:54.510Z"),
"timestamp" : ISODate("2017-09-14T00:42:54.510Z"),
"resource_uri" : "b755d62a-8c0a-4e8a-945f-41782c13535b",
"sources_info" : {
"browsers" : [
{
"name" : "Chrome",
"count" : NumberLong(2)
}
],
"operating_systems" : [
{
"name" : "Mac OS X",
"count" : NumberLong(2)
}
],
"continent_ids" : [
{
"name" : "EU",
"count" : NumberLong(1)
}
],
"country_ids" : [
{
"name" : "DE",
"count" : NumberLong(1)
}
],
"city_ids" : [
{
"name" : "Solingen",
"count" : NumberLong(1)
}
]
},
"unique_sources" : NumberLong(1),
"requests" : NumberLong(1),
"cache_hits" : NumberLong(0),
"cache_misses" : NumberLong(1),
"cache_hit_size" : NumberLong(0),
"cache_refill_size" : NumberLong("170000000000")
}
Now, we need to aggregate these reports based on timestamp.
So far, so easy:
db.channel_report.aggregate([{
$group: {
_id: {
$dateToString: {
format: "%Y",
date: "$timestamp"
}
},
sources_info: {
$push: "$sources_info"
}
},
}];
But now it gets difficult for me. As you might already noticed, the sources_info object is the problem.
Instead of just "pushing" all sources info into array per group, we need to actually accumulate it.
So, if we have something like this:
{
sources_info: [
{
browsers: [
{
name: "Chrome,
count: 1
}
]
},
{
browsers: [
{
name: "Chrome,
count: 1
}
]
}
]
}
The array should be reduced to this:
{
sources_info:
{
browsers: [
{
name: "Chrome,
count: 2
}
]
}
}
We migrated from MySQL to MongoDB for analytics, but I have no clue how to model this behaviour in Mongo. Regarding the docs I almost think it is not possible, at least not with the current data structure.
Is there a nice solution for this? Or maybe even a different kind of data structure?
Cheers,
Chris from StriveCDN
The basic problem you have is that you are using "named keys" where you probably really should be instead using values to a consistent attribute path. This means instead of keys like "browsers", this probably should simply be "type": "browser" and so on on each entry.
The reasoning for this should become apparent on the general approaches to aggregating the data. It also really helps in querying in general. But the approaches basically involve coercing your initial data format into this kind of structure in order to aggregate it first.
With most recent releases ( MongoDB 3.4.4 and greater ), we can work with your named keys via $objectToArray and manipulate as follows:
db.channel_report.aggregate([
{ "$project": {
"timestamp": 1,
"sources": {
"$reduce": {
"input": {
"$map": {
"input": { "$objectToArray": "$sources_info" },
"as": "s",
"in": {
"$map": {
"input": "$$s.v",
"as": "v",
"in": {
"type": "$$s.k",
"name": "$$v.name",
"count": "$$v.count"
}
}
}
}
},
"initialValue": [],
"in": { "$concatArrays": ["$$value", "$$this"] }
}
}
}},
{ "$unwind": "$sources" },
{ "$group": {
"_id": {
"year": { "$year": "$timestamp" },
"type": "$sources.type",
"name": "$sources.name"
},
"count": { "$sum": "$sources.count" }
}},
{ "$group": {
"_id": { "year": "$_id.year", "type": "$_id.type" },
"v": { "$push": { "name": "$_id.name", "count": "$count" } }
}},
{ "$group": {
"_id": "$_id.year",
"sources_info": {
"$push": { "k": "$_id.type", "v": "$v" }
}
}},
{ "$addFields": {
"sources_info": { "$arrayToObject": "$sources_info" }
}}
])
Taking that back a notch to MongoDB 3.4 ( which should be default on most hosted services by now ) you could alternately declare each key name manually:
db.channel_report.aggregate([
{ "$project": {
"timestamp": 1,
"sources": {
"$concatArrays": [
{ "$map": {
"input": "$sources_info.browsers",
"in": {
"type": "browsers",
"name": "$$this.name",
"count": "$$this.count"
}
}},
{ "$map": {
"input": "$sources_info.operating_systems",
"in": {
"type": "operating_systems",
"name": "$$this.name",
"count": "$$this.count"
}
}},
{ "$map": {
"input": "$sources_info.continent_ids",
"in": {
"type": "continent_ids",
"name": "$$this.name",
"count": "$$this.count"
}
}},
{ "$map": {
"input": "$sources_info.country_ids",
"in": {
"type": "country_ids",
"name": "$$this.name",
"count": "$$this.count"
}
}},
{ "$map": {
"input": "$sources_info.city_ids",
"in": {
"type": "city_ids",
"name": "$$this.name",
"count": "$$this.count"
}
}}
]
}
}},
{ "$unwind": "$sources" },
{ "$group": {
"_id": {
"year": { "$year": "$timestamp" },
"type": "$sources.type",
"name": "$sources.name"
},
"count": { "$sum": "$sources.count" }
}},
{ "$group": {
"_id": { "year": "$_id.year", "type": "$_id.type" },
"v": { "$push": { "name": "$_id.name", "count": "$count" } }
}},
{ "$group": {
"_id": "$_id.year",
"sources": {
"$push": { "k": "$_id.type", "v": "$v" }
}
}},
{ "$project": {
"sources_info": {
"browsers": {
"$arrayElemAt": [
"$sources.v",
{ "$indexOfArray": [ "$sources.k", "browsers" ] }
]
},
"operating_systems": {
"$arrayElemAt": [
"$sources.v",
{ "$indexOfArray": [ "$sources.k", "operating_systems" ] }
]
},
"continent_ids": {
"$arrayElemAt": [
"$sources.v",
{ "$indexOfArray": [ "$sources.k", "continent_ids" ] }
]
},
"country_ids": {
"$arrayElemAt": [
"$sources.v",
{ "$indexOfArray": [ "$sources.k", "country_ids" ] }
]
},
"city_ids": {
"$arrayElemAt": [
"$sources.v",
{ "$indexOfArray": [ "$sources.k", "city_ids" ] }
]
}
}
}}
])
We can even wind that back to MongoDB 3.2 by using $map and $filter in place of $indexOfArray, but the general approach is the main thing to explain.
Concatenate arrays
The main thing that needs to happen is to take the data from the many different arrays with named keys and make a "single array" with a "type" property representing each key name. This is arguably how the data should be stored in the first place, and the first aggregation stage of either approach comes out like this:
/* 1 */
{
"_id" : ObjectId("59b9d08e402025326e1a0f30"),
"timestamp" : ISODate("2017-09-14T00:42:54.510Z"),
"sources" : [
{
"type" : "browsers",
"name" : "Chrome",
"count" : NumberLong(2)
},
{
"type" : "operating_systems",
"name" : "Mac OS X",
"count" : NumberLong(2)
},
{
"type" : "continent_ids",
"name" : "EU",
"count" : NumberLong(1)
},
{
"type" : "country_ids",
"name" : "DE",
"count" : NumberLong(1)
},
{
"type" : "city_ids",
"name" : "Solingen",
"count" : NumberLong(1)
}
]
}
Unwind and Group
Part of the data you want to accumulate on actually includes those "type" and "name" properties from "within" the array. Whenever you need to accumulate across documents from "within an array", the process you use is $unwind in order to be able to access those values as part of the grouping key.
What this means is that after using $unwind on the combined array, you then want to $group on both of those keys and the reduced "timestamp" detail in order to $sum the "count" values.
Since you then have "sub-levels" of detail ( i.e each name of browser within browsers ) then you use additional $group pipeline stages, gradually decreasing the granularity of the grouping keys and using $push to accumulate the details into arrays.
In either case, omitting the very last stage of output the accumulated structure comes out as:
/* 1 */
{
"_id" : 2017,
"sources_info" : [
{
"k" : "continent_ids",
"v" : [
{
"name" : "EU",
"count" : NumberLong(1)
}
]
},
{
"k" : "city_ids",
"v" : [
{
"name" : "Solingen",
"count" : NumberLong(1)
}
]
},
{
"k" : "country_ids",
"v" : [
{
"name" : "DE",
"count" : NumberLong(1)
}
]
},
{
"k" : "browsers",
"v" : [
{
"name" : "Chrome",
"count" : NumberLong(2)
}
]
},
{
"k" : "operating_systems",
"v" : [
{
"name" : "Mac OS X",
"count" : NumberLong(2)
}
]
}
]
}
This really is the final state of the data, though not represented in the same form as it was originally found. It is arguably complete at this point as any further processing is merely cosmetic to output as named keys again.
Output to named keys
As shown the varied approaches are either looking up the array entries by the matching key name, or by using $arrayToObject to transform the array content back into an object with named keys.
An alternate is also to simply do that very last manipulation in code, as shown by this .map() example of manipulating the cursor result in the shell:
db.channel_report.aggregate([
{ "$project": {
"timestamp": 1,
"sources": {
"$reduce": {
"input": {
"$map": {
"input": { "$objectToArray": "$sources_info" },
"as": "s",
"in": {
"$map": {
"input": "$$s.v",
"as": "v",
"in": {
"type": "$$s.k",
"name": "$$v.name",
"count": "$$v.count"
}
}
}
}
},
"initialValue": [],
"in": { "$concatArrays": ["$$value", "$$this"] }
}
}
}},
{ "$unwind": "$sources" },
{ "$group": {
"_id": {
"year": { "$year": "$timestamp" },
"type": "$sources.type",
"name": "$sources.name"
},
"count": { "$sum": "$sources.count" }
}},
{ "$group": {
"_id": { "year": "$_id.year", "type": "$_id.type" },
"v": { "$push": { "name": "$_id.name", "count": "$count" } }
}},
{ "$group": {
"_id": "$_id.year",
"sources_info": {
"$push": { "k": "$_id.type", "v": "$v" }
}
}},
/*
{ "$addFields": {
"sources_info": { "$arrayToObject": "$sources_info" }
}}
*/
]).map( d => Object.assign(d,{
"sources_info": d.sources_info.reduce((acc,curr) =>
Object.assign(acc,{ [curr.k]: curr.v }),{})
}))
Which of course applies to either aggregation pipeline approach.
And of course even $concatArrays can be replaced with $setUnion as long as all the entries have a unique identifying combination of "name" and "type" ( as they appear to be ), and that means with application of modifying the final output by processing the cursor instead you can apply the technique even as far back as MongoDB 2.6.
Final Output
And the final output ( actually aggregated of course, but the question only samples one document ) accumulates for all the sub-keys and reconstructs from the last sample output as shown as:
{
"_id" : 2017,
"sources_info" : {
"continent_ids" : [
{
"name" : "EU",
"count" : NumberLong(1)
}
],
"city_ids" : [
{
"name" : "Solingen",
"count" : NumberLong(1)
}
],
"country_ids" : [
{
"name" : "DE",
"count" : NumberLong(1)
}
],
"browsers" : [
{
"name" : "Chrome",
"count" : NumberLong(2)
}
],
"operating_systems" : [
{
"name" : "Mac OS X",
"count" : NumberLong(2)
}
]
}
}
Where every array entry under each key of sources_info is reduced down to it's cumulative count for every other entry sharing the same "name".

MongoDB aggregate multiple group by top fields and array fields

My collection will look like this,
{
"_id" : ObjectId("591c5971240033283736860a"),
"status" : "Done",
"createdDate" : ISODate("2017-05-17T14:09:20.653Z")
"communications" : [
{
"communicationUUID" : "df07948e-4a14-468e-beb1-db55ff72b215",
"communicationType" : "CALL",
"recipientId" : 12345,
"createdDate" : ISODate("2017-05-18T14:09:20.653Z")
"callResponse" : {
"Status" : "completed",
"id" : "dsd45554545ds92a9bd2c12e0e6436d",
}
}
]}
{
"_id" : ObjectId("45sdsd59124003345121450a"),
"status" : "ToDo",
"createdDate" : ISODate("2017-05-17T14:09:20.653Z")
"communications" : [
{
"communicationUUID" : "45sds55-4a14-468e-beb1-db55ff72b215",
"communicationType" : "CALL",
"recipientId" : 1234,
"createdDate" : ISODate("2017-05-18T14:09:20.653Z")
"callResponse" : {
"Status" : "completed",
"id" : "84fe862f1924455dsds5556436d",
}
}
]}
Currently I am writing two aggregate query to achieve my requirement and my query will be below
db.collection.aggregate(
{ $project: {
dayMonthYear: { $dateToString: { format: "%d/%m/%Y", date: "$createdDate" } },
status: 1,
}},
{ $group: {
_id: "$dayMonthYear",
Pending: { $sum: { $cond : [{ $eq : ["$status", "ToDo"]}, 1, 0]} },
InProgress: { $sum: { $cond : [{ $eq : ["$status", "InProgress"]}, 1, 0]} },
Done: { $sum: { $cond : [{ $eq : ["$status", "Done"]}, 1, 0]} },
Total: { $sum: 1 }
}}
My output will be,
{"_id" : "17/05/2017", "Pending" : 1.0, "InProgress" : 0.0, "Done" : 1.0, "Total" : 2.0 }
Using above query I can able to get count but I need to find the count based on communication Status too so I am writing one more query to achieve,
db.collection.aggregate(
{"$unwind":"$communications"},
{ $project: {
dayMonthYear: { $dateToString: { format: "%d/%m/%Y", date: "$createdDate" } },
communications: 1
}},
{ "$group": {
_id: "$dayMonthYear",
"total_call": { $sum: { $cond : [{ $or : [ { $eq: [ "$communications.callResponse.Status", "failed"] },
{ $eq: [ "$communications.callResponse.Status", "busy"] },
{ $eq: [ "$communications.callResponse.Status", "completed"] },
{ $eq: [ "$communications.callResponse.Status", "no-answer"] }
]}, 1, 0 ] }},
"engaged": { $addToSet: { $cond : [{ $eq : ["$communications.callResponse.Status", "completed"]},
"$communications.recipientId", "null" ]} },
"not_engaged": { $addToSet: { $cond: [{ $or : [ { $eq: [ "$communications.callResponse.Status", "failed"] },
{ $eq: [ "$communications.callResponse.Status", "busy"] },
{ $eq: [ "$communications.callResponse.Status", "no-answer"] } ]},
"$communications.recipientId", "null" ] }}
}},
{ "$project": {
"_id": 1,
"total_call": 1,
"engaged": { "$setDifference": [ "$ngaged", ["null"] ] },
"not_engaged": { "$setDifference": [ "$not_engaged", ["null"] ] },
}},
{ "$project": {
"total_call": 1,
"engaged": { "$size": "$engaged" },
"not_engaged": { "$size": { "$setDifference": [ "$not_engaged", "$engaged" ] }},
}})
My output will be,
{"_id" : "18/05/2017", "total_call" : 2.0, "engaged" : 2, "not_engaged" : 0}
Using above query I can able to get count but I want to achieve it in single query
I am looking for output like
{"_id":"17/05/2017", "Pending" : 1.0, "InProgress" : 0.0, "Done" : 1.0, "total_call" : 0, "engaged" : 0, "not_engaged" : 0}
{"_id":"18/05/2017", "Pending" : 0.0, "InProgress" : 0.0, "Done" : 0.0, "total_call" : 2, "engaged" : 2, "not_engaged" : 0}
Can anyone suggest or provide me good way to get above result.
You can use $concatArrays to merge the status& createdDate documents followed by $group to count the occurrences.
db.collection.aggregate([
{
"$project": {
"statusandcreateddate": {
"$concatArrays": [
[
{
"status": "$status",
"createdDate": "$createdDate"
}
],
{
"$map": {
"input": "$communications",
"as": "l",
"in": {
"status": "$$l.callResponse.Status",
"createdDate": "$$l.createdDate"
}
}
}
]
}
}
},
{
"$unwind": "$statusandcreateddate"
},
{
"$group": {
"_id": {
"$dateToString": {
"format": "%d/%m/%Y",
"date": "$statusandcreateddate.createdDate"
}
},
"total_call": {
"$sum": {
"$cond": [
{
"$or": [
{
"$eq": [
"$statusandcreateddate.status",
"failed"
]
},
{
"$eq": [
"$statusandcreateddate.status",
"busy"
]
},
{
"$eq": [
"$statusandcreateddate.status",
"completed"
]
},
{
"$eq": [
"$statusandcreateddate.status",
"no-answer"
]
}
]
},
1,
0
]
}
},
"engaged": {
"$sum": {
"$cond": [
{
"$eq": [
"$statusandcreateddate.status",
"completed"
]
},
1,
0
]
}
},
"not_engaged": {
"$sum": {
"$cond": [
{
"$or": [
{
"$eq": [
"$statusandcreateddate.status",
"failed"
]
},
{
"$eq": [
"$statusandcreateddate.status",
"busy"
]
},
{
"$eq": [
"$statusandcreateddate.status",
"no-answer"
]
}
]
},
1,
0
]
}
},
"Pending": {
"$sum": {
"$cond": [
{
"$eq": [
"$statusandcreateddate.status",
"ToDo"
]
},
1,
0
]
}
},
"InProgress": {
"$sum": {
"$cond": [
{
"$eq": [
"$statusandcreateddate.status",
"InProgress"
]
},
1,
0
]
}
},
"Done": {
"$sum": {
"$cond": [
{
"$eq": [
"$statusandcreateddate.status",
"Done"
]
},
1,
0
]
}
}
}
}
])

MongoDB: aggregating fields from arrays of subdocuments

I have a mongodb collection called Events, containing baseball games. Here is an example of one record in the table:
{
"name" : "Game# 814",
"dateStart" : ISODate("2012-09-28T14:47:53.695Z"),
"_id" : ObjectId("53a1b24de3f25f4443d9747e"),
"stats" : [
{
"team" : ObjectId("53a11a43a8de6dd8375c940b"),
"teamName" : "Reds",
"_id" : ObjectId("53a1b24de3f25f4443d97480"),
"score" : 17
},
{
"team" : ObjectId("53a11a43a8de6dd8375c938d"),
"teamName" : "Yankees",
"_id" : ObjectId("53a1b24de3f25f4443d9747f"),
"score" : 12
}
]
"__v" : 0
}
I need help writing the query that returns standings for all teams. The result set should look like:
{
"team" : ObjectId("53a11a43a8de6dd8375c938d"),
"teamName" : "Yankees",
"wins" : <<number of Yankees wins>>
"losses" : <<number of Yankees losses>>
"draws" : <<number of Yankees draws>>
}
{
"team" : ObjectId("53a11a43a8de6dd8375c940b"),
"teamName" : "Reds",
"wins" : <<number of Reds wins>>
"losses" : <<number of Reds losses>>
"draws" : <<number of Reds draws>>
}
...
Here's the query I've started with...
db.events.aggregate(
{"$unwind": "$stats" },
{ $group : {
_id : "$stats.team",
gamesPlayed : { $sum : 1},
totalScore : { $sum : "$stats.score" }
}}
);
... which returns results:
{
"result" : [
{
"_id" : ObjectId("53a11a43a8de6dd8375c93cb"),
"gamesPlayed" : 125, // not a requirement... just trying to get $sum working
"totalScore" : 1213 // ...same here
},
{
"_id" : ObjectId("53a11a44a8de6dd8375c955f"),
"gamesPlayed" : 128,
"totalScore" : 1276
},
{
"_id" : ObjectId("53a11a44a8de6dd8375c9661"),
"gamesPlayed" : 152,
"totalScore" : 1509
},
....
It would seem advisable for you to keep your "wins", "losses", "draws" within your documents as you create or update them. But it is possible to do with aggregate if a little long winded
db.events.aggregate([
// Unwind the "stats" array
{ "$unwind": "$stats" },
// Combine the document with new fields
{ "$group": {
"_id": "$_id",
"firstTeam": { "$first": "$stats.team" },
"firstTeamName": { "$first": "$stats.teamName" },
"firstScore": { "$first": "$stats.score" },
"lastTeam": { "$last": "$stats.team" },
"lastTeamName": { "$last": "$stats.teamName" },
"lastScore": { "$last": "$stats.score" },
"minScore": { "$min": "$stats.score" },
"maxScore": { "$max": "$stats.score" }
}},
// Calculate by comparing scores
{ "$project": {
"firstTeam": 1,
"firstTeamName": 1,
"firstScore": 1,
"lastTeam": 1,
"lastTeamName": 1,
"lastScore": 1,
"firstWins": {
"$cond": [
{ "$gt": [ "$firstScore", "$lastScore" ] },
1,
0
]
},
"firstLosses": {
"$cond": [
{ "$lt": [ "$firstScore", "$lastScore" ] },
1,
0
]
},
"firstDraws": {
"$cond": [
{ "$eq": [ "$firstScore", "$lastScore" ] },
1,
0
]
},
"lastWins": {
"$cond": [
{ "$gt": [ "$lastScore", "$firstScore" ] },
1,
0
]
},
"lastLosses": {
"$cond": [
{ "$lt": [ "$lastScore", "$firstScore" ] },
1,
0
]
},
"lastDraws": {
"$cond": [
{ "$eq": [ "$lastScore", "$firstScore" ] },
1,
0
]
},
"type": { "$literal": [ true, false ] }
}},
// Unwind the "type"
{ "$unwind": "$type" },
// Group teams conditionally on "type"
{ "$group": {
"_id": {
"team": {
"$cond": [
"$type",
"$firstTeam",
"$lastTeam"
]
},
"teamName": {
"$cond": [
"$type",
"$firstTeamName",
"$lastTeamName"
]
}
},
"owins": {
"$sum": {
"$cond": [
"$type",
"$firstWins",
"$lastWins"
]
}
},
"olosses": {
"$sum": {
"$cond": [
"$type",
"$firstLosses",
"$lastLosses"
]
}
},
"odraws": {
"$sum": {
"$cond": [
"$type",
"$firstDraws",
"$lastDraws"
]
}
}
}},
// Project your final form
{ "$project": {
"_id": 0,
"team": "$_id.team",
"teamName": "$_id.teamName",
"wins": "$owins",
"losses": "$olosses",
"draws": "$odraws"
}}
])
The first part is to "re-shape" the document by unwinding the array and then grouping with "first" and "last" for defining fields for your two teams.
Then you want to $project through those documents and calculate your "wins", "losses" and "draws" for each team in the pairing. The additional thing is adding an array field for the two values true/false is convenient here. If you are on a pre 2.6 version of mongodb the $literal can be replaced with $const which is not documented but does the same thing.
Once you $unwind that "type" array, the documents can be split apart in the $group stage by evaluating whether to choose the "first" or "last" team field values via the use of $cond. This is a ternary operator that evaluates a true/false condition and returns the appropriate value according to that condition.
With a final $project your documents are formed exactly how you want.