I have a collection of documents that look like this
{
_id: 1,
weight: 2,
height: 3,
fruit: "Orange",
bald: "Yes"
},
{
_id: 2,
weight: 4,
height: 5,
fruit: "Apple",
bald: "No"
}
I need to get a result that aggregates the entire collection into this.
{
avgWeight: 3,
avgHeight: 4,
orangeCount: 1,
appleCount: 1,
baldCount: 1
}
I think I could map/reduce this, or I could query the averages and counts separately. The only values fruit could ever have are Apple and Orange. What other ways would you go about doing this? I've been away from MongoDB for a while now and maybe there are new amazing ways to do this I'm not aware of?
Aggregation Framework
The aggregation framework will do far better for you than what mapReduce can do, and the basic method is compatible with every release back to 2.2 when the aggregation framework was released.
If you have MongoDB 3.6 you can do
db.fruit.aggregate([
{ "$group": {
"_id": "$fruit",
"avgWeight": { "$avg": "$weight" },
"avgHeight": { "$avg": "$height" },
"baldCount": {
"$sum": { "$cond": [{ "$eq": ["$bald", "Yes"] }, 1, 0] }
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"data": {
"$push": {
"k": {
"$concat": [
{ "$toLower": "$_id" },
"Count"
]
},
"v": "$count"
}
},
"avgWeight": { "$avg": "$avgWeight" },
"avgHeight": { "$avg": "$avgHeight" },
"baldCount": { "$sum": "$baldCount" }
}},
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{ "$arrayToObject": "$data" },
{
"avgWeight": "$avgWeight",
"avgHeight": "$avgHeight",
"baldCount": "$baldCount"
}
]
}
}}
])
As a slight alternate, you can apply the $mergeObjects in the $group here instead:
db.fruit.aggregate([
{ "$group": {
"_id": "$fruit",
"avgWeight": { "$avg": "$weight" },
"avgHeight": { "$avg": "$height" },
"baldCount": {
"$sum": { "$cond": [{ "$eq": ["$bald", "Yes"] }, 1, 0] }
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"data": {
"$mergeObjects": {
"$arrayToObject": [[{
"k": {
"$concat": [
{ "$toLower": "$_id" },
"Count"
]
},
"v": "$count"
}]]
}
},
"avgWeight": { "$avg": "$avgWeight" },
"avgHeight": { "$avg": "$avgHeight" },
"baldCount": { "$sum": "$baldCount" }
}},
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [
"$data",
{
"avgWeight": "$avgWeight",
"avgHeight": "$avgHeight",
"baldCount": "$baldCount"
}
]
}
}}
])
But there are reasons why I personally don't think that is the better approach, and that mostly leads to the next concept.
So even if you don't have a "latest" MongoDB release, you can simply reshape the output since that is all the last pipeline stage actually using the MongoDB 3.6 features is doing:
db.fruit.aggregate([
{ "$group": {
"_id": "$fruit",
"avgWeight": { "$avg": "$weight" },
"avgHeight": { "$avg": "$height" },
"baldCount": {
"$sum": { "$cond": [{ "$eq": ["$bald", "Yes"] }, 1, 0] }
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"data": {
"$push": {
"k": {
"$concat": [
{ "$toLower": "$_id" },
"Count"
]
},
"v": "$count"
}
},
"avgWeight": { "$avg": "$avgWeight" },
"avgHeight": { "$avg": "$avgHeight" },
"baldCount": { "$sum": "$baldCount" }
}},
/*
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{ "$arrayToObject": "$data" },
{
"avgWeight": "$avgWeight",
"avgHeight": "$avgHeight",
"baldCount": "$baldCount"
}
]
}
}}
*/
]).map( d =>
Object.assign(
d.data.reduce((acc,curr) => Object.assign(acc,{ [curr.k]: curr.v }), {}),
{ avgWeight: d.avgWeight, avgHeight: d.avgHeight, baldCount: d.baldCount }
)
)
And of course you can even just "hardcode" the keys:
db.fruit.aggregate([
{ "$group": {
"_id": null,
"appleCount": {
"$sum": {
"$cond": [{ "$eq": ["$fruit", "Apple"] }, 1, 0]
}
},
"orangeCount": {
"$sum": {
"$cond": [{ "$eq": ["$fruit", "Orange"] }, 1, 0]
}
},
"avgWeight": { "$avg": "$weight" },
"avgHeight": { "$avg": "$height" },
"baldCount": {
"$sum": {
"$cond": [{ "$eq": ["$bald", "Yes"] }, 1, 0]
}
}
}}
])
But it would not be recommended as your data might just change some day, and if there is a value to "group on" then it's better to actually use it than coercing with conditions.
In any form you return the same result:
{
"appleCount" : 1,
"orangeCount" : 1,
"avgWeight" : 3,
"avgHeight" : 4,
"baldCount" : 1
}
We do this with "two" $group stages, being once for accumulating "per fruit" and then secondly to compact all fruit to an array using $push under "k" and "v" values to keep their "key" and their "count". We do a little transformation on the "key" here using $toLower and $concat to join the strings. This is optional at this stage but easier in general.
The "alternate" for 3.6 is simply applying $mergeObjects within this earlier stage instead of $push since we already accumulated these keys. It's just really moving the $arrayToObject to a different stage in the pipeline. It's not really necessary and does not really have any specific advantage. If anything it just removes the flexible option as demonstrated with the "client transform" discussed later.
The "average" accumulations are done via $avg and the "bald" is counted using $cond to test the strings and feed a number to $sum. As the array is "rolled up" we can do all those accumulations again to total for everything.
As mentioned, the only part that actually relies on "new features" is all within the $replaceRoot stage which re-writes the "root" document. That's why this is optional as you can simply do these transformations after the same "already aggregated" data is returned from the database.
All we really do here is take that array with the "k" and "v" entries and turn it into an "object" with named keys via $arrayToObject and apply $mergeObjects on that result with the other keys we already produced at the "root". This transforms that array to be part of the main document returned in result.
The exact same transformation is applied using the JavaScript Array.reduce() and Object.assign() methods in the mongo shell compatible code. It's a very simple thing to apply and the Cursor.map() is generally a feature of most language implementations, so you can do these transforms before you start using the cursor results.
With ES6 compatible JavaScript environments ( not the shell ), we can shorten that syntax a little more:
.map(({ data, ...d }) => ({ ...data.reduce((o,[k,v]) => ({ ...o, [k]: v }), {}), ...d }) )
So it truly is a "one line" function, and that's a general reason why transformations like these are often better in the client code than the server anyway.
As a note on the usage of $cond, it is noted that using it for "hardcoded" evaluation is not really a good idea for several reasons. So it really does not make much sense to "force" that evaluation. Even with the data you present the "bald" would be better expressed as a Boolean value than a "string". If you change "Yes/No" to be true/false then even that "one" valid usage becomes:
"baldCount": { "$sum": { "$cond": ["$bald", 1, 0 ] } }
Which removes the need to "test" a condition on a string match since it's already true/false. MongoDB 4.0 adds another enhancement using $toInt to "coerce" the Boolean to an integer:
"baldCount": { "$sum": { "$toInt": "$bald" } }
That removes the need for $cond altogether, as would simply recording 1 or 0 but that change might cause a loss of clarity in the data, so it is still probably reasonable to have that sort of "coercion" there, but not really optimal anywhere else.
Even with the "dynamic" form using "two" $group stages for accumulation, the main work is still done in the first stage. It simply leaves the remaining accumulation on n result documents for the number of possible unique values of the grouping key. In this case "two", so even though it's an additional instruction there is no real overhead for the gain of having flexible code.
MapReduce
If you really have you're heart set on at least "trying" a mapReduce, then it's really a single pass with a finalize function just to make the averages
db.fruit.mapReduce(
function() {
emit(null,{
"key": { [`${this.fruit.toLowerCase()}Count`]: 1 },
"totalWeight": this.weight,
"totalHeight": this.height,
"totalCount": 1,
"baldCount": (this.bald === "Yes") ? 1 : 0
});
},
function(key,values) {
var output = {
key: { },
totalWeight: 0,
totalHeight: 0,
totalCount: 0,
baldCount: 0
};
for ( let value of values ) {
for ( let key in value.key ) {
if ( !output.key.hasOwnProperty(key) )
output.key[key] = 0;
output.key[key] += value.key[key];
}
Object.keys(value).filter(k => k != 'key').forEach(k =>
output[k] += value[k]
)
}
return output;
},
{
"out": { "inline": 1 },
"finalize": function(key,value) {
return Object.assign(
value.key,
{
avgWeight: value.totalWeight / value.totalCount,
avgHeight: value.totalHeight / value.totalCount,
baldCount: value.baldCount
}
)
}
}
)
Since we already ran through the process for the aggregate() method then the general points should be pretty familiar since we are basically doing much the same thing here.
The main differences are for an "average" you actually need the full totals and counts and of course you get a bit more control over accumulating via an "Object" with JavaScript code.
The results are basically the same, just with the standard mapReduce "bent" on how it presents them:
{
"_id" : null,
"value" : {
"orangeCount" : 1,
"appleCount" : 1,
"avgWeight" : 3,
"avgHeight" : 4,
"baldCount" : 1
}
}
Summary
The general catch being of course that MapReduce using interpreted JavaScript in order to execute has a much higher cost and slower execution than the native coded operations of the aggregation framework.There once may have been an option to use MapReduce for this kind of output on "larger" result sets, but since MongoDB 2.6 introduced "cursor" output for the aggregation framework then the scales have been firmly tipped in favor of the newer option.
Fact is that most "legacy" reasons for employing MapReduce is basically superseded by it's younger sibling as the aggregation framework gains new operations which remove the need for the JavaScript execution environment. It would be a fair comment to say that support for JavaScript execution is generally "dwindling", and once legacy options which used this from the beginning are being gradually removed from the product.
db.demo.aggregate(
// Pipeline
[
// Stage 1
{
$project: {
weight: 1,
height: 1,
Orange: {
$cond: {
if: {
$eq: ["$fruit", 'Orange']
},
then: {
$sum: 1
},
else: 0
}
},
Apple: {
$cond: {
if: {
$eq: ["$fruit", 'Apple']
},
then: {
$sum: 1
},
else: 0
}
},
bald: {
$cond: {
if: {
$eq: ["$bald", 'Yes']
},
then: {
$sum: 1
},
else: 0
}
},
}
},
// Stage 2
{
$group: {
_id: null,
avgWeight: {
$avg: '$weight'
},
avgHeight: {
$avg: '$height'
},
orangeCount: {
$sum: '$Orange'
},
appleCount: {
$sum: '$Apple'
},
baldCount: {
$sum: '$bald'
}
}
},
]
);
Related
Here's a very simple data example.
[
{
"aaa": true,
"bbb": 111,
},
{
"aaa": false,
"bbb": 111,
}
]
Then, what query should be executed so that I can get the result like this?
[
{
"_id": "0",
"bbb_sum": 222,
"aaa_and": false,
"aaa_or": true
}
]
Actually, I've tried with a query like this
db.collection.aggregate([
{
"$group": {
"_id": "0",
"bbb_sum": {
"$sum": "$bbb"
},
"aaa_and": {
"$and": ["$aaa", true]
},
"aaa_or": {
"$or": ["$aaa", false]
}
}
}
])
But the Mongo Playground complains query failed: (Location40237) The $and accumulator is a unary operator, that's quite confusing.
You can also find this simple test case here https://mongoplayground.net/p/8dqtXJ93vIx
Also, I've searched for similar questions on both Google and Stackoverflow, but I can't find one.
Thanks in advance!
Not like "$sum","$and" and "$or" are not aggregation operators that can be used in "$group". You can temporary save all the "aaa" field into an array and then use "$project" operator to process the data.
db.collection.aggregate([
{
"$group": {
"_id": "0",
"sum": {
"$sum": "$bbb"
},
"aaa_all": {
"$push": "$aaa"
}
}
},
{
"$project": {
"sum": 1,
"aaa_and": {
"$allElementsTrue": "$aaa_all"
},
"aaa_or": {
"$anyElementTrue": "$aaa_all"
}
}
}
])
Here is the case: https://mongoplayground.net/p/Y-Fs_Ch9lwk
$min and $max operators actually work with booleans too, false being considered smaller than true
thinking of them as 0 and 1 might be easier to understand :
$min: [a,…,n] will return 1/true only if all elements are 1/true => this is a AND
$max: [a,…,n] will return 0/false only if all elements are 0/false => this is a OR
(the operators will return booleans if input booleans, the analogy with numbers is only for the sake of comprehension)
So your request can simply become :
db.collection.aggregate([
{
"$group": {
"_id": "0",
"bbb_sum": {
"$sum": "$bbb"
},
"aaa_and": {
"$min": "$aaa"
},
"aaa_or": {
"$max": "$aaa"
}
}
}
])
You can do some logics as below
db.collection.aggregate([
{
"$group": {//Group by desired id
"_id": null,
"sum": {//Sum the value
"$sum": "$bbb"
},
"aaa_and": {
"$sum": {
"$cond": {
"if": {
"$eq": [
"$aaa",
true
]
},
"then": 1, //If true returns 1
"else": 0 // else 0
}
}
},
"total": { //helper to do the logic
$sum: 1
}
}
},
{
$project: {
aaa_and: {
"$eq": [//If total matches with number of true, all are true
"$total",
"$aaa_and"
]
},
aaa_or: {
"$ne": [//if value greater than 0, then there is at least one true
"$aaa_and",
"0"
]
},
sum: 1
}
}
])
playground
I have a data in profile collection
[
{
name: "Harish",
gender: "Male",
caste: "Vokkaliga",
education: "B.E"
},
{
name: "Reshma",
gender: "Female",
caste: "Vokkaliga",
education: "B.E"
},
{
name: "Rangnath",
gender: "Male",
caste: "Lingayath",
education: "M.C.A"
},
{
name: "Lakshman",
gender: "Male",
caste: "Lingayath",
education: "B.Com"
},
{
name: "Reshma",
gender: "Female",
caste: "Lingayath",
education: "B.E"
}
]
here I need to calculate total Number of different gender, total number of different caste and total number of different education.
Expected o/p
{
gender: [{
name: "Male",
total: "3"
},
{
name: "Female",
total: "2"
}],
caste: [{
name: "Vokkaliga",
total: "2"
},
{
name: "Lingayath",
total: "3"
}],
education: [{
name: "B.E",
total: "3"
},
{
name: "M.C.A",
total: "1"
},
{
name: "B.Com",
total: "1"
}]
}
using mongodb aggregation how can I get the expected result.
There are different approaches depending on the version available, but they all essentially break down to transforming your document fields into separate documents in an "array", then "unwinding" that array with $unwind and doing successive $group stages in order to accumulate the output totals and arrays.
MongoDB 3.4.4 and above
Latest releases have special operators like $arrayToObject and $objectToArray which can make transfer to the initial "array" from the source document more dynamic than in earlier releases:
db.profile.aggregate([
{ "$project": {
"_id": 0,
"data": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$in": [ "$$this.k", ["gender","caste","education"] ] }
}
}
}},
{ "$unwind": "$data" },
{ "$group": {
"_id": "$data",
"total": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.k",
"v": {
"$push": { "name": "$_id.v", "total": "$total" }
}
}},
{ "$group": {
"_id": null,
"data": { "$push": { "k": "$_id", "v": "$v" } }
}},
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": "$data"
}
}}
])
So using $objectToArray you make the initial document into an array of it's keys and values as "k" and "v" keys in the resulting array of objects. We apply $filter here in order to select by "key". Here using $in with a list of keys we want, but this could be more dynamically used as a list of keys to "exclude" where that was shorter. It's just using logical operators to evaluate the condition.
The end stage here uses $replaceRoot and since all our manipulation and "grouping" in between still keeps that "k" and "v" form, we then use $arrayToObject here to promote our "array of objects" in result to the "keys" of the top level document in output.
MongoDB 3.6 $mergeObjects
As an extra wrinkle here, MongoDB 3.6 includes $mergeObjects which can be used as an "accumulator" in a $group pipeline stage as well, thus replacing the $push and making the final $replaceRoot simply shifting the "data" key to the "root" of the returned document instead:
db.profile.aggregate([
{ "$project": {
"_id": 0,
"data": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$in": [ "$$this.k", ["gender","caste","education"] ] }
}
}
}},
{ "$unwind": "$data" },
{ "$group": { "_id": "$data", "total": { "$sum": 1 } }},
{ "$group": {
"_id": "$_id.k",
"v": {
"$push": { "name": "$_id.v", "total": "$total" }
}
}},
{ "$group": {
"_id": null,
"data": {
"$mergeObjects": {
"$arrayToObject": [
[{ "k": "$_id", "v": "$v" }]
]
}
}
}},
{ "$replaceRoot": { "newRoot": "$data" } }
])
This is not really that different to what is being demonstrated overall, but simply demonstrates how $mergeObjects can be used in this way and may be useful in cases where the grouping key was something different and we did not want that final "merge" to the root space of the object.
Note that the $arrayToObject is still needed to transform the "value" back into the name of the "key", but we just do it during the accumulation rather than after the grouping, since the new accumulation allows the "merge" of keys.
MongoDB 3.2
Taking it back a version or even if you have a MongoDB 3.4.x that is less than the 3.4.4 release, we can still use much of this but instead we deal with the creation of the array in a more static fashion, as well as handling the final "transform" on output differently due to the aggregation operators we don't have:
db.profile.aggregate([
{ "$project": {
"data": [
{ "k": "gender", "v": "$gender" },
{ "k": "caste", "v": "$caste" },
{ "k": "education", "v": "$education" }
]
}},
{ "$unwind": "$data" },
{ "$group": {
"_id": "$data",
"total": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.k",
"v": {
"$push": { "name": "$_id.v", "total": "$total" }
}
}},
{ "$group": {
"_id": null,
"data": { "$push": { "k": "$_id", "v": "$v" } }
}},
/*
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": "$data"
}
}}
*/
]).map( d =>
d.data.map( e => ({ [e.k]: e.v }) )
.reduce((acc,curr) => Object.assign(acc,curr),{})
)
This is exactly the same thing, except instead of having a dynamic transform of the document into the array, we actually "explicitly" assign each array member with the same "k" and "v" notation. Really just keeping those key names for convention at this point since none of the aggregation operators here depend on that at all.
Also instead of using $replaceRoot, we just do exactly the same thing as what the previous pipeline stage implementation was doing there but in client code instead. All MongoDB drivers have some implementation of cursor.map() to enable "cursor transforms". Here with the shell we use the basic JavaScript functions of Array.map() and Array.reduce() to take that output and again promote the array content to being the keys of the top level document returned.
MongoDB 2.6
And falling back to MongoDB 2.6 to cover the versions in between, the only thing that changes here is the usage of $map and a $literal for input with the array declaration:
db.profile.aggregate([
{ "$project": {
"data": {
"$map": {
"input": { "$literal": ["gender","caste", "education"] },
"as": "k",
"in": {
"k": "$$k",
"v": {
"$cond": {
"if": { "$eq": [ "$$k", "gender" ] },
"then": "$gender",
"else": {
"$cond": {
"if": { "$eq": [ "$$k", "caste" ] },
"then": "$caste",
"else": "$education"
}
}
}
}
}
}
}
}},
{ "$unwind": "$data" },
{ "$group": {
"_id": "$data",
"total": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.k",
"v": {
"$push": { "name": "$_id.v", "total": "$total" }
}
}},
{ "$group": {
"_id": null,
"data": { "$push": { "k": "$_id", "v": "$v" } }
}},
/*
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": "$data"
}
}}
*/
])
.map( d =>
d.data.map( e => ({ [e.k]: e.v }) )
.reduce((acc,curr) => Object.assign(acc,curr),{})
)
Since the basic idea here is to "iterate" a provided array of the field names, the actual assignment of values comes by "nesting" the $cond statements. For three possible outcomes this means only a single nesting in order to "branch" for each outcome.
Modern MongoDB from 3.4 have $switch which makes this branching simpler, yet this demonstrates the logic was always possible and the $cond operator has been around since the aggregation framework was introduced in MongoDB 2.2.
Again, the same transformation on the cursor result applies as there is nothing new there and most programming languages have the ability to do this for years, if not from inception.
Of course the basic process can even be done way back to MongoDB 2.2, but just applying the array creation and $unwind in a different way. But no-one should be running any MongoDB under 2.8 at this point in time, and official support even from 3.0 is even fast running out.
Output
For visualization, the output of all demonstrated pipelines here has the following form before the last "transform" is done:
/* 1 */
{
"_id" : null,
"data" : [
{
"k" : "gender",
"v" : [
{
"name" : "Male",
"total" : 3.0
},
{
"name" : "Female",
"total" : 2.0
}
]
},
{
"k" : "education",
"v" : [
{
"name" : "M.C.A",
"total" : 1.0
},
{
"name" : "B.E",
"total" : 3.0
},
{
"name" : "B.Com",
"total" : 1.0
}
]
},
{
"k" : "caste",
"v" : [
{
"name" : "Lingayath",
"total" : 3.0
},
{
"name" : "Vokkaliga",
"total" : 2.0
}
]
}
]
}
And then either by the $replaceRoot or the cursor transform as demonstrated the result becomes:
/* 1 */
{
"gender" : [
{
"name" : "Male",
"total" : 3.0
},
{
"name" : "Female",
"total" : 2.0
}
],
"education" : [
{
"name" : "M.C.A",
"total" : 1.0
},
{
"name" : "B.E",
"total" : 3.0
},
{
"name" : "B.Com",
"total" : 1.0
}
],
"caste" : [
{
"name" : "Lingayath",
"total" : 3.0
},
{
"name" : "Vokkaliga",
"total" : 2.0
}
]
}
So whilst we can put some new and fancy operators into the aggregation pipeline where we have those available, the most common use case is in these "end of pipeline transforms" in which case we may as well simply do the same transformation on each document in the cursor results returned instead.
I need to sum the values for 2018-06-01 through 2018-06-30 for each document in the collection. Each key in "days" is a different date and value. What should the mongo aggregate command look like? Result should look something like {
_id: Product_123 ,
June_Sum:
value}
That's really not a great structure for the sort of operation you now want to do. The whole point of keeping data in such a format is that you "increment" it as you go.
For example:
var now = Date.now(),
today = new Date(now - ( now % ( 1000 * 60 * 60 * 24 ))).toISOString().substr(0,10);
var product = "Product_123";
db.counters.updateOne(
{
"month": today.substr(0,7),
"product": product
},
{
"$inc": {
[`dates.${today}`]: 1,
"totals": 1
}
},
{ "upsert": true }
)
In that way the subsequent updates with $inc apply to both the "key" used for the "date" and also increment the "totals" property of the matched document. So after a few iterations you would end up with something like:
{
"_id" : ObjectId("5af395c53945a933add62173"),
"product": "Product_123",
"month": "2018-05",
"dates" : {
"2018-05-10" : 2,
"2018-05-09" : 1
},
"totals" : 3
}
If you're not actually doing that then you "should" be since it's the intended usage pattern for such a structure.
Without keeping a "totals" or like type of entry within the document(s) storing these keys the only methods left for "aggregation" in processing are to effectively coerce the the "keys" into an "array" form.
MongoDB 3.6 with $objectToArray
db.colllection.aggregate([
// Only consider documents with entries within the range
{ "$match": {
"$expr": {
"$anyElementTrue": {
"$map": {
"input": { "$objectToArray": "$days" },
"in": {
"$and": [
{ "$gte": [ "$$this.k", "2018-06-01" ] },
{ "$lt": [ "$$this.k", "2018-07-01" ] }
]
}
}
}
}
}},
// Aggregate for the month
{ "$group": {
"_id": "$product", // <-- or whatever your key for the value is
"total": {
"$sum": {
"$sum": {
"$map": {
"input": { "$objectToArray": "$days" },
"in": {
"$cond": {
"if": {
"$and": [
{ "$gte": [ "$$this.k", "2018-06-01" ] },
{ "$lt": [ "$$this.k", "2018-07-01" ] }
]
},
"then": "$$this.v",
"else": 0
}
}
}
}
}
}
}}
])
Other versions with mapReduce
db.collection.mapReduce(
// Taking the same presumption on your un-named key for "product"
function() {
Object.keys(this.days)
.filter( k => k >= "2018-06-01" && k < "2018-07-01")
.forEach(k => emit(this.product, this.days[k]));
},
function(key,values) {
return Array.sum(values);
},
{
"out": { "inline": 1 },
"query": {
"$where": function() {
return Object.keys(this.days).some(k => k >= "2018-06-01" && k < "2018-07-01")
}
}
}
)
Both are pretty horrible since you need to calculate whether the "keys" fall within the required range even to select the documents and even then still filter through the keys in those documents again in order to decide whether to accumulate for it or not.
Also noting here that if your "Product_123' is also the "name of a key" in the document and NOT a "value", then you're performing even more "gymnastics" to simply convert that "key" into a "value" form, which is how databases do things and the whole point of the the unnecessary coercion going on here.
Better Option
So as opposed to the handling as originally shown where you "should" be accumulating "as you go" with every write to the document(s) at hand, the better option than needing "processing" in order to coerce into an array format is to simply put the data into an array in the first place:
{
"_id" : ObjectId("5af395c53945a933add62173"),
"product": "Product_123",
"month": "2018-05",
"dates" : [
{ "day": "2018-05-09", "value": 1 },
{ "day": "2018-05-10", "value": 2 }
},
"totals" : 3
}
These are infinitely better for purposes of query and further analysis:
db.counters.aggregate([
{ "$match": {
// "month": "2018-05" // <-- or really just that, since it's there
"dates": {
"day": {
"$elemMatch": {
"$gte": "2018-05-01", "$lt": "2018-06-01"
}
}
}
}},
{ "$group": {
"_id": null,
"total": {
"$sum": {
"$sum": {
"$filter": {
"input": "$dates",
"cond": {
"$and": [
{ "$gte": [ "$$this.day", "2018-05-01" ] },
{ "$lt": [ "$$this.day", "2018-06-01" ] }
]
}
}
}
}
}
}}
])
Which is of course really efficient, and kind of deliberately avoiding the "total" field that is already there for demonstration only. But of course you keep the "running accumulation" on writes by doing:
db.counters.updateOne(
{ "product": product, "month": today.substr(0,7)}, "dates.day": today },
{ "$inc": { "dates.$.value": 1, "total": 1 } }
)
Which is really simple. Adding upserts adds a "little" more complexity:
// A "batch" of operations with bulkWrite
db.counter.bulkWrite([
// Incrementing the matched element
{ "udpdateOne": {
"filter": {
"product": product,
"month": today.substr(0,7)},
"dates.day": today
},
"update": {
"$inc": { "dates.$.value": 1, "total": 1 }
}
}},
// Pushing a new "un-matched" element
{ "updateOne": {
"filter": {
"product": product,
"month": today.substr(0,7)},
"dates.day": { "$ne": today }
},
"update": {
"$push": { "dates": { "day": today, "value": 1 } },
"$inc": { "total": 1 }
}
}},
// "Upserting" a new document were not matched
{ "updateOne": {
"filter": {
"product": product,
"month": today.substr(0,7)},
},
"update": {
"$setOnInsert": {
"dates": [{ "day": today, "value": 1 }],
"total": 1
}
},
"upsert": true
}}
])
But generally your getting the "best of both worlds" by having something simple to accumulate "as you go" as well as something that's easy and efficient to query and do other analysis on later.
The overall moral of the story is to "choose the right structure" for what you actually want to do. Don't put things into "keys" which are clearly intended to be used as "values", since it's an anti-pattern which just adds complexity and inefficiency to the rest of your purposes, even if it seemed right for a "single" purpose when you originally stored it that way.
NOTE Also not really advocating storing "strings" for "dates" in any way here. As noted the better approach is to use "values" where you really mean "values" you intend to use. When storing date data as a "value" it is always far more efficient and practical to store as a BSON Date, and NOT a "string".
I have an aggregation that looks like this:
userSchema.statics.getCounts = function (req, type) {
return this.aggregate([
{ $match: { organization: req.user.organization._id } },
{
$lookup: {
from: 'tickets', localField: `${type}Tickets`, foreignField: '_id', as: `${type}_tickets`,
},
},
{ $unwind: `$${type}_tickets` },
{ $match: { [`${type}_tickets.createdAt`]: { $gte: new Date(moment().subtract(4, 'd').startOf('day').utc()), $lt: new Date(moment().endOf('day').utc()) } } },
{
$group: {
_id: {
groupDate: {
$dateFromParts: {
year: { $year: `$${type}_tickets.createdAt` },
month: { $month: `$${type}_tickets.createdAt` },
day: { $dayOfMonth: `$${type}_tickets.createdAt` },
},
},
userId: `$${type}_tickets.assignee_id`,
},
ticketCount: {
$sum: 1,
},
},
},
{
$sort: { '_id.groupDate': -1 },
},
{ $group: { _id: '$_id.userId', data: { $push: { groupDate: '$_id.groupDate', ticketCount: '$ticketCount' } } } },
]);
};
Which outputs data like this:
[
{
_id: 5aeb6b71709f43359e0888bb,
data: [
{ "groupDate": 2018-05-07T00:00:000Z", ticketCount: 4 }
}
]
Ideally though, I would have data like this:
[
{
_id: 5aeb6b71709f43359e0888bb,
data: [
{ "groupDate": 2018-05-07T00:00:000Z", assignedCount: 4, resolvedCount: 8 }
}
]
The difference being that the object for the user would output both the total number of assigned tickets and the total number of resolved tickets for each date.
My userSchema is like this:
const userSchema = new Schema({
firstName: String,
lastName: String,
assignedTickets: [
{
type: mongoose.Schema.ObjectId,
ref: 'Ticket',
index: true,
},
],
resolvedTickets: [
{
type: mongoose.Schema.ObjectId,
ref: 'Ticket',
index: true,
},
],
}, {
timestamps: true,
});
An example user doc is like this:
{
"_id": "5aeb6b71709f43359e0888bb",
"assignedTickets": ["5aeb6ba7709f43359e0888bd", "5aeb6bf3709f43359e0888c2", "5aec7e0adcdd76b57af9e889"],
"resolvedTickets": ["5aeb6bc2709f43359e0888be", "5aeb6bc2709f43359e0888bf"],
"firstName": "Name",
"lastName": "Surname",
}
An example ticket doc is like this:
{
"_id": "5aeb6ba7709f43359e0888bd",
"ticket_id": 120292,
"type": "assigned",
"status": "Pending",
"assignee_email": "email#gmail.com",
"assignee_id": "5aeb6b71709f43359e0888bb",
"createdAt": "2018-05-02T20:05:59.147Z",
"updatedAt": "2018-05-03T20:05:59.147Z",
}
I've tried adding multiple lookups and group stages, but I keep getting an empty array. If I only do one lookup and one group, I get the correct counts for the searched on field, but I'd like to have both fields in one query. Is it possible to have the query group on two lookups?
In short you seem to be coming to terms with setting up your models in mongoose and have gone overboard with references. In reality you really should not keep the arrays within the "User" documents. This is actually an "anti-pattern" which was just something mongoose used initially as a convention for keeping "references" for population where it did not understand how to translate the references from being kept in the "child" to the "parent" instead.
You actually have that data in each "Ticket" and the natural form of $lookup is to use that "foreignField" in reference to the detail from the local collection. In this case the "assignee_id" on the tickets will suffice for looking at matching back to the "_id" of the "User". Though you don't state it, your "status" should be an indicator of whether the data is actually either "assigned" as when in "Pending" state or "resolved" when it is not.
For the sake of simplicity we are going to consider the state "resolved" if it is anything other than "Pending" in value, but extending on the logic from the example for actual needs is not the problem here.
Basically then we resolve to a single $lookup operation by actually using the natural "foreign key" as opposed to keeping separate arrays.
MongoDB 3.6 and greater
Ideally you would use features from MongoDB 3.6 with sub-pipeline processing here:
// Better date calculations
const oneDay = (1000 * 60 * 60 * 24);
var now = Date.now(),
end = new Date((now - (now % oneDay)) + oneDay),
start = new Date(end.valueOf() - (4 * oneDay));
User.aggregate([
{ "$match": { "organization": req.user.organization._id } },
{ "$lookup": {
"from": Ticket.collection.name,
"let": { "id": "$_id" },
"pipeline": [
{ "$match": {
"createdAt": { "$gte": start, "$lt": end },
"$expr": {
"$eq": [ "$$id", "$assignee_id" ]
}
}},
{ "$group": {
"_id": {
"status": "$status",
"date": {
"$dateFromParts": {
"year": { "$year": "$createdAt" },
"month": { "$month": "$createdAt" },
"day": { "$dayOfMonth": "$createdAt" }
}
}
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.date",
"data": {
"$push": {
"k": {
"$cond": [
{ "$eq": ["$_id.status", "Pending"] },
"assignedCount",
"resolvedCount"
]
},
"v": "$count"
}
}
}},
{ "$sort": { "_id": -1 } },
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{ "groupDate": "$_id", "assignedCount": 0, "resolvedCount": 0 },
{ "$arrayToObject": "$data" }
]
}
}}
],
"as": "data"
}},
{ "$project": { "data": 1 } }
])
From MongoDB 3.0 and upwards
Or where you lack those features we use a different pipeline process and a little data transformation after the results are returned from the server:
User.aggregate([
{ "$match": { "organization": req.user.organization._id } },
{ "$lookup": {
"from": Ticket.collection.name,
"localField": "_id",
"foreignField": "assignee_id",
"as": "data"
}},
{ "$unwind": "$data" },
{ "$match": {
"data.createdAt": { "$gte": start, "$lt": end }
}},
{ "$group": {
"_id": {
"userId": "$_id",
"date": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$data.createdAt", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$data.createdAt", new Date(0) ] },
oneDay
]}
]},
new Date(0)
]
},
"status": "$data.status"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": {
"userId": "$_id.userId",
"date": "$_id.date"
},
"data": {
"$push": {
"k": {
"$cond": [
{ "$eq": [ "$_id.status", "Pending" ] },
"assignedCount",
"resolvedCount"
]
},
"v": "$count"
}
}
}},
{ "$sort": { "_id.userId": 1, "_id.date": -1 } },
{ "$group": {
"_id": "$_id.userId",
"data": {
"$push": {
"groupDate": "$_id.date",
"data": "$data"
}
}
}}
])
.then( results =>
results.map( ({ data, ...d }) =>
({
...d,
data: data.map(di =>
({
groupDate: di.groupDate,
assignedCount: 0,
resolvedCount: 0,
...di.data.reduce((acc,curr) => ({ ...acc, [curr.k]: curr.v }),{})
})
)
})
)
)
Which just really goes to show that even with the fancy features in modern releases, you really don't need them because there pretty much has always been ways to work around this. Even the JavaScript parts just had slightly longer winded versions before the current "object spread" syntax was available.
So that is really the direction you need to go in. What you certainly don't want is using "multiple" $lookup stages or even applying $filter conditions on what could potentially be large arrays. Also both forms here do their best to "filter down" the number of items "joined" from the foreign collection so as not to cause a breach of the BSON limit.
Particularly the "pre 3.6" version actually has a trick where $lookup + $unwind + $match occur in succession which you can see in the explain output. All stages actually combine into "one" stage there which solely returns only the items which match the conditions in the $match from the foreign collection. Keeping things "unwound" until we reduce further avoids BSON limit problems, as does the new form with MongoDB 3.6 where the "sub-pipeline" does all the document reduction and grouping before any results are returned.
Your one document sample would return like this:
{
"_id" : ObjectId("5aeb6b71709f43359e0888bb"),
"data" : [
{
"groupDate" : ISODate("2018-05-02T00:00:00Z"),
"assignedCount" : 1,
"resolvedCount" : 0
}
]
}
Once I expand the date selection to include that date, which of course the date selection can also be improved and corrected from your original form.
So it seems to make sense that your relationships are actually defined that way but it's just that you recorded them "twice". You don't need to and even if that's not the definition then you should actually instead record on the "child" rather than an array in the parent. We can juggle and merge the parent arrays, but that's counterproductive to actually establishing the data relations correctly and using them correctly as well.
How about something like this?
db.users.aggregate([
{
$lookup:{ // lookup assigned tickets
from:'tickets',
localField:'assignedTickets',
foreignField:'_id',
as:'assigned',
}
},
{
$lookup:{ // lookup resolved tickets
from:'tickets',
localField:'resolvedTickets',
foreignField:'_id',
as:'resolved',
}
},
{
$project:{
"tickets":{ // merge all tickets into one single array
$concatArrays:[
"$assigned",
"$resolved"
]
}
}
},
{
$unwind:'$tickets' // flatten the 'tickets' array into separate documents
},
{
$group:{ // group by 'createdAt' and 'assignee_id'
_id:{
groupDate:{
$dateFromParts:{
year:{ $year:'$tickets.createdAt' },
month:{ $month:'$tickets.createdAt' },
day:{ $dayOfMonth:'$tickets.createdAt' },
},
},
userId:'$tickets.assignee_id',
},
assignedCount:{ // get the count of assigned tickets
$sum:{
$cond:[
{ // by checking the 'type' field for a value of 'assigned'
$eq:[
'$tickets.type',
'assigned'
]
},
1, // if matching count 1
0 // else 0
]
}
},
resolvedCount:{
$sum:{
$cond:[
{ // by checking the 'type' field for a value of 'resolved'
$eq:[
'$tickets.type',
'resolved'
]
},
1, // if matching count 1
0 // else 0
]
}
},
},
},
{
$sort:{ // sort by 'groupDate' descending
'_id.groupDate':-1
},
},
{
$group:{
_id:'$_id.userId', // group again but only by userId
data:{
$push:{ // create an array
groupDate:'$_id.groupDate',
assignedCount:{
$sum:'$assignedCount'
},
resolvedCount:{
$sum:'$resolvedCount'
}
}
}
}
}
])
I'm fairly new to MongoDB and I'm trying to aggregate some stats on a "Matches" collection that looks like this:
{
team1: {
players: ["player1", "player2"],
score: 10
},
team2: {
players: ["player3", "player4"],
score: 5
}
},
{
team1: {
players: ["player1", "player3"],
score: 15
},
team2: {
players: ["player2", "player4"],
score: 21
}
},
{
team1: {
players: ["player4", "player1"],
score: 21
},
team2: {
players: ["player3", "player2"],
score: 9
}
},
{
team1: {
players: ["player1"],
score: 5
},
team2: {
players: ["player3"],
score: 10
}
}
I'm looking to get games won, loss and win/loss ratio by each player. I'm new to aggregate functions and having trouble getting something going. Could someone point me the right direction?
Dealing with mutiple arrays in a structure is not really a simple task for aggregation, particularly when your results really want to consider the combination of both arrays.
Fortunately there are a few operations and/or techniques that can help here, along with the fact that each game comprises a "set" of unique players per team/match and results.
The most streamlined approach would be using the features of MongoDB 2.6 and upwards to effectively "combine" the arrays into a single array for processing:
db.league.aggregate([
{ "$project": {
"players": {
"$concatArrays": [
{ "$map": {
"input": "$team1.players",
"as": "el",
"in": {
"player": "$$el",
"win": {
"$cond": {
"if": { "$gt": [ "$team1.score", "$team2.score" ] },
"then": 1,
"else": 0
}
},
"loss": {
"$cond": {
"if": { "$lt": [ "$team1.score", "$team2.score" ] },
"then": 1,
"else": 0
}
}
}
}},
{ "$map": {
"input": "$team2.players",
"as": "el",
"in": {
"player": "$$el",
"win": {
"$cond": {
"if": { "$gt": [ "$team2.score", "$team1.score" ] },
"then": 1,
"else": 0
}
},
"loss": {
"$cond": {
"if": { "$lt": [ "$team2.score", "$team1.score" ] },
"then": 1,
"else": 0
}
}
}
}}
]
}
}},
{ "$unwind": "$players" },
{ "$group": {
"_id": "$players.player",
"win": { "$sum": "$players.win" },
"loss": { "$sum": "$players.loss" }
}},
{ "$project": {
"win": 1,
"loss": 1,
"ratio": { "$divide": [ "$win", "$loss" ] }
}},
{ "$sort": { "_id": 1 } }
])
This listing is using $concatArrays from MongoDB 3.2, but that acutal operator can just as easily be replaced by $setUnion considering that the list of players per game is "unique" and therefore a "set". Either operator is basically joining one array with another based on the output of the inner operations.
For those inner operations we are using $map, which processes each array ( "team1/team2" ) in-line and just does a caculation for each player on whether the game result was a "win/loss". This makes things easier for the following stages.
Though the 3.2 and 2.6 releases for MongoDB both introduced operators for making working with arrays easier, the general principle comes back to that if you want to "aggregate" on data within an array, then you process with $unwind first. This exposes each "player" data within each game from the previous mapping.
Now it's just a matter of using $group to bring together the results for each player, with $sum for each total field. In order to get a "ratio" over the summed results, process with a $project to introduce the $divide between the result values, then optionally $sort the resulting key for each player.
Older Solution
Prior to MongoDB 2.6, your only real tool for dealing with arrays was first to $unwind. So the same principles come into play here:
"map" each array with "win/loss".
Combine the content per game into one "distinct list"
Sum content based on common "player" field
The only real difference in approach is that the "distinct list" per game we are going to be here will be "after" pulling apart the mapped arrays, and instead just returning one document per "game/player" combination:
db.league.aggregate([
{ "$unwind": "$team1.players" },
{ "$group": {
"_id": "$_id",
"team1": {
"$push": {
"player": "$team1.players",
"win": {
"$cond": [
{ "$gt": [ "$team1.score", "$team2.score" ] },
1,
0
]
},
"loss": {
"$cond": [
{ "$lt": [ "$team1.score", "$team2.score" ] },
1,
0
]
}
}
},
"team1Score": { "$first": "$team1.score" },
"team2": { "$first": "$team2" }
}},
{ "$unwind": "$team2.players" },
{ "$group": {
"_id": "$_id",
"team1": { "$first": "$team1" },
"team2": {
"$push": {
"player": "$team2.players",
"win": {
"$cond": [
{ "$gt": [ "$team2.score", "$team1Score" ] },
1,
0
]
},
"loss": {
"$cond": [
{ "$lt": [ "$team2.score", "$team1Score" ] },
1,
0
]
}
}
},
"type": { "$first": { "$const": ["A","B" ] } }
}},
{ "$unwind": "$team1" },
{ "$unwind": "$team2" },
{ "$unwind": "$type" },
{ "$group": {
"_id": {
"_id": "$_id",
"player": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.player",
"$team2.player"
]
},
"win": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.win",
"$team2.win"
]
},
"loss": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.loss",
"$team2.loss"
]
}
}
}},
{ "$group": {
"_id": "$_id.player",
"win": { "$sum": "$_id.win" },
"loss": { "$sum": "$_id.loss" }
}},
{ "$project": {
"win": 1,
"loss": 1,
"ratio": { "$divide": [ "$win", "$loss" ] }
}},
{ "$sort": { "_id": 1 } }
])
So this is the interesting part here:
{ "$group": {
"_id": {
"_id": "$_id",
"player": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.player",
"$team2.player"
]
},
"win": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.win",
"$team2.win"
]
},
"loss": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.loss",
"$team2.loss"
]
}
}
}},
That basically gets rid of any duplication per game that would have resulted from each $unwind on different arrays. Being that when you $unwind one array, you get a copy of the whole document for each array member. If you then $unwind another contained array, then the content you just "unwound" is also "copied" again for each of those array members.
Fortunately this is fine since any player is only listed once per game, so every player only has one set of results per game. An alternate way to write that stage, would be to process into another array using $addToSet:
{ "$group": {
"_id": "$_id",
"players": {
"$addToSet": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1",
"$team2"
]
}
}
}},
{ "$unwind": "$players" }
But since that produces another "array", it's a bit more desirable to just keep the results as separate documents, rather than process with $unwind again.
So again this is really "joining results into a single distinct list", where in this case since we lack the operators to "join" both "team1" and "team2" together, the arrays are pulled apart and then conditionally "combined" depending on the current "A" or "B" value that is being processed.
The end "joining" looks at many "copies" of data, but there is still essentially only "one distinct player record per game" for each player involved, and since we worked out the numbers before the "duplication" occurred, then it's just really a matter of picking one of them from each game first.
Same end results, by them summing up for each player and calculating from totals.
Conclusion
So you might generally conclude here, that in either case most of the work involved is aimed at getting those two arrays of data into a single array, or indeed into singular documents per player per game in order to come to the simple aggregation for totals.
You might well consider then "that" is probably a better structure for the data than the present format, given your need to aggregate totals from those sources.
N.B: The $const operator is undocumented but has been in place since MongoDB 2.2 with the introduction of the aggregation framework. It serves exactly the same function as $literal ( introduced in MongoDB 2.6 ), and in fact is "exactly" the same thing in the codebase, with the newer definition simply pointing to the older one.
It's used in the listing here as the intended MongoDB targets ( pre 2.6 ) would not have $literal, and the other listing is suitable and better for MongoDB 2.6 and upwards. With $setUnion applied of course.
Well, honestly I'd like to not do this kind of manipulation in mongoldb as it's not very efficient. However, for sake of argument you can try:
NOTE: following query targets MongoDB version 3.2
db.matches.aggregate([
{$project:{_id:1, teams:["$team1","$team2"],
tscore:{$max:["$team1.score","$team2.score"]}}},
{$unwind:"$teams"},
{$unwind:"$teams.players"},
{$project:{player:"$teams.players",
won:{$cond:[{$eq:["$teams.score","$tscore"]},1,0]},
lost:{$cond:[{$lt:["$teams.score","$tscore"]},1,0]}}},
{$group:{_id:"$player", won:{$sum:"$won"}, lost:{$sum:"$lost"}}},
{$project:{_id:0, player:"$_id", won:1, lost:1,
ratio:{$cond:[{$eq:[0, "$lost"]},"$won",
{$divide:["$won","$lost"]}]}}}
])
and it will output following from your sample dataset: NOTE: my mathematics could be wrong in calculation of ratio, however, this is not what we are looking here. I'm simply using won/lost
{
"won" : NumberInt(2),
"lost" : NumberInt(1),
"player" : "player4",
"ratio" : 2.0
}
{
"won" : NumberInt(1),
"lost" : NumberInt(3),
"player" : "player3",
"ratio" : 0.3333333333333333
}
{
"won" : NumberInt(2),
"lost" : NumberInt(1),
"player" : "player2",
"ratio" : 2.0
}
{
"won" : NumberInt(2),
"lost" : NumberInt(2),
"player" : "player1",
"ratio" : 1.0
}