I am struggling with an aggregation in mongodb. I have the following type of documents:
{
"_id": "xxxx",
"workHome": true,
"commute": true,
"tel": false,
"weekend": true,
"age":39
},
{
"_id": "yyyy",
"workHome": false,
"commute": true,
"tel": false,
"weekend": true,
"age":32
},
{
"_id": "zzzz",
"workHome": false,
"commute": false,
"tel": false,
"weekend": false,
"age":27
}
Out of this I want to generate an aggregation by the total number of fields that are "true" in the document. There are a total of 4 boolean fields in the document so I want the query to group them together to generate the following output (as examples from e.g. a collection with 100 documents in total):
0:20
1:30
2:10
3:20
4:20
This means: There is 20 documents out of 100 with 'all false', 30 documents with '1x true', 10 documents with '2x true' etc. up to a total of 'all 4 are true'.
Is there any way to do this with an $aggregate statement? Right now I am trying to $group by the $sum of 'true' values but don't find a way to get the conditional query to work.
So assuming that the data is consistent with all the same fields as "workHome", "commute", "tel" and "weekend", then you would proceed with a "logical" evaluation such as this:
db.collection.aggregate([
{ "$project": {
"mapped": { "$map": {
"input": ["A","B","C","D"],
"as": "el",
"in": { "$cond": [
{ "$eq": [ "$$el", "A" ] },
"$workHome",
{ "$cond": [
{ "$eq": [ "$$el", "B" ] },
"$commute",
{ "$cond": [
{ "$eq": [ "$$el", "C" ] },
"$tel",
"$weekend"
]}
]}
]}
}}
}},
{ "$unwind": "$mapped" },
{ "$group": {
"_id": "$_id",
"size": { "$sum": { "$cond": [ "$mapped", 1, 0 ] } }
}},
{ "$group": {
"_id": "$size",
"count": { "$sum": 1 }
}},
{ "$sort": { "_id": 1 } }
])
From your simple sample this gives:
{ "_id" : 0, "count" : 1 }
{ "_id" : 2, "count" : 1 }
{ "_id" : 3, "count" : 1 }
To break this down, first the $map operator here transposes the values of the fields to an array of the same lenght as the fields themselves. This is done my comparing each element of the "input" to an expected value via $cond and either returning the true condtion where a match, or moving on to the next condition embedded in the false part of this "ternary" operator. This is done until all logical matches are met and results in an array of values from the fields like so, for the first document:
[true,true,false,true]
The next step is to $unwind the array elements for further comparison. This "de-normalizes" into separate documents for each array element, and is usually required in aggregation pipelines when processing arrays.
Once that is done a $group pipeline stage is invoked, in order to assess the "total" of those elements with a true value. The same $cond ternary is used to transform the logical true/falsecondtions into numeric values here and fed to the $sum accumulator for addition.
Since the "grouping key" provided in _id in the $group is the original document _id value, the current totals are per document for those fields that are true. In order to get totals on the "counts" over the whole collection ( or selection ) then the futher $group stage is invoked with the grouping key being the returned "size" of the matched true results from each document.
The $sum accumulator used there simply adds 1 for each match on the grouping key, thus "counting" the number of occurances of each match count.
Finally $sort by the number of matches "key" in to produce some order to the results.
For the record, this is so much nicer with the upcoming release of MongoDB ( as of writing ) which includes the $filter operator:
db.collection.aggregate([
{ "$group": {
"_id": {
"$size": {
"$filter": {
"input": { "$map": {
"input": ["A","B","C","D"],
"as": "el",
"in": { "$cond": [
{ "$eq": [ "$$el", "A" ] },
"$workHome",
{ "$cond": [
{ "$eq": [ "$$el", "B" ] },
"$commute",
{ "$cond": [
{ "$eq": [ "$$el", "C" ] },
"$tel",
"$weekend"
]}
]}
]}
}},
"as": "el",
"cond": {
"$eq": [ "$$el", true ]
}
}
}
},
"count": { "$sum": 1 }
}},
{ "$sort": { "_id": 1 } }
])
So now just "two" pipeline stages doing the same thing as the original statement that will work from MongoDB 2.6 and above.
Therefore if your own application is in "development" itself, or you are otherwise curious, then take a look at the Development Branch releases where this functionality is available now.
Related
I have a collection set with documents like :
{
"_id": ObjectId("57065ee93f0762541749574e"),
"name": "myName",
"results" : [
{
"_id" : ObjectId("570e3e43628ba58c1735009b"),
"color" : "GREEN",
"week" : 17,
"year" : 2016
},
{
"_id" : ObjectId("570e3e43628ba58c1735009d"),
"color" : "RED",
"week" : 19,
"year" : 2016
}
]
}
I am trying to build a query witch alow me to return all documents of my collection but only select the field 'results' with subdocuments if week > X and year > Y.
I can select the documents where week > X and year > Y with the aggregate function and a $match but I miss documents with no match.
So far, here is my function :
query = ModelUser.aggregate(
{$unwind:{path:'$results', preserveNullAndEmptyArrays:true}},
{$match:{
$or: [
{$and:[
{'results.week':{$gte:parseInt(week)}},
{'results.year':{$eq:parseInt(year)}}
]},
{'results.year':{$gt:parseInt(year)}},
{'results.week':{$exists: false}}
{$group:{
_id: {
_id:'$_id',
name: '$name'
},
results: {$push:{
_id:'$results._id',
color: '$results.color',
numSemaine: '$results.numSemaine',
year: '$results.year'
}}
}},
{$project: {
_id: '$_id._id',
name: '$_id.name',
results: '$results'
);
The only thing I miss is : I have to get all 'name' even if there is no result to display.
Any idea how to do this without 2 queries ?
It looks like you actually have MongoDB 3.2, so use $filter on the array. This will just return an "empty" array [] where the conditions supplied did not match anything:
db.collection.aggregate([
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$filter": {
"input": "$results",
"as": "result",
"cond": {
"$and": [
{ "$eq": [ "$$result.year", year ] },
{ "$or": [
{ "$gt": [ "$$result.week", week ] },
{ "$not": { "$ifNull": [ "$$result.week", false ] } }
]}
]
}
}
}
}}
])
Where the $ifNull test in place of $exists as a logical form can actually "compact" the condition since it returns an alternate value where the property is not present, to:
db.collection.aggregate([
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$filter": {
"input": "$results",
"as": "result",
"cond": {
"$and": [
{ "$eq": [ "$$result.year", year ] },
{ "$gt": [
{ "$ifNull": [ "$$result.week", week+1 ] },
week
]}
]
}
}
}
}}
])
In MongoDB 2.6 releases, you can probably get away with using $redact and $$DESCEND, but of course need to fake the match in the top level document. This has similar usage of the $ifNull operator:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [{ "$ifNull": [ "$year", year ] }, year ] },
{ "$gt": [
{ "$ifNull": [ "$week", week+1 ] }
week
]}
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
If you actually have MongoDB 2.4, then you are probably better off filtering the array content in client code instead. Every language has methods for filtering array content, but as a JavaScript example reproducible in the shell:
db.collection.find().forEach(function(doc) {
doc.results = doc.results.filter(function(result) {
return (
result.year == year &&
( result.hasOwnProperty('week') ? result.week > week : true )
)
]);
printjson(doc);
})
The reason being is that prior to MongoDB 2.6 you need to use $unwind and $group, and various stages in-between. This is a "very costly" operation on the server, considering that all you want to do is remove items from the arrays of documents and not actually "aggregate" from items within the array.
MongoDB releases have gone to great lengths to provide array processing that does not use $unwind, since it's usage for that purpose alone is not a performant option. It should only ever be used in the case where you are removing a "significant" amount of data from arrays as a result.
The whole point is that otherwise the "cost" of the aggregation operation is likely greater than the "cost" of transferring the data over the network to be filtered on the client instead. Use with caution:
db.collection.aggregate([
// Create an array if one does not exist or is already empty
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$cond": [
{ "$ifNull": [ "$results.0", false ] },
"$results",
[false]
]
}
}},
// Unwind the array
{ "$unwind": "$results" },
// Conditionally $push based on match expression and conditionally count
{ "$group": {
"_id": "_id",
"name": { "$first": "$name" },
"user": { "$first": "$user" },
"results": {
"$push": {
"$cond": [
{ "$or": [
{ "$not": "$results" },
{ "$and": [
{ "$eq": [ "$results.year", year ] },
{ "$gt": [
{ "$ifNull": [ "$results.week", week+1 ] },
week
]}
]}
] },
"$results",
false
]
}
},
"count": {
"$sum": {
"$cond": [
{ "$and": [
{ "$eq": [ "$results.year", year ] },
{ "$gt": [
{ "$ifNull": [ "$results.week", week+1 ] },
week
]}
] }
1,
0
]
}
}
}},
// $unwind again
{ "$unwind": "$results" }
// Filter out false items unless count is 0
{ "$match": {
"$or": [
"$results",
{ "count": 0 }
]
}},
// Group again
{ "$group": {
"_id": "_id",
"name": { "$first": "$name" },
"user": { "$first": "$user" },
"results": { "$push": "$results" }
}},
// Now swap [false] for []
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$cond": [
{ "$ne": [ "$results", [false] ] },
"$results",
[]
]
}
}}
])
Now that is a lot of operations and shuffling just to "filter" content from an array compared to all of the other approaches which are really quite simple. And aside from the complexity, it really does "cost" a lot more to execute on the server.
So if your server version actually supports the newer operators that can do this optimally, then it's okay to do so. But if you are stuck with that last process, then you probably should not be doing it and instead do your array filtering in the client.
I have the following aggregate query which gives me counts (countA) for a given date range period. In this case 01/01/2016-03/31/2016. Is it possible to add a second date rage period for example 04/01/2016-07/31/2016 and count these as countB?
db.getCollection('customers').aggregate(
{$match: {"status": "Closed"}},
{$unwind: "$lines"},
{$match: {"lines.status": "Closed"}},
{$match: {"lines.deliveryMethod": "Tech Delivers"}},
{$match: {"date": {$gte: new Date('01/01/2016'), $lte: new Date('03/31/2016')}}},
{$group:{_id:"$lines.productLine",countA: {$sum: 1}}}
)
Thanks in advance
Sure, and you can also simplify your pipeline stages quite a lot, mostly since successive $match stages are really a single stage, and that you should always use match criteria at the beginning of any aggregation pipeline. Even if it doesn't actually "filter" the array content, it at least just selects the documents containing entries that will actually match. This speeds things up immensely, and especially with large data sets.
For the two date ranges, well this is just an $or query argument. Also it would be applied "before" the array filtering is done, since after all it is a document level match to begin with. So again, in the very first pipeline $match:
db.getCollection('customers').aggregate([
// Filter all document conditions first. Reduces things to process.
{ "$match": {
"status": "Closed",
"lines": { "$elemMatch": {
"status": "Closed",
"deliveryMethod": "Tech Delivers"
}},
"$or": [
{ "date": {
"$gte": new Date("2016-01-01"),
"$lt": new Date("2016-04-01")
}},
{ "date": {
"$gte": new Date("2016-04-01"),
"$lt": new Date("2016-08-01")
}}
]
}},
// Unwind the array
{ "$unwind": "$lines" },
// Filter just the matching elements
// Successive $match is really just one pipeline stage
{ "$match": {
"lines.status": "Closed",
"lines.deliveryMethod": "Tech Delivers"
}},
// Then group on the productline values within the array
{ "$group":{
"_id": "$lines.productLine",
"countA": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-01-01") ] },
{ "$lt": [ "$date", new Date("2016-04-01") ] }
]},
1,
0
]
}
},
"countB": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-04-01") ] },
{ "$lt": [ "$date", new Date("2016-08-01") ] }
]},
1,
0
]
}
}
}}
])
The $or basically "joins" two result sets as it looks for "either" range criteria to apply. As this is given in addition to the other arguments, the logic is an "AND" condition as with the others on the criteria met with either $or argument. Note the $gte and $lt combination is also another form of expressing "AND" conditions on the same key.
The $elemMatch is applied since "both" criteria are required on the array element. If you just directly applied them with "dot notation", then all that really asks is that "at least one array element" matches each condition, rather than the array element matching "both" conditions.
The later filtering after $unwind can use the "dot notation" since the array elements are now "de-normalised" into separate documents. So there is only one element per document to now match the conditions.
When you apply the $group, instead of just using { "$sum": 1 } you rather "conditionally assess whether to count it or not by using $cond. Since both date ranges are within the results, you just need to determine if the current document being "rolled up" belongs to one date range or another. As a "ternary" (if/then/else) operator, this is what $cond provides.
It looks at the values within "date" in the document and if it matches the condition set ( first argument - if ) then it returns 1 ( second argument - then ), else it returns 0, effectively not adding to the current count.
Since these are "logical" conditions then the "AND" is expressed with a logical $and operator, which itself returns true or false, requiring both contained conditions to be true.
Also note the correction in the Date object constructors, since if you do not instantiate with the string in that representation then the resulting Date is in "localtime" as opposed to the "UTC" format in which MongoDB is storing the dates. Only use a "local" constructor if you really mean that, and often people really don't.
The other note is the $lt date change, which should always be "one day" greater than the last date you are looking for. Remember these are "beginning of day" dates, and therefore you usually want all possible times within the date, and not just up to the beginning. So it's "less than the next day" as the correct condition.
For the record, with MongoDB versions from 2.6, it's likely better to "pre-filter" the array content "before" you $unwind. This removes the overhead of producing new documents in the "de-normalizing" that occurs that would not match the conditions you want to apply to array elements.
For MongoDB 3.2 and greater, use $filter:
db.getCollection('customers').aggregate([
// Filter all document conditions first. Reduces things to process.
{ "$match": {
"status": "Closed",
"lines": { "$elemMatch": {
"status": "Closed",
"deliveryMethod": "Tech Delivers"
}},
"$or": [
{ "date": {
"$gte": new Date("2016-01-01"),
"$lt": new Date("2016-04-01")
}},
{ "date": {
"$gte": new Date("2016-04-01"),
"$lt": new Date("2016-08-01")
}}
]
}},
// Pre-filter the array content to matching elements
{ "$project": {
"lines": {
"$filter": {
"input": "$lines",
"as": "line",
"cond": {
"$and": [
{ "$eq": [ "$$line.status", "Closed" ] },
{ "$eq": [ "$$line.deliveryMethod", "Tech Delivers" ] }
]
}
}
}
}},
// Unwind the array
{ "$unwind": "$lines" },
// Then group on the productline values within the array
{ "$group":{
"_id": "$lines.productLine",
"countA": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date": new Date("2016-01-01") ] },
{ "$lt": [ "$date", new Date("2016-04-01") ] }
]},
1,
0
]
}
},
"countB": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-04-01") ] },
{ "$lt": [ "$date", new Date("2016-08-01") ] }
]},
1,
0
]
}
}
}}
])
Or for at least MongoDB 2.6, then apply $redact instead:
db.getCollection('customers').aggregate([
// Filter all document conditions first. Reduces things to process.
{ "$match": {
"status": "Closed",
"lines": { "$elemMatch": {
"status": "Closed",
"deliveryMethod": "Tech Delivers"
}},
"$or": [
{ "date": {
"$gte": new Date("2016-01-01"),
"$lt": new Date("2016-04-01")
}},
{ "date": {
"$gte": new Date("2016-04-01"),
"$lt": new Date("2016-08-01")
}}
]
}},
// Pre-filter the array content to matching elements
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [ "$status", "Closed" ] },
{ "$eq": [
{ "$ifNull": ["$deliveryMethod", "Tech Delivers" ] },
"Tech Delivers"
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
// Unwind the array
{ "$unwind": "$lines" },
// Then group on the productline values within the array
{ "$group":{
"_id": "$lines.productLine",
"countA": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date": new Date("2016-01-01") ] },
{ "$lt": [ "$date", new Date("2016-04-01") ] }
]},
1,
0
]
}
},
"countB": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-04-01") ] },
{ "$lt": [ "$date", new Date("2016-08-01") ] }
]},
1,
0
]
}
}
}}
])
Noting that funny little $ifNull in there which is necessary due to the recursive nature of $$DESCEND, since all levels of the document are inspected, including the "top level" document and then "descending" into subsequent arrays and members or even nested objects. The "status" field is present and has a value of "Closed" due to earlier query selection criteria for the top level field, but of course there is no "top level" element called "deliveryMethod", since it is only within the array elements.
That basically is the "care" then needs to be take when using $redact like this, and if the structure if the document does not allow such conditions, then it's not really an option, so revert to processing $unwind then $match instead.
But where possible, use those methods in preference to the $unwind then $match processing, as it will save considerable time and use less resources by using the newer techniques instead.
I'm fairly new to MongoDB and I'm trying to aggregate some stats on a "Matches" collection that looks like this:
{
team1: {
players: ["player1", "player2"],
score: 10
},
team2: {
players: ["player3", "player4"],
score: 5
}
},
{
team1: {
players: ["player1", "player3"],
score: 15
},
team2: {
players: ["player2", "player4"],
score: 21
}
},
{
team1: {
players: ["player4", "player1"],
score: 21
},
team2: {
players: ["player3", "player2"],
score: 9
}
},
{
team1: {
players: ["player1"],
score: 5
},
team2: {
players: ["player3"],
score: 10
}
}
I'm looking to get games won, loss and win/loss ratio by each player. I'm new to aggregate functions and having trouble getting something going. Could someone point me the right direction?
Dealing with mutiple arrays in a structure is not really a simple task for aggregation, particularly when your results really want to consider the combination of both arrays.
Fortunately there are a few operations and/or techniques that can help here, along with the fact that each game comprises a "set" of unique players per team/match and results.
The most streamlined approach would be using the features of MongoDB 2.6 and upwards to effectively "combine" the arrays into a single array for processing:
db.league.aggregate([
{ "$project": {
"players": {
"$concatArrays": [
{ "$map": {
"input": "$team1.players",
"as": "el",
"in": {
"player": "$$el",
"win": {
"$cond": {
"if": { "$gt": [ "$team1.score", "$team2.score" ] },
"then": 1,
"else": 0
}
},
"loss": {
"$cond": {
"if": { "$lt": [ "$team1.score", "$team2.score" ] },
"then": 1,
"else": 0
}
}
}
}},
{ "$map": {
"input": "$team2.players",
"as": "el",
"in": {
"player": "$$el",
"win": {
"$cond": {
"if": { "$gt": [ "$team2.score", "$team1.score" ] },
"then": 1,
"else": 0
}
},
"loss": {
"$cond": {
"if": { "$lt": [ "$team2.score", "$team1.score" ] },
"then": 1,
"else": 0
}
}
}
}}
]
}
}},
{ "$unwind": "$players" },
{ "$group": {
"_id": "$players.player",
"win": { "$sum": "$players.win" },
"loss": { "$sum": "$players.loss" }
}},
{ "$project": {
"win": 1,
"loss": 1,
"ratio": { "$divide": [ "$win", "$loss" ] }
}},
{ "$sort": { "_id": 1 } }
])
This listing is using $concatArrays from MongoDB 3.2, but that acutal operator can just as easily be replaced by $setUnion considering that the list of players per game is "unique" and therefore a "set". Either operator is basically joining one array with another based on the output of the inner operations.
For those inner operations we are using $map, which processes each array ( "team1/team2" ) in-line and just does a caculation for each player on whether the game result was a "win/loss". This makes things easier for the following stages.
Though the 3.2 and 2.6 releases for MongoDB both introduced operators for making working with arrays easier, the general principle comes back to that if you want to "aggregate" on data within an array, then you process with $unwind first. This exposes each "player" data within each game from the previous mapping.
Now it's just a matter of using $group to bring together the results for each player, with $sum for each total field. In order to get a "ratio" over the summed results, process with a $project to introduce the $divide between the result values, then optionally $sort the resulting key for each player.
Older Solution
Prior to MongoDB 2.6, your only real tool for dealing with arrays was first to $unwind. So the same principles come into play here:
"map" each array with "win/loss".
Combine the content per game into one "distinct list"
Sum content based on common "player" field
The only real difference in approach is that the "distinct list" per game we are going to be here will be "after" pulling apart the mapped arrays, and instead just returning one document per "game/player" combination:
db.league.aggregate([
{ "$unwind": "$team1.players" },
{ "$group": {
"_id": "$_id",
"team1": {
"$push": {
"player": "$team1.players",
"win": {
"$cond": [
{ "$gt": [ "$team1.score", "$team2.score" ] },
1,
0
]
},
"loss": {
"$cond": [
{ "$lt": [ "$team1.score", "$team2.score" ] },
1,
0
]
}
}
},
"team1Score": { "$first": "$team1.score" },
"team2": { "$first": "$team2" }
}},
{ "$unwind": "$team2.players" },
{ "$group": {
"_id": "$_id",
"team1": { "$first": "$team1" },
"team2": {
"$push": {
"player": "$team2.players",
"win": {
"$cond": [
{ "$gt": [ "$team2.score", "$team1Score" ] },
1,
0
]
},
"loss": {
"$cond": [
{ "$lt": [ "$team2.score", "$team1Score" ] },
1,
0
]
}
}
},
"type": { "$first": { "$const": ["A","B" ] } }
}},
{ "$unwind": "$team1" },
{ "$unwind": "$team2" },
{ "$unwind": "$type" },
{ "$group": {
"_id": {
"_id": "$_id",
"player": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.player",
"$team2.player"
]
},
"win": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.win",
"$team2.win"
]
},
"loss": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.loss",
"$team2.loss"
]
}
}
}},
{ "$group": {
"_id": "$_id.player",
"win": { "$sum": "$_id.win" },
"loss": { "$sum": "$_id.loss" }
}},
{ "$project": {
"win": 1,
"loss": 1,
"ratio": { "$divide": [ "$win", "$loss" ] }
}},
{ "$sort": { "_id": 1 } }
])
So this is the interesting part here:
{ "$group": {
"_id": {
"_id": "$_id",
"player": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.player",
"$team2.player"
]
},
"win": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.win",
"$team2.win"
]
},
"loss": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.loss",
"$team2.loss"
]
}
}
}},
That basically gets rid of any duplication per game that would have resulted from each $unwind on different arrays. Being that when you $unwind one array, you get a copy of the whole document for each array member. If you then $unwind another contained array, then the content you just "unwound" is also "copied" again for each of those array members.
Fortunately this is fine since any player is only listed once per game, so every player only has one set of results per game. An alternate way to write that stage, would be to process into another array using $addToSet:
{ "$group": {
"_id": "$_id",
"players": {
"$addToSet": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1",
"$team2"
]
}
}
}},
{ "$unwind": "$players" }
But since that produces another "array", it's a bit more desirable to just keep the results as separate documents, rather than process with $unwind again.
So again this is really "joining results into a single distinct list", where in this case since we lack the operators to "join" both "team1" and "team2" together, the arrays are pulled apart and then conditionally "combined" depending on the current "A" or "B" value that is being processed.
The end "joining" looks at many "copies" of data, but there is still essentially only "one distinct player record per game" for each player involved, and since we worked out the numbers before the "duplication" occurred, then it's just really a matter of picking one of them from each game first.
Same end results, by them summing up for each player and calculating from totals.
Conclusion
So you might generally conclude here, that in either case most of the work involved is aimed at getting those two arrays of data into a single array, or indeed into singular documents per player per game in order to come to the simple aggregation for totals.
You might well consider then "that" is probably a better structure for the data than the present format, given your need to aggregate totals from those sources.
N.B: The $const operator is undocumented but has been in place since MongoDB 2.2 with the introduction of the aggregation framework. It serves exactly the same function as $literal ( introduced in MongoDB 2.6 ), and in fact is "exactly" the same thing in the codebase, with the newer definition simply pointing to the older one.
It's used in the listing here as the intended MongoDB targets ( pre 2.6 ) would not have $literal, and the other listing is suitable and better for MongoDB 2.6 and upwards. With $setUnion applied of course.
Well, honestly I'd like to not do this kind of manipulation in mongoldb as it's not very efficient. However, for sake of argument you can try:
NOTE: following query targets MongoDB version 3.2
db.matches.aggregate([
{$project:{_id:1, teams:["$team1","$team2"],
tscore:{$max:["$team1.score","$team2.score"]}}},
{$unwind:"$teams"},
{$unwind:"$teams.players"},
{$project:{player:"$teams.players",
won:{$cond:[{$eq:["$teams.score","$tscore"]},1,0]},
lost:{$cond:[{$lt:["$teams.score","$tscore"]},1,0]}}},
{$group:{_id:"$player", won:{$sum:"$won"}, lost:{$sum:"$lost"}}},
{$project:{_id:0, player:"$_id", won:1, lost:1,
ratio:{$cond:[{$eq:[0, "$lost"]},"$won",
{$divide:["$won","$lost"]}]}}}
])
and it will output following from your sample dataset: NOTE: my mathematics could be wrong in calculation of ratio, however, this is not what we are looking here. I'm simply using won/lost
{
"won" : NumberInt(2),
"lost" : NumberInt(1),
"player" : "player4",
"ratio" : 2.0
}
{
"won" : NumberInt(1),
"lost" : NumberInt(3),
"player" : "player3",
"ratio" : 0.3333333333333333
}
{
"won" : NumberInt(2),
"lost" : NumberInt(1),
"player" : "player2",
"ratio" : 2.0
}
{
"won" : NumberInt(2),
"lost" : NumberInt(2),
"player" : "player1",
"ratio" : 1.0
}
Given the following document containing 3 nested documents...
{ "_id": ObjectId("56116d8e4a0000c9006b57ac"), "name": "Stock 1", "items" [
{ "price": 1.50, "description": "Item 1", "count": 10 }
{ "price": 1.70, "description": "Item 2", "count": 13 }
{ "price": 1.10, "description": "Item 3", "count": 20 }
]
}
... I need to select the sub-document with the lowest price closer to a given amount (here below I assume 1.05):
db.stocks.aggregate([
{$unwind: "$items"},
{$sort: {"items.price":1}},
{$match: {"items.price": {$gte: 1.05}}},
{$group: {
_id:0,
item: {$first:"$items"}
}},
{$project: {
_id: "$item._id",
price: "$item.price",
description: "$item.description"
}}
]);
This works as expected and here is the result:
"result" : [
{
"price" : 1.10,
"description" : "Item 3",
"count" : 20
}
],
"ok" : 1
Alongside returning the item with the lowest price closer to a given amount, I need to decrement count by 1. For instance, here below is the result I'm looking for:
"result" : [
{
"price" : 1.10,
"description" : "Item 3",
"count" : 19
}
],
"ok" : 1
It depends on whether you actually want to "update" the result or simply "return" the result with a decremented value. In the former case you will of course need to go back to the document and "decrement" the value for the returned result.
Also want to note that what you "think" is efficient here is actually not. Doing the "filter" of elements "post sort" or even "post unwind" really makes no difference at all to how the $first accumulator works in terms of performance.
The better approach is to basically "pre filter" the values from the array where possible. This reduces the document size in the aggregation pipeline, and the number of array elements to be processed by $unwind:
db.stocks.aggregate([
{ "$match": {
"items.price": { "$gte": 1.05 }
}},
{ "$project": {
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "item",
"in": {
"$cond": [
{ "$gte": [ "$$item.price", 1.05 ] }
],
"$$item",
false
}
}},
[false]
]
}
}},
{ "$unwind": "$items"},
{ "$sort": { "items.price":1 } },
{ "$group": {
"_id": 0,
"item": { "$first": "$items" }
}},
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description"
}}
]);
Of course that does require a MongoDB version 2.6 or greater server to have the available operators, and going by your output you may have an earlier version. If that is the case then at least loose the $match as it does not do anything of value and would be detremental to performance.
Where a $match is useful, is in the document selection before you do anything, as what you always want to avoid is processing documents that do not even possibly meet the conditions you want from within the array or anywhere else. So you should always $match or use a similar query stage first.
At any rate, if all you wanted was a "projected result" then just use $subtract in the output:
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description",
"count": { "$subtract": [ "$item.count", 1 ] }
}}
If you wanted however to "update" the result, then you would be iterating the array ( it's still an array even with one result ) to update the matched item and "decrement" the count via $inc:
var result = db.stocks.aggregate([
{ "$match": {
"items.price": { "$gte": 1.05 }
}},
{ "$project": {
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "item",
"in": {
"$cond": [
{ "$gte": [ "$$item.price", 1.05 ] }
],
"$$item",
false
}
}},
[false]
]
}
}},
{ "$unwind": "$items"},
{ "$sort": { "items.price":1 } },
{ "$group": {
"_id": 0,
"item": { "$first": "$items" }
}},
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description"
}}
]);
result.forEach(function(item) {
db.stocks.update({ "item._id": item._id},{ "$inc": { "item.$.count": -1 }})
})
And on a MongoDB 2.4 shell, your same aggregate query applies ( but please make the changes ) however the result contains another field called result inside it with the array, so add the level:
result.result.forEach(function(item) {
db.stocks.update({ "item._id": item._id},{ "$inc": { "item.$.count": -1 }})
})
So either just $project for display only, or use the returned result to effect an .update() on the data as required.
I have a collection with documents which look like this:
{
"campaignType" : 1,
"allowAccessControl" : true,
"userId" : "108028399"
}
I'd like to query this collection using aggregation framework and have a result which looks like this:
{
"campaignType" : ["APPLICATION"],
"allowAccessControl" : "true",
"userId" : "108028399",
}
You will notice that:
campaignType field becomes and array
the numeric value was mapped to a string
Can that be done using aggregation framework?
I tried looking at $addToSet and $push but had no luck.
Please help.
Thanks
In either case here it is th $cond operator from the aggregation framework that is your friend. It is a "ternary" operator, which means it evaluates a condition for true|false and then returns the result based on that evaluation.
So for modern versions from MongoDB 2.6 and upwards you can $project with usage of the $map operator to construct the array:
db.campaign.aggregate([
{ "$project": {
"campaignType": {
"$map": {
"input": { "$literal": [1] },
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$campaignType", 1 ] },
"APPLICATION",
false
]
}
}
},
"allowAcessControl" : 1,
"userId": 1
}}
])
Or generally in most versions you can simply use the $push operator in a $group pipeline stage:
db.campaign.aggregate([
{ "$group": {
"_id": "$_id",
"campaignType": {
"$push": {
"$cond": [
{ "$eq": [ "$campaignType", 1 ] },
"APPLICATION",
false
]
}
},
"allowAccessControl": { "$first": "$allowAccessControl" },
"userId": { "first": "$userId" }
}}
])
But the general concept if that you use "nested" expressions with the $cond operator in order to "test" and return some value that matches your "mapping" condition and do that with another operator that allows you to produce an array.