Aggregate results from two arrays - mongodb

I'm fairly new to MongoDB and I'm trying to aggregate some stats on a "Matches" collection that looks like this:
{
team1: {
players: ["player1", "player2"],
score: 10
},
team2: {
players: ["player3", "player4"],
score: 5
}
},
{
team1: {
players: ["player1", "player3"],
score: 15
},
team2: {
players: ["player2", "player4"],
score: 21
}
},
{
team1: {
players: ["player4", "player1"],
score: 21
},
team2: {
players: ["player3", "player2"],
score: 9
}
},
{
team1: {
players: ["player1"],
score: 5
},
team2: {
players: ["player3"],
score: 10
}
}
I'm looking to get games won, loss and win/loss ratio by each player. I'm new to aggregate functions and having trouble getting something going. Could someone point me the right direction?

Dealing with mutiple arrays in a structure is not really a simple task for aggregation, particularly when your results really want to consider the combination of both arrays.
Fortunately there are a few operations and/or techniques that can help here, along with the fact that each game comprises a "set" of unique players per team/match and results.
The most streamlined approach would be using the features of MongoDB 2.6 and upwards to effectively "combine" the arrays into a single array for processing:
db.league.aggregate([
{ "$project": {
"players": {
"$concatArrays": [
{ "$map": {
"input": "$team1.players",
"as": "el",
"in": {
"player": "$$el",
"win": {
"$cond": {
"if": { "$gt": [ "$team1.score", "$team2.score" ] },
"then": 1,
"else": 0
}
},
"loss": {
"$cond": {
"if": { "$lt": [ "$team1.score", "$team2.score" ] },
"then": 1,
"else": 0
}
}
}
}},
{ "$map": {
"input": "$team2.players",
"as": "el",
"in": {
"player": "$$el",
"win": {
"$cond": {
"if": { "$gt": [ "$team2.score", "$team1.score" ] },
"then": 1,
"else": 0
}
},
"loss": {
"$cond": {
"if": { "$lt": [ "$team2.score", "$team1.score" ] },
"then": 1,
"else": 0
}
}
}
}}
]
}
}},
{ "$unwind": "$players" },
{ "$group": {
"_id": "$players.player",
"win": { "$sum": "$players.win" },
"loss": { "$sum": "$players.loss" }
}},
{ "$project": {
"win": 1,
"loss": 1,
"ratio": { "$divide": [ "$win", "$loss" ] }
}},
{ "$sort": { "_id": 1 } }
])
This listing is using $concatArrays from MongoDB 3.2, but that acutal operator can just as easily be replaced by $setUnion considering that the list of players per game is "unique" and therefore a "set". Either operator is basically joining one array with another based on the output of the inner operations.
For those inner operations we are using $map, which processes each array ( "team1/team2" ) in-line and just does a caculation for each player on whether the game result was a "win/loss". This makes things easier for the following stages.
Though the 3.2 and 2.6 releases for MongoDB both introduced operators for making working with arrays easier, the general principle comes back to that if you want to "aggregate" on data within an array, then you process with $unwind first. This exposes each "player" data within each game from the previous mapping.
Now it's just a matter of using $group to bring together the results for each player, with $sum for each total field. In order to get a "ratio" over the summed results, process with a $project to introduce the $divide between the result values, then optionally $sort the resulting key for each player.
Older Solution
Prior to MongoDB 2.6, your only real tool for dealing with arrays was first to $unwind. So the same principles come into play here:
"map" each array with "win/loss".
Combine the content per game into one "distinct list"
Sum content based on common "player" field
The only real difference in approach is that the "distinct list" per game we are going to be here will be "after" pulling apart the mapped arrays, and instead just returning one document per "game/player" combination:
db.league.aggregate([
{ "$unwind": "$team1.players" },
{ "$group": {
"_id": "$_id",
"team1": {
"$push": {
"player": "$team1.players",
"win": {
"$cond": [
{ "$gt": [ "$team1.score", "$team2.score" ] },
1,
0
]
},
"loss": {
"$cond": [
{ "$lt": [ "$team1.score", "$team2.score" ] },
1,
0
]
}
}
},
"team1Score": { "$first": "$team1.score" },
"team2": { "$first": "$team2" }
}},
{ "$unwind": "$team2.players" },
{ "$group": {
"_id": "$_id",
"team1": { "$first": "$team1" },
"team2": {
"$push": {
"player": "$team2.players",
"win": {
"$cond": [
{ "$gt": [ "$team2.score", "$team1Score" ] },
1,
0
]
},
"loss": {
"$cond": [
{ "$lt": [ "$team2.score", "$team1Score" ] },
1,
0
]
}
}
},
"type": { "$first": { "$const": ["A","B" ] } }
}},
{ "$unwind": "$team1" },
{ "$unwind": "$team2" },
{ "$unwind": "$type" },
{ "$group": {
"_id": {
"_id": "$_id",
"player": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.player",
"$team2.player"
]
},
"win": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.win",
"$team2.win"
]
},
"loss": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.loss",
"$team2.loss"
]
}
}
}},
{ "$group": {
"_id": "$_id.player",
"win": { "$sum": "$_id.win" },
"loss": { "$sum": "$_id.loss" }
}},
{ "$project": {
"win": 1,
"loss": 1,
"ratio": { "$divide": [ "$win", "$loss" ] }
}},
{ "$sort": { "_id": 1 } }
])
So this is the interesting part here:
{ "$group": {
"_id": {
"_id": "$_id",
"player": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.player",
"$team2.player"
]
},
"win": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.win",
"$team2.win"
]
},
"loss": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1.loss",
"$team2.loss"
]
}
}
}},
That basically gets rid of any duplication per game that would have resulted from each $unwind on different arrays. Being that when you $unwind one array, you get a copy of the whole document for each array member. If you then $unwind another contained array, then the content you just "unwound" is also "copied" again for each of those array members.
Fortunately this is fine since any player is only listed once per game, so every player only has one set of results per game. An alternate way to write that stage, would be to process into another array using $addToSet:
{ "$group": {
"_id": "$_id",
"players": {
"$addToSet": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$team1",
"$team2"
]
}
}
}},
{ "$unwind": "$players" }
But since that produces another "array", it's a bit more desirable to just keep the results as separate documents, rather than process with $unwind again.
So again this is really "joining results into a single distinct list", where in this case since we lack the operators to "join" both "team1" and "team2" together, the arrays are pulled apart and then conditionally "combined" depending on the current "A" or "B" value that is being processed.
The end "joining" looks at many "copies" of data, but there is still essentially only "one distinct player record per game" for each player involved, and since we worked out the numbers before the "duplication" occurred, then it's just really a matter of picking one of them from each game first.
Same end results, by them summing up for each player and calculating from totals.
Conclusion
So you might generally conclude here, that in either case most of the work involved is aimed at getting those two arrays of data into a single array, or indeed into singular documents per player per game in order to come to the simple aggregation for totals.
You might well consider then "that" is probably a better structure for the data than the present format, given your need to aggregate totals from those sources.
N.B: The $const operator is undocumented but has been in place since MongoDB 2.2 with the introduction of the aggregation framework. It serves exactly the same function as $literal ( introduced in MongoDB 2.6 ), and in fact is "exactly" the same thing in the codebase, with the newer definition simply pointing to the older one.
It's used in the listing here as the intended MongoDB targets ( pre 2.6 ) would not have $literal, and the other listing is suitable and better for MongoDB 2.6 and upwards. With $setUnion applied of course.

Well, honestly I'd like to not do this kind of manipulation in mongoldb as it's not very efficient. However, for sake of argument you can try:
NOTE: following query targets MongoDB version 3.2
db.matches.aggregate([
{$project:{_id:1, teams:["$team1","$team2"],
tscore:{$max:["$team1.score","$team2.score"]}}},
{$unwind:"$teams"},
{$unwind:"$teams.players"},
{$project:{player:"$teams.players",
won:{$cond:[{$eq:["$teams.score","$tscore"]},1,0]},
lost:{$cond:[{$lt:["$teams.score","$tscore"]},1,0]}}},
{$group:{_id:"$player", won:{$sum:"$won"}, lost:{$sum:"$lost"}}},
{$project:{_id:0, player:"$_id", won:1, lost:1,
ratio:{$cond:[{$eq:[0, "$lost"]},"$won",
{$divide:["$won","$lost"]}]}}}
])
and it will output following from your sample dataset: NOTE: my mathematics could be wrong in calculation of ratio, however, this is not what we are looking here. I'm simply using won/lost
{
"won" : NumberInt(2),
"lost" : NumberInt(1),
"player" : "player4",
"ratio" : 2.0
}
{
"won" : NumberInt(1),
"lost" : NumberInt(3),
"player" : "player3",
"ratio" : 0.3333333333333333
}
{
"won" : NumberInt(2),
"lost" : NumberInt(1),
"player" : "player2",
"ratio" : 2.0
}
{
"won" : NumberInt(2),
"lost" : NumberInt(2),
"player" : "player1",
"ratio" : 1.0
}

Related

Accumulate Documents with Dynamic Keys

I have a collection of documents that look like this
{
_id: 1,
weight: 2,
height: 3,
fruit: "Orange",
bald: "Yes"
},
{
_id: 2,
weight: 4,
height: 5,
fruit: "Apple",
bald: "No"
}
I need to get a result that aggregates the entire collection into this.
{
avgWeight: 3,
avgHeight: 4,
orangeCount: 1,
appleCount: 1,
baldCount: 1
}
I think I could map/reduce this, or I could query the averages and counts separately. The only values fruit could ever have are Apple and Orange. What other ways would you go about doing this? I've been away from MongoDB for a while now and maybe there are new amazing ways to do this I'm not aware of?
Aggregation Framework
The aggregation framework will do far better for you than what mapReduce can do, and the basic method is compatible with every release back to 2.2 when the aggregation framework was released.
If you have MongoDB 3.6 you can do
db.fruit.aggregate([
{ "$group": {
"_id": "$fruit",
"avgWeight": { "$avg": "$weight" },
"avgHeight": { "$avg": "$height" },
"baldCount": {
"$sum": { "$cond": [{ "$eq": ["$bald", "Yes"] }, 1, 0] }
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"data": {
"$push": {
"k": {
"$concat": [
{ "$toLower": "$_id" },
"Count"
]
},
"v": "$count"
}
},
"avgWeight": { "$avg": "$avgWeight" },
"avgHeight": { "$avg": "$avgHeight" },
"baldCount": { "$sum": "$baldCount" }
}},
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{ "$arrayToObject": "$data" },
{
"avgWeight": "$avgWeight",
"avgHeight": "$avgHeight",
"baldCount": "$baldCount"
}
]
}
}}
])
As a slight alternate, you can apply the $mergeObjects in the $group here instead:
db.fruit.aggregate([
{ "$group": {
"_id": "$fruit",
"avgWeight": { "$avg": "$weight" },
"avgHeight": { "$avg": "$height" },
"baldCount": {
"$sum": { "$cond": [{ "$eq": ["$bald", "Yes"] }, 1, 0] }
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"data": {
"$mergeObjects": {
"$arrayToObject": [[{
"k": {
"$concat": [
{ "$toLower": "$_id" },
"Count"
]
},
"v": "$count"
}]]
}
},
"avgWeight": { "$avg": "$avgWeight" },
"avgHeight": { "$avg": "$avgHeight" },
"baldCount": { "$sum": "$baldCount" }
}},
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [
"$data",
{
"avgWeight": "$avgWeight",
"avgHeight": "$avgHeight",
"baldCount": "$baldCount"
}
]
}
}}
])
But there are reasons why I personally don't think that is the better approach, and that mostly leads to the next concept.
So even if you don't have a "latest" MongoDB release, you can simply reshape the output since that is all the last pipeline stage actually using the MongoDB 3.6 features is doing:
db.fruit.aggregate([
{ "$group": {
"_id": "$fruit",
"avgWeight": { "$avg": "$weight" },
"avgHeight": { "$avg": "$height" },
"baldCount": {
"$sum": { "$cond": [{ "$eq": ["$bald", "Yes"] }, 1, 0] }
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"data": {
"$push": {
"k": {
"$concat": [
{ "$toLower": "$_id" },
"Count"
]
},
"v": "$count"
}
},
"avgWeight": { "$avg": "$avgWeight" },
"avgHeight": { "$avg": "$avgHeight" },
"baldCount": { "$sum": "$baldCount" }
}},
/*
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{ "$arrayToObject": "$data" },
{
"avgWeight": "$avgWeight",
"avgHeight": "$avgHeight",
"baldCount": "$baldCount"
}
]
}
}}
*/
]).map( d =>
Object.assign(
d.data.reduce((acc,curr) => Object.assign(acc,{ [curr.k]: curr.v }), {}),
{ avgWeight: d.avgWeight, avgHeight: d.avgHeight, baldCount: d.baldCount }
)
)
And of course you can even just "hardcode" the keys:
db.fruit.aggregate([
{ "$group": {
"_id": null,
"appleCount": {
"$sum": {
"$cond": [{ "$eq": ["$fruit", "Apple"] }, 1, 0]
}
},
"orangeCount": {
"$sum": {
"$cond": [{ "$eq": ["$fruit", "Orange"] }, 1, 0]
}
},
"avgWeight": { "$avg": "$weight" },
"avgHeight": { "$avg": "$height" },
"baldCount": {
"$sum": {
"$cond": [{ "$eq": ["$bald", "Yes"] }, 1, 0]
}
}
}}
])
But it would not be recommended as your data might just change some day, and if there is a value to "group on" then it's better to actually use it than coercing with conditions.
In any form you return the same result:
{
"appleCount" : 1,
"orangeCount" : 1,
"avgWeight" : 3,
"avgHeight" : 4,
"baldCount" : 1
}
We do this with "two" $group stages, being once for accumulating "per fruit" and then secondly to compact all fruit to an array using $push under "k" and "v" values to keep their "key" and their "count". We do a little transformation on the "key" here using $toLower and $concat to join the strings. This is optional at this stage but easier in general.
The "alternate" for 3.6 is simply applying $mergeObjects within this earlier stage instead of $push since we already accumulated these keys. It's just really moving the $arrayToObject to a different stage in the pipeline. It's not really necessary and does not really have any specific advantage. If anything it just removes the flexible option as demonstrated with the "client transform" discussed later.
The "average" accumulations are done via $avg and the "bald" is counted using $cond to test the strings and feed a number to $sum. As the array is "rolled up" we can do all those accumulations again to total for everything.
As mentioned, the only part that actually relies on "new features" is all within the $replaceRoot stage which re-writes the "root" document. That's why this is optional as you can simply do these transformations after the same "already aggregated" data is returned from the database.
All we really do here is take that array with the "k" and "v" entries and turn it into an "object" with named keys via $arrayToObject and apply $mergeObjects on that result with the other keys we already produced at the "root". This transforms that array to be part of the main document returned in result.
The exact same transformation is applied using the JavaScript Array.reduce() and Object.assign() methods in the mongo shell compatible code. It's a very simple thing to apply and the Cursor.map() is generally a feature of most language implementations, so you can do these transforms before you start using the cursor results.
With ES6 compatible JavaScript environments ( not the shell ), we can shorten that syntax a little more:
.map(({ data, ...d }) => ({ ...data.reduce((o,[k,v]) => ({ ...o, [k]: v }), {}), ...d }) )
So it truly is a "one line" function, and that's a general reason why transformations like these are often better in the client code than the server anyway.
As a note on the usage of $cond, it is noted that using it for "hardcoded" evaluation is not really a good idea for several reasons. So it really does not make much sense to "force" that evaluation. Even with the data you present the "bald" would be better expressed as a Boolean value than a "string". If you change "Yes/No" to be true/false then even that "one" valid usage becomes:
"baldCount": { "$sum": { "$cond": ["$bald", 1, 0 ] } }
Which removes the need to "test" a condition on a string match since it's already true/false. MongoDB 4.0 adds another enhancement using $toInt to "coerce" the Boolean to an integer:
"baldCount": { "$sum": { "$toInt": "$bald" } }
That removes the need for $cond altogether, as would simply recording 1 or 0 but that change might cause a loss of clarity in the data, so it is still probably reasonable to have that sort of "coercion" there, but not really optimal anywhere else.
Even with the "dynamic" form using "two" $group stages for accumulation, the main work is still done in the first stage. It simply leaves the remaining accumulation on n result documents for the number of possible unique values of the grouping key. In this case "two", so even though it's an additional instruction there is no real overhead for the gain of having flexible code.
MapReduce
If you really have you're heart set on at least "trying" a mapReduce, then it's really a single pass with a finalize function just to make the averages
db.fruit.mapReduce(
function() {
emit(null,{
"key": { [`${this.fruit.toLowerCase()}Count`]: 1 },
"totalWeight": this.weight,
"totalHeight": this.height,
"totalCount": 1,
"baldCount": (this.bald === "Yes") ? 1 : 0
});
},
function(key,values) {
var output = {
key: { },
totalWeight: 0,
totalHeight: 0,
totalCount: 0,
baldCount: 0
};
for ( let value of values ) {
for ( let key in value.key ) {
if ( !output.key.hasOwnProperty(key) )
output.key[key] = 0;
output.key[key] += value.key[key];
}
Object.keys(value).filter(k => k != 'key').forEach(k =>
output[k] += value[k]
)
}
return output;
},
{
"out": { "inline": 1 },
"finalize": function(key,value) {
return Object.assign(
value.key,
{
avgWeight: value.totalWeight / value.totalCount,
avgHeight: value.totalHeight / value.totalCount,
baldCount: value.baldCount
}
)
}
}
)
Since we already ran through the process for the aggregate() method then the general points should be pretty familiar since we are basically doing much the same thing here.
The main differences are for an "average" you actually need the full totals and counts and of course you get a bit more control over accumulating via an "Object" with JavaScript code.
The results are basically the same, just with the standard mapReduce "bent" on how it presents them:
{
"_id" : null,
"value" : {
"orangeCount" : 1,
"appleCount" : 1,
"avgWeight" : 3,
"avgHeight" : 4,
"baldCount" : 1
}
}
Summary
The general catch being of course that MapReduce using interpreted JavaScript in order to execute has a much higher cost and slower execution than the native coded operations of the aggregation framework.There once may have been an option to use MapReduce for this kind of output on "larger" result sets, but since MongoDB 2.6 introduced "cursor" output for the aggregation framework then the scales have been firmly tipped in favor of the newer option.
Fact is that most "legacy" reasons for employing MapReduce is basically superseded by it's younger sibling as the aggregation framework gains new operations which remove the need for the JavaScript execution environment. It would be a fair comment to say that support for JavaScript execution is generally "dwindling", and once legacy options which used this from the beginning are being gradually removed from the product.
db.demo.aggregate(
// Pipeline
[
// Stage 1
{
$project: {
weight: 1,
height: 1,
Orange: {
$cond: {
if: {
$eq: ["$fruit", 'Orange']
},
then: {
$sum: 1
},
else: 0
}
},
Apple: {
$cond: {
if: {
$eq: ["$fruit", 'Apple']
},
then: {
$sum: 1
},
else: 0
}
},
bald: {
$cond: {
if: {
$eq: ["$bald", 'Yes']
},
then: {
$sum: 1
},
else: 0
}
},
}
},
// Stage 2
{
$group: {
_id: null,
avgWeight: {
$avg: '$weight'
},
avgHeight: {
$avg: '$height'
},
orangeCount: {
$sum: '$Orange'
},
appleCount: {
$sum: '$Apple'
},
baldCount: {
$sum: '$bald'
}
}
},
]
);

Return Sub-document only when matched but keep empty arrays

I have a collection set with documents like :
{
"_id": ObjectId("57065ee93f0762541749574e"),
"name": "myName",
"results" : [
{
"_id" : ObjectId("570e3e43628ba58c1735009b"),
"color" : "GREEN",
"week" : 17,
"year" : 2016
},
{
"_id" : ObjectId("570e3e43628ba58c1735009d"),
"color" : "RED",
"week" : 19,
"year" : 2016
}
]
}
I am trying to build a query witch alow me to return all documents of my collection but only select the field 'results' with subdocuments if week > X and year > Y.
I can select the documents where week > X and year > Y with the aggregate function and a $match but I miss documents with no match.
So far, here is my function :
query = ModelUser.aggregate(
{$unwind:{path:'$results', preserveNullAndEmptyArrays:true}},
{$match:{
$or: [
{$and:[
{'results.week':{$gte:parseInt(week)}},
{'results.year':{$eq:parseInt(year)}}
]},
{'results.year':{$gt:parseInt(year)}},
{'results.week':{$exists: false}}
{$group:{
_id: {
_id:'$_id',
name: '$name'
},
results: {$push:{
_id:'$results._id',
color: '$results.color',
numSemaine: '$results.numSemaine',
year: '$results.year'
}}
}},
{$project: {
_id: '$_id._id',
name: '$_id.name',
results: '$results'
);
The only thing I miss is : I have to get all 'name' even if there is no result to display.
Any idea how to do this without 2 queries ?
It looks like you actually have MongoDB 3.2, so use $filter on the array. This will just return an "empty" array [] where the conditions supplied did not match anything:
db.collection.aggregate([
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$filter": {
"input": "$results",
"as": "result",
"cond": {
"$and": [
{ "$eq": [ "$$result.year", year ] },
{ "$or": [
{ "$gt": [ "$$result.week", week ] },
{ "$not": { "$ifNull": [ "$$result.week", false ] } }
]}
]
}
}
}
}}
])
Where the $ifNull test in place of $exists as a logical form can actually "compact" the condition since it returns an alternate value where the property is not present, to:
db.collection.aggregate([
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$filter": {
"input": "$results",
"as": "result",
"cond": {
"$and": [
{ "$eq": [ "$$result.year", year ] },
{ "$gt": [
{ "$ifNull": [ "$$result.week", week+1 ] },
week
]}
]
}
}
}
}}
])
In MongoDB 2.6 releases, you can probably get away with using $redact and $$DESCEND, but of course need to fake the match in the top level document. This has similar usage of the $ifNull operator:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [{ "$ifNull": [ "$year", year ] }, year ] },
{ "$gt": [
{ "$ifNull": [ "$week", week+1 ] }
week
]}
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
If you actually have MongoDB 2.4, then you are probably better off filtering the array content in client code instead. Every language has methods for filtering array content, but as a JavaScript example reproducible in the shell:
db.collection.find().forEach(function(doc) {
doc.results = doc.results.filter(function(result) {
return (
result.year == year &&
( result.hasOwnProperty('week') ? result.week > week : true )
)
]);
printjson(doc);
})
The reason being is that prior to MongoDB 2.6 you need to use $unwind and $group, and various stages in-between. This is a "very costly" operation on the server, considering that all you want to do is remove items from the arrays of documents and not actually "aggregate" from items within the array.
MongoDB releases have gone to great lengths to provide array processing that does not use $unwind, since it's usage for that purpose alone is not a performant option. It should only ever be used in the case where you are removing a "significant" amount of data from arrays as a result.
The whole point is that otherwise the "cost" of the aggregation operation is likely greater than the "cost" of transferring the data over the network to be filtered on the client instead. Use with caution:
db.collection.aggregate([
// Create an array if one does not exist or is already empty
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$cond": [
{ "$ifNull": [ "$results.0", false ] },
"$results",
[false]
]
}
}},
// Unwind the array
{ "$unwind": "$results" },
// Conditionally $push based on match expression and conditionally count
{ "$group": {
"_id": "_id",
"name": { "$first": "$name" },
"user": { "$first": "$user" },
"results": {
"$push": {
"$cond": [
{ "$or": [
{ "$not": "$results" },
{ "$and": [
{ "$eq": [ "$results.year", year ] },
{ "$gt": [
{ "$ifNull": [ "$results.week", week+1 ] },
week
]}
]}
] },
"$results",
false
]
}
},
"count": {
"$sum": {
"$cond": [
{ "$and": [
{ "$eq": [ "$results.year", year ] },
{ "$gt": [
{ "$ifNull": [ "$results.week", week+1 ] },
week
]}
] }
1,
0
]
}
}
}},
// $unwind again
{ "$unwind": "$results" }
// Filter out false items unless count is 0
{ "$match": {
"$or": [
"$results",
{ "count": 0 }
]
}},
// Group again
{ "$group": {
"_id": "_id",
"name": { "$first": "$name" },
"user": { "$first": "$user" },
"results": { "$push": "$results" }
}},
// Now swap [false] for []
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$cond": [
{ "$ne": [ "$results", [false] ] },
"$results",
[]
]
}
}}
])
Now that is a lot of operations and shuffling just to "filter" content from an array compared to all of the other approaches which are really quite simple. And aside from the complexity, it really does "cost" a lot more to execute on the server.
So if your server version actually supports the newer operators that can do this optimally, then it's okay to do so. But if you are stuck with that last process, then you probably should not be doing it and instead do your array filtering in the client.

Count how many documents contain a field

I have these three MongoDB documents:
{
"_id" : ObjectId("571094afc2bcfe430ddd0815"),
"name" : "Barry",
"surname" : "Allen",
"address" : [
{
"street" : "Red",
"number" : NumberInt(66),
"city" : "Central City"
},
{
"street" : "Yellow",
"number" : NumberInt(7),
"city" : "Gotham City"
}
]
}
{
"_id" : ObjectId("57109504c2bcfe430ddd0816"),
"name" : "Oliver",
"surname" : "Queen",
"address" : {
"street" : "Green",
"number" : NumberInt(66),
"city" : "Star City"
}
}
{
"_id" : ObjectId("5710953ac2bcfe430ddd0817"),
"name" : "Tudof",
"surname" : "Unknown",
"address" : "homeless"
}
The address field is an Array of Objects in the first document, an Object in the second and a String in the third.
My target is to find how many documents of my collection containinig the field address.street. In this case the right count is 1 but with my query I get two:
db.coll.find({"address.street":{"$exists":1}}).count()
I also tried map/reduce. It works but it is slower; so if it is possible, I would avoid it.
The distinction here is that the .count() operation is actually "correct" in returning the "document" count where the field is present. So the general considerations break down to:
If you just want to exlude the documents with the array field
Then the most effective way of excluding those documents where the "street" was a property of the "address" as an "array", then just use the dot-notation property of looking for the 0 index to not exist in the exlcusion:
db.coll.find({
"address.street": { "$exists": true },
"address.0": { "$exists": false }
}).count()
As a natively coded operator test in both cases $exists does the correct job and efficiently.
If you intended to count field occurences
If what you are actually asking is the "field count", where some "documents" contain array entries where that "field" may be present several times.
For that you need the aggregation framework or mapReduce like you mention. MapReduce uses JavaScript based processing and is therefore going to be considerably slower than the .count() operation. The aggregation framework also needs to calculate and "will" be slower than .count(), but not by as much as mapReduce.
In MongoDB 3.2 you get some help here by the expanded ability of $sum to work on an array of values as well as being an grouping accumulator. The other helper here is $isArray which allows a different processing method via $map when the data is in fact "an array":
db.coll.aggregate([
{ "$group": {
"_id": null,
"count": {
"$sum": {
"$sum": {
"$cond": {
"if": { "$isArray": "$address" },
"then": {
"$map": {
"input": "$address",
"as": "el",
"in": {
"$cond": {
"if": { "$ifNull": [ "$$el.street", false ] },
"then": 1,
"else": 0
}
}
}
},
"else": {
"$cond": {
"if": { "$ifNull": [ "$address.street", false ] },
"then": 1,
"else": 0
}
}
}
}
}
}
}}
])
Earlier versions hinge on a bit more conditional processing in order to treat the array and non-array data differently, and generally require $unwind to process array entries.
Either transposing the array via $map with MongoDB 2.6:
db.coll.aggregate([
{ "$project": {
"address": {
"$cond": {
"if": { "$ifNull": [ "$address.0", false ] },
"then": "$address",
"else": {
"$map": {
"input": ["A"],
"as": "el",
"in": "$address"
}
}
}
}
}},
{ "$unwind": "$address" },
{ "$group": {
"_id": null,
"count": {
"$sum": {
"$cond": {
"if": { "$ifNull": [ "$address.street", false ] },
"then": 1,
"else": 0
}
}
}
}}
])
Or providing conditional selection with MongoDB 2.2 or 2.4:
db.coll.aggregate([
{ "$group": {
"_id": "$_id",
"address": {
"$first": {
"$cond": [
{ "$ifNull": [ "$address.0", false ] },
"$address",
{ "$const": [null] }
]
}
},
"other": {
"$push": {
"$cond": [
{ "$ifNull": [ "$address.0", false ] },
null,
"$address"
]
}
},
"has": {
"$first": {
"$cond": [
{ "$ifNull": [ "$address.0", false ] },
1,
0
]
}
}
}},
{ "$unwind": "$address" },
{ "$unwind": "$other" },
{ "$group": {
"_id": null,
"count": {
"$sum": {
"$cond": [
{ "$eq": [ "$has", 1 ] },
{ "$cond": [
{ "$ifNull": [ "$address.street", false ] },
1,
0
]},
{ "$cond": [
{ "$ifNull": [ "$other.street", false ] },
1,
0
]}
]
}
}
}}
])
So the latter form "should" perform a bit better than mapReduce, but probably not by much.
In all cases the logic falls to using $ifNull as the "logical" form of $exists for the aggregation framework. Paired with $cond, a "truthfull" result is obtained when the property actually exsists, and a false value is returned when it is not. This determines whether 1 or 0 is returned respectively to the overall accumulation via $sum.
Ideally you have the modern version that can do this in a single $group pipeline stage, but otherwise you need the longer path.
Can you try this:
db.getCollection('collection_name').find({
"address.street":{"$exists":1},
"$where": "Array.isArray(this.address) == false && typeof this.address === 'object'"
});
In where clause, we are excluding if address is array and
Including address if it's type is object.

MongoDB: How to Get the Lowest Value Closer to a given Number and Decrement by 1 Another Field

Given the following document containing 3 nested documents...
{ "_id": ObjectId("56116d8e4a0000c9006b57ac"), "name": "Stock 1", "items" [
{ "price": 1.50, "description": "Item 1", "count": 10 }
{ "price": 1.70, "description": "Item 2", "count": 13 }
{ "price": 1.10, "description": "Item 3", "count": 20 }
]
}
... I need to select the sub-document with the lowest price closer to a given amount (here below I assume 1.05):
db.stocks.aggregate([
{$unwind: "$items"},
{$sort: {"items.price":1}},
{$match: {"items.price": {$gte: 1.05}}},
{$group: {
_id:0,
item: {$first:"$items"}
}},
{$project: {
_id: "$item._id",
price: "$item.price",
description: "$item.description"
}}
]);
This works as expected and here is the result:
"result" : [
{
"price" : 1.10,
"description" : "Item 3",
"count" : 20
}
],
"ok" : 1
Alongside returning the item with the lowest price closer to a given amount, I need to decrement count by 1. For instance, here below is the result I'm looking for:
"result" : [
{
"price" : 1.10,
"description" : "Item 3",
"count" : 19
}
],
"ok" : 1
It depends on whether you actually want to "update" the result or simply "return" the result with a decremented value. In the former case you will of course need to go back to the document and "decrement" the value for the returned result.
Also want to note that what you "think" is efficient here is actually not. Doing the "filter" of elements "post sort" or even "post unwind" really makes no difference at all to how the $first accumulator works in terms of performance.
The better approach is to basically "pre filter" the values from the array where possible. This reduces the document size in the aggregation pipeline, and the number of array elements to be processed by $unwind:
db.stocks.aggregate([
{ "$match": {
"items.price": { "$gte": 1.05 }
}},
{ "$project": {
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "item",
"in": {
"$cond": [
{ "$gte": [ "$$item.price", 1.05 ] }
],
"$$item",
false
}
}},
[false]
]
}
}},
{ "$unwind": "$items"},
{ "$sort": { "items.price":1 } },
{ "$group": {
"_id": 0,
"item": { "$first": "$items" }
}},
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description"
}}
]);
Of course that does require a MongoDB version 2.6 or greater server to have the available operators, and going by your output you may have an earlier version. If that is the case then at least loose the $match as it does not do anything of value and would be detremental to performance.
Where a $match is useful, is in the document selection before you do anything, as what you always want to avoid is processing documents that do not even possibly meet the conditions you want from within the array or anywhere else. So you should always $match or use a similar query stage first.
At any rate, if all you wanted was a "projected result" then just use $subtract in the output:
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description",
"count": { "$subtract": [ "$item.count", 1 ] }
}}
If you wanted however to "update" the result, then you would be iterating the array ( it's still an array even with one result ) to update the matched item and "decrement" the count via $inc:
var result = db.stocks.aggregate([
{ "$match": {
"items.price": { "$gte": 1.05 }
}},
{ "$project": {
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "item",
"in": {
"$cond": [
{ "$gte": [ "$$item.price", 1.05 ] }
],
"$$item",
false
}
}},
[false]
]
}
}},
{ "$unwind": "$items"},
{ "$sort": { "items.price":1 } },
{ "$group": {
"_id": 0,
"item": { "$first": "$items" }
}},
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description"
}}
]);
result.forEach(function(item) {
db.stocks.update({ "item._id": item._id},{ "$inc": { "item.$.count": -1 }})
})
And on a MongoDB 2.4 shell, your same aggregate query applies ( but please make the changes ) however the result contains another field called result inside it with the array, so add the level:
result.result.forEach(function(item) {
db.stocks.update({ "item._id": item._id},{ "$inc": { "item.$.count": -1 }})
})
So either just $project for display only, or use the returned result to effect an .update() on the data as required.

How to find document and single subdocument matching given criterias in MongoDB collection

I have collection of products. Each product contains array of items.
> db.products.find().pretty()
{
"_id" : ObjectId("54023e8bcef998273f36041d"),
"shop" : "shop1",
"name" : "product1",
"items" : [
{
"date" : "01.02.2100",
"purchasePrice" : 1,
"sellingPrice" : 10,
"count" : 15
},
{
"date" : "31.08.2014",
"purchasePrice" : 10,
"sellingPrice" : 1,
"count" : 5
}
]
}
So, can you please give me an advice, how I can query MongoDB to retrieve all products with only single item which date is equals to the date I pass to query as parameter.
The result for "31.08.2014" must be:
{
"_id" : ObjectId("54023e8bcef998273f36041d"),
"shop" : "shop1",
"name" : "product1",
"items" : [
{
"date" : "31.08.2014",
"purchasePrice" : 10,
"sellingPrice" : 1,
"count" : 5
}
]
}
What you are looking for is the positional $ operator and "projection". For a single field you need to match the required array element using "dot notation", for more than one field use $elemMatch:
db.products.find(
{ "items.date": "31.08.2014" },
{ "shop": 1, "name":1, "items.$": 1 }
)
Or the $elemMatch for more than one matching field:
db.products.find(
{ "items": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ "shop": 1, "name":1, "items.$": 1 }
)
These work for a single array element only though and only one will be returned. If you want more than one array element to be returned from your conditions then you need more advanced handling with the aggregation framework.
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$unwind": "$items" },
{ "$match": { "items.date": "31.08.2014" } },
{ "$group": {
"_id": "$_id",
"shop": { "$first": "$shop" },
"name": { "$first": "$name" },
"items": { "$push": "$items" }
}}
])
Or possibly in shorter/faster form since MongoDB 2.6 where your array of items contains unique entries:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$project": {
"shop": 1,
"name": 1,
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.date", "31.08.2014" ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])
Or possibly with $redact, but a little contrived:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$redact": {
"$cond": [
{ "$eq": [ { "$ifNull": [ "$date", "31.08.2014" ] }, "31.08.2014" ] },
"$$DESCEND",
"$$PRUNE"
]
}}
])
More modern, you would use $filter:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$addFields": {
"items": {
"input": "$items",
"cond": { "$eq": [ "$$this.date", "31.08.2014" ] }
}
}}
])
And with multiple conditions, the $elemMatch and $and within the $filter:
db.products.aggregate([
{ "$match": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ "$addFields": {
"items": {
"input": "$items",
"cond": {
"$and": [
{ "$eq": [ "$$this.date", "31.08.2014" ] },
{ "$eq": [ "$$this.purchasePrice", 1 ] }
]
}
}
}}
])
So it just depends on whether you always expect a single element to match or multiple elements, and then which approach is better. But where possible the .find() method will generally be faster since it lacks the overhead of the other operations, which in those last to forms does not lag that far behind at all.
As a side note, your "dates" are represented as strings which is not a very good idea going forward. Consider changing these to proper Date object types, which will greatly help you in the future.
Based on Neil Lunn's code I work with this solution, it includes automatically all first level keys (but you could also exclude keys if you want):
db.products.find(
{ "items.date": "31.08.2014" },
{ "shop": 1, "name":1, "items.$": 1 }
{ items: { $elemMatch: { date: "31.08.2014" } } },
)
With multiple requirements:
db.products.find(
{ "items": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ items: { $elemMatch: { "date": "31.08.2014", "purchasePrice": 1 } } },
)
Mongo supports dot notation for sub-queries.
See: http://docs.mongodb.org/manual/reference/glossary/#term-dot-notation
Depending on your driver, you want something like:
db.products.find({"items.date":"31.08.2014"});
Note that the attribute is in quotes for dot notation, even if usually your driver doesn't require this.