Let's say you have a collections with thousands of football players like these two
[
{
"_id" : ObjectId("5e19d76fa45abb5d4d50c1d3"),
"name" : "Leonel Messi",
"country" : "Argentina",
"awards" : [
{
"award" : "Ballon d'Or",
"year" : 1972
},
{
"award" : "Golden Boot",
"year" : 1971
},
{
"award" : "FIFA World Player of the Year",
"year" : 1988
}
]
},
{
"_id" : ObjectId("53w9d76fa45abb5d4d30c112"),
"name" : "Lars Sørensen",
"country" : "Denmark",
"awards" : [
{
"award" : "Ballon d'Or",
"year" : 1971
},
]
}
]
"awards" can contain any number of objects.
I would like to return all the players, with a boolean property on whether they have won the "Golden Boot" award or not. So something like this:
[
{
"name" : "Leonel Messi",
"won_golden_boot" : true,
},
{
"name" : "Lars Sørensen",
"won_golden_boot" : false,
}
]
But I struggle to figure out how I use the aggregation stages to do this? Do I use $map? $in? and if so, how would they come ind her:
{ $sort: { name: 1 } },
// What goes here??
{
$project: {
name: "$name",
won_golden_boot: "$won?",
}
},
You can use this aggregation query:
This query use $project as you have done but for won_golden_boot do a condition $cond checking if exists an award called Golden Boot using $in operator.
db.collection.aggregate([
{
"$project": {
"name": "$name",
"won_golden_boot": {
"$cond": {
"if": {
"$in": [
"Golden Boot",
"$awards.award"
]
},
"then": true,
"else": false
}
}
}
}
])
Example here
Edit:
I know this is question is specifically to use aggregation, but in case is useful for somebody is possible to do this using find like this example
Related
This is the bios collection that I found on the official MongoDB documentation:
{
"_id" : 1,
"name" : {
"first" : "John",
"last" : "Backus"
},
"awards" : [
{
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society",
"monetaryRewards": false
}
]
},
{
"_id" : 2,
"name" : {
"first" : "John",
"last" : "McCarthy"
},
"awards" : [
{
"award" : "Turing Award",
"year" : 1971,
"by" : "ACM",
"monetaryRewards": true
},
{
"award" : "Kyoto Prize",
"year" : 1988,
"by" : "Inamori Foundation",
"monetaryRewards": true
}
]
},
{
"_id" : 3,
"name" : {
"first" : "Grace",
"last" : "Hopper"
},
"awards" : [
{
"award" : "Computer Sciences Man of the Year",
"year" : 1969,
"by" : "Data Processing Management Association",
"monetaryRewards": true
},
{
"award" : "Distinguished Fellow",
"year" : 1973,
"by" : " British Computer Society",
"monetaryRewards": true
},
{
"award" : "W. W. McDowell Award",
"year" : 1976,
"by" : "IEEE Computer Society",
"monetaryRewards": false
}
]
}
I am trying to write a query that will allow me to retrieve all documents for which all monetaryRewards are true. Thus, considering the previous documents:
_id = 1 -> there is only one monetaryRewards and it is false. The query should not select it;
_id = 2 -> There are three awards and the monetaryRewards fields are always true. The query should select it;
_id = 3 -> There are three awards where monetaryRewards is twice true and once false. The query should not select it;
I wrote the following query:
db.bios.find( { awards: {$not: {$elemMatch:{"monetaryRewards":false}}}} )
The query works correctly. Later I realised that my bios collection might also not contain the monetaryRewards field, for example there could be another document:
{
"_id" : 4,
"name" : {
"first" : "Kristen",
"last" : "Nygaard"
},
"awards" : [
{
"award" : "Rosing Prize",
"year" : 1999,
"by" : "Norwegian Data Association",
"monetaryRewards":true
},
{
"award" : "Turing Award",
"year" : 2001,
"by" : "ACM"
}
}
In this situation my query fails because it takes the lack of monetaryRewards as true while in my case it should be false.
How can I fix my query?
Here you can find the mongoplayground where you can see that the document with _id=4 is incorrectly selected.
You can $map the awards.monetaryRewards to an auxiliary array of booleans. Use $ifNull to cater the missing field case. Then use $allElementsTrue to perform the filtering.
db.collection.aggregate([
{
"$addFields": {
"filterArr": {
"$map": {
"input": "$awards",
"as": "a",
"in": {
$ifNull: [
"$$a.monetaryRewards",
false
]
}
}
}
}
},
{
"$match": {
$expr: {
"$allElementsTrue": "$filterArr"
}
}
},
{
$project: {
filterArr: false
}
}
])
Here is the Mongo playground for your reference.
I have a collection of documents, each has a field which is an array of subdocuments, and all subdocuments have a common field 'status'. I want to find all documents that have the same status for all subdocuments.
collection:
{
"name" : "John",
"wives" : [
{
"name" : "Mary",
"status" : "dead"
},
{
"name" : "Anne",
"status" : "alive"
}
]
},
{
"name" : "Bill",
"wives" : [
{
"name" : "Mary",
"status" : "dead"
},
{
"name" : "Anne",
"status" : "dead"
}
]
},
{
"name" : "Mohammed",
"wives" : [
{
"name" : "Jane",
"status" : "dead"
},
{
"name" : "Sarah",
"status" : "dying"
}
]
}
I want to check if all wives are dead and find only Bill.
You can use the following aggregation query to get records of person whose wives are all dead:
db.collection.aggregate(
{$project: {name:1, wives:1, size:{$size:'$wives'}}},
{$unwind:'$wives'},
{$match:{'wives.status':'dead'}},
{$group:{_id:'$_id',name:{$first:'$name'}, wives:{$push: '$wives'},size:{$first:'$size'},count:{$sum:1}}},
{$project:{_id:1, wives:1, name:1, cmp_value:{$cmp:['$size','$count']}}},
{$match:{cmp_value:0}}
)
Output:
{ "_id" : ObjectId("56d401de8b953f35aa92bfb8"), "name" : "Bill", "wives" : [ { "name" : "Mary", "status" : "dead" }, { "name" : "Anne", "status" : "dead" } ], "cmp_value" : 0 }
If you need to find records of users who has same status, then you may remove the initial match stage.
The most efficient way to handle this is always going to be to "match" on the status of "dead" as the opening query, otherwise you are processing items that cannot possibly match, and the logic really quite simply followed with $map and $allElementsTrue:
db.collection.aggregate([
{ "$match": { "wives.status": "dead" } },
{ "$redact": {
"$cond": {
"if": {
"$allElementsTrue": {
"$map": {
"input": "$wives",
"as": "wife",
"in": { "$eq": [ "$$wife.status", "dead" ] }
}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
Or the same thing with $where:
db.collection.find({
"wives.status": "dead",
"$where": function() {
return this.wives.length
== this.wives.filter(function(el) {
el.status == "dead";
}).length;
}
})
Both essentially test the "status" value of all elements to make sure they match in the fastest possible way. But the aggregate pipeline with just $match and $redact should be faster. And "less" pipeline stages ( essentially each a pass through the data ) means faster as well.
Of course keeping a property on the document is always fastest, but it would involve logic to set that only where "all elements" are the same property. Which of course would typically mean inspecting the document by loading it from the server prior to each update.
I am trying to formulate a query over the sample bios collection http://docs.mongodb.org/manual/reference/bios-example-collection/:
Retrieve all the persons who received two awards on the same year.
The expected answers are "Ole-Johan Dahl" and "Kristen Nygaard" as for instance the doc for Ole-Johan Dahl is
{
"_id" : 5,
"name" : {
"first" : "Ole-Johan",
"last" : "Dahl"
},
"birth" : ISODate("1931-10-12T04:00:00Z"),
"death" : ISODate("2002-06-29T04:00:00Z"),
"contribs" : [
"OOP",
"Simula"
],
"awards" : [
{
"award" : "Rosing Prize",
"year" : 1999,
"by" : "Norwegian Data Association"
},
{
"award" : "Turing Award",
"year" : 2001,
"by" : "ACM"
},
{
"award" : "IEEE John von Neumann Medal",
"year" : 2001,
"by" : "IEEE"
}
]
}
So far, the best query that I could come up with is the following query using aggregation framework:
db.bios.aggregate([
{$project : { "first_name": "$name.first", "last_name": "$name.last" , "award1" :"$awards", "award2" :"$awards" } },
{$unwind : "$award1"},
{$unwind : "$award2"},
{$project : { "first_name": 1, "last_name": 1, "award1" : 1, "award2" : 1,
"super" : { $and : [ {$eq : ["$award1.year", "$award2.year"]},
{$lt: ["$award1.award", "$award2.award"]}
]
}}
},
{$match : {"super": true}}
])
However I am not happy with this solution because
the query projects awards twice and unwind them in the following step. This will generate quadratic many intermediate documents;
the query computes an auxiliary field "super" which is only used for filtering afterwards.
Is there a better way to formulate this query?
Try the following aggregation pipeline:
db.bios.aggregate([
{
"$unwind": "$awards"
},
{
"$group": {
"_id": {
"year": "$awards.year",
"firstName": "$name.first",
"lastName": "$name.last"
},
"count": { "$sum": 1 },
"award_recepients": { "$push": "$name" }
}
},
{
"$match": { "count": 2 }
},
{
"$project": {
"_id": 0,
"year": "$_id.year",
"award_recepients": 1,
"count": 1
}
}
])
I'm just starting out with mongodb and have been reading through the documentation on aggregation but still struggling to relate equivalent knowledge of sql statements to the methods used in mongo.
I have this data:
{
"_id" : ObjectId("53ac7bce4eaf6de4d5601c19"),
"uid" : ObjectId("53ac7bb84eaf6de4d5601c15"),
"mid" : ObjectId("53ab27504eaf6de4d5601be4"),
"score" : 1
},{
"_id" : ObjectId("53ac7bce4eaf6de4d5601c1a"),
"uid" : ObjectId("53ac7bb84eaf6de4d5601c16"),
"mid" : ObjectId("53ab27504eaf6de4d5601be4"),
"score" : 5
}
...
And I'm trying to get to this result:
{
"uid" : ObjectId("53ac7bb84eaf6de4d5601c15"),
"uid_2" : ObjectId("53ac7bb84eaf6de4d5601c16"),
"mid" : ObjectId("53ab27504eaf6de4d5601be4"),
"score" : 1,
"score_2" : 5,
"difference" : 4
}
...
Where I am comparing every uid against every other uid around a single mid and calculating the difference in their scores (can't be a negative difference, only positive).
Most of the examples I'm running into don't quite fit my requirements and hoping some mongo guru can help me out. Thanks!
As stated, I think your data modelling is a little off here as you need something to "pair" the "matches" as it were. I have a "simplified" case here:
{
"_id" : ObjectId("53ae9da2e24682cac4215e0c"),
"match" : ObjectId("53ae9d78e24682cac4215e0b"),
"score" : 1
}
{
"_id" : ObjectId("53ae9da5e24682cac4215e0d"),
"match" : ObjectId("53ae9d78e24682cac4215e0b"),
"score" : 5
}
{
"_id" : ObjectId("53aea6cde24682cac4215e15"),
"match" : ObjectId("53aea6c1e24682cac4215e14"),
"score" : 2
}
{
"_id" : ObjectId("53aea6e4e24682cac4215e16"),
"match" : ObjectId("53aea6c1e24682cac4215e14"),
"score" : 1
}
{
"_id" : ObjectId("53aea6eae24682cac4215e18"),
"match" : ObjectId("53aea6e6e24682cac4215e17"),
"score" : 2
}
{
"_id" : ObjectId("53aea6ece24682cac4215e19"),
"match" : ObjectId("53aea6e6e24682cac4215e17"),
"score" : 2
}
What that basically represents is the scores for "six" teams in "three" distinct matches.
Given that, my take on getting to results would be this:
db.matches.aggregate([
// Group on matches and find the "min" and "max" score
{ "$group": {
"_id": "$match",
"teams": {
"$push": {
"_id": "$_id",
"score": "$score"
}
},
"minScore": { "$min": "$score" },
"maxScore": { "$max": "$score" }
}},
// Unwind the "teams" array created
{ "$unwind": "$teams" },
// Compare scores for "win", "loss" or "draw"
{ "$group": {
"_id": "$_id",
"win": {
"$min": { "$cond": [
{ "$and": [
{ "$eq": [ "$teams.score", "$maxScore" ] },
{ "$gt": [ "$teams.score", "$minScore" ] }
]},
"$teams",
false
]}
},
"loss": {
"$min": { "$cond": [
{ "$and": [
{ "$eq": [ "$teams.score", "$minScore" ] },
{ "$lt": [ "$teams.score", "$maxScore" ] }
]},
"$teams",
false
]}
},
"draw": {
"$push": { "$cond": [
{ "$eq": [ "$minScore", "$maxScore" ] },
"$teams",
false
]}
},
"difference": {
"$max": { "$subtract": [ "$maxScore", "$minScore" ] }
}
}},
// Just fix up those "draw" results with a [false,false] array
{ "$project": {
"win": 1,
"loss": 1,
"draw": { "$cond": [
{ "$gt": [
{ "$size": { "$setDifference": [ "$draw", [false] ] } },
0
]},
"$draw",
false
]},
"difference": 1
}}
])
And this gives you a quite nice result:
{
"_id" : ObjectId("53ae9d78e24682cac4215e0b"),
"win" : {
"_id" : ObjectId("53ae9da5e24682cac4215e0d"),
"score" : 5
},
"loss" : {
"_id" : ObjectId("53ae9da2e24682cac4215e0c"),
"score" : 1
},
"draw" : false,
"difference" : 4
}
{
"_id" : ObjectId("53aea6c1e24682cac4215e14"),
"win" : {
"_id" : ObjectId("53aea6cde24682cac4215e15"),
"score" : 2
},
"loss" : {
"_id" : ObjectId("53aea6e4e24682cac4215e16"),
"score" : 1
},
"draw" : false,
"difference" : 1
}
{
"_id" : ObjectId("53aea6e6e24682cac4215e17"),
"win" : false,
"loss" : false,
"draw" : [
{
"_id" : ObjectId("53aea6eae24682cac4215e18"),
"score" : 2
},
{
"_id" : ObjectId("53aea6ece24682cac4215e19"),
"score" : 2
}
],
"difference" : 0
}
That is essentially the results per "match" and determines the "difference" between winner and looser while identifying which team "won" or "lost". The final stage there uses some operators only introduced in MongoDB 2.6, but that really is not necessary if you do not have that version available. Or you could actually still do the same thing if you wanted to by using $unwind and some other processing.
db.test.aggregate([
{ "$unwind": "$Data" },
{ "$sort" : { "Data.cost" : -1 } },
{ "$group":{
"_id":"$Data.name",
"Data":{ "$push": "$Data" }
}}
])
I fired above query. It is giving me result as follows:
{
"result":[{
"_id" : "abc"
"Data" : [
{
"uid" : "1...A",
"name" : "abc",
"city" : "Paris",
"description" : {
"things" : [
{
"fruit" : {
"fruit_name" : "apple",
"fruit_rate" : "4 USD"
},
"flower" : {
"flower_name" : "rose",
"flower_rate" : "2 USD"
}
}
]
},
"cost" : "6 USD"
},
{
"uid" : "1...B",
"name" : "abc",
"city" : "Paris",
"description" : {
"things" : [
{
"fruit" : {
"fruit_name" : "cherry",
"fruit_rate" : "3 USD"
},
"flower" : {
"flower_name" : "orchid",
"flower_rate" : "2 USD"
}
}
]
},
"cost" : "5 USD"
}
]
}]
}
But I dont want result like this. I want to merge "description" array of both data if "name" is same.
like mention below:
{
"result":[{
"_id" : "abc"
"Data" : [
{
"uid" : "1...A",
"name" : "abc",
"city" : "Paris",
"description" : {
"things" : [
{
"fruit" : {
"fruit_name" : "apple",
"fruit_rate" : "4 USD"
},
"flower" : {
"flower_name" : "rose",
"flower_rate" : "2 USD"
}
}
]
},
"description" : {
"things" : [
{
"fruit" : {
"fruit_name" : "cherry",
"fruit_rate" : "3 USD"
},
"flower" : {
"flower_name" : "orchid",
"flower_rate" : "2 USD"
}
}
]
},
"cost" : "6 USD"
}
]
}]
}
Is it possible to get result like this? what changes I have to do in my query?
Thank you.
The way you have structured your desired result is simply not possible. The reason for this is you are basically breaking the principle behind a Hash Table or dictionary/associative array ( whatever term suits you better ) in that you cannot have more than one key value with the same name.
If you want multiple keys of the same name, then those must be contained within an array, which is very much similar to the sort of structure you have and also within your result. And that result doesn't really do anything other than sort the array elements and then group them back into an array.
So giving you a bit of headroom here for that you have simply done a copy and paste to represent your desired result, and that you actually want some form of merging of the inner elements, you can always do something like this:
db.test.aggregate([
{ "$unwind": "$Data" },
{ "$unwind": "$Data.description.things" },
{ "$group": {
"_id": "$Data.name",
"city": { "$first": "$Data.city" },
"things": { "$addToSet": "$Data.description.things" }
}}
])
Which produces a result:
{
"_id" : "abc",
"city" : "Paris",
"things" : [
{
"fruit" : {
"fruit_name" : "cherry",
"fruit_rate" : "3 USD"
},
"flower" : {
"flower_name" : "orchid",
"flower_rate" : "2 USD"
}
},
{
"fruit" : {
"fruit_name" : "apple",
"fruit_rate" : "4 USD"
},
"flower" : {
"flower_name" : "rose",
"flower_rate" : "2 USD"
}
}
]
}
So that has the inner "things" now "pushed" together into a singular array while grouping on a common element and adding some additional fields.
If you actually want something with even more "merging" and even possibly avoiding removal of duplicate "set" items, then you could further re-shape with a statement like this:
db.test.aggregate([
{ "$unwind": "$Data" },
{ "$unwind": "$Data.description.things" },
{ "$project": {
"name": "$Data.name",
"city": "$Data.city",
"things": "$Data.description.things",
"type": { "$literal": [ "flower", "fruit" ] }
}},
{ "$unwind": "$type" },
{ "$group": {
"_id": "$name",
"city": { "$first": "$city" },
"things": { "$push": { "$cond": [
{ "$eq": [ "$type", "flower" ] },
{
"type": "$type",
"name": "$things.flower.flower_name",
"rate": "$things.flower.flower_rate"
},
{
"type": "$type",
"name": "$things.fruit.fruit_name",
"rate": "$things.fruit.fruit_rate"
},
]}}
}}
])
Which gives a result:
{
"_id" : "abc",
"city" : "Paris",
"things" : [
{
"type" : "flower",
"name" : "rose",
"rate" : "2 USD"
},
{
"type" : "fruit",
"name" : "apple",
"rate" : "4 USD"
},
{
"type" : "flower",
"name" : "orchid",
"rate" : "2 USD"
},
{
"type" : "fruit",
"name" : "cherry",
"rate" : "3 USD"
}
]
}
Which would possibly even indicate how you original data would be better structured in the first place. Certainly you would need to re-shape like this if you wanted to do something like "Find the total value of 'cherries', or 'flowers' or 'fruit'" or whatever the type.
So the way you structured your result, not possible, your breaking the rules as mentioned. In the forms I have presented, well there are a few ways to do that.
P.S: I am deliberately staying away from your $sort representation as though it "sort of" worked for you in your initial example, do not expect this to work in wider examples as your value is a string and not a number. In short this means that "10 USD" is actually less than "4 USD" as that is how strings are lexically compared. i.e: 4 is greater than 1, which is the order in which the comparison is done.
So change these by splitting up your fields and using a numerical type, as in:
{
"type" : "fruit",
"name" : "cherry",
"rate" : 3,
"currency": "USD"
}
And you even get to filter on "currency" if that is required.
P.P.S: the $literal operator is a construct available for MongoDB 2.6 and upwards. In prior versions where that operator is not available, you can instead code that like this:
"type": { "$cond": [ 1, [ "flower", "fruit" ], 0 ] }
Which obscurely does that same thing as the returned true value from $cond (or even the false value ) is "literally" declared, so what you put there will actually be produced. In this case, it is a way of adding an "array" to the projection, which is wanted in order to match the "types".
You might find references on the net that use $const for this purpose, but I don't particularly trust that as, while it does exist, it was not intended for this purpose and is hence not officially documented.