I am trying to formulate a query over the sample bios collection http://docs.mongodb.org/manual/reference/bios-example-collection/:
Retrieve all the persons who received two awards on the same year.
The expected answers are "Ole-Johan Dahl" and "Kristen Nygaard" as for instance the doc for Ole-Johan Dahl is
{
"_id" : 5,
"name" : {
"first" : "Ole-Johan",
"last" : "Dahl"
},
"birth" : ISODate("1931-10-12T04:00:00Z"),
"death" : ISODate("2002-06-29T04:00:00Z"),
"contribs" : [
"OOP",
"Simula"
],
"awards" : [
{
"award" : "Rosing Prize",
"year" : 1999,
"by" : "Norwegian Data Association"
},
{
"award" : "Turing Award",
"year" : 2001,
"by" : "ACM"
},
{
"award" : "IEEE John von Neumann Medal",
"year" : 2001,
"by" : "IEEE"
}
]
}
So far, the best query that I could come up with is the following query using aggregation framework:
db.bios.aggregate([
{$project : { "first_name": "$name.first", "last_name": "$name.last" , "award1" :"$awards", "award2" :"$awards" } },
{$unwind : "$award1"},
{$unwind : "$award2"},
{$project : { "first_name": 1, "last_name": 1, "award1" : 1, "award2" : 1,
"super" : { $and : [ {$eq : ["$award1.year", "$award2.year"]},
{$lt: ["$award1.award", "$award2.award"]}
]
}}
},
{$match : {"super": true}}
])
However I am not happy with this solution because
the query projects awards twice and unwind them in the following step. This will generate quadratic many intermediate documents;
the query computes an auxiliary field "super" which is only used for filtering afterwards.
Is there a better way to formulate this query?
Try the following aggregation pipeline:
db.bios.aggregate([
{
"$unwind": "$awards"
},
{
"$group": {
"_id": {
"year": "$awards.year",
"firstName": "$name.first",
"lastName": "$name.last"
},
"count": { "$sum": 1 },
"award_recepients": { "$push": "$name" }
}
},
{
"$match": { "count": 2 }
},
{
"$project": {
"_id": 0,
"year": "$_id.year",
"award_recepients": 1,
"count": 1
}
}
])
Related
I am learning MongoDB NoSQL and I am stuck in a problem.
Consider these documents:
{
"_id" : ObjectId("63aad45c008cdce77c2c3f9e"),
"title" : "The Express",
"year" : 2008,
"cast" : "Dennis Quaid",
"genres" : "Sports"
},
{
"_id" : ObjectId("63aad45c008cdce77c2c3fa0"),
"title" : "The Express",
"year" : 2008,
"cast" : "Rob Brown",
"genres" : "Sports"
},
{
"_id" : ObjectId("63aad45c008cdce77c2c3fa2"),
"title" : "The Express",
"year" : 2008,
"cast" : "Omar Benson Miller",
"genres" : "Sports"
},
{
"_id" : ObjectId("63aad45c008cdce77c2c416e"),
"title" : "Semi-Pro",
"year" : 2008,
"cast" : "Will Ferrell",
"genres" : "Sports"
},
{
"_id" : ObjectId("63aad45c008cdce77c2c4170"),
"title" : "Semi-Pro",
"year" : 2008,
"cast" : "Woody Harrelson",
"genres" : "Sports"
},
{
"_id" : ObjectId("63aad45c008cdce77c2c4172"),
"title" : "Semi-Pro",
"year" : 2008,
"cast" : "André Benjamin",
"genres" : "Sports"
}
I am trying to group by "year" and "genres", and count all "title" without repetition.
The code that I try is this:
var query1 = {$group: {"_id": { "year": "$year", "genre": "$genres"}, "count": {$sum:1}}}
var stages = [query1]
db.genres.aggregate(stages)
But this is grouping all the documents and the value of "count" that I get is six when I only have two titles different.
I do not know how to get title with no repeat..
The expect output is as follows:
{
"_id":{
"year": 2008
"genre": "Sports"
},
"count": 2
}
However, with the code that I tried, the output is this:
{
"_id":{
"year": 2008
"genre": "Sports"
},
"count": 6
}
This is wrong, because I only have two different titles in the documents.
How can I solve this? How can I get titles without repetition and with this output?
Thanks so much! Whatever you need to ask, do it please... I am really stuck and I want to learn to do it.
I am trying to group by "year" and "genres", and count all "title" without repetition.
...
But this is grouping all the documents and the value of "count" that I get is six when I only have two titles different.
It sounds to me like you will need to de-duplicate by title before performing this final count. Assuming that different movies never have the same title, something like this would perform that de-duplication:
db.collection.aggregate([
{
$group: {
_id: "$title",
year: {
$first: "$year"
},
genre: {
$first: "genre"
},
}
},
{
$group: {
"_id": {
"year": "$year",
"genre": "$genres",
},
"count": {
$sum: 1
}
}
}
])
The playground demonstration here shows the output is as expected:
[
{
"_id": {
"genre": "Sports",
"year": 2008
},
"count": 2
}
]
Alternatively you could generate an array with distinct values for the movie titles in your current grouping and then calculate its size afterwards. Again with the same assumption about movie titles from above, something like this:
db.collection.aggregate([
{
$group: {
"_id": {
"year": "$year",
"genre": "$genres",
},
"count": {
"$addToSet": "$title"
}
}
},
{
"$addFields": {
"count": {
$size: "$count"
}
}
}
])
Playground demonstration here (with the same output from the previous example).
Let's say you have a collections with thousands of football players like these two
[
{
"_id" : ObjectId("5e19d76fa45abb5d4d50c1d3"),
"name" : "Leonel Messi",
"country" : "Argentina",
"awards" : [
{
"award" : "Ballon d'Or",
"year" : 1972
},
{
"award" : "Golden Boot",
"year" : 1971
},
{
"award" : "FIFA World Player of the Year",
"year" : 1988
}
]
},
{
"_id" : ObjectId("53w9d76fa45abb5d4d30c112"),
"name" : "Lars Sørensen",
"country" : "Denmark",
"awards" : [
{
"award" : "Ballon d'Or",
"year" : 1971
},
]
}
]
"awards" can contain any number of objects.
I would like to return all the players, with a boolean property on whether they have won the "Golden Boot" award or not. So something like this:
[
{
"name" : "Leonel Messi",
"won_golden_boot" : true,
},
{
"name" : "Lars Sørensen",
"won_golden_boot" : false,
}
]
But I struggle to figure out how I use the aggregation stages to do this? Do I use $map? $in? and if so, how would they come ind her:
{ $sort: { name: 1 } },
// What goes here??
{
$project: {
name: "$name",
won_golden_boot: "$won?",
}
},
You can use this aggregation query:
This query use $project as you have done but for won_golden_boot do a condition $cond checking if exists an award called Golden Boot using $in operator.
db.collection.aggregate([
{
"$project": {
"name": "$name",
"won_golden_boot": {
"$cond": {
"if": {
"$in": [
"Golden Boot",
"$awards.award"
]
},
"then": true,
"else": false
}
}
}
}
])
Example here
Edit:
I know this is question is specifically to use aggregation, but in case is useful for somebody is possible to do this using find like this example
I have a JSON data like this and i wanted to apply aggregation on this data in such a way that i should group by from data:
{
"series": [
{
"id": "1",
"element": "111",
"data": [
{
"timeFrame": {
"from": "2016-01-01T00:00:00Z",
"to": "2016-01-31T23:59:59Z"
},
"value": 1
},
{
"timeFrame": {
"from": "2016-02-01T00:00:00Z",
"to": "2016-02-29T23:59:59Z"
},
"value": 2
}
]
}
]
}
and i have acheived this by the above aggregation:
db.getCollection('col1').aggregate([
{$unwind: "$data"},
{$group :{
element: {$first:"$relatedElement"},
_id : {
day : {$dayOfMonth: "$values.timeFrame.from"},
month:{$month: "$values.timeFrame.from"},
year:{$year: "$values.timeFrame.from"}
},
fromDate : { $first : "$values.timeFrame.from" },
total : {$sum : "$values.value"},
count : {$sum : 1},
}
},
{
$project: {
_id : 0,
element:1,
fromDate : '$fromDate',
avgValue : { $divide: [ "$total", "$count" ] }
}
}])
OutPut:
{
"id" : "1",
"element" : "3",
"fromDate" : ISODate("2017-05-01T00:00:00.000Z"),
"avgValue" : 0.0378787878787879
}
{
"id" : "1",
"element" : "3",
"fromDate" : ISODate("2017-04-30T22:00:00.000Z"),
"avgValue" : 0.416666666666667
}
But, i am getting two document and this i want to merge as a single document like :
{
"id" : "1",
"element" : "3",
"average" : [
{
"fromDate" : ISODate("2017-05-01T00:00:00.000Z"),
"avgValue" : 0.0378787878787879
},
{
"fromDate" : ISODate("2017-04-30T22:00:00.000Z"),
"avgValue" : 0.416666666666667
}
]
}
Can anyone help me on this.
Add following $group at the end of your aggregate pipeline to merge current output documents into single document -
{$group:{
_id:"$_id",
element: {$first: "$element"},
average:{$push:{
"fromDate": "$fromDate",
"avgValue": "$avgValue"
}}
}}
I have this simple Mongodb document:
{
"_id" : ObjectId("55663d9361cfa81a5c48d54f")
"name" : "Oliver",
"surname" : "Queen",
"age" : 25,
"friends" : [
{
"name" : "Jhon",
"surname" : "Diggle",
"age" : "30"
},
{
"name" : "Barry",
"surname" : "Allen",
"age" : "24"
}
]
}
Is it possbile, using denormalized model as above, to find all Oliver's friends with 24 years old?
I think it's really simple with normalized model; it's enough to do two queries.
For example the following query:
db.collection.find({name:"Oliver", "friends.age":24}, {_id:0, friends:1})
returns an array of Oliver's friends. Is it possible to make a selection of the internal document?
Using aggregation
db.collection.aggregate(
[
{ $match: { "name": "Oliver" }},
{ $unwind: "$friends" },
{ $match: { "friends.age": 24 }},
{ $group: { "_id": "$_id", friends: { "$push": "$friends" }}},
{ $project: { "_id": 0, "friends": 1 }}
]
)
I'm trying to query a collection for a specific document that contains a sub-document. The sub-document contains values for which I'd like to obtain
the highest and lowest scores from that sub-document and return that result as virtual fields to the original document.
I have the following dataset:
{
"_id" : "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e",
"name" : "Addison Hunt",
"tests" : [
{
"name" : "lorem",
"score" : 79
},
{
"name" : "vallum",
"score" : 100
},
{
"name" : "ipsum",
"score" : 65
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU"
}
In mongo 2.4, how can I query mongo once to return the following result:
{
"_id" : "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e",
"name" : "Addison Hunt",
"tests" : [
{
"name" : "lorem",
"score" : 79
},
{
"name" : "vallum",
"score" : 100
},
{
"name" : "ipsum",
"score" : 65
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
"worst_test": {
"name" : "ipsum",
"score" : 65
},
"best_test": {
"name" : "vallum",
"score" : 100
}
}
Where "best_test" and "worst_test" are virtual fields representing the tests with the highest and lowest scores, respectively.
I've tried with many different ways and the closest I've gotten is with this query:
db.students.aggregate([
{ $match: {
'_id': 'd0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e'
}},
{ $unwind: '$tests' },
{ $sort: {'tests.score': 1} },
{ $group: {
_id: '$_id',
student_tests: {$push: "$$ROOT"},
worst_test: {$first: '$tests'},
best_test: { $last: '$tests' }
}}
]);
Which yields this result:
{
"_id" : "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e",
"student_tests" : [
{
"name" : "Addison Hunt",
"tests" : [
{
"name" : "ipsum",
"score" : 65
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
},
{
"name" : "Addison Hunt",
"tests" : [
{
"name" : "lorem",
"score" : 79
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
},
{
"name" : "Addison Hunt",
"tests" : [
{
"name" : "vallum",
"score" : 100
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
},
],
"worst_test": {
"name" : "ipsum",
"score" : 65
},
"best_test": {
"name" : "vallum",
"score" : 100
}
}
If you are using $$ROOT then in fact you are using MongoDB 2.6 as this is an aggregation variable only introduced in that version.
But while handy for various things, all it does is represent the entire document at the present stage of the pipeline where used. To do what you want and return the original document unmodified but with additional fields, you could use it in $project stage before the $unwind to assign to the _id field, but really you don't have exactly the same document as you would still need to $project at the end in order to get the correct document shape out of those elements.
You best bet is just projecting the fields, but keeping an un-altered copy of the array before any $sort is applied:
db.students.aggregate([
{ "$match": {
"_id": "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e"
}},
{ "$project": {
"name": 1,
"tests": 1,
"created_at": 1,
"class": 1,
"user_id": 1,
"testCopy": "$tests"
}},
{ "$unwind": "$testCopy" },
{ "$sort": { "testCopy.score": 1 } },
{ "$group": {
"_id: "$_id",
"tests": { "$first": "$tests" },
"created_at": { "$first": "$created_at" },
"class": { "$first": "$class" },
"user_id": { "$first": "$user_id" },
"worst_test": { "$first": "$testCopy" },
"best_test": { "$last": "$testCopy" }
}}
]);
Or using $$ROOT as mentioned before, alternately just placing the fields under the _id individually in the $project:
db.students.aggregate([
{ "$match": {
"_id": "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e"
}},
{ "$project": {
"_id": "$$ROOT",
"tests": 1
}},
{ "$unwind": "$tests" },
{ "$sort": { "tests.score": 1 } },
{ "$group": {
"_id": "$_id",
"aworst_test": { "$first": "$tests" },
"abest_test": { "$last": "$tests" }
}},
{ "$project": {
"_id": "$_id._id",
"tests": "$_id.tests",
"created_at": "$_id.created_at",
"class": "$_id.class",
"user_id": "$_id.user_id",
"worst_test": "$aworst_test",
"best_test": "$abest_test"
}}
]);
But as you see, you are still doing the $project work somewhere in order to get the structure you want, as well as the "renamed fields" to maintain the field order you want as the $project will otherwise "optimize" and "keep" any fields that have not been renamed and "append" new fields after the existing ones.
There really is no simple way to "get all fields" in the same way as you originally found them. Operations like $project and $group are an "all or nothing" affair, where they only explicitly produce what you tell them to.