MongoDB Question Aggregation Without Repetition - mongodb

I am learning MongoDB NoSQL and I am stuck in a problem.
Consider these documents:
{
"_id" : ObjectId("63aad45c008cdce77c2c3f9e"),
"title" : "The Express",
"year" : 2008,
"cast" : "Dennis Quaid",
"genres" : "Sports"
},
{
"_id" : ObjectId("63aad45c008cdce77c2c3fa0"),
"title" : "The Express",
"year" : 2008,
"cast" : "Rob Brown",
"genres" : "Sports"
},
{
"_id" : ObjectId("63aad45c008cdce77c2c3fa2"),
"title" : "The Express",
"year" : 2008,
"cast" : "Omar Benson Miller",
"genres" : "Sports"
},
{
"_id" : ObjectId("63aad45c008cdce77c2c416e"),
"title" : "Semi-Pro",
"year" : 2008,
"cast" : "Will Ferrell",
"genres" : "Sports"
},
{
"_id" : ObjectId("63aad45c008cdce77c2c4170"),
"title" : "Semi-Pro",
"year" : 2008,
"cast" : "Woody Harrelson",
"genres" : "Sports"
},
{
"_id" : ObjectId("63aad45c008cdce77c2c4172"),
"title" : "Semi-Pro",
"year" : 2008,
"cast" : "André Benjamin",
"genres" : "Sports"
}
I am trying to group by "year" and "genres", and count all "title" without repetition.
The code that I try is this:
var query1 = {$group: {"_id": { "year": "$year", "genre": "$genres"}, "count": {$sum:1}}}
var stages = [query1]
db.genres.aggregate(stages)
But this is grouping all the documents and the value of "count" that I get is six when I only have two titles different.
I do not know how to get title with no repeat..
The expect output is as follows:
{
"_id":{
"year": 2008
"genre": "Sports"
},
"count": 2
}
However, with the code that I tried, the output is this:
{
"_id":{
"year": 2008
"genre": "Sports"
},
"count": 6
}
This is wrong, because I only have two different titles in the documents.
How can I solve this? How can I get titles without repetition and with this output?
Thanks so much! Whatever you need to ask, do it please... I am really stuck and I want to learn to do it.

I am trying to group by "year" and "genres", and count all "title" without repetition.
...
But this is grouping all the documents and the value of "count" that I get is six when I only have two titles different.
It sounds to me like you will need to de-duplicate by title before performing this final count. Assuming that different movies never have the same title, something like this would perform that de-duplication:
db.collection.aggregate([
{
$group: {
_id: "$title",
year: {
$first: "$year"
},
genre: {
$first: "genre"
},
}
},
{
$group: {
"_id": {
"year": "$year",
"genre": "$genres",
},
"count": {
$sum: 1
}
}
}
])
The playground demonstration here shows the output is as expected:
[
{
"_id": {
"genre": "Sports",
"year": 2008
},
"count": 2
}
]
Alternatively you could generate an array with distinct values for the movie titles in your current grouping and then calculate its size afterwards. Again with the same assumption about movie titles from above, something like this:
db.collection.aggregate([
{
$group: {
"_id": {
"year": "$year",
"genre": "$genres",
},
"count": {
"$addToSet": "$title"
}
}
},
{
"$addFields": {
"count": {
$size: "$count"
}
}
}
])
Playground demonstration here (with the same output from the previous example).

Related

MongoDB Aggregation leap-year

I am learning MongoDB NoSQL, and I have a problema about it.
Consider these documents:
{
"_id" : ObjectId("63a994974ac549c5ea982d2b"),
"title" : "Destroyer",
"year" : 2018
},
{
"_id" : ObjectId("63a994974ac549c5ea982d2a"),
"title" : "Aquaman",
"year" : 2014
},
{
"_id" : ObjectId("63a994974ac549c5ea982d29"),
"title" : "On the Basis of Sex",
"year" : 1996
},
{
"_id" : ObjectId("63a994974ac549c5ea982d28"),
"title" : "Holmes and Watson",
"year" : 1940
},
{
"_id" : ObjectId("63a994974ac549c5ea982d27"),
"title" : "Conundrum: Secrets Among Friends",
"year" : 1957
},
{
"_id" : ObjectId("63a994974ac549c5ea982d26"),
"title" : "Welcome to Marwen",
"year" : 2000
},
{
"_id" : ObjectId("63a994974ac549c5ea982d25"),
"title" : "Mary Poppins Returns",
"year" : 1997
},
{
"_id" : ObjectId("63a994974ac549c5ea982d24"),
"title" : "Bumblebee",
"year" : 2004
},
I am trying to get all title that they have a leap year and I want to get the "count" of all title.
So, I tried this code:
var q1 = {$project: {
leap: {
"$and": [
"$eq": ["$divide"["$year", 4], 0],
{
"$or":[{"$ne": ["$divide"["$year",100],0]},
{"$eq": ["$divide"["$year", 400],0]}]
}
]
}
}}
var q2 = {$group: {"_id": null, "total": {$sum:1}}}
var etapas = [q1,q2]
db.genres.aggregate(etapas)
But, with this code, I only get in 'q1', the variable 'leap', all false conditionals. So, I do not know how to get the year when the logical operator is true or false. Even more, I want to count all the leap years of the documents I have been given before.
The expect output for the documents before is this:
{
"_id": leap year
"count": 3
}
How can I do that? How can I type when I've got a false result?
Thanks so much for your attention in this problem. Whatever ask you need, do it without any problem.
To check if a number is divisible by another, you need to see if the remainder is 0 using $mod, such as:
db.collection.aggregate([
{$project: {
leap: {
"$and": [
{"$eq": [{"$mod": ["$year",4]}, 0]},
{"$or": [
{"$ne": [{"$mod": ["$year", 100]}, 0]},
{"$eq": [{"$mod": ["$year", 400]}, 0] }
]}
]
}
}},
{$group: {
"_id": "$leap",
"total": {$sum: 1}
}}
])
Playground

MongoDB - is this query possibile with denormalized model?

I have this simple Mongodb document:
{
"_id" : ObjectId("55663d9361cfa81a5c48d54f")
"name" : "Oliver",
"surname" : "Queen",
"age" : 25,
"friends" : [
{
"name" : "Jhon",
"surname" : "Diggle",
"age" : "30"
},
{
"name" : "Barry",
"surname" : "Allen",
"age" : "24"
}
]
}
Is it possbile, using denormalized model as above, to find all Oliver's friends with 24 years old?
I think it's really simple with normalized model; it's enough to do two queries.
For example the following query:
db.collection.find({name:"Oliver", "friends.age":24}, {_id:0, friends:1})
returns an array of Oliver's friends. Is it possible to make a selection of the internal document?
Using aggregation
db.collection.aggregate(
[
{ $match: { "name": "Oliver" }},
{ $unwind: "$friends" },
{ $match: { "friends.age": 24 }},
{ $group: { "_id": "$_id", friends: { "$push": "$friends" }}},
{ $project: { "_id": 0, "friends": 1 }}
]
)

mongo query/aggregation with equality inside array

I am trying to formulate a query over the sample bios collection http://docs.mongodb.org/manual/reference/bios-example-collection/:
Retrieve all the persons who received two awards on the same year.
The expected answers are "Ole-Johan Dahl" and "Kristen Nygaard" as for instance the doc for Ole-Johan Dahl is
{
"_id" : 5,
"name" : {
"first" : "Ole-Johan",
"last" : "Dahl"
},
"birth" : ISODate("1931-10-12T04:00:00Z"),
"death" : ISODate("2002-06-29T04:00:00Z"),
"contribs" : [
"OOP",
"Simula"
],
"awards" : [
{
"award" : "Rosing Prize",
"year" : 1999,
"by" : "Norwegian Data Association"
},
{
"award" : "Turing Award",
"year" : 2001,
"by" : "ACM"
},
{
"award" : "IEEE John von Neumann Medal",
"year" : 2001,
"by" : "IEEE"
}
]
}
So far, the best query that I could come up with is the following query using aggregation framework:
db.bios.aggregate([
{$project : { "first_name": "$name.first", "last_name": "$name.last" , "award1" :"$awards", "award2" :"$awards" } },
{$unwind : "$award1"},
{$unwind : "$award2"},
{$project : { "first_name": 1, "last_name": 1, "award1" : 1, "award2" : 1,
"super" : { $and : [ {$eq : ["$award1.year", "$award2.year"]},
{$lt: ["$award1.award", "$award2.award"]}
]
}}
},
{$match : {"super": true}}
])
However I am not happy with this solution because
the query projects awards twice and unwind them in the following step. This will generate quadratic many intermediate documents;
the query computes an auxiliary field "super" which is only used for filtering afterwards.
Is there a better way to formulate this query?
Try the following aggregation pipeline:
db.bios.aggregate([
{
"$unwind": "$awards"
},
{
"$group": {
"_id": {
"year": "$awards.year",
"firstName": "$name.first",
"lastName": "$name.last"
},
"count": { "$sum": 1 },
"award_recepients": { "$push": "$name" }
}
},
{
"$match": { "count": 2 }
},
{
"$project": {
"_id": 0,
"year": "$_id.year",
"award_recepients": 1,
"count": 1
}
}
])

Group and count using aggregation framework

I'm trying to group and count the following structure:
[{
"_id" : ObjectId("5479c4793815a1f417f537a0"),
"status" : "canceled",
"date" : ISODate("2014-11-29T00:00:00.000Z"),
"offset" : 30,
"devices" : [
{
"name" : "Mouse",
"cost" : 150,
},
{
"name" : "Keyboard",
"cost" : 200,
}
],
},
{
"_id" : ObjectId("5479c4793815a1f417d557a0"),
"status" : "done",
"date" : ISODate("2014-10-20T00:00:00.000Z"),
"offset" : 30,
"devices" : [
{
"name" : "LCD",
"cost" : 150,
},
{
"name" : "Keyboard",
"cost" : 200,
}
],
}
,
{
"_id" : ObjectId("5479c4793815a1f417f117a0"),
"status" : "done",
"date" : ISODate("2014-12-29T00:00:00.000Z"),
"offset" : 30,
"devices" : [
{
"name" : "Headphones",
"cost" : 150,
},
{
"name" : "LCD",
"cost" : 200,
}
],
}]
I need group and count something like that:
"result" : [
{
"_id" : {
"status" : "canceled"
},
"count" : 1
},
{
"_id" : {
"status" : "done"
},
"count" : 2
},
totaldevicecost: 730,
],
"ok" : 1
}
My problem in calculating cost sum in subarray "devices". How to do that?
It seems like you got a start on this but you got lost on some of the other concepts. There are some basic truths when working with arrays in documents, but's let's start where you left off:
db.sample.aggregate([
{ "$group": {
"_id": "$status",
"count": { "$sum": 1 }
}}
])
So that is just going to use the $group pipeline to gather up your documents on the different values of the "status" field and then also produce another field for "count" which of course "counts" the occurrences of the grouping key by passing a value of 1 to the $sum operator for each document found. This puts you at a point much like you describe:
{ "_id" : "done", "count" : 2 }
{ "_id" : "canceled", "count" : 1 }
That's the first stage of this and easy enough to understand, but now you need to know how to get values out of an array. You might then be tempted once you understand the "dot notation" concept properly to do something like this:
db.sample.aggregate([
{ "$group": {
"_id": "$status",
"count": { "$sum": 1 },
"total": { "$sum": "$devices.cost" }
}}
])
But what you will find is that the "total" will in fact be 0 for each of those results:
{ "_id" : "done", "count" : 2, "total" : 0 }
{ "_id" : "canceled", "count" : 1, "total" : 0 }
Why? Well MongoDB aggregation operations like this do not actually traverse array elements when grouping. In order to do that, the aggregation framework has a concept called $unwind. The name is relatively self-explanatory. An embedded array in MongoDB is much like having a "one-to-many" association between linked data sources. So what $unwind does is exactly that sort of "join" result, where the resulting "documents" are based on the content of the array and duplicated information for each parent.
So in order to act on array elements you need to use $unwind first. This should logically lead you to code like this:
db.sample.aggregate([
{ "$unwind": "$devices" },
{ "$group": {
"_id": "$status",
"count": { "$sum": 1 },
"total": { "$sum": "$devices.cost" }
}}
])
And then the result:
{ "_id" : "done", "count" : 4, "total" : 700 }
{ "_id" : "canceled", "count" : 2, "total" : 350 }
But that isn't quite right is it? Remember what you just learned from $unwind and how it does a de-normalized join with the parent information? So now that is duplicated for every document since both had two array member. So while the "total" field is correct, the "count" is twice as much as it should be in each case.
A bit more care needs to be taken, so instead of doing this in a single $group stage, it is done in two:
db.sample.aggregate([
{ "$unwind": "$devices" },
{ "$group": {
"_id": "$_id",
"status": { "$first": "$status" },
"total": { "$sum": "$devices.cost" }
}},
{ "$group": {
"_id": "$status",
"count": { "$sum": 1 },
"total": { "$sum": "$total" }
}}
])
Which now gets the result with correct totals in it:
{ "_id" : "canceled", "count" : 1, "total" : 350 }
{ "_id" : "done", "count" : 2, "total" : 700 }
Now the numbers are right, but it is still not exactly what you are asking for. I would think you should stop there as the sort of result you are expecting is really not suited to just a single result from aggregation alone. You are looking for the total to be "inside" the result. It really doesn't belong there, but on small data it is okay:
db.sample.aggregate([
{ "$unwind": "$devices" },
{ "$group": {
"_id": "$_id",
"status": { "$first": "$status" },
"total": { "$sum": "$devices.cost" }
}},
{ "$group": {
"_id": "$status",
"count": { "$sum": 1 },
"total": { "$sum": "$total" }
}},
{ "$group": {
"_id": null,
"data": { "$push": { "count": "$count", "total": "$total" } },
"totalCost": { "$sum": "$total" }
}}
])
And a final result form:
{
"_id" : null,
"data" : [
{
"count" : 1,
"total" : 350
},
{
"count" : 2,
"total" : 700
}
],
"totalCost" : 1050
}
But, "Do Not Do That". MongoDB has a document limit on response of 16MB, which is a limitation of the BSON spec. On small results you can do this kind of convenience wrapping, but in the larger scheme of things you want the results in the earlier form and either a separate query or live with iterating the whole results in order to get the total from all documents.
You do appear to be using a MongoDB version less than 2.6, or copying output from a RoboMongo shell which does not support the latest version features. From MongoDB 2.6 though the results of aggregation can be a "cursor" rather than a single BSON array. So the overall response can be much larger than 16MB, but only when you are not compacting to a single document as results, shown for the last example.
This would be especially true in cases where you were "paging" the results, with 100's to 1000's of result lines but you just wanted a "total" to return in an API response when you are only returning a "page" of 25 results at a time.
Anyhow, that should give you a reasonable guide on how to get the type of results you are expecting from your common document form. Remember $unwind in order to process arrays, and generally $group multiple times in order to get totals at different grouping levels from your document and collection groupings.

How to query a mongo collection to return the full document with virtual fields containing calculated values from the sub-document?

I'm trying to query a collection for a specific document that contains a sub-document. The sub-document contains values for which I'd like to obtain
the highest and lowest scores from that sub-document and return that result as virtual fields to the original document.
I have the following dataset:
{
"_id" : "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e",
"name" : "Addison Hunt",
"tests" : [
{
"name" : "lorem",
"score" : 79
},
{
"name" : "vallum",
"score" : 100
},
{
"name" : "ipsum",
"score" : 65
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU"
}
In mongo 2.4, how can I query mongo once to return the following result:
{
"_id" : "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e",
"name" : "Addison Hunt",
"tests" : [
{
"name" : "lorem",
"score" : 79
},
{
"name" : "vallum",
"score" : 100
},
{
"name" : "ipsum",
"score" : 65
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
"worst_test": {
"name" : "ipsum",
"score" : 65
},
"best_test": {
"name" : "vallum",
"score" : 100
}
}
Where "best_test" and "worst_test" are virtual fields representing the tests with the highest and lowest scores, respectively.
I've tried with many different ways and the closest I've gotten is with this query:
db.students.aggregate([
{ $match: {
'_id': 'd0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e'
}},
{ $unwind: '$tests' },
{ $sort: {'tests.score': 1} },
{ $group: {
_id: '$_id',
student_tests: {$push: "$$ROOT"},
worst_test: {$first: '$tests'},
best_test: { $last: '$tests' }
}}
]);
Which yields this result:
{
"_id" : "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e",
"student_tests" : [
{
"name" : "Addison Hunt",
"tests" : [
{
"name" : "ipsum",
"score" : 65
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
},
{
"name" : "Addison Hunt",
"tests" : [
{
"name" : "lorem",
"score" : 79
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
},
{
"name" : "Addison Hunt",
"tests" : [
{
"name" : "vallum",
"score" : 100
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
},
],
"worst_test": {
"name" : "ipsum",
"score" : 65
},
"best_test": {
"name" : "vallum",
"score" : 100
}
}
If you are using $$ROOT then in fact you are using MongoDB 2.6 as this is an aggregation variable only introduced in that version.
But while handy for various things, all it does is represent the entire document at the present stage of the pipeline where used. To do what you want and return the original document unmodified but with additional fields, you could use it in $project stage before the $unwind to assign to the _id field, but really you don't have exactly the same document as you would still need to $project at the end in order to get the correct document shape out of those elements.
You best bet is just projecting the fields, but keeping an un-altered copy of the array before any $sort is applied:
db.students.aggregate([
{ "$match": {
"_id": "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e"
}},
{ "$project": {
"name": 1,
"tests": 1,
"created_at": 1,
"class": 1,
"user_id": 1,
"testCopy": "$tests"
}},
{ "$unwind": "$testCopy" },
{ "$sort": { "testCopy.score": 1 } },
{ "$group": {
"_id: "$_id",
"tests": { "$first": "$tests" },
"created_at": { "$first": "$created_at" },
"class": { "$first": "$class" },
"user_id": { "$first": "$user_id" },
"worst_test": { "$first": "$testCopy" },
"best_test": { "$last": "$testCopy" }
}}
]);
Or using $$ROOT as mentioned before, alternately just placing the fields under the _id individually in the $project:
db.students.aggregate([
{ "$match": {
"_id": "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e"
}},
{ "$project": {
"_id": "$$ROOT",
"tests": 1
}},
{ "$unwind": "$tests" },
{ "$sort": { "tests.score": 1 } },
{ "$group": {
"_id": "$_id",
"aworst_test": { "$first": "$tests" },
"abest_test": { "$last": "$tests" }
}},
{ "$project": {
"_id": "$_id._id",
"tests": "$_id.tests",
"created_at": "$_id.created_at",
"class": "$_id.class",
"user_id": "$_id.user_id",
"worst_test": "$aworst_test",
"best_test": "$abest_test"
}}
]);
But as you see, you are still doing the $project work somewhere in order to get the structure you want, as well as the "renamed fields" to maintain the field order you want as the $project will otherwise "optimize" and "keep" any fields that have not been renamed and "append" new fields after the existing ones.
There really is no simple way to "get all fields" in the same way as you originally found them. Operations like $project and $group are an "all or nothing" affair, where they only explicitly produce what you tell them to.