MongoDB using $sort on aggregation not sorting - mongodb

I'm doing the course of MongoDB and I'm on the first exercise of week 5. The first exercise consists on getting the author who has more comments.
The first thing I did was check how looks the data and after that I started writing the query and that's what I got:
db.posts.aggregatae([
{ $unwind: "$comments" },
{ $group:
{
_id: "$author",
num_posts:{ $sum:1 }
}
},
{ $sort:
{ "num_posts": -1 }
}
]);
The query works and counts the num of comments correctly but when I try to sort the results it didn't work. I tried to change the $group stage to this:
{ $group:
{ _id: "$author" },
num_posts:{ $sum:1 }
}
But I get the error:
Error: command failed: {
"errmsg" : "exception": A pipeline state specification object must contain exactly
one field.", "code" : 16435, "ok" : 0

The problem with your query is you are grouping by a non-existing key, you need to group by the comments' author key to get the author (from the embedded comments subdocuments array) with the most number of comments as follows:
db.posts.aggregate([
{ "$unwind": "$comments"},
{
"$group": {
"_id": "$comments.author",
"num_posts": { "$sum": 1 }
}
},
{
"$sort": { "num_posts": -1 }
},
{ "$limit": 1 }
]);

Related

Count number of rows and get only the last row in MongoDB

I have a collection of posts as follows:
{
"author": "Rothfuss",
"text": "Name of the Wind",
"likes": 1007,
"date": ISODate("2013-03-20T11:30:05Z")
},
{
"author": "Rothfuss",
"text": "Doors of Stone",
"likes": 1,
"date": ISODate("2051-03-20T11:30:05Z")
}
I want to get the count of each author's posts and his/her last post.
There is a SQL answer for the same question here. I try to find its MongoDB alternative.
I ended up this query so far:
db.collection.aggregate([
{
"$group": {
"_id": "$author",
"count": {
"$sum": 1
},
"lastPost": {
"$max": {
"_id": "$date",
"post": "$text"
}
}
}
}
])
which seems to work, but its different runs generate different results. It can be tested here in Mongo playground.
I don't understand how to use $max to select another property from the document containing the maximum. I am new to MongoDB, so describing the basics is also warmly appreciated.
extra question
Is it possible to limit $sum to only add posts with likes more than 100?
its different runs generate different results.
I don't understand how to use $max to select another property from the document containing the maximum.
The $max does not work in multiple fields, and also it is not effective in that field that having text/string value.
It will select any of the properties from a group of posts, it will different every time.
So the accurate result you can add new stage $sort before $group stage, to sort by date in descending order, and in the group stage you can select a value by $first operator,
{ $sort: { date: -1 } },
{
$group: {
_id: "$author",
count: { $sum: 1 },
date: { $first: "$date" },
post: { $first: "$text" }
}
}
Is it possible to limit $sum to only add posts with likes more than 100?
There is two meaning of your requirement, I am not sure which is you are asking but let me give both the solutions,
If you only don't want to count posts in count but you want to get it as the last post's date and text if it is.
$cond check condition if likes is greater than 100 then count 1 otherwise count 0
db.collection.aggregate([
{ $sort: { date: -1 } },
{
$group: {
_id: "$author",
count: {
$sum: {
$cond: [{ $gt: ["$likes", 100] }, 1, 0]
}
},
date: { $first: "$date" },
post: { $first: "$text" }
}
}
])
Playground
If you don't want to count and also don't want the last post if it is.
You can add a $match stage at the first stage to check greater than condition, and your final query would be,
db.collection.aggregate([
{ $match: { likes: { $gt: 100 } } },
{ $sort: { date: -1 } },
{
$group: {
_id: "$author",
count: { $sum: 1 },
date: { $first: "$date" },
post: { $first: "$text" }
}
}
])
Playground
Your query looks ok to me, adding a $match stage can filter out the posts if not likes > 100. (you can also do it in $sum, with $cond but there is no need here)
Query
$max accumulator can be used for documents also
Here you can see how MongoDB compares documents
mongoplayground has a problem and loses the order of fields in the documents(behaves likes they are are hashmaps when they are not) (test it in your driver also)
Test code here
db.collection.aggregate([
{
"$match": {
"likes": {
"$gt": 100
}
}
},
{
"$group": {
"_id": "$author",
"count": {
"$sum": 1
},
"lastPost": {
"$max": {
_id: "$date",
post: "$text"
}
}
}
}
])

a group specification must include an _id [duplicate]

Here is an example from MongoDB tutorial (here it collection ZIP Code db:
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
if I replace _id with something else like word Test, I will get error message:
"errmsg" : "exception: the group aggregate field 'Test' must be defined as an expression inside an object",
"code" : 15951,
"ok" : 0
Could anybody help me understand why I need _id in my command? I thought MongoDB assigns IDs automatically, if used does not provide it.
In a $group stage, _id is used to designate the group condition. You obviously need it.
If you're familiar with the SQL world, think of it as the GROUP BY clause.
Please note, in that context too, _id is really an unique identifier in the generated collection, as by definition $group cannot produce two documents having the same value for that field.
The _id field is mandatory, but you can set it to null if you do not wish to aggregate with respect to a key, or keys. Not utilising it would result in a single aggregate value over the fields. It is thus acting a 'reserved word' in this context, indicating what the resulting 'identifier'/key is for each group.
In your case, grouping by _id: "$state" would result in n aggregate results of totalPop, provided there there are n distinct values for state (akin to SELECT SUM() FROM table GROUP BY state). Whereas,
$group : {_id : null, totalPop: { $sum: "$pop" }}}
would provide a single result for totalPop (akin to SELECT SUM() FROM table).
This behaviour is well described in the group operator documentation.
We're going to understand the _id field within the $group stage & look at some best practices for constructing _ids in group aggregation stages. Let's look at this query:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
One thing which might not be clear to us is why the _id field is constructed this "document" way? We could have done it this way as well:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: "$founded_year",
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id": 1
}
}]).pretty()
We don't do it this way, because in these output documents - it's not explicit what exactly this number means. So, we actually don't know. And in some cases, that means there maybe confusion in interpreting these documents. So, another case maybe to group an _id document with multiple fields:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year",
category_code: "$category_code"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
$push simply pushes the elements to generating arrays. Often, it might be required to group on promoted fields to upper level:
db.companies.aggregate([{
$group: {
_id: {
ipo_year: "$ipo.pub_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.ipo_year": 1
}
}]).pretty()
It's also perfect to have an expression that resolves to a document as a _id key.
db.companies.aggregate([{
$match: {
"relationships.person": {
$ne: null
}
}
}, {
$project: {
relationships: 1,
_id: 0
}
}, {
$unwind: "$relationships"
}, {
$group: {
_id: "$relationships.person",
count: {
$sum: 1
}
}
}, {
$sort: {
count: -1
}
}])

Mongo query group by and output only subtotal great than 1

I am trying to group by user and email and only output the subtotal > 1. I tried this but it is failed to compile.
db.member.aggregate(
{"$group" : {
_id : {user:"$user", email: "$email"},
count : { $sum : { if: { $gte: [ "$sum", 1 ] }, then: 1, else: 0 }
} } } )
You don't have to try and fit everything into a single $group stage. It's an aggregation "pipeline" and should be used as such. Just at a $match at the end:
db.member.aggregate([
{ "$group": {
"_id": { "user": "$user", "email": "$email" },
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gte": 1 } } }
])
It's basically required anyway, since "first" you accumulate, and then you filter. Much like GROUP BY and HAVING in SQL.
Also see SQL To Aggregation Mapping Chart in the core documentation for common examples.

MongoDB - objects? Why do I need _id in aggregate

Here is an example from MongoDB tutorial (here it collection ZIP Code db:
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
if I replace _id with something else like word Test, I will get error message:
"errmsg" : "exception: the group aggregate field 'Test' must be defined as an expression inside an object",
"code" : 15951,
"ok" : 0
Could anybody help me understand why I need _id in my command? I thought MongoDB assigns IDs automatically, if used does not provide it.
In a $group stage, _id is used to designate the group condition. You obviously need it.
If you're familiar with the SQL world, think of it as the GROUP BY clause.
Please note, in that context too, _id is really an unique identifier in the generated collection, as by definition $group cannot produce two documents having the same value for that field.
The _id field is mandatory, but you can set it to null if you do not wish to aggregate with respect to a key, or keys. Not utilising it would result in a single aggregate value over the fields. It is thus acting a 'reserved word' in this context, indicating what the resulting 'identifier'/key is for each group.
In your case, grouping by _id: "$state" would result in n aggregate results of totalPop, provided there there are n distinct values for state (akin to SELECT SUM() FROM table GROUP BY state). Whereas,
$group : {_id : null, totalPop: { $sum: "$pop" }}}
would provide a single result for totalPop (akin to SELECT SUM() FROM table).
This behaviour is well described in the group operator documentation.
We're going to understand the _id field within the $group stage & look at some best practices for constructing _ids in group aggregation stages. Let's look at this query:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
One thing which might not be clear to us is why the _id field is constructed this "document" way? We could have done it this way as well:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: "$founded_year",
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id": 1
}
}]).pretty()
We don't do it this way, because in these output documents - it's not explicit what exactly this number means. So, we actually don't know. And in some cases, that means there maybe confusion in interpreting these documents. So, another case maybe to group an _id document with multiple fields:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year",
category_code: "$category_code"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
$push simply pushes the elements to generating arrays. Often, it might be required to group on promoted fields to upper level:
db.companies.aggregate([{
$group: {
_id: {
ipo_year: "$ipo.pub_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.ipo_year": 1
}
}]).pretty()
It's also perfect to have an expression that resolves to a document as a _id key.
db.companies.aggregate([{
$match: {
"relationships.person": {
$ne: null
}
}
}, {
$project: {
relationships: 1,
_id: 0
}
}, {
$unwind: "$relationships"
}, {
$group: {
_id: "$relationships.person",
count: {
$sum: 1
}
}
}, {
$sort: {
count: -1
}
}])

Convert to lowercase in group aggregation

I want to return an aggregate of blog post tags and their total count. My blog posts are stored like so:
{
"_id" : ObjectId("532c323bb07ab5aace243c8e"),
"title" : "Fitframe.js - Responsive iframes made easy",
"tags" : [
"JavaScript",
"jQuery",
"RWD"
]
}
I'm then executing the following pipeline:
printjson(db.posts.aggregate(
{
$project: {
tags: 1,
count: { $add: 1 }
}
},
{
$unwind: '$tags'
},
{
$group: {
_id: '$tags',
count: {
$sum: '$count'
},
tags_lower: { $toLower: '$tags' }
}
},
{
$sort: {
_id: 1
}
}
));
So that the results are sorted correctly I need to sort on a lowercase version of each tag. However, when executing the above code I get the following error:
aggregate failed: {
"errmsg" : "exception: unknown group operator '$toLower'",
"code" : 15952,
"ok" : 0
}
Do I need to do another projection to add the lowercase tag?
Yes, you must add it to the projection. It will not work in the group, only specific operators like $sum ( http://docs.mongodb.org/manual/reference/operator/aggregation-group/ ) are counted as $group operators and capable of being used on that level of the group
You don't need to add another projection ... you could fix it when you do the $group:
db.posts.aggregate(
{
$project: {
tags: 1,
count: { $add: 1 }
}
},
{
$unwind: '$tags'
},
{
$group: {
_id: { tag: '$tags', lower: { $toLower : '$tags' } },
count: {
$sum: '$count'
}
}
},
{
$sort: {
"_id.lower": 1
}
}
)
In the above example, I've preserved the original name and added the lower case version to the _id.
Add another projection step between $unwind and $grop:
...
{$project: {
tags: {$toLower: '$tags'},
count: 1
}}
...
And remove tags_lower from $group