a group specification must include an _id [duplicate] - mongodb

Here is an example from MongoDB tutorial (here it collection ZIP Code db:
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
if I replace _id with something else like word Test, I will get error message:
"errmsg" : "exception: the group aggregate field 'Test' must be defined as an expression inside an object",
"code" : 15951,
"ok" : 0
Could anybody help me understand why I need _id in my command? I thought MongoDB assigns IDs automatically, if used does not provide it.

In a $group stage, _id is used to designate the group condition. You obviously need it.
If you're familiar with the SQL world, think of it as the GROUP BY clause.
Please note, in that context too, _id is really an unique identifier in the generated collection, as by definition $group cannot produce two documents having the same value for that field.

The _id field is mandatory, but you can set it to null if you do not wish to aggregate with respect to a key, or keys. Not utilising it would result in a single aggregate value over the fields. It is thus acting a 'reserved word' in this context, indicating what the resulting 'identifier'/key is for each group.
In your case, grouping by _id: "$state" would result in n aggregate results of totalPop, provided there there are n distinct values for state (akin to SELECT SUM() FROM table GROUP BY state). Whereas,
$group : {_id : null, totalPop: { $sum: "$pop" }}}
would provide a single result for totalPop (akin to SELECT SUM() FROM table).
This behaviour is well described in the group operator documentation.

We're going to understand the _id field within the $group stage & look at some best practices for constructing _ids in group aggregation stages. Let's look at this query:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
One thing which might not be clear to us is why the _id field is constructed this "document" way? We could have done it this way as well:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: "$founded_year",
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id": 1
}
}]).pretty()
We don't do it this way, because in these output documents - it's not explicit what exactly this number means. So, we actually don't know. And in some cases, that means there maybe confusion in interpreting these documents. So, another case maybe to group an _id document with multiple fields:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year",
category_code: "$category_code"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
$push simply pushes the elements to generating arrays. Often, it might be required to group on promoted fields to upper level:
db.companies.aggregate([{
$group: {
_id: {
ipo_year: "$ipo.pub_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.ipo_year": 1
}
}]).pretty()
It's also perfect to have an expression that resolves to a document as a _id key.
db.companies.aggregate([{
$match: {
"relationships.person": {
$ne: null
}
}
}, {
$project: {
relationships: 1,
_id: 0
}
}, {
$unwind: "$relationships"
}, {
$group: {
_id: "$relationships.person",
count: {
$sum: 1
}
}
}, {
$sort: {
count: -1
}
}])

Related

Count number of rows and get only the last row in MongoDB

I have a collection of posts as follows:
{
"author": "Rothfuss",
"text": "Name of the Wind",
"likes": 1007,
"date": ISODate("2013-03-20T11:30:05Z")
},
{
"author": "Rothfuss",
"text": "Doors of Stone",
"likes": 1,
"date": ISODate("2051-03-20T11:30:05Z")
}
I want to get the count of each author's posts and his/her last post.
There is a SQL answer for the same question here. I try to find its MongoDB alternative.
I ended up this query so far:
db.collection.aggregate([
{
"$group": {
"_id": "$author",
"count": {
"$sum": 1
},
"lastPost": {
"$max": {
"_id": "$date",
"post": "$text"
}
}
}
}
])
which seems to work, but its different runs generate different results. It can be tested here in Mongo playground.
I don't understand how to use $max to select another property from the document containing the maximum. I am new to MongoDB, so describing the basics is also warmly appreciated.
extra question
Is it possible to limit $sum to only add posts with likes more than 100?
its different runs generate different results.
I don't understand how to use $max to select another property from the document containing the maximum.
The $max does not work in multiple fields, and also it is not effective in that field that having text/string value.
It will select any of the properties from a group of posts, it will different every time.
So the accurate result you can add new stage $sort before $group stage, to sort by date in descending order, and in the group stage you can select a value by $first operator,
{ $sort: { date: -1 } },
{
$group: {
_id: "$author",
count: { $sum: 1 },
date: { $first: "$date" },
post: { $first: "$text" }
}
}
Is it possible to limit $sum to only add posts with likes more than 100?
There is two meaning of your requirement, I am not sure which is you are asking but let me give both the solutions,
If you only don't want to count posts in count but you want to get it as the last post's date and text if it is.
$cond check condition if likes is greater than 100 then count 1 otherwise count 0
db.collection.aggregate([
{ $sort: { date: -1 } },
{
$group: {
_id: "$author",
count: {
$sum: {
$cond: [{ $gt: ["$likes", 100] }, 1, 0]
}
},
date: { $first: "$date" },
post: { $first: "$text" }
}
}
])
Playground
If you don't want to count and also don't want the last post if it is.
You can add a $match stage at the first stage to check greater than condition, and your final query would be,
db.collection.aggregate([
{ $match: { likes: { $gt: 100 } } },
{ $sort: { date: -1 } },
{
$group: {
_id: "$author",
count: { $sum: 1 },
date: { $first: "$date" },
post: { $first: "$text" }
}
}
])
Playground
Your query looks ok to me, adding a $match stage can filter out the posts if not likes > 100. (you can also do it in $sum, with $cond but there is no need here)
Query
$max accumulator can be used for documents also
Here you can see how MongoDB compares documents
mongoplayground has a problem and loses the order of fields in the documents(behaves likes they are are hashmaps when they are not) (test it in your driver also)
Test code here
db.collection.aggregate([
{
"$match": {
"likes": {
"$gt": 100
}
}
},
{
"$group": {
"_id": "$author",
"count": {
"$sum": 1
},
"lastPost": {
"$max": {
_id: "$date",
post: "$text"
}
}
}
}
])

Returns two results with database in one object MongoDb

I tried do query which will return me 2 results, in
first will be all boards where match board.users == user._id and
second result where will be all boards where board._id is equall some with array.
db.getCollection('boards').aggregate(
[
{
$group:
{
_id: { users : ObjectId("59cd114cea98d9326ca1c421") },
name: { $push: { name: "$name", _id: "$_id" } }
}
},
{
$group:
{
_id: { _id : { $in: [ObjectId("59cd1f9a71b8ad5f48eb74f6"), ObjectId("59ecf24ca3a60c06d06e5088")] } },
favorite: { $push: { name: "$name", _id: "$_id" } },
}
}
]
)
I know this is badly written, could someone direct me and write about this comment to improve? Please.
I would start by doing a find instead of aggregate. Something like this:
db.getCollection('boards').find({$or:[CONDITION_1 , CONDITION_2]});
CONDITION_1 seems to be an $elemMatch
CONDITION_2 seems to be a $in

MongoDB using $sort on aggregation not sorting

I'm doing the course of MongoDB and I'm on the first exercise of week 5. The first exercise consists on getting the author who has more comments.
The first thing I did was check how looks the data and after that I started writing the query and that's what I got:
db.posts.aggregatae([
{ $unwind: "$comments" },
{ $group:
{
_id: "$author",
num_posts:{ $sum:1 }
}
},
{ $sort:
{ "num_posts": -1 }
}
]);
The query works and counts the num of comments correctly but when I try to sort the results it didn't work. I tried to change the $group stage to this:
{ $group:
{ _id: "$author" },
num_posts:{ $sum:1 }
}
But I get the error:
Error: command failed: {
"errmsg" : "exception": A pipeline state specification object must contain exactly
one field.", "code" : 16435, "ok" : 0
The problem with your query is you are grouping by a non-existing key, you need to group by the comments' author key to get the author (from the embedded comments subdocuments array) with the most number of comments as follows:
db.posts.aggregate([
{ "$unwind": "$comments"},
{
"$group": {
"_id": "$comments.author",
"num_posts": { "$sum": 1 }
}
},
{
"$sort": { "num_posts": -1 }
},
{ "$limit": 1 }
]);

MongoDB - objects? Why do I need _id in aggregate

Here is an example from MongoDB tutorial (here it collection ZIP Code db:
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
if I replace _id with something else like word Test, I will get error message:
"errmsg" : "exception: the group aggregate field 'Test' must be defined as an expression inside an object",
"code" : 15951,
"ok" : 0
Could anybody help me understand why I need _id in my command? I thought MongoDB assigns IDs automatically, if used does not provide it.
In a $group stage, _id is used to designate the group condition. You obviously need it.
If you're familiar with the SQL world, think of it as the GROUP BY clause.
Please note, in that context too, _id is really an unique identifier in the generated collection, as by definition $group cannot produce two documents having the same value for that field.
The _id field is mandatory, but you can set it to null if you do not wish to aggregate with respect to a key, or keys. Not utilising it would result in a single aggregate value over the fields. It is thus acting a 'reserved word' in this context, indicating what the resulting 'identifier'/key is for each group.
In your case, grouping by _id: "$state" would result in n aggregate results of totalPop, provided there there are n distinct values for state (akin to SELECT SUM() FROM table GROUP BY state). Whereas,
$group : {_id : null, totalPop: { $sum: "$pop" }}}
would provide a single result for totalPop (akin to SELECT SUM() FROM table).
This behaviour is well described in the group operator documentation.
We're going to understand the _id field within the $group stage & look at some best practices for constructing _ids in group aggregation stages. Let's look at this query:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
One thing which might not be clear to us is why the _id field is constructed this "document" way? We could have done it this way as well:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: "$founded_year",
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id": 1
}
}]).pretty()
We don't do it this way, because in these output documents - it's not explicit what exactly this number means. So, we actually don't know. And in some cases, that means there maybe confusion in interpreting these documents. So, another case maybe to group an _id document with multiple fields:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year",
category_code: "$category_code"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
$push simply pushes the elements to generating arrays. Often, it might be required to group on promoted fields to upper level:
db.companies.aggregate([{
$group: {
_id: {
ipo_year: "$ipo.pub_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.ipo_year": 1
}
}]).pretty()
It's also perfect to have an expression that resolves to a document as a _id key.
db.companies.aggregate([{
$match: {
"relationships.person": {
$ne: null
}
}
}, {
$project: {
relationships: 1,
_id: 0
}
}, {
$unwind: "$relationships"
}, {
$group: {
_id: "$relationships.person",
count: {
$sum: 1
}
}
}, {
$sort: {
count: -1
}
}])

Get original document field as part of aggregate result

I am wanting to get all of the document fields in my aggregate results but as soon as I use $group they are gone. Using $project allows me to readd whatever fields I have defined in $group but no luck on getting the other fields:
var doc = {
_id: '123',
name: 'Bob',
comments: [],
attendances: [{
answer: 'yes'
}, {
answer: 'no'
}]
}
aggregate({
$unwind: '$attendances'
}, {
$match: {
"attendances.answer": { $ne:"no" }
}
}, {
$group: {
_id: '$_id',
attendances: { $sum: 1 },
comments: { $sum: { $size: { $ifNull: [ "$comments", [] ] }}}
}
}, {
$project: {
comments: 1,
}
}
This results in:
[{
_id: 5317b771b6504bd4a32395be,
comments: 12
},{
_id: 53349213cb41af00009a94d0,
comments: 0
}]
How do I get 'name' in there? I have tried adding to $group as:
name: '$name'
as well as in $project:
name: 1
But neither will work
You can't project fields that are removed during the $group operation.
Since you are grouping by the original document _id and there will only be one name value, you can preserve the name field using $first:
db.sample.aggregate(
{ $group: {
_id: '$_id',
comments: { $sum: { $size: { $ifNull: [ "$comments", [] ] }}},
name: { $first: "$name" }
}}
)
Example output would be:
{ "_id" : "123", "comments" : 0, "name" : "Bob" }
If you are grouping by criteria where there could be multiple values to preserve, you should either $push to an array in the $group or use $addToSet if you only want unique names.
Projecting all the fields
If you are using MongoDB 2.6 and want to get all of the original document fields (not just name) without listing them individually you can use the aggregation variable $$ROOT in place of a specific field name.