MongoDB aggregation `$group`: Specify a default operator? - mongodb

I'm doing a $group within my aggregation pipeline, where I $push one property to an array, and for all the remaining properties I simply take the $first:
{ $group: {
'_id': '$_id',
property1: { $push: '$property1' },
property2: { $first: '$property2' },
property3: { $first: '$property3' },
property4: { $first: '$property4' },
property5: { $first: '$property5' },
property6: { $first: '$property6' },
property7: { $first: '$property7' },
// …
}},
Is there a possibility to specify this in a more concise way? I am hoping for something like the following (which is not working), to say “use $push for property1, and $first for anything else”:
{ $group: {
'_id': '$_id',
property1: { $push: '$property1' },
'*': { $first: '$*' }
}},

No, There is no other way. You have to specify each field with the $first accumulator in the $group stage.
But you can avoid specifying $first to every field. Something like this
{ "$group": {
"_id": "$_id",
"property1": { "$push": "$property1" },
"data": {
"$first": {
"property2": "$property2",
"property3": "$property3",
"property4": "$property4",
"property5": "$property5",
"property6": "$property6",
"property7": "$property7"
}
}
}}

As stated above by Anthony Winzlet, there’s no way to achieve that through the $group operator. However, I used the following workaround, which saved me from having to list all those properties explicitly. It's probably not worth the hassle for a small amount of properties and if you do not need to take care of flexibility that additional ones might be added later, but in my case it made sense.
Here’s the idea of the aggregation pipeline:
Use $addFields to add the entire root document as a temporary copy.
Specify the group stage, where you add the $push for the desired property, and for the entire copied sub-document from before, use the $first operator.
Use $replaceRoot and $mergeObjects to take the copied root document, and replace the property with the $push aggregation.
Here’s an example:
db.getCollection('test').aggregate([
{ $addFields: { tempRoot: '$$ROOT' } },
{ $group: { '_id': '$property1', 'property2': { $push: '$property2' }, 'tempRoot': { $first: '$tempRoot' } } },
{ $replaceRoot: { newRoot: { $mergeObjects: [ '$tempRoot', { property2: '$property2' } ] } } }
]);

Related

Using $map in aggregate $group

I need to analyze some mongo db collections. What I need to extract the names and values of a collection.
Heres's how far I got:
db.collection(coll.name)
.aggregate([
{ $project: { arrayofkeyvalue: { $objectToArray: '$$ROOT' } } },
{ $unwind: '$arrayofkeyvalue' },
{
$group: {
_id: null,
allkeys: { $addToSet: '$arrayofkeyvalue.k' },
},
},
])
.toArray();
This works quite nicely. I get all the keys. However I'd like to get the values too.
So, I thought "piece o' cake" and replaced the allkeys section with the allkeysandvalues section, which is supposed to create a map with key and value pairs.
Like this:
db.collection(coll.name)
.aggregate([
{ $project: { arrayofkeyvalue: { $objectToArray: '$$ROOT' } } },
{ $unwind: '$arrayofkeyvalue' },
{
$group: {
_id: null,
allkeysandvalues: {
$map: {
input: '$arrayofkeyvalue',
as: 'kv',
in: {
k: '$$kv.k',
v: '$$kv.v',
},
},
},
},
},
])
.toArray();
But that's not working. I get the error message
MongoError: unknown group operator '$map'
Does anyone know hot to solve this?
The $group pipeline stage requires accumulator expression so you have to use $push instead of $map:
{
$group: {
_id: null,
allkeysandvalues: {
$push: "$arrayofkeyvalue"
}
}
}
or
{
$group: {
_id: null,
allkeysandvalues: {
$push: {
k: "$arrayofkeyvalue.k",
v: "$arrayofkeyvalue.v"
}
}
}
}
which returns the same result.
Please note that arrayofkeyvalue is an object since you run $unwind prior to $group
Mongo Playground
MongoError: unknown group operator '$map'
You can not use $map operator in $group stage directly in root level,
you can try adding one more group stage,
$group by k (key) and get the first v (value)
$group by null and construct the array of key-value pair
$arrayToObject convert key-value pair array to object
db.collection(coll.name).aggregate([
{ $project: { arrayofkeyvalue: { $objectToArray: "$$ROOT" } } },
{ $unwind: "$arrayofkeyvalue" },
{
$group: {
_id: "$arrayofkeyvalue.k",
value: { $first: "$arrayofkeyvalue.v" }
}
},
{
$group: {
_id: null,
allkeysandvalues: { $push: { k: "$_id", v: "$value" } }
}
},
{ $project: { allkeysandvalues: { $arrayToObject: "$allkeysandvalues" } } }
])
Playground

Sort MongoDB Document by field/key name

I'm reviewing my MongoDB documents using Robo 3T, and I'd like to sort the keys in the document by their name.
My document might look like
{"Y":3,"X":"Example","A":{"complex_obj":{}}
and at the end I'd like the returned document to look like when I run a find query and apply a sort to it. {"A":{"complex_obj":{},"X":"Example","Y":3}
Is there a way to sort the returned keys / fields of a document? All the examples I see are for applying sort based on the value of a field, rather than the name of the key.
Not sure why the order of field does matter in a JSON document but you can try below aggregation query :
db.collection.aggregate([
{
$project: { data: { $objectToArray: "$$ROOT" } }
},
{
$unwind: "$data"
},
{
$sort: { "data.k": 1 }
},
{
$group: { _id: "_id", data: { $push: "$$ROOT.data" } }
},
{
$replaceRoot: { newRoot: { $arrayToObject: "$data" } }
},
{
$project: { _id: 0 }
}
])
Test : mongoplayground
There is a way but you won't like it. Technically you can do it with aggregation by converting objects to arrays, unwinding, sorting, grouping it back and converting the group to the object:
db.collection.aggregate([
{
$project: {
o: {
$objectToArray: "$$ROOT"
}
}
},
{
$unwind: "$o"
},
{
$sort: {
"o.k": 1
}
},
{
$group: {
_id: "$_id",
o: {
$push: "$o"
}
}
},
{
$replaceRoot: {
newRoot: {
$arrayToObject: "$o"
}
}
}
])
but you don't want to do it. Too much hassle, too expensive, too little benefits.
Mongo by design preserve order of keys as they were inserted. Well, apart from _id, and few other edge cases.

a group specification must include an _id [duplicate]

Here is an example from MongoDB tutorial (here it collection ZIP Code db:
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
if I replace _id with something else like word Test, I will get error message:
"errmsg" : "exception: the group aggregate field 'Test' must be defined as an expression inside an object",
"code" : 15951,
"ok" : 0
Could anybody help me understand why I need _id in my command? I thought MongoDB assigns IDs automatically, if used does not provide it.
In a $group stage, _id is used to designate the group condition. You obviously need it.
If you're familiar with the SQL world, think of it as the GROUP BY clause.
Please note, in that context too, _id is really an unique identifier in the generated collection, as by definition $group cannot produce two documents having the same value for that field.
The _id field is mandatory, but you can set it to null if you do not wish to aggregate with respect to a key, or keys. Not utilising it would result in a single aggregate value over the fields. It is thus acting a 'reserved word' in this context, indicating what the resulting 'identifier'/key is for each group.
In your case, grouping by _id: "$state" would result in n aggregate results of totalPop, provided there there are n distinct values for state (akin to SELECT SUM() FROM table GROUP BY state). Whereas,
$group : {_id : null, totalPop: { $sum: "$pop" }}}
would provide a single result for totalPop (akin to SELECT SUM() FROM table).
This behaviour is well described in the group operator documentation.
We're going to understand the _id field within the $group stage & look at some best practices for constructing _ids in group aggregation stages. Let's look at this query:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
One thing which might not be clear to us is why the _id field is constructed this "document" way? We could have done it this way as well:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: "$founded_year",
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id": 1
}
}]).pretty()
We don't do it this way, because in these output documents - it's not explicit what exactly this number means. So, we actually don't know. And in some cases, that means there maybe confusion in interpreting these documents. So, another case maybe to group an _id document with multiple fields:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year",
category_code: "$category_code"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
$push simply pushes the elements to generating arrays. Often, it might be required to group on promoted fields to upper level:
db.companies.aggregate([{
$group: {
_id: {
ipo_year: "$ipo.pub_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.ipo_year": 1
}
}]).pretty()
It's also perfect to have an expression that resolves to a document as a _id key.
db.companies.aggregate([{
$match: {
"relationships.person": {
$ne: null
}
}
}, {
$project: {
relationships: 1,
_id: 0
}
}, {
$unwind: "$relationships"
}, {
$group: {
_id: "$relationships.person",
count: {
$sum: 1
}
}
}, {
$sort: {
count: -1
}
}])

Return original documents only from mongoose group/aggregation operation

I have a filter + group operation on a bunch of documents (books). The grouping is to return only latest versions of books that share the same book_id (name). The below code works, but it's untidy since it returns redundant information:
return Book.aggregate([
{ $match: generateMLabQuery(rawQuery) },
{
$sort: {
"published_date": -1
}
},
{
$group: {
_id: "$book_id",
books: {
$first: "$$ROOT"
}
}
}
])
I end up with an array of objects that looks like this:
[{ _id: "aedrtgt6854earg864", books: { singleBookObject } }, {...}, {...}]
Essentially I only need the singleBookObject part, which is the original document (and what I'd be getting if I had done only the $match operation). Is there a way to get rid of the redundant _id and books parts within the aggregation pipeline?
You can use $replaceRoot
Book.aggregate([
{ "$match": generateMLabQuery(rawQuery) },
{ "$sort": { "published_date": -1 }},
{ "$group": {
"_id": "$book_id",
"books": { "$first": "$$ROOT" }
}},
{ "$replaceRoot": { "newRoot": "$books" } }
])

MongoDB - objects? Why do I need _id in aggregate

Here is an example from MongoDB tutorial (here it collection ZIP Code db:
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
if I replace _id with something else like word Test, I will get error message:
"errmsg" : "exception: the group aggregate field 'Test' must be defined as an expression inside an object",
"code" : 15951,
"ok" : 0
Could anybody help me understand why I need _id in my command? I thought MongoDB assigns IDs automatically, if used does not provide it.
In a $group stage, _id is used to designate the group condition. You obviously need it.
If you're familiar with the SQL world, think of it as the GROUP BY clause.
Please note, in that context too, _id is really an unique identifier in the generated collection, as by definition $group cannot produce two documents having the same value for that field.
The _id field is mandatory, but you can set it to null if you do not wish to aggregate with respect to a key, or keys. Not utilising it would result in a single aggregate value over the fields. It is thus acting a 'reserved word' in this context, indicating what the resulting 'identifier'/key is for each group.
In your case, grouping by _id: "$state" would result in n aggregate results of totalPop, provided there there are n distinct values for state (akin to SELECT SUM() FROM table GROUP BY state). Whereas,
$group : {_id : null, totalPop: { $sum: "$pop" }}}
would provide a single result for totalPop (akin to SELECT SUM() FROM table).
This behaviour is well described in the group operator documentation.
We're going to understand the _id field within the $group stage & look at some best practices for constructing _ids in group aggregation stages. Let's look at this query:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
One thing which might not be clear to us is why the _id field is constructed this "document" way? We could have done it this way as well:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: "$founded_year",
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id": 1
}
}]).pretty()
We don't do it this way, because in these output documents - it's not explicit what exactly this number means. So, we actually don't know. And in some cases, that means there maybe confusion in interpreting these documents. So, another case maybe to group an _id document with multiple fields:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year",
category_code: "$category_code"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
$push simply pushes the elements to generating arrays. Often, it might be required to group on promoted fields to upper level:
db.companies.aggregate([{
$group: {
_id: {
ipo_year: "$ipo.pub_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.ipo_year": 1
}
}]).pretty()
It's also perfect to have an expression that resolves to a document as a _id key.
db.companies.aggregate([{
$match: {
"relationships.person": {
$ne: null
}
}
}, {
$project: {
relationships: 1,
_id: 0
}
}, {
$unwind: "$relationships"
}, {
$group: {
_id: "$relationships.person",
count: {
$sum: 1
}
}
}, {
$sort: {
count: -1
}
}])