Get original document field as part of aggregate result - mongodb

I am wanting to get all of the document fields in my aggregate results but as soon as I use $group they are gone. Using $project allows me to readd whatever fields I have defined in $group but no luck on getting the other fields:
var doc = {
_id: '123',
name: 'Bob',
comments: [],
attendances: [{
answer: 'yes'
}, {
answer: 'no'
}]
}
aggregate({
$unwind: '$attendances'
}, {
$match: {
"attendances.answer": { $ne:"no" }
}
}, {
$group: {
_id: '$_id',
attendances: { $sum: 1 },
comments: { $sum: { $size: { $ifNull: [ "$comments", [] ] }}}
}
}, {
$project: {
comments: 1,
}
}
This results in:
[{
_id: 5317b771b6504bd4a32395be,
comments: 12
},{
_id: 53349213cb41af00009a94d0,
comments: 0
}]
How do I get 'name' in there? I have tried adding to $group as:
name: '$name'
as well as in $project:
name: 1
But neither will work

You can't project fields that are removed during the $group operation.
Since you are grouping by the original document _id and there will only be one name value, you can preserve the name field using $first:
db.sample.aggregate(
{ $group: {
_id: '$_id',
comments: { $sum: { $size: { $ifNull: [ "$comments", [] ] }}},
name: { $first: "$name" }
}}
)
Example output would be:
{ "_id" : "123", "comments" : 0, "name" : "Bob" }
If you are grouping by criteria where there could be multiple values to preserve, you should either $push to an array in the $group or use $addToSet if you only want unique names.
Projecting all the fields
If you are using MongoDB 2.6 and want to get all of the original document fields (not just name) without listing them individually you can use the aggregation variable $$ROOT in place of a specific field name.

Related

concat array fields of all matching documents mongodb

I have below document structure in mongodb:
{name: String, location: [String]}
Example documents are:
{name: "XYZ", location: ["A","B","C","D"]},
{name: "XYZ", location: ["M","N"]},
{name: "ABC", location: ["P","Q","R","S"]}
I want to write a query that when searches for a specific name, concats all location arrays of resulting documents. For example, If I search for name XYZ, I should get:
{name:"XYZ",location:["A","B","C","D","M","N"]}
I guess this is possible using aggregation that might use $unwind operator, but I am unable to frame the query.
Please help me to frame the query.
Thanks!
$match - Filter document(s).
$group - Group by name. Add the location array into the location field via $push. It results in the location with the value of the nested array.
$project - Decorate the output document. With the $reduce operator transforms the original location array which is a nested array to be flattened by combining arrays into one via $concatArrays.
db.collection.aggregate([
{
$match: {
name: "XYZ"
}
},
{
$group: {
_id: "$name",
location: {
$push: "$location"
}
}
},
{
$project: {
_id: 0,
name: "$_id",
location: {
$reduce: {
input: "$location",
initialValue: [],
in: {
$concatArrays: [
"$$value",
"$$this"
]
}
}
}
}
}
])
Demo # Mongo Playground
This should do the trick:
Match the required docs. Unwind the location array. Group by name, and project the necessary output.
db.collection.aggregate([
{
"$match": {
name: "XYZ"
}
},
{
"$unwind": "$location"
},
{
"$group": {
"_id": "$name",
"location": {
"$push": "$location"
}
}
},
{
"$project": {
name: "$_id",
location: 1,
_id: 0
}
}
])
Playground link.

How do you convert an array of ObjectIds into an array of embedded documents with a field containing the original array element value

I have a collection of documents where one of the fields is currently an array of ObjectId items.
{
_id: ObjectId(...),
user: "jdoe",
docs: [
ObjectId(1),
ObjectId(2),
...
]
}
{
_id: ObjectId(...),
user: "jsmith",
docs: [
ObjectId(3),
ObjectId(4),
...
]
}
How can I update all of the documents in my collection to convert the docs field into an array of objects that contain a "docID" field equal to the original element value?
For example, I'd want my documents to end up looking like:
{
_id: ObjectId(...),
user: "jdoe",
docs: [
{ docID: ObjectId(1) },
{ docID: ObjectId(2) },
...
]
}
{
_id: ObjectId(...),
user: "jsmith",
docs: [
{ docID: ObjectId(3)},
{ docID: ObjectId(4)},
...
]
}
I'm hoping there is a command that I can run from the shell such as:
db.getCollection('myCollection').update(
{},
{
$set: {
'docs.$[]: { docID: '$$VALUE'}
}
},
{multi: true }
);
But I can't figure out how to reference the original value of the element.
Update:
I'm marking #mickl with the correct answer since it got me on the correct track. Below is the final aggregate that I ended up with which only changes the docs field if it is an array of object IDs, otherwise the existing value is left as-is, including documents that don't have a docs field.
db.getCollection('myCollection').aggregate([
{ $addFields: {
'docs': { $cond: {
if : { $eq: [{ $type: { $arrayElemAt: [ '$docs', 0]} }, "objectId"]},
then: { $map: {
input: '$docs',
in: { tocID: '$$this'}
}},
else : '$docs'
}}
}},
{ $out: "myCollection" }
])
You can use $map to reshape your data and $out to replace existing collection with aggregation result:
db.col.aggregate([
{
$addFields: {
docs: {
$map: {
input: "$docs",
in: { docID: "$$this" }
}
}
}
},
{ $out: "col" }
])

a group specification must include an _id [duplicate]

Here is an example from MongoDB tutorial (here it collection ZIP Code db:
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
if I replace _id with something else like word Test, I will get error message:
"errmsg" : "exception: the group aggregate field 'Test' must be defined as an expression inside an object",
"code" : 15951,
"ok" : 0
Could anybody help me understand why I need _id in my command? I thought MongoDB assigns IDs automatically, if used does not provide it.
In a $group stage, _id is used to designate the group condition. You obviously need it.
If you're familiar with the SQL world, think of it as the GROUP BY clause.
Please note, in that context too, _id is really an unique identifier in the generated collection, as by definition $group cannot produce two documents having the same value for that field.
The _id field is mandatory, but you can set it to null if you do not wish to aggregate with respect to a key, or keys. Not utilising it would result in a single aggregate value over the fields. It is thus acting a 'reserved word' in this context, indicating what the resulting 'identifier'/key is for each group.
In your case, grouping by _id: "$state" would result in n aggregate results of totalPop, provided there there are n distinct values for state (akin to SELECT SUM() FROM table GROUP BY state). Whereas,
$group : {_id : null, totalPop: { $sum: "$pop" }}}
would provide a single result for totalPop (akin to SELECT SUM() FROM table).
This behaviour is well described in the group operator documentation.
We're going to understand the _id field within the $group stage & look at some best practices for constructing _ids in group aggregation stages. Let's look at this query:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
One thing which might not be clear to us is why the _id field is constructed this "document" way? We could have done it this way as well:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: "$founded_year",
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id": 1
}
}]).pretty()
We don't do it this way, because in these output documents - it's not explicit what exactly this number means. So, we actually don't know. And in some cases, that means there maybe confusion in interpreting these documents. So, another case maybe to group an _id document with multiple fields:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year",
category_code: "$category_code"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
$push simply pushes the elements to generating arrays. Often, it might be required to group on promoted fields to upper level:
db.companies.aggregate([{
$group: {
_id: {
ipo_year: "$ipo.pub_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.ipo_year": 1
}
}]).pretty()
It's also perfect to have an expression that resolves to a document as a _id key.
db.companies.aggregate([{
$match: {
"relationships.person": {
$ne: null
}
}
}, {
$project: {
relationships: 1,
_id: 0
}
}, {
$unwind: "$relationships"
}, {
$group: {
_id: "$relationships.person",
count: {
$sum: 1
}
}
}, {
$sort: {
count: -1
}
}])

Retrieving a count that matches specified criteria in a $group aggregation

So I am looking to group documents in my collection on a specific field, and for the output results of each group, I am looking to include the following:
A count of all documents in the group that match a specific query (i.e. a count of documents that satisfy some expression { "$Property": "Value" })
The total number of documents in the group
(Bonus, as I suspect that this is not easily accomplished) Properties of a document that correspond to a $min/$max accumulator
I am very new to the syntax used to query in mongo and don't quite understand how it all works, but after some research, I've managed to get it down to the following query (please note, I am currently using version 3.0.12 for my mongo db, but I believe we will upgrade in a couple of months time):
db.getCollection('myCollection').aggregate(
[
{
$group: {
_id: {
GroupID: "$GroupID",
Status: "$Status"
},
total: { $sum: 1 },
GroupName: { $first: "$GroupName" },
EarliestCreatedDate: { $min: "$DateCreated" },
LastModifiedDate: { $max: "$LastModifiedDate" }
}
},
{
$group: {
_id: "$_id.GroupID",
Statuses: {
$push: {
Status: "$_id.Status",
Count: "$total"
}
},
TotalCount: { $sum: "$total" },
GroupName: { $first: "$GroupName" },
EarliestCreatedDate: { $min: "$EarliestCreatedDate" },
LastModifiedDate: { $max: "$LastModifiedDate" }
}
}
]
)
Essentially what I am looking to retrieve is the Count for specific Status values, and project them into one final result document that looks like the following:
{
GroupName,
EarliestCreatedDate,
EarliestCreatedBy,
LastModifiedDate,
LastModifiedBy,
TotalCount,
PendingCount,
ClosedCount
}
Where PendingCount and ClosedCount are the total number of documents in each group that have a status Pending/Closed. I suspect I need to use $project with some other expression to extract this value, but I don't really understand the aggregation pipeline well enough to figure this out.
Also the EarliestCreatedBy and LastModifiedBy are the users who created/modified the document(s) corresponding to the EarliestCreatedDate and LastModifiedDate respectively. As I mentioned, I think retrieving these values will add another layer of complexity, so if there is no practical solution, I am willing to forgo this requirement.
Any suggestions/tips would be very much appreciated.
You can try below aggregation stages.
$group
Calculate all the necessary counts TotalCount, PendingCount and ClosedCount for each GroupID
Calculate $min and $max for EarliestCreatedDate and LastModifiedDate respectively and push all the fields to CreatedByLastModifiedBy to be compared later for fetching EarliestCreatedBy and LastModifiedBy for each GroupID
$project
Project all the fields for response
$filter the EarliestCreatedDate value against the data in the CreatedByLastModifiedBy and $map the matching CreatedBy to the EarliestCreatedBy and $arrayElemAt to convert the array to object.
Similar steps for calculating LastModifiedBy
db.getCollection('myCollection').aggregate(
[{
$group: {
_id: "$GroupID",
TotalCount: {
$sum: 1
},
PendingCount: {
$sum: {
$cond: {
if: {
$eq: ["Status", "Pending"]
},
then: 1,
else: 0
}
}
},
ClosedCount: {
$sum: {
$cond: {
if: {
$eq: ["Status", "Closed "]
},
then: 1,
else: 0
}
}
},
GroupName: {
$first: "$GroupName"
},
EarliestCreatedDate: {
$min: "$DateCreated"
},
LastModifiedDate: {
$max: "$LastModifiedDate"
},
CreatedByLastModifiedBy: {
$push: {
CreatedBy: "$CreatedBy",
LastModifiedBy: "$LastModifiedBy",
DateCreated: "$DateCreated",
LastModifiedDate: "$LastModifiedDate"
}
}
}
}, {
$project: {
_id: 0,
GroupName: 1,
EarliestCreatedDate: 1,
EarliestCreatedBy: {
$arrayElemAt: [{
$map: {
input: {
$filter: {
input: "$CreatedByLastModifiedBy",
as: "CrBy",
cond: {
"$eq": ["$EarliestCreatedDate", "$$CrBy.DateCreated"]
}
}
},
as: "EaCrBy",
in: {
"$$EaCrBy.CreatedBy"
}
}
}, 0]
},
LastModifiedDate: 1,
LastModifiedBy: {
$arrayElemAt: [{
$map: {
input: {
$filter: {
input: "$CreatedByLastModifiedBy",
as: "MoBy",
cond: {
"$eq": ["$LastModifiedDate", "$$MoBy.LastModifiedDate"]
}
}
},
as: "LaMoBy",
in: {
"$$LaMoBy.LastModifiedBy"
}
}
}, 0]
},
TotalCount: 1,
PendingCount: 1,
ClosedCount: 1
}
}]
)
Update for Version < 3.2
$filter is also not available in your version. Below is the equivalent.
The comparison logic is the same and creates an array with for every non matching entry the value of false or LastModifiedBy otherwise.
Next step is to use $setDifference to compare the previous array values with array [false] which returns the elements that only exist in the first set.
LastModifiedBy: {
$setDifference: [{
$map: {
input: "$CreatedByLastModifiedBy",
as: "MoBy",
in: {
$cond: [{
$eq: ["$LastModifiedDate", "$$MoBy.LastModifiedDate"]
},
"$$MoBy.LastModifiedBy",
false
]
}
}
},
[false]
]
}
Add $unwind stage after $project stage to change to object
{$unwind:"$LastModifiedBy"}
Similar steps for calculating EarliestCreatedBy

Project first item in an array to new field (MongoDB aggregation)

I am using Mongoose aggregation (MongoDB version 3.2).
I have a field users which is an array. I want to $project first item in this array to a new field user.
I tried
{ $project: {
user: '$users[0]',
otherField: 1
}},
{ $project: {
user: '$users.0',
otherField: 1
}},
{ $project: {
user: { $first: '$users'},
otherField: 1
}},
But neither works.
How can I do it correctly? Thanks
Update:
Starting from v4.4 there is a dedicated operator $first:
{ $project: {
user: { $first: "$users" },
otherField: 1
}},
It's a syntax sugar to the
Original answer:
You can use arrayElemAt:
{ $project: {
user: { $arrayElemAt: [ "$users", 0 ] },
otherField: 1
}},
If it is an array of objects and you want to use just single object field, ie:
{
"users": [
{name: "John", surname: "Smith"},
{name: "Elon", surname: "Gates"}
]
}
you can use:
{ $project: { user: { $first: "$users.name" } }
Edit (exclude case - after comment from #haytham)
In order to exclude a single field from a nested document in array you have to do 2 projections:
{ $project: { user: { $first: "$users" } }
Which return whole first object, and then exclude field you do not want, ie:
{ $project: { "user.name" : 0 }
Starting Mongo 4.4, the aggregation operator $first can be used to access the first element of an array:
// { "users": ["Jean", "Paul", "Jack"] }
// { "users": ["Claude"] }
db.collection.aggregate([
{ $project: { user: { $first: "$users" } } }
])
// { "user" : "Jean" }
// { "user" : "Claude" }