How to extract grouped results from array in $group stage and return as separate fields? - mongodb

I'm running an aggregation query, and the $group stage is as follows
$group:
{
_id:
{
year_month: { $dateToString: { "date": "$updated_at", "format": "%Y-%m" } }
,client_name: "$clients_docs.client_name"
,client_label: "$clients_docs.client_label"
,client_code: "$clients_docs.client_code"
,client_country: "$clients_docs.client_country"
,base_curr: "$clients_docs.client_base_currency"
,inv_curr: "$clients_docs.client_invoice_currency"
,dest_curr: "$store.destination_currency"
}
,total_vol: { $sum: "$USD_Value" }
,total_tran: { $sum: 1 }
}
It returns the correct results, and returns all the grouped results in the _id:{} array.
I now want to extract all those fields from the array and return them not within the array so I can more easily export the output to a spreadsheet.
I tried using this stage:
{
$project:
{
year_month: 1
,client_name: 1
,client_label: 1
,client_code: 1
,client_country: 1
,base_curr: 1
,inv_curr: 1
,dest_curr: 1
,total_vol: 1
,total_tran : 1
}
},
But that returned the same results as the $group stage:
{
"_id" : {
"year_month" : "2022-01",
"client_name" : "client A",
"client_label" : "client A",
"client_code" : NumberInt(0000),
"client_country" : "TH",
"base_curr" : "USD",
"inv_curr" : "USD",
"dest_curr" : "HKD"
},
"total_vol" : 100000,
"total_tran" : 100.0
}
I want the "year_month" through "dest_curr" fields at the same level as the "total_vol" and "total_tran", so that when the data is exported they all appear as separate columns (now it's all captured as one "_id" column, and a "total_vol" and "total_tran" column). What's the best way to do this?

From a terminology perspective, you currently have an embedded document (or nested fields) rather than an array.
The straightforward way to do this is to simply enumerate each field, eg:
"year_month": "$_id.year_month",
There are fancier ways to do this, but as you only have a handful of fields this should suffice. Working playground example here.
Edit
An alternative ("fancier") approach is to leverage the $replaceWith stage using the $mergeObjects operator inside of it. Then you can $unset the previous _id field afterwards. It would look like this:
db.collection.aggregate([
{
"$replaceWith": {
"$mergeObjects": [
"$$ROOT",
"$_id"
]
}
},
{
$unset: "_id"
}
])
Playground link here
I also fixed the earlier playground link that had a typo for the client_label field.

Related

Using cond to specify _id fields for group in mongodb aggregation

new to Mongo. Trying to group across different sub fields of a document based on a condition. The condition is a regex on a field value. Looks like -
db.collection.aggregate([{
{
"$group": {
"$cond": [{
"upper.leaf": {
$not: {
$regex: /flower/
}
}
},
{
"_id": {
"leaf": "$upper.leaf",
"stem": "$upper.stem"
}
},
{
"_id": {
"stem": "$upper.stem",
"petal": "$upper.petal"
}
}
]
}
}])
Using api v4.0: cond in the docs shows - { $cond: [ <boolean-expression>, <true-case>, <false-case> ] }
The error I get with the above code is - "Syntax error: dotted field name 'upper.leaf' can not used in a sub object."
Reading up on that I tried $let to re-assign the dotted field name. But started to hit various syntax errors with no obvious issue in the query.
Also tried using $project to rename the fields, but got - Field names may not start with '$'
Thoughts on the best approach here? I can always address this at the application level and split my query into two but it's attractive potentially to solve it natively in mongo.
$group syntax is wrong
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
You tried to do
{
$group:
<expression>
}
And even if your expression resulted in the same code, its invalid syntax for $group (check from the documentation where you are allowed to use expressions)
One other problem is that you use the query operator for regex, and not the aggregate regex operators (you can't do that, if you aggregate you can use only aggregate operators, only $match is the exception that you can use both if you add $expr)
You need this i think
[{
"$group" : {
"_id" : {
"$cond" : [ {
"$not" : [ {
"$regexMatch" : {
"input" : "$upper.leaf",
"regex" : "/flower/"}}]},
{"leaf" : "$upper.leaf","stem" : "$upper.stem"},
{"stem" : "$upper.stem","petal" : "$upper.petal"}]
}
}}]
Its similar code, but expression gets as value of the "_id" and $regexMatch
is used that is aggregate operator.
I didnt tested the code.

How do I update a field in a sub-document array with a field from the document in MongoDB?

I have a large amount of data (~160M items) where a date value wasn't populated on the sub-document array fields, but was populated on the parent document. I'm very new to MongoDB and having trouble figuring out how to $set the field to match. Here's a sample of the data:
{
"_id": "5f11d4c48663f32e940696ed",
"Widgets":[{
"WidgetId":663,
"Name":"Super Widget 2.0",
"Created":null,
"LastUpdated":null
}],
"Status":3,
"LastUpdated":null,
"Created": "2018-11-09T18:22:16.000Z"
}
}
My knowledge of MongoDB is pretty limited but here's the basic aggregation I have created for part of the pipeline and where I'm struggling:
db.sample.aggregate(
[
{
"$match" : {
"Donors.$.Created" : {
"$exists" : true
}
}
},
{
"$match" : {
"Widgets.$.Created" : null
}
},
{
"$set" : {
"Widgets.$.Created" : "Created" // <- This is where I can't figure out how to define the reference to the parent "Created" field
}
}
]
);
The desired output would be:
{
"_id": "5f11d4c48663f32e940696ed",
"Widgets":[{
"WidgetId":663,
"Name":"Super Widget 2.0",
"Created":"2018-11-09T18:22:16.000Z",
"LastUpdated":null
}],
"Status":3,
"LastUpdated":null,
"Created": "2018-11-09T18:22:16.000Z"
}
}
Thanks for any assitance
Are you attempting to add the Created field to sub documents on query/aggregation? Or are you attempting to update/save the Created field on the subdocuments?
The $ is an update operator, to be used with updateMany or updateOne. Not aggregate.
https://docs.mongodb.com/manual/reference/operator/query-array/
https://docs.mongodb.com/manual/reference/operator/update-array/
If you just want to add the parents Created field to all subdocuments on query/aggregation this is all you have to do: https://mongoplayground.net/p/yHDHULCSTIz
db.collection.aggregate([
{
"$addFields": {
"Widgets.Created": "$Created"
}
}
])
If your attempting to save the parents Created field to all subdocuments:
db.sample.updateMany({"Widgets.Created" : null}, [{$set: {"Widgets.Created" : "$Created"}}])
Note: This matches any doc that has a subdocument with a null Created field and updates all the subdocuments.

Get record having highest date inside nested group in Mongodb

I am having a record set like below :
I need to write a query where foreach datatype of every parent I show the data type with highest date i.e
So far I am able to create two groups one on parent id & other on data type but i am unable to understand how to get record with max date.
Below is my query :
db.getCollection('Maintenance').aggregate( [{ $group :
{ _id :{ parentName: "$ParentID" , maintainancename : "$DataType" }}},
{ $group : {
_id : "$_id.parentName",
maintainancename: {
$push: {
term:"$_id.DataType"
}
}
}
}] )
You don't have to $group twice, try below aggregation query :
db.collection.aggregate([
/** group on two fields `ParentID` & `Datatype`,
* which will leave docs with unique `ParentID + Datatype`
* & use `$max` to get max value on `Date` field in unique set of docs */
{
$group: {
_id: {
parentName: "$ParentID",
maintainancename: "$Datatype"
},
"Date": { $max: "$Date" }
}
}
])
Test : mongoplayground
Note : After group stage you can use $project or $addFieldsstages to transform fields the way you want.

MongoDB Aggregation with DBRef

Is it possible to aggregate on data that is stored via DBRef?
Mongo 2.6
Let's say I have transaction data like:
{
_id : ObjectId(...),
user : DBRef("user", ObjectId(...)),
product : DBRef("product", ObjectId(...)),
source : DBRef("website", ObjectId(...)),
quantity : 3,
price : 40.95,
total_price : 122.85,
sold_at : ISODate("2015-07-08T09:09:40.262-0700")
}
The trick is "source" is polymorphic in nature - it could be different $ref values such as "webpage", "call_center", etc that also have different ObjectIds. For example DBRef("webpage", ObjectId("1")) and DBRef("webpage",ObjectId("2")) would be two different webpages where a transaction originated.
I would like to ultimately aggregate by source over a period of time (like a month):
db.coll.aggregate( { $match : { sold_at : { $gte : start, $lt : end } } },
{ $project : { source : 1, total_price : 1 } },
{ $group : {
_id : { "source.$ref" : "$source.$ref" },
count : { $sum : $total_price }
} } );
The trick is you get a path error trying to use a variable starting with $ either by trying to group by it or by trying to transform using expressions via project.
Any way to do this? Actually trying to push this data via aggregation to a subcollection to operate on it there. Trying to avoid a large cursor operation over millions of records to transform the data so I can aggregate it.
Mongo 4. Solved this issue in the following way:
Having this structure:
{
"_id" : LUUID("144e690f-9613-897c-9eab-913933bed9a7"),
"owner" : {
"$ref" : "person",
"$id" : NumberLong(10)
},
...
...
}
I needed to use "owner.$id" field. But because of "$" in the name of field, I was unable to use aggregation.
I transformed "owner.$id" -> "owner" using following snippet:
db.activities.find({}).aggregate([
{
$addFields: {
"owner": {
$arrayElemAt: [{ $objectToArray: "$owner" }, 1]
}
}
},
{
$addFields: {
"owner": "$owner.v"
}
},
{"$group" : {_id:"$owner", count:{$sum:1}}},
{$sort:{"count":-1}}
])
Detailed explanations here - https://dev.to/saurabh73/mongodb-using-aggregation-pipeline-to-extract-dbref-using-lookup-operator-4ekl
You cannot use DBRef values with the aggregation framework. Instead you need to use JavasScript processing of mapReduce in order to access the property naming that they use:
db.coll.mapReduce(
function() {
emit( this.source.$ref, this["total_price"] )
},
function(key,values) {
return Array.sum( values );
},
{
"query": { "sold_at": { "$gte": start, "$lt": end } },
"out": { "inline": 1 }
}
)
You really should not be using DBRef at all. The usage is basically deprecated now and if you feel you need some external referencing then you should be "manually referencing" this with your own code or implemented by some other library, with which you can do so in a much more supported way.

how to use mongodb aggregate and retrieve entire documents

I am seriously baffled by mongodb's aggregate function. All I want is to find the newest document in my collection. Let's say each record has a field "created"
db.collection.aggregate({
$group: {
_id:0,
'id':{$first:"$_id"},
'max':{$max:"$created"}
}
})
yields the correct result, but I want the entire document in the result? How would I do that?
This is the structure of the document:
{
"_id" : ObjectId("52310da847cf343c8c000093"),
"created" : 1389073358,
"image" : ObjectId("52cb93dd47cf348786d63af2"),
"images" : [
ObjectId("52cb93dd47cf348786d63af2"),
ObjectId("52f67c8447cf343509d63af2")
],
"organization" : ObjectId("522949d347cf3402c3000001"),
"published" : 1392601521,
"status" : "PUBLISHED",
"tags" : [ ],
"updated" : 1392601521,
"user_id" : ObjectId("52214ce847cf344902000000")
}
In the documentation i found that the $$ROOT expression addresses this problem.
From the DOC:
http://docs.mongodb.org/manual/reference/operator/aggregation/group/#group-documents-by-author
query = [
{
'$sort': {
'created': -1
}
},
{
$group: {
'_id':null,
'max':{'$first':"$$ROOT"}
}
}
]
db.collection.aggregate(query)
db.collection.aggregate([
{
$group: {
'_id':"$_id",
'otherFields':{ $push: { fields: $ROOT } }
}
}
])
I think I figured it out. For example, I have a collection containing an array of images (or pointers). Now I want to find the document with the most images
results=[];
db.collection.aggregate([
{$unwind: "$images"},
{$group:{_id:"$_id", 'imagecount':{$sum:1}}},
{$group:{_id:"$_id",'max':{$max: "$imagecount"}}},
{$sort:{max:-1}},
{$group:{_id:0,'id':{$first:'$_id'},'max':{$first:"$max"}}}
]).result.forEach(function(d){
results.push(db.stories.findOne({_id:d.id}));
});
now the final array will contain the document with the most images. Since images is an array, I use $unwind, I then group by document id and $sum:1, pipe that into a $group that finds the max, pipe it into reverse $sort for max and $group out the first result. Finally I fetchOne the document and push it into the results array.
You should be using db.collection.find() rather than db.collection.aggregate():
db.collection.find().sort({"created":-1}).limit(1)