Aggregation with update in mongoDB - mongodb

I've a collection with many similar structured document, two of the document looks like
Input:
{
"_id": ObjectId("525c22348771ebd7b179add8"),
"cust_id": "A1234",
"score": 500,
"status": "A"
"clear": "No"
}
{
"_id": ObjectId("525c22348771ebd7b179add9"),
"cust_id": "A1234",
"score": 1600,
"status": "B"
"clear": "No"
}
By default the clear for all document is "No",
Req: I have to add the score of all documents with same cust_id, provided they belong to status "A" and status "B". If the score exceeds 2000 then I have to update the clear attribute to "Yes" for all of the document with the same cust_id.
Expected output:
{
"_id": ObjectId("525c22348771ebd7b179add8"),
"cust_id": "A1234",
"score": 500,
"status": "A"
"clear": "Yes"
}
{
"_id": ObjectId("525c22348771ebd7b179add9"),
"cust_id": "A1234",
"score": 1600,
"status": "B"
"clear": "Yes"
}
Yes because 1600+500 = 2100, and 2100 > 2000.
My Approach:
I was only able to get the sum by aggregate function but failed at updating
db.aggregation.aggregate([
{$match: {
$or: [
{status: 'A'},
{status: 'B'}
]
}},
{$group: {
_id: '$cust_id',
total: {$sum: '$score'}
}},
{$match: {
total: {$gt: 2000}
}}
])
Please suggest me how do I proceed.

After a lot of trouble, experimenting mongo shell I've finally got a solution to my question.
Psudocode:
# To get the list of customer whose score is greater than 2000
cust_to_clear=db.col.aggregate(
{$match:{$or:[{status:'A'},{status:'B'}]}},
{$group:{_id:'$cust_id',total:{$sum:'$score'}}},
{$match:{total:{$gt:500}}})
# To loop through the result fetched from above code and update the clear
cust_to_clear.result.forEach
(
function(x)
{
db.col.update({cust_id:x._id},{$set:{clear:'Yes'}},{multi:true});
}
)
Please comment, if you have any different solution for the same question.

With Mongo 4.2 it is now possible to do this using update with aggregation pipeline. The example 2 has example how you do conditional updates:
db.runCommand(
{
update: "students",
updates: [
{
q: { },
u: [
{ $set: { average : { $avg: "$tests" } } },
{ $set: { grade: { $switch: {
branches: [
{ case: { $gte: [ "$average", 90 ] }, then: "A" },
{ case: { $gte: [ "$average", 80 ] }, then: "B" },
{ case: { $gte: [ "$average", 70 ] }, then: "C" },
{ case: { $gte: [ "$average", 60 ] }, then: "D" }
],
default: "F"
} } } }
],
multi: true
}
],
ordered: false,
writeConcern: { w: "majority", wtimeout: 5000 }
}
)
Another example:
db.c.update({}, [
{$set:{a:{$cond:{
if: {}, // some condition
then:{} , // val1
else: {} // val2 or "$$REMOVE" to not set the field or "$a" to leave existing value
}}}}
]);

You need to do this in two steps:
Identify customers (cust_id) with a total score greater than 200
For each of these customers, set clear to Yes
You already have a good solution for the first part. The second part should be implemented as a separate update() calls to the database.
Psudocode:
# Get list of customers using the aggregation framework
cust_to_clear = db.col.aggregate(
{$match:{$or:[{status:'A'},{status:'B'}]}},
{$group:{_id:'$cust_id', total:{$sum:'$score'}}},
{$match:{total:{$gt:2000}}}
)
# Loop over customers and update "clear" to "yes"
for customer in cust_to_clear:
id = customer[_id]
db.col.update(
{"_id": id},
{"$set": {"clear": "Yes"}}
)
This isn't ideal because you have to make a database call for every customer. If you need to do this kind of operation often, you might revise your schema to include the total score in each document. (This would have to be maintained by your application.) In this case, you could do the update with a single command:
db.col.update(
{"total_score": {"$gt": 2000}},
{"$set": {"clear": "Yes"}},
{"multi": true}
)

Short Answer: To avoid looping a Database query, just add $merge to the end and specify your collection like so:
db.aggregation.aggregate([
{$match: {
$or: [
{status: 'A'},
{status: 'B'}
]
}},
{$group: {
_id: '$cust_id',
total: {$sum: '$score'}
}},
{$match: {
total: {$gt: 2000}
}},
{ $merge: "<collection name here>"}
])
Elaboration: The current solution is looping through a database query, which is not good time efficiency wise and also a lot more code.
Mitar's answer is not updating through an aggregation, but the opposite => using an aggregation within Mongo's update. If your wondering what is a pro in doing it this way, well you can use all of the aggregation pipeline as opposed to being restricted to only a few as specified in their documentation.
Here is an example of an aggregate that won't work with Mongo's update:
db.getCollection('foo').aggregate([
{ $addFields: {
testField: {
$in: [ "someValueInArray", '$arrayFieldInFoo']
}
}},
{ $merge : "foo" }]
)
This will output the updated collection with a new test field that will be true if "someValueInArray" is in "arrayFieldInFoo" or false otherwise. This is NOT possible currently with Mongo.update since $in cannot be used inside update aggregate.
Update: Changed from $out to $merge since $out would only work if updating the entire collection as $out replaces entire collection with the result of the aggregate. $merge will only overrite if the aggregate matches a document (much safer).

In MongoDB 2.6., it will be possible to write the output of aggregation query, with the same command.
More information here : http://docs.mongodb.org/master/reference/operator/aggregation/out/

The solution which I found is using "$out"
*) e.g adding a field :
db.socios.aggregate(
[
{
$lookup: {
from: 'cuotas',
localField: 'num_socio',
foreignField: 'num_socio',
as: 'cuotas'
}
},
{
$addFields: { codigo_interno: 1001 }
},
{
$out: 'socios' //Collection to modify
}
]
)
*) e.g modifying a field :
db.socios.aggregate(
[
{
$lookup: {
from: 'cuotas',
localField: 'num_socio',
foreignField: 'num_socio',
as: 'cuotas'
}
},
{
$set: { codigo_interno: 1001 }
},
{
$out: 'socios' //Collection to modify
}
]
)

Related

How to update a property of the last object of a list in mongo

I would like to update a property of the last objet stored in a list in mongo. For performance reasons, I can not pop the object from the list, then update the property, and then put the objet back. I can not either change the code design as it does not depend on me. In brief am looking for a way to select the last element of a list.
The closest I came to get it working was to use arrayFilters that I found doing research on the subject (mongodb core ticket: https://jira.mongodb.org/browse/SERVER-27089):
db.getCollection("myCollection")
.updateOne(
{
_id: ObjectId('638f5f7fe881c670052a9d08')
},
{
$set: {"theList.$[i].propertyToUpdate": 'NewValueToAssign'}
},
{
arrayFilters: [{'i.type': 'MyTypeFilter'}]
}
)
I use a filter to only update the objets in theList that have their property type evaluated as MyTypeFilter.
What I am looking for is something like:
db.getCollection("maCollection")
.updateOne(
{
_id: ObjectId('638f5f7fe881c670052a9d08')
},
{
$set: {"theList.$[i].propertyToUpdate": 'NewValueToAssign'}
},
{
arrayFilters: [{'i.index': -1}]
}
)
I also tried using "theList.$last.propertyToUpdate" instead of "theList.$[i].propertyToUpdate" but the path is not recognized (since $last is invalid)
I could not find anything online matching my case.
Thank you for your help, have a great day
You want to be using Mongo's pipelined updates, this allows us to use aggregation operators within the update body.
You do however need to consider edge cases that the previous answer does not. (null list, empty list, and list.length == 1)
Overall it looks like so:
db.collection.update({
_id: ObjectId("638f5f7fe881c670052a9d08")
},
[
{
$set: {
list: {
$concatArrays: [
{
$cond: [
{
$gt: [
{
$size: {
$ifNull: [
"$list",
[]
]
}
},
1
]
},
{
$slice: [
"$list",
0,
{
$subtract: [
{
$size: "$list"
},
1
]
}
]
},
[]
]
},
[
{
$mergeObjects: [
{
$ifNull: [
{
$last: "$list"
},
{}
]
},
{
propertyToUpdate: "NewValueToAssign"
}
]
}
]
]
}
}
}
])
Mongo Playground
One option is to use update with pipeline:
db.collection.update(
{_id: ObjectId("638f5f7fe881c670052a9d08")},
[
{$set: {
theList: {
$concatArrays: [
{$slice: ["$theList", 0, {$subtract: [{$size: "$theList"}, 1]}]},
[{$mergeObjects: [{$last: "$theList"}, {propertyToUpdate: "NewValueToAssign"}]}]
]
}
}}
]
)
See how it works on the playground example

Efficiently find the most recent filtered document in MongoDB collection using datetime field

I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.

unable to access date value after lookup in mongodb

Main Collection
{
"_id": ObjectID("5ea1a07bfd7e4965408a5171"),
"data": [],
"history_id": ObjectID("5e4e755b380054797d9db627"),
"sender_id": ObjectID("5e4e74eb380054797d9db623"),
"text": "Hi tester",
"date": 1587650683434
}
History collection
{
"_id": ObjectID("5ea1afd4f4151402efd234e3"),
"user_id": [
"5e4a8d2d3952132a08ae5764"
],
"dialog_id": ObjectID("5e4e755b380054797d9db627"),
"date": 1587549034211,
"__v": 1
}
const messages = await MainModal.aggregate([
{
$lookup: {
from: 'history',
localField: 'history_id',
foreignField: 'history_id',
as: 'History'
}
},
{ $unwind: '$History' },
{ "$match" : { date: {$gt: "History.date" } } }, // this is not working
])
I am getting value inside $history but enable to fetch matched record. I don't know why i read somewhere $gt does work on number my date is number too
when i set string instead "History.date" it does work but after putting this doesnot work
Basically my idea is not to display those value whose date are $lt from history collection and where user not in 5e4a8d2d3952132a08ae5764
You are mixing the query language with aggregation operators. In order to use $match to compare two fields from the same document, you will need to use $expr along with aggregation operators.
For that final match stage, use
{ "$match":{ "$expr":{ "$gt":[ "$date", "$History.date" ] } } }
Edit
Adding an additional comparison between other fields would need to use $and or $or, like:
{ "$match":{
"$expr":{
"$and":[
{"$gt":[ "$date", "$History.date" ] },
{"$not": {"$eq":[ "$sender_id", "$History.user_id" ]}}
]
}
}}
Playground

Mongo aggregation with paginated data and totals

I've crawled all over stack overflow, and have not found any info on how to return proper pagination data included in the resultset.
I'm trying to aggregate some data from my mongo store. What I want, is to have something return:
{
total: 5320,
page: 0,
pageSize: 10,
data: [
{
_id: 234,
currentEvent: "UPSTREAM_QUEUE",
events: [
{ ... }, { ... }, { ... }
]
},
{
_id: 235,
currentEvent: "UPSTREAM_QUEUE",
events: [
{ ... }, { ... }, { ... }
]
}
]
}
This is what I have so far:
// page and pageSize are variables
db.mongoAuditEvent.aggregate([
// Actual grouped data
{"$group": {
"_id" : "$corrId",
"currentEvent": {"$last": "$event.status"},
"events": { $push: "$$ROOT"}
}},
// Pagination group
{"$group": {
"_id": 0,
"total": { "$sum": "corrId" },
"page": page,
"pageSize": pageSize,
"data": {
"$push": {
"_id": "$_id",
"currentEvent": "$currentEvent",
"events": "$events"
}
}
}},
{"$sort": {"events.timestamp": -1} }, // Latest first
{"$skip": page },
{"$limit": pageSize }
], {allowDiskUse: true});
I'm trying to have a pagination group as root, containing the actual grouped data inside (so that I get actual totals, whilst still retaining skip and limits).
The above code will return the following error in mongo console:
The field 'page' must be an accumulator object
If I remove the page and pageSize from the pagination group, I still get the following error:
BSONObj size: 45707184 (0x2B96FB0) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 0
If I remove the pagination group alltogether, the query works fine. But I really need to return how many documents I have stored total, and allthough not actually necessary, page and pageSize would be nice to return as well.
Can somebody please tell me what I am doing wrong? Or tell me if it is at all possible to do this in one go?
If you have a lot of events, {$ push: "$$ ROOT"}, will make Mongo return an error, I have solved it with $facet (Only works with version 3.4+)
aggregate([
{ $match: options },
{
$facet: {
edges: [
{ $sort: sort },
{ $skip: skip },
{ $limit: limit },
],
pageInfo: [
{ $group: { _id: null, count: { $sum: 1 } } },
],
},
},
])
A performance optimization tip:
When you use $facet stage for pagination, Try to add it as soon as it's possible.
For example: if you want to add $project or $lookup stage, add them after $facet, not before it.
it will have impressive effect in aggregation speed. because $project stage require MongoDB to explore all documents and get involve with all fields(which is not necessary).
Did this in two steps instead of one:
// Get the totals
db.mongoAuditEvent.aggregate([{$group: {_id: "$corrId"}}, {$group: {_id: 1, total: {$sum: 1}}}]);
// Get the data
db.mongoAuditEvent.aggregate([
{$group: {
_id : "$corrId",
currentEvent: {"$last": "$event.status"},
"events": { $push: "$$ROOT"}
}},
{$sort: {"events.timestamp": -1} }, // Latest first
{$skip: 0 },
{$limit: 10}
], {allowDiskUse: true}).pretty();
I would be very happy if anybody got a better solution to this though.

Compare document array size to other document field

The document might look like:
{
_id: 'abc',
programId: 'xyz',
enrollment: 'open',
people: ['a', 'b', 'c'],
maxPeople: 5
}
I need to return all documents where enrollment is open and the length of people is less than maxPeople
I got this to work with $where:
const
exists = ['enrollment', 'maxPeople', 'people'],
query = _.reduce(exists, (existsQuery, field) => {
existsQuery[field] = {'$exists': true}; return existsQuery;
}, {});
query['$and'] = [{enrollment: 'open'}];
query['$where'] = 'this.people.length<this.maxPeople';
return db.coll.find(query, {fields: {programId: 1, maxPeople: 1, people: 1}});
But could I do this with aggregation, and why would it be better?
Also, if aggregation is better/faster, I don't understand how I could convert the above query to use aggregation. I'm stuck at:
db.coll.aggregate([
{$project: {ab: {$cmp: ['$maxPeople','$someHowComputePeopleLength']}}},
{$match: {ab:{$gt:0}}}
]);
UPDATE:
Based on #chridam answer, I was able to implement a solution like so, note the $and in the $match, for those of you that need a similar query:
return Coll.aggregate([
{
$match: {
$and: [
{"enrollment": "open"},
{"times.start.dateTime": {$gte: new Date()}}
]
}
},
{
"$redact": {
"$cond": [
{"$lt": [{"$size": "$students" }, "$maxStudents" ] },
"$$KEEP",
"$$PRUNE"
]
}
}
]);
The $redact pipeline operator in the aggregation framework should work for you in this case. This will recursively descend through the document structure and do some actions based on an evaluation of specified conditions at each level. The concept can be a bit tricky to grasp but basically the operator allows you to proccess the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which restricts contents of the result set based on the access required to view the data and is more efficient.
To run a query on all documents where enrollment is open and the length of people is less than maxPeople, include a $redact stage as in the following::
db.coll.aggregate([
{ "$match": { "enrollment": "open" } },
{
"$redact": {
"$cond": [
{ "$lt": [ { "$size": "$people" }, "$maxPeople" ] },
"$$KEEP",
"$$PRUNE"
]
}
}
])
You can do :
1 $project that create a new field featuring the result of the comparison for the array size of people to maxPeople
1 $match that match the previous comparison result & enrollment to open
Query is :
db.coll.aggregate([{
$project: {
_id: 1,
programId: 1,
enrollment: 1,
cmp: {
$cmp: ["$maxPeople", { $size: "$people" }]
}
}
}, {
$match: {
$and: [
{ cmp: { $gt: 0 } },
{ enrollment: "open" }
]
}
}])