Update multiple fields based on condition in aggregation pipeline MongoDB Atlas trigger - mongodb

I have the following pipeline that calculate the rank (sort) according to the score when the flag update is set to true:
const pipeline = [
{$match: {"score": {$gt: 0}, "update": true}},
{$setWindowFields: {sortBy: {"score": -1}, output: {"rank": {$denseRank: {}}}}},
{$merge: {into: "ranking"}}
];
await ranking_col.aggregate(pipeline).toArray();
What i do next is to set the rank to 0 when the update flag is set to false:
ranking_col.updateMany({"update": false}, {$set: {"rank": parseInt(0, 10)}});
One of my document looks like this :
{
"_id": "7dqe1kcA7R1YGjdwHsAkV83",
"score": 294,
"update": false,
"rank": 0,
}
I want to avoid the extra updateMany call and do the equivalent inside the pipeline. MongoDB support back then told me to use the $addFields flag this way :
const pipeline = [
{$match: {"score": {$gt: 0}, "update": true}},
{$setWindowFields: {sortBy: {"score": -1}, output: {"rank": {$denseRank: {}}}}},
{$addFields: {rank: {$cond: [{$eq: ['$update', false]},parseInt(0, 10),'$rank']}}},
{$merge: {into: "ranking"}}
];
This is not working in my Atlas Trigger.
Can you please correct my syntax or tell me a good way to do so ?

This aggregation pipeline isn't particularly efficient (a fair amount of work in "$setWindowFields" gets thrown away - more comments about this below), but I think it does what you want. Please check to make sure it's correct as I don't have complete understanding of the collection, its use, etc.
N.B.: This aggregation pipeline is not very efficient because:
It processes every document. There's no leading "$match" to filter documents.
Because of 1., "$setWindowFields" has to "partitionBy": "$update" and sort/rank the "update": false partition and "$and": ["update": true, {"$lte": ["score", 0]}] docs even though it is irrelevant.
All the irrelevant work is thrown away by just setting the "update": false" partition's "rank" to 0 and then excluding all the "$and": ["update": true, {"$lte": ["score", 0]}] documents from the "$merge".
In a large collection, your original two-step update may likely be more efficient.
db.ranking.aggregate([
{
"$setWindowFields": {
"partitionBy": "$update",
"sortBy": {"score": -1},
"output": {
"rank": {"$denseRank": {}}
}
}
},
{
"$set": {
"rank": {
"$cond": [
"$update",
"$rank",
0
]
}
}
},
{
"$match": {
"$expr": {
"$not": [{"$and": ["$update", {"$lte": ["$score", 0]}]}]
}
}
},
{"$merge": "ranking"}
])
Try it on mongoplayground.net.

Related

Limit number of objects pushed to array in MongoDB aggregation

I've been trying to find a way to limit the number of objects i'm pushing to arrays I'm creating while using "aggregate" on a MongoDB collection.
I have a collection of students - each has these relevant keys:
class number it takes this semester (only one value),
percentile in class (exists if is enrolled in class, null if not),
current score in class (> 0 if enrolled in class, else - 0),
total average (GPA),
max grade
I need to group all students who never failed, per class, in one array that contains those with a GPA higher than 80, and another array containing those without this GPA, sorted by their score in this specific class.
This is my query:
db.getCollection("students").aggregate([
{"$match": {
"class_number":
{"$in": [49, 50, 16]},
"grades.curr_class.percentile":
{"$exists": true},
"grades.min": {"$gte": 80},
}},
{"$sort": {"grades.curr_class.score": -1}},
{"$group": {"_id": "$class_number",
"studentsWithHighGPA":
{"$push":
{"$cond": [{"$gte": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
}
},
"studentsWithoutHighGPA":
{"$push":
{"$cond": [{"$lt": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
},
},
},
},
])
What i'm trying to do is limit the number of students in each of these arrays. I only want the top 16 in each array, but i'm not sure how to approach this.
Thanks in advance!
I've tried using limit in different variations, and slice too, but none seem to work.
Since mongoDb version 5.0, one option is to use $setWindowFields for this, and in particular, its $rank option. This will allow to keep only the relevant students and limit their count even before the $group step:
$match only relevant students as suggested by the OP
$set the groupId for the setWindowFields (as it can currently partition by one key only
$setWindowFields to define the rank of each student in their array
$match only students with the wanted rank
$group by class_number as suggested by the OP:
db.collection.aggregate([
{$match: {
class_number: {$in: [49, 50, 16]},
"grades.curr_class.percentile": {$exists: true},
"grades.min": {$gte: 80}
}},
{$set: {
groupId: {$concat: [
{$toString: "$class_number"},
{$toString: {$toBool: {$gte: ["$grades.gpa", 80]}}}
]}
}},
{$setWindowFields: {
partitionBy: "$groupId",
sortBy: {"grades.curr_class.score": -1},
output: {rank: {$rank: {}}}
}},
{$match: {rank: {$lte: rankLimit}}},
{$group: {
_id: "$class_number",
studentsWithHighGPA: {$push: {
$cond: [{$gte: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}},
studentsWithoutHighGPA: {$push: {
$cond: [{$lt: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}}
}}
])
See how it works on the playground example
*This solution will limit the rank of the students, so there is an edge case of more than n students in the array (In case there are multiple students with the exact rank of n). it can be simply solved by adding a $slice step
Maybe MongoDB $facets are a solution. You can specify different output pipelines in one aggregation call.
Something like this:
const pipeline = [
{
'$facet': {
'studentsWithHighGPA': [
{ '$match': { 'grade': { '$gte': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
],
'studentsWithoutHighGPA': [
{ '$match': { 'grade': { '$lt': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
]
}
}
];
coll.aggregate(pipeline)
This should end up with one document including two arrays.
studentsWithHighGPA (array)
0 (object)
1 (object)
...
studentsWithoutHighGPA (array)
0 (object)
1 (object)
See each facet as an aggregation pipeline on its own. So you can also include $group to group by classes or something else.
https://www.mongodb.com/docs/manual/reference/operator/aggregation/facet/
I don't think there is a mongodb-provided operator to apply a limit inside of a $group stage.
You could use $accumulator, but that requires server-side scripting to be enabled, and may have performance impact.
Limiting studentsWithHighGPA to 16 throughout the grouping might look something like:
"studentsWithHighGPA": {
"$accumulator": {
init: "function(){
return {combined:[]};
}",
accumulate: "function(state, id, score){
if (score >= 80) {
state.combined.push({_id:id, score:score})
};
return {combined:state.combined.slice(0,16)}
}",
accumulateArgs: [ "$_id", "$grades.gpa"],
merge: "function(A,B){
return {combined:
A.combined.concat(B.combined).sort(
function(SA,SB){
return (SB.score - SA.score)
})
}
}",
finalize: "function(s){
return s.combined.slice(0,16).map(function(A){
return {_id:A._id}
})
}",
lang: "js"
}
}
Note that the score is also carried through until the very end so that partial result sets from different shards can be combined properly.

Sort and assign the order to query in mongodb

I'd like to sort a collection, then add a virtual property to the result which is their numerical order in which the results where displayed as.
So for example, we have a collection called calls, and we'd like to ascertain the current call queue priority as a number so it can be synced to our CRM via reverse ETL.
We have to do this inside of the query itself because we don't have an intermediary step where we can introduce any logic to determine this logic.
So my current query is
db.getCollection('callqueues').aggregate([
{
$match: {
'invalidated': false,
'assigned_agent': null
}
},
{ $sort: {
score: -1, _id: -1
} },
{
$addFields: {
order: "<NEW ORDER PROPERTY HERE>",
}
},
])
So I was wondering how would I insert as a virtual property their order, where the first element after the sort should be 1, second 2, etc
One option (since mongoDB version 5.0) is to use $setWindowFields for this:
db.collection.aggregate([
{$match: {invalidated: false, assigned_agent: null}},
{$setWindowFields: {
sortBy: {score: -1, _id: -1},
output: {
order: {
$sum: 1,
window: {documents: ["unbounded", "current"]}
}
}
}}
])
See how it works on the playground example
EDIT: If your mongoDB version is earlier than 5.0, you can use a less efficient query, involving $group and $unwind:
db.collection.aggregate([
{$match: {invalidated: false, assigned_agent: null}},
{$sort: {score: -1, _id: -1}},
{$group: {_id: 0, data: {$push: "$$ROOT"}}},
{$unwind: {path: "$data", includeArrayIndex: "order"}},
{$replaceRoot: {newRoot: {$mergeObjects: ["$data", {order: {$add: ["$order", 1]}}]}}}
])
See how it works on the playground example < 5.0

Scala / MongoDB - removing duplicate

I have seen very similar questions with solutions to this problem, but I am unsure how I would incorporate it in to my own query. I'm programming in Scala and using a MongoDB Aggregates "framework".
val getItems = Seq (
Aggregates.lookup(Store...)...
Aggregates.lookup(Store.STORE_NAME, "relationship.itemID", "uniqueID", "item"),
Aggregates.unwind("$item"),
// filter duplicates here ?
Aggregates.lookup(Store.STORE_NAME, "item.content", "ID", "content"),
Aggregates.unwind("$content"),
Aggregates.project(Projections.fields(Projections.include("store", "item", "content")))
)
The query returns duplicate objects which is undesirable. I would like to remove these. How could I go about incorporating Aggregates.group and "$addToSet" to do this? Or any other reasonable solution would be great too.
Note: I have to omit some details about the query, so the store lookup aggregate is not there. However, I want to remove the duplicates later in the query so it hopefully shouldn't matter.
Please let me know if I need to provide more information.
Thanks.
EDIT: 31/ 07/ 2019: 13:47
I have tried the following:
val getItems = Seq (
Aggregates.lookup(Store...)...
Aggregates.lookup(Store.STORE_NAME, "relationship.itemID", "uniqueID", "item"),
Aggregates.unwind("$item"),
Aggregates.group("$item.itemID,
Accumulators.first("ID", "$ID"),
Accumulators.first("itemName", "$itemName"),
Accumulators.addToSet("item", "$item")
Aggregates.unwind("$items"),
Aggregates.lookup(Store.STORE_NAME, "item.content", "ID", "content"),
Aggregates.unwind("$content"),
Aggregates.project(Projections.fields(Projections.include("store", "items", "content")))
)
But my query now returns zero results instead of the duplicate result.
You can use $first to remove the duplicates.
Suppose I have the following data:
[
{"_id": 1,"item": "ABC","sizes": ["S","M","L"]},
{"_id": 2,"item": "EFG","sizes": []},
{"_id": 3, "item": "IJK","sizes": "M" },
{"_id": 4,"item": "LMN"},
{"_id": 5,"item": "XYZ","sizes": null
}
]
Now, let's aggregate it using $first and $unwind and see the difference:
First let's aggregate it using $first
db.collection.aggregate([
{ $sort: {
item: 1
}
},
{ $group: {
_id: "$item",firstSize: {$first: "$sizes"}}}
])
Output
[
{"_id": "XYZ","firstSize": null},
{"_id": "ABC","firstSize": ["S","M","L" ]},
{"_id": "IJK","firstSize": "M"},
{"_id": "EFG","firstSize": []},
{"_id": "LMN","firstSize": null}
]
Now, Let's aggregate it using $unwind
db.collection.aggregate([
{
$unwind: "$sizes"
}
])
Output
[
{"_id": 1,"item": "ABC","sizes": "S"},
{"_id": 1,"item": "ABC","sizes": "M"},
{"_id": 1,"item": "ABC","sizes": "L},
{"_id": 3,"item": "IJK","sizes": "M"}
]
You can see $first removes the duplicates where as $unwind keeps the duplicates.
Using $unwind and $first together.
db.collection.aggregate([
{ $unwind: "$sizes"},
{
$group: {
_id: "$item",firstSize: {$first: "$sizes"}}
}
])
Output
[
{"_id": "IJK", "firstSize": "M"},
{"_id": "ABC","firstSize": "S"}
]
group then addToSet is an effective way to deal with your problem !
it looks like this in mongoshell
db.sales.aggregate(
[
{
$group:
{
_id: { day: { $dayOfYear: "$date"}, year: { $year: "$date" } },
itemsSold: { $addToSet: "$item" }
}
}
]
)
in scala you can do it like
Aggregates.group("$groupfield", Accumulators.addToSet("fieldName","$expression"))
if you have multiple field to group
Aggregates.group(new BasicDBObject().append("fieldAname","$fieldA").append("fieldBname","$fieldB")), Accumulators.addToSet("fieldName","expression"))
then unwind

MongoDB - Select all documents by the count of an array field

In my current project I have a structure like this:
"squad": {
"members": [
{
"name": "xyz",
"empty": true
},
{
"name": "xyz",
"empty": true
},
{
"name": "xyz",
"empty": true
}
]
}
Now I want to query every squad with mongodb which have at least, lets say 3 empty member slots. I've googled and only found aggregate and $size, which seem to only select an array count not something per field.
Any idea how to do it?
You can try this query :
db.getCollection('collectionName').aggregate([
{$unwind:"$squad.members"},
{$group:{_id:"$_id",count:{$sum:{$cond: [{$eq: ['$squad.members.empty', true]}, 1, 0]}}}},
{$match: {count: {$gte: 3}}}
])
In this query applied conditional sum and then check the count is greater than or equal 3
It will return all documents will empty slots greater than 3
db.squad.aggregate([
{$unwind:"$squad.members"},
{$match:{"squad.members.empty": true}},
{$group:{_id:"$_id",count:{$sum:1}}},
{$match: {count: {$gt: 3}}}
])

counting record types in a mongo aggregation (pymongo)

I am doing the following aggregation using pymongo (users is defined previously as the list of users I want to query for):
pipeline = [
{ '$match': {'user': {'$in': users} },
{ '$group': { '_id': "$user", 'badges': {'$push': '$badge'} } },
]
This gives me the following results:
{u'ok': 1.0,
u'result': [{u'_id': u'user22',
u'badges': [u'gold', u'silver', u'silver']},
{u'_id': u'user2',
u'badges': [u'gold', u'gold']},
{u'_id': u'user15',
u'badges': [u'gold', u'bronze', u'bronze']},
{u'_id': u'user11',
u'badges': [u'gold']},
{u'_id': u'user3',
u'badges': [u'silver', u'bronze']},
{u'_id': u'user18',
u'badges': [u'bronze']}
]
}
This is ok, but what I really want to get is a count per medal type (type=gold/silver/bronze). I can do this easily in post-processing in Python, but I feel like I should be able to do it in the same pipeline and I want to learn "how to mongo better" :)
So to be clear, what I really want is this (I generated this ideal output by hand so there might be an inconsistency w/ the data above or a syntax error, but I think it gets the point across):
{u'ok': 1.0,
u'result': [{u'_id': u'user22',
u'badges': {u'gold': 1, u'silver': 2}},
{u'_id': u'user2',
u'badges': {u'gold': 2}},
{u'_id': u'user15',
u'badges': {u'gold': 1, u'bronze': 2}},
{u'_id': u'user11',
u'badges': {u'gold': 1}},
{u'_id': u'user3',
u'badges': {u'silver': 1, u'bronze': 1}},
{u'_id': u'user18',
u'badges': {u'bronze': 1}}
]
}
My data-structure requirements aren't rigid. I would also be happy with using gold/silver/bronze as the keys and avoiding having the nested dict:
{u'_id': u'user22',
u'gold': 1, u'silver': 2},
{u'_id': u'user2',
u'gold': 2},
...
I tried doing a bunch of things with the $sum operator, but with no luck. When I try to dynamically generate a field name I get:
failed: exception: the group aggregate field name '$badge' cannot be an operator name
Any ideas? Thanks in advance!
(Also, semi-related...I don't know much about map-reduce. Maybe this is a candidate for that. I started using aggregations and they have worked so far for me, til now. I should probably learn about map-reduce too)
What you could rather do other than pushing the badges to an array is conditionally $sum on the badge type. This is generally done by testing an $eq condition inside a $cond operator in order to determine the amount to contribute to the "sum total":
collection.aggregate([
{ "$match": { "user": { "$in": users } } },
{ "$group": {
"_id": "$user",
"gold": {
"$sum": {
"$cond": [
{ "$eq": [ "$badge", "gold" ] },
1,
0
]
}
},
"silver": {
"$sum": {
"$cond": [
{ "$eq": [ "$badge", "silver" ] },
1,
0
]
}
},
"bronze": {
"$sum": {
"$cond": [
{ "$eq": [ "$badge", "bronze" ] },
1,
0
]
}
}
}}
])
That will correctly sum each type through of course there will be a count for "gold/silver/bronze" for each user regardless of whether it is greater than 0 or not. What you cannot do is "dynamically" create fields in the aggregation framework.
If you really need "dynamic" fields then your only option is mapReduce, but of course that will not be as efficient as the aggregation framework. The conditional sum really does give you the best option.