How to get best 5 results in $group method in mongodb? - mongodb

On production server I use mongodb 4.4
I have a query that works well
db.step_tournaments_results.aggregate([
{ "$match": { "tournament_id": "6377f2f96174982ef89c48d2" } },
{ "$sort": { "total_points": -1, "time_spent": 1 } },
{
$group: {
_id: "$club_name",
'total_points': { $sum: "$total_points"},
'time_spent': { $sum: "$time_spent"}
},
},
])
But the problem is in $group operator, because it sums all the points of every group for total_points, but I need only best 5 of every group. How to achieve that?

Query
like your query, match and sort
on group instead of sum, gather all members inside one array
(i collected the $ROOT but you can collect only the 2 fields you need inside a {}, if the documents have many fields)
take the first 5 of them
take the 2 sums you need from the first 5
remove the temp fields
*with mongodb 6, you can do this in the group, without need to collect th members in an array, in mongodb 5 you can also do those with window-fields without group, but for mongodb 4.4 i think this is a way to do it
aggregate(
[{"$match": {"tournament_id": {"$eq": "6377f2f96174982ef89c48d2"}}},
{"$sort": {"total_points": -1, "time_spent": 1}},
{"$group": {"_id": "$club_name", "group-members": {"$push": "$$ROOT"}}},
{"$set":
{"first-five": {"$slice": ["$group-members", 5]},
"group-members": "$$REMOVE"}},
{"$set":
{"total_points": {"$sum": "$first-five.total_points"},
"time_spent": {"$sum": "$first-five.time_spent"},
"first-five": "$$REMOVE"}}])

Related

Limit number of objects pushed to array in MongoDB aggregation

I've been trying to find a way to limit the number of objects i'm pushing to arrays I'm creating while using "aggregate" on a MongoDB collection.
I have a collection of students - each has these relevant keys:
class number it takes this semester (only one value),
percentile in class (exists if is enrolled in class, null if not),
current score in class (> 0 if enrolled in class, else - 0),
total average (GPA),
max grade
I need to group all students who never failed, per class, in one array that contains those with a GPA higher than 80, and another array containing those without this GPA, sorted by their score in this specific class.
This is my query:
db.getCollection("students").aggregate([
{"$match": {
"class_number":
{"$in": [49, 50, 16]},
"grades.curr_class.percentile":
{"$exists": true},
"grades.min": {"$gte": 80},
}},
{"$sort": {"grades.curr_class.score": -1}},
{"$group": {"_id": "$class_number",
"studentsWithHighGPA":
{"$push":
{"$cond": [{"$gte": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
}
},
"studentsWithoutHighGPA":
{"$push":
{"$cond": [{"$lt": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
},
},
},
},
])
What i'm trying to do is limit the number of students in each of these arrays. I only want the top 16 in each array, but i'm not sure how to approach this.
Thanks in advance!
I've tried using limit in different variations, and slice too, but none seem to work.
Since mongoDb version 5.0, one option is to use $setWindowFields for this, and in particular, its $rank option. This will allow to keep only the relevant students and limit their count even before the $group step:
$match only relevant students as suggested by the OP
$set the groupId for the setWindowFields (as it can currently partition by one key only
$setWindowFields to define the rank of each student in their array
$match only students with the wanted rank
$group by class_number as suggested by the OP:
db.collection.aggregate([
{$match: {
class_number: {$in: [49, 50, 16]},
"grades.curr_class.percentile": {$exists: true},
"grades.min": {$gte: 80}
}},
{$set: {
groupId: {$concat: [
{$toString: "$class_number"},
{$toString: {$toBool: {$gte: ["$grades.gpa", 80]}}}
]}
}},
{$setWindowFields: {
partitionBy: "$groupId",
sortBy: {"grades.curr_class.score": -1},
output: {rank: {$rank: {}}}
}},
{$match: {rank: {$lte: rankLimit}}},
{$group: {
_id: "$class_number",
studentsWithHighGPA: {$push: {
$cond: [{$gte: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}},
studentsWithoutHighGPA: {$push: {
$cond: [{$lt: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}}
}}
])
See how it works on the playground example
*This solution will limit the rank of the students, so there is an edge case of more than n students in the array (In case there are multiple students with the exact rank of n). it can be simply solved by adding a $slice step
Maybe MongoDB $facets are a solution. You can specify different output pipelines in one aggregation call.
Something like this:
const pipeline = [
{
'$facet': {
'studentsWithHighGPA': [
{ '$match': { 'grade': { '$gte': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
],
'studentsWithoutHighGPA': [
{ '$match': { 'grade': { '$lt': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
]
}
}
];
coll.aggregate(pipeline)
This should end up with one document including two arrays.
studentsWithHighGPA (array)
0 (object)
1 (object)
...
studentsWithoutHighGPA (array)
0 (object)
1 (object)
See each facet as an aggregation pipeline on its own. So you can also include $group to group by classes or something else.
https://www.mongodb.com/docs/manual/reference/operator/aggregation/facet/
I don't think there is a mongodb-provided operator to apply a limit inside of a $group stage.
You could use $accumulator, but that requires server-side scripting to be enabled, and may have performance impact.
Limiting studentsWithHighGPA to 16 throughout the grouping might look something like:
"studentsWithHighGPA": {
"$accumulator": {
init: "function(){
return {combined:[]};
}",
accumulate: "function(state, id, score){
if (score >= 80) {
state.combined.push({_id:id, score:score})
};
return {combined:state.combined.slice(0,16)}
}",
accumulateArgs: [ "$_id", "$grades.gpa"],
merge: "function(A,B){
return {combined:
A.combined.concat(B.combined).sort(
function(SA,SB){
return (SB.score - SA.score)
})
}
}",
finalize: "function(s){
return s.combined.slice(0,16).map(function(A){
return {_id:A._id}
})
}",
lang: "js"
}
}
Note that the score is also carried through until the very end so that partial result sets from different shards can be combined properly.

How to merge multiple documents in MongoDB and convert fields' values into fields

I have a MongoDB collection that I have managed to process using an aggregation pipeline to produce the following result:
[
{
_id: 'Complex Numbers',
count: 2
},
{ _id: 'Calculus',
count: 1
}
]
But the result that I am aiming for is something like the following:
{
'Complex Numbers': 2,
'Calculus': 1
}
is there a way to achieve that?
Query
to convert to {} we need somethings like [[k1 v1] ...] OR [{"k" "..." :v "..."}]
first stage
converts each document to [{"k" ".." , "v" ".."}]
then arrayToObject
and replace root
so we have each document like "Complex Numbers": 2
the group is used to combine all those documents in 1 document
and then replace the root with that one document
Test code here
aggregate(
[{"$replaceRoot":
{"newRoot": {"$arrayToObject": [[{"k": "$_id", "v": "$count"}]]}}},
{"$group": {"_id": null, "data": {"$mergeObjects": "$$ROOT"}}},
{"$replaceRoot": {"newRoot": "$data"}}])

Is it possible to use` $sum` in the `$match` stage of mongo aggregation and how?

I have a gifts collection in mongodb with four items inside it. how do I query the db so that I get only gifts that the sum of their amount is less-than-or-equal-to 5500?
so for example from these four gifts in db:
{
"_id": 1,
"amount": 3000,
},
{
"_id": 2,
"amount": 2000,
},
{
"_id": 3,
"amount": 1000,
},
{
"_id": 4,
"amount": 5000,
}
The query should return the first two only:
{
"_id": 1,
"amount": 3000,
},
{
"_id": 1,
"amount": 2000,
},
I think I should use mongo aggregation? if so, what is the syntax?
I had some googling, I know how to use $sum in the $group stage, but I don't know how to use it in the $match stage. is it event possible to do so?
P.S: I assumend I should use $sum in $match, Am I supposed to group them first? if so, how do I tell mongo to make a group where the sum of amounts in that group is less-than-or-equal-to 5500?
Thanks for any help you are able to provide.
You're going the right way.
First store your $sum in a variable then filter them with $match:
db.gifts.aggregate([
{$match: {}}, // Initial query
{$group: {
_id: '$code', // Assume your gift could be grouped by a unique code
sum: {$sum: '$amount'}, // Sum all amount per group
items: {$push: '$$ROOT'} // Push all gift item to an array
}},
{$match: {sum: {$lte: 5500}}}, // Filter group where sum <= 5500
{$unwind: '$items'}, // Unwind items array to get all match field
{$replaceRoot: {newRoot: '$items'}} // Use this stage to get back the original items
])

Mongodb aggregation taking more than 15 seconds

I have more than 100k records in my collections, and for every 5 seconds it will add a record into collection. I have a aggregate query to get 720(approx) records from last one year data.
The aggregate query:
db.collectionName.aggregate([
{"$match": {
"Id": "****-id-****",
"receivedDate": {
"$gte": ISODate("2016-06-26T18:30:00.463Z"),
"$lt": ISODate("2017-06-26T18:30:00.463Z")
}
}
},
{"$group": {
"_id": {
"$add": [
{"$subtract": [
{"$subtract": ["$receivedDate", ISODate("1970-01-01T00:00:00.000Z")]},
{"$mod": [
{"$subtract": ["$receivedDate", ISODate("1970-01-01T00:00:00.000Z")]},
43200000
]}
]},
ISODate("1970-01-01T00:00:00.000Z")
]
},
"_rid": {"$first": "$_id"},
"_data": {"$first": "$receivedData.data"},
"count": {"$sum": 1}
}
},
{"$sort": {"_id": -1}},
{"$project": {
"_id": "$_rid",
"receivedDate": "$_id",
"receivedData": {"data": "$_data"}
}
}
])
I am not sure why its taking more than 15 seconds, when I try to get data for 1 month it is working fine.
Its too late to answer this question, This would be helpful for others,
Might be the compound index can help in this situation, Compound indexes can support queries that match on multiple fields.
You can create compound index on Id and receivedDate fields,
db.collectionName.createIndex({ Id: -1, receivedDate: -1 });
The order of the fields listed in a compound index is important. The index will contain references to documents sorted first by the values of the Id field and, within each value of the Id field, sorted by values of the receivedDate field.

Mongodb aggregation $group followed by $limit for pagination

In MongoDB aggregation pipeline, record flow from stage to stage happens one/batch at a time (or) will wait for the current stage to complete for whole collection before passing it to next stage?
For e.g., I have a collection classtest with following sample records
{name: "Person1", marks: 20}
{name: "Person2", marks: 20}
{name: "Person1", marks: 20}
I have total 1000 records for about 100 students and I have following aggregate query
db.classtest.aggregate(
[
{$sort: {name: 1}},
{$group: {_id: '$name',
total: {$sum: '$marks'}}},
{$limit: 5}
])
I have following questions.
The sort order is lost in final results. If I place another sort after $group, then results are sorted properly. Does that mean $group not maintains the previous sort order?
I would like to limit the results to 5. Does group operation has to be completely done (for all 1000 records) before passing to the limit. (or) The group operation passes the records to limit stage as and when it has record and stops processing when the requirement for limit stage is met?
My actual idea is to do pagination on results of aggregate. In above scenario, if $group maintains sort order and processes only required number of records, I want to apply $match condition {$ge: 'lastPersonName'} in subsequent page queries.
I do not want to apply $limit before $group as I want results for 5 students not first 5 records.
I may not want to use $skip as that means effectively traversing those many records.
I have solved the problem without need of maintaining another collection or even without $group traversing whole collection, hence posting my own answer.
As others have pointed:
$group doesn't retain order, hence early sorting is not of much help.
$group doesn't do any optimization, even if there is a following $limit, i.e., runs $group on entire collection.
My usecase has following unique features, which helped me to solve it:
There will be maximum of 10 records per each student (minimum of 1).
I am not very particular on page size. The front-end capable of handling varying page sizes.
The following is the aggregation command I have used.
db.classtest.aggregate(
[
{$sort: {name: 1}},
{$limit: 5 * 10},
{$group: {_id: '$name',
total: {$sum: '$marks'}}},
{$sort: {_id: 1}}
])
Explaining the above.
if $sort immediately precedes $limit, the framework optimizes the amount of data to be sent to next stage. Refer here
To get a minimum of 5 records (page size), I need to pass at least 5 (page size) * 10 (max records per student) = 50 records to the $group stage. With this, the size of final result may be anywhere between 0 and 50.
If the result is less than 5, then there is no further pagination required.
If the result size is greater than 5, there may be chance that last student record is not completely processed (i.e., not grouped all the records of student), hence I discard the last record from the result.
Then name in last record (among retained results) is used as $match criteria in subsequent page request as shown below.
db.classtest.aggregate(
[
{$match: {name: {$gt: lastRecordName}}}
{$sort: {name: 1}},
{$limit: 5 * 10},
{$group: {_id: '$name',
total: {$sum: '$marks'}}},
{$sort: {_id: 1}}
])
In above, the framework will still optimize $match, $sort and $limit together as single operation, which I have confirmed through explain plan.
The first few things to consider here is that the aggregation framework works with a "pipeline" of stages to be applied in order to get a result. If you are familiar with processing things on the "command line" or "shell" of your operating system, then you might have some experience with the "pipe" or | operator.
Here is a common unix idiom:
ps -ef | grep mongod | tee "out.txt"
In this case the output of the first command here ps -ef is being "piped" to the next command grep mongod which in turn has it's output "piped" to the tee out.txt which both outputs to terminal as well as the specified file name. This is a "pipeline" wher each stage "feeds" the next, and in "order" of the sequence they are written in.
The same is true of the aggregation pipeline. A "pipeline" here is in fact an "array", which is an ordered set of instructions to be passed in processing the data to a result.
db.classtest.aggregate([
{ "$group": {
"_id": "$name",
"total": { "$sum": "$marks"}
}},
{ "$sort": { "name": 1 } },
{ "$limit": 5 }
])
So what happens here is that all of the items in the collection are first processed by $group to get their totals. There is no specified "order" to grouping so there is not much sense in pre-ordering the data. Neither is there any point in doing so because you are yet to get to your later stages.
Then you would $sort the results and also $limit as required.
For your next "page" of data you will want ideally $match on the last unique name found, like so:
db.classtest.aggregate([
{ "$match": { "name": { "$gt": lastNameFound } }},
{ "$group": {
"_id": "$name",
"total": { "$sum": "$marks"}
}},
{ "$sort": { "name": 1 } },
{ "$limit": 5 }
])
It's not the best solution, but there really are not alternatives for this type of grouping. It will however notably get "faster" with each iteration towards the end. Alternately, storing all the unqiue names ( or reading that out of another collection ) and "paging" through that list with a "range query" on each aggregation statement may be a viable option, if your data permits it.
Something like:
db.classtest.aggregate([
{ "$match": { "name": { "$gte": "Allan", "$lte": "David" } }},
{ "$group": {
"_id": "$name",
"total": { "$sum": "$marks"}
}},
{ "$sort": { "name": 1 } },
])
Unfortunately there is not a "limit grouping up until x results" option, so unless you can work with another list, then you are basically grouping up everything ( and possibly a a gradually smaller set each time ) with each aggregation query you send.
"$group does not order its output documents." See http://docs.mongodb.org/manual/reference/operator/aggregation/group/
$limit limits the number of processed elements of an immediately preceding $sort operation, not only the number of elements passed to the next stage. See the note at http://docs.mongodb.org/manual/reference/operator/aggregation/limit/
For the very first question you asked, I am not sure, but it appears (see 1.) that a stage n+1 can influence the behaviour of stage n : the limit will limit the sort operation to its first n elements, and the sort operation will not complete just as if the following limit stage did not exist.
pagination on group data mongodb -
in $group items you can't directly apply pagination, but below trick will be used ,
if you want pagination on group data -
for example- i want group products categoryWise and then i want only 5 product per category then
step 1 - write aggregation on product table, and write groupBY
{ $group: { _id: '$prdCategoryId', products: { $push: '$$ROOT' } } },
step 2 - prdSkip for skipping , and limit for limiting data , pass it
dynamically
{
$project: {
// pagination for products
products: {
$slice: ['$products', prdSkip, prdLimit],
}
}
},
finally query looks like -
params - limit , skip - for category pagination
and prdSkip and PrdLimit for products pagination
db.products.aggregate([
{ $group: { _id: '$prdCategoryId', products: { $push: '$$ROOT' } } },
{
$lookup: {
from: 'categories',
localField: '_id',
foreignField: '_id',
as: 'categoryProducts',
},
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [{ $arrayElemAt: ['$categoryProducts', 0] }, '$$ROOT'],
},
},
},
{
$project: {
// pagination for products
products: {
$slice: ['$products', prdSkip, prdLimit],
},
_id: 1,
catName: 1,
catDescription: 1,
},
},
])
.limit(limit) // pagination for category
.skip(skip);
I used replaceRoot here to pullOut category.