Aggregate MongoDB query within nested arrays over multiple documents - mongodb

I am doing a little hobby project for friends' disc golf results.
I'm fairly new to MongoDB and I've stuck on a query. What I want to do is to return a JSON object on the form [{player: "Name", totalThrows: (sum of Totalt field), total+/-: (sum of +/- field)}].
So in short: I want to aggregate each players total throws (Total field) and total+/-, over 3 different documents in my database.
The final JSON would look like this: [{player: "Tormod", totalThrows: 148, total+/-: 24}, {player: "Martin", totalThrows: 149, total+/-: 25}, {player: "Andreas", totalThrows: 158, total+/-: 34}]
The picture below shows my document as it's saved in MongoDB. The results array consist of the results of a specific player discgolf results. There is also not a set order on which player is first in the results array in the different documents. Each document represents a new round (think of playing 3 different days).
Code:
aggregate(
[
{$unwind: "$results"},
{$group: {
player: "$results.PlayerName",
throws: "$results.Totalt"
}},
{$group: {
_id: "$_id",
totalThrows: {$sum: "$results.Totalt"}
}}
])
https://mongoplayground.net/p/aaKv3aaCeCi

You should pass an _id field (the group by expression) to the $group stage and use $sum to sum the Totalt and +/- values. Since the values are strings, you should convert them to integers using $toInt before summing them. Finally, you can project the values based on the desired structure.
Btw, it looks like most of the numeric fields are stored as strings in the documents. I would recommend you update the documents converting them all to numbers and make sure only numbers are added to the numeric fields in the new documents.
db.collection.aggregate([
{
$unwind: "$results",
},
{
$group: {
_id: "$results.PlayerName",
totalThrows: {
$sum: {
$toInt: "$results.Totalt",
},
},
"total+/-": {
$sum: {
$toInt: "$results.+/-",
},
},
},
},
{
$project: {
_id: 0,
player: "$_id",
totalThrows: "$totalThrows",
"total+/-": "$total+/-",
},
},
])
MongoPlayground

Related

Limit number of objects pushed to array in MongoDB aggregation

I've been trying to find a way to limit the number of objects i'm pushing to arrays I'm creating while using "aggregate" on a MongoDB collection.
I have a collection of students - each has these relevant keys:
class number it takes this semester (only one value),
percentile in class (exists if is enrolled in class, null if not),
current score in class (> 0 if enrolled in class, else - 0),
total average (GPA),
max grade
I need to group all students who never failed, per class, in one array that contains those with a GPA higher than 80, and another array containing those without this GPA, sorted by their score in this specific class.
This is my query:
db.getCollection("students").aggregate([
{"$match": {
"class_number":
{"$in": [49, 50, 16]},
"grades.curr_class.percentile":
{"$exists": true},
"grades.min": {"$gte": 80},
}},
{"$sort": {"grades.curr_class.score": -1}},
{"$group": {"_id": "$class_number",
"studentsWithHighGPA":
{"$push":
{"$cond": [{"$gte": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
}
},
"studentsWithoutHighGPA":
{"$push":
{"$cond": [{"$lt": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
},
},
},
},
])
What i'm trying to do is limit the number of students in each of these arrays. I only want the top 16 in each array, but i'm not sure how to approach this.
Thanks in advance!
I've tried using limit in different variations, and slice too, but none seem to work.
Since mongoDb version 5.0, one option is to use $setWindowFields for this, and in particular, its $rank option. This will allow to keep only the relevant students and limit their count even before the $group step:
$match only relevant students as suggested by the OP
$set the groupId for the setWindowFields (as it can currently partition by one key only
$setWindowFields to define the rank of each student in their array
$match only students with the wanted rank
$group by class_number as suggested by the OP:
db.collection.aggregate([
{$match: {
class_number: {$in: [49, 50, 16]},
"grades.curr_class.percentile": {$exists: true},
"grades.min": {$gte: 80}
}},
{$set: {
groupId: {$concat: [
{$toString: "$class_number"},
{$toString: {$toBool: {$gte: ["$grades.gpa", 80]}}}
]}
}},
{$setWindowFields: {
partitionBy: "$groupId",
sortBy: {"grades.curr_class.score": -1},
output: {rank: {$rank: {}}}
}},
{$match: {rank: {$lte: rankLimit}}},
{$group: {
_id: "$class_number",
studentsWithHighGPA: {$push: {
$cond: [{$gte: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}},
studentsWithoutHighGPA: {$push: {
$cond: [{$lt: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}}
}}
])
See how it works on the playground example
*This solution will limit the rank of the students, so there is an edge case of more than n students in the array (In case there are multiple students with the exact rank of n). it can be simply solved by adding a $slice step
Maybe MongoDB $facets are a solution. You can specify different output pipelines in one aggregation call.
Something like this:
const pipeline = [
{
'$facet': {
'studentsWithHighGPA': [
{ '$match': { 'grade': { '$gte': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
],
'studentsWithoutHighGPA': [
{ '$match': { 'grade': { '$lt': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
]
}
}
];
coll.aggregate(pipeline)
This should end up with one document including two arrays.
studentsWithHighGPA (array)
0 (object)
1 (object)
...
studentsWithoutHighGPA (array)
0 (object)
1 (object)
See each facet as an aggregation pipeline on its own. So you can also include $group to group by classes or something else.
https://www.mongodb.com/docs/manual/reference/operator/aggregation/facet/
I don't think there is a mongodb-provided operator to apply a limit inside of a $group stage.
You could use $accumulator, but that requires server-side scripting to be enabled, and may have performance impact.
Limiting studentsWithHighGPA to 16 throughout the grouping might look something like:
"studentsWithHighGPA": {
"$accumulator": {
init: "function(){
return {combined:[]};
}",
accumulate: "function(state, id, score){
if (score >= 80) {
state.combined.push({_id:id, score:score})
};
return {combined:state.combined.slice(0,16)}
}",
accumulateArgs: [ "$_id", "$grades.gpa"],
merge: "function(A,B){
return {combined:
A.combined.concat(B.combined).sort(
function(SA,SB){
return (SB.score - SA.score)
})
}
}",
finalize: "function(s){
return s.combined.slice(0,16).map(function(A){
return {_id:A._id}
})
}",
lang: "js"
}
}
Note that the score is also carried through until the very end so that partial result sets from different shards can be combined properly.

How to access overall document count during arithmetic aggregation expression

I have a collection of documents in this format:
{
_id: ObjectId,
items: [
{
defindex: number,
...
},
...
]
}
Certain parts of the schema not relevant are omitted, and each item defindex within the items array is guaranteed to be unique for that array. The same defindex can occur in different documents' items fields, but will only occur once in each respective array if present.
I currently call $unwind upon the items field, followed by $sortByCount upon items.defindex to get a sorted list of items with the highest count.
I now want to add a new field to this final sorted list using $set called usage, that shows the item's usage as a percentage of the initial number of total documents in the collection.
(i.e. if the item's count is 1300, and the overall document count pre-$unwind was 2600, the usage value will be 0.5)
My initial plan was to use $facet upon the initial collection, creating a document as so:
{
total: number (achieved using $count),
documents: [{...}] (achieved using an empty $set)
}
And then calling $unwind on the documents field to add the total document count to each document. Calculating the usage value is then trivial using $set, since the total count is a field in the document itself.
This approach ran into memory issues though, since my collection is far larger than the 16MB limit.
How would I solve this?
One way to do it is use $setWindowFields:
db.collection.aggregate([
{
$setWindowFields: {
output: {
totalCount: {$count: {}}
}
}
},
{
$unwind: "$items"
},
{
$group: {
_id: "$items.defindex",
count: {$sum: 1},
totalCount: {$first: "$totalCount"}
}
},
{
$project: {
count: 1,
usage: {$divide: ["$count", "$totalCount"]
}
}
},
{$sort: {count: -1}}
])
As you can see here

How to group documents of a collection to a map with unique field values as key and count of documents as mapped value in mongodb?

I need a mongodb query to get the list or map of values with unique value of the field(f) as the key in the collection and count of documents having the same value in the field(f) as the mapped value. How can I achieve this ?
Example:
Document1: {"id":"1","name":"n1","city":"c1"}
Document2: {"id":"2","name":"n2","city":"c2"}
Document3: {"id":"3","name":"n1","city":"c3"}
Document4: {"id":"4","name":"n1","city":"c5"}
Document5: {"id":"5","name":"n2","city":"c2"}
Document6: {"id":"6,""name":"n1","city":"c8"}
Document7: {"id":"7","name":"n3","city":"c9"}
Document8: {"id":"8","name":"n2","city":"c6"}
Query result should be something like this if group by field is "name":
{"n1":"4",
"n2":"3",
"n3":"1"}
It would be nice if the list is also sorted in the descending order.
It's worth noting, using data points as field names (keys) is somewhat considered an anti-pattern and makes tooling difficult. Nonetheless if you insist on having data points as field names you can use this complicated aggregation to perform the query output you desire...
Aggregation
db.collection.aggregate([
{
$group: { _id: "$name", "count": { "$sum": 1} }
},
{
$sort: { "count": -1 }
},
{
$group: { _id: null, "values": { "$push": { "name": "$_id", "count": "$count" } } }
},
{
$project:
{
_id: 0,
results:
{
$arrayToObject:
{
$map:
{
input: "$values",
as: "pair",
in: ["$$pair.name", "$$pair.count"]
}
}
}
}
},
{
$replaceRoot: { newRoot: "$results" }
}
])
Aggregation Explanation
This is a 5 stage aggregation consisting of the following...
$group - get the count of the data as required by name.
$sort - sort the results with count descending.
$group - place results into an array for the next stage.
$project - use the $arrayToObject and $map to pivot the data such
that a data point can be a field name.
$replaceRoot - make results the top level fields.
Sample Results
{ "n1" : 4, "n2" : 3, "n3" : 1 }
For whatever reason, you show desired results having count as a string, but my results show the count as an integer. I assume that is not an issue, and may actually be preferred.

MongoDB aggregation pipeline: counting occurrences of words in list field from matching documents?

Here's a simplified example of what I'm trying to do. My documents all have various things and a keywords field with a list of strings as values. (The lists can contain duplicates, which are significant.) Suppose the following documents match the query:
{'original_id': 33, 'keywords': ['dog', 'cat', 'goat', 'dog']},
{'original_id': 34, 'keywords': ['dog', 'kitten', 'goat', 'moose']},
{'original_id': 35, 'keywords': ['moose', 'elk']}
I want to get back a map of the keywords found with the number of occurrences of each in the set of matching documents:
{'dog': 3, 'cat': 1, 'goat':2, 'kitten': 1, 'moose': 2, 'elk': 1}
(Note that dog in document 33 gets counted twice.)
I'm currently doing this from PyMongo by creating a Counter, calling collection_name.find(...) and then iterating through all the documents updating the counter with each keywords field. But I would like to make the process more efficient by doing it within MongoDB.
Is this kind of counting possible in an aggregation pipeline? If so, how?
$unwind deconstruct keywords array
$group by keywords and count total
$group by null and construct array of key-value pair
$arrayToObject convert above array to object key-value format
$replaceRoot to replace above converted object to root
db.collection.aggregate([
{ $unwind: "$keywords"c },
{
$group: {
_id: "$keywords",
count: { $sum: 1 }
}
},
{
$group: {
_id: null,
keywords: {
$push: {
k: "$_id",
v: "$count"
}
}
}
},
{ $replaceRoot: { newRoot: { $arrayToObject: "$keywords" } } }
])
Playground

Is it possible to use` $sum` in the `$match` stage of mongo aggregation and how?

I have a gifts collection in mongodb with four items inside it. how do I query the db so that I get only gifts that the sum of their amount is less-than-or-equal-to 5500?
so for example from these four gifts in db:
{
"_id": 1,
"amount": 3000,
},
{
"_id": 2,
"amount": 2000,
},
{
"_id": 3,
"amount": 1000,
},
{
"_id": 4,
"amount": 5000,
}
The query should return the first two only:
{
"_id": 1,
"amount": 3000,
},
{
"_id": 1,
"amount": 2000,
},
I think I should use mongo aggregation? if so, what is the syntax?
I had some googling, I know how to use $sum in the $group stage, but I don't know how to use it in the $match stage. is it event possible to do so?
P.S: I assumend I should use $sum in $match, Am I supposed to group them first? if so, how do I tell mongo to make a group where the sum of amounts in that group is less-than-or-equal-to 5500?
Thanks for any help you are able to provide.
You're going the right way.
First store your $sum in a variable then filter them with $match:
db.gifts.aggregate([
{$match: {}}, // Initial query
{$group: {
_id: '$code', // Assume your gift could be grouped by a unique code
sum: {$sum: '$amount'}, // Sum all amount per group
items: {$push: '$$ROOT'} // Push all gift item to an array
}},
{$match: {sum: {$lte: 5500}}}, // Filter group where sum <= 5500
{$unwind: '$items'}, // Unwind items array to get all match field
{$replaceRoot: {newRoot: '$items'}} // Use this stage to get back the original items
])