MongoDB - Perform aggregations on bucketed data - mongodb

I have a collection of nested documents, divided into buckets belonging to a single business id.
To illustrate, the following represents a document related to an invoice from business n. 1022 in which 10 roses, 20 oranges and 15 apples were sold:
sample_doc = {
'business_id': '32044',
'dt_op': Timestamp('2018-10-02 12:16:12'),
'transactions': [
{'Product name': 'Rose', "Quantity": 10},
{'Product name': 'Orange', "Quantity": 20},
{'Product name': 'Apple', "Quantity": 15}
]
}
I would like to get the total number of sales (sum of 'Quantity') for each product ('Product name') within a defined 'business_id'.
I tried, using Compass, to:
# Stage 1: $match
{
business_id: "1022"
}
#Stage 2: $group
{
_id: "$transactions.Product name",
TotalSum: {
$sum: "transactions.Quantity"
}
}
But a nested list of documents is returned, without performing sums.
How can I correctly perform the aggregation pipeline to get the total number of sales (sum of 'Quantity') for each product ('Product name') within a defined 'business_id'?

You are very close, all you're missing is a single $unwind before the $group stage:
db.collection.aggregate([
{
$match: {
business_id: "1022"
}
},
{
$unwind: "$transactions"
},
{
$group: {
_id: "$transactions.Product name",
TotalSum: {
$sum: "$transactions.Quantity"
}
}
}
])
Mongo Playground

Related

Limit number of objects pushed to array in MongoDB aggregation

I've been trying to find a way to limit the number of objects i'm pushing to arrays I'm creating while using "aggregate" on a MongoDB collection.
I have a collection of students - each has these relevant keys:
class number it takes this semester (only one value),
percentile in class (exists if is enrolled in class, null if not),
current score in class (> 0 if enrolled in class, else - 0),
total average (GPA),
max grade
I need to group all students who never failed, per class, in one array that contains those with a GPA higher than 80, and another array containing those without this GPA, sorted by their score in this specific class.
This is my query:
db.getCollection("students").aggregate([
{"$match": {
"class_number":
{"$in": [49, 50, 16]},
"grades.curr_class.percentile":
{"$exists": true},
"grades.min": {"$gte": 80},
}},
{"$sort": {"grades.curr_class.score": -1}},
{"$group": {"_id": "$class_number",
"studentsWithHighGPA":
{"$push":
{"$cond": [{"$gte": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
}
},
"studentsWithoutHighGPA":
{"$push":
{"$cond": [{"$lt": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
},
},
},
},
])
What i'm trying to do is limit the number of students in each of these arrays. I only want the top 16 in each array, but i'm not sure how to approach this.
Thanks in advance!
I've tried using limit in different variations, and slice too, but none seem to work.
Since mongoDb version 5.0, one option is to use $setWindowFields for this, and in particular, its $rank option. This will allow to keep only the relevant students and limit their count even before the $group step:
$match only relevant students as suggested by the OP
$set the groupId for the setWindowFields (as it can currently partition by one key only
$setWindowFields to define the rank of each student in their array
$match only students with the wanted rank
$group by class_number as suggested by the OP:
db.collection.aggregate([
{$match: {
class_number: {$in: [49, 50, 16]},
"grades.curr_class.percentile": {$exists: true},
"grades.min": {$gte: 80}
}},
{$set: {
groupId: {$concat: [
{$toString: "$class_number"},
{$toString: {$toBool: {$gte: ["$grades.gpa", 80]}}}
]}
}},
{$setWindowFields: {
partitionBy: "$groupId",
sortBy: {"grades.curr_class.score": -1},
output: {rank: {$rank: {}}}
}},
{$match: {rank: {$lte: rankLimit}}},
{$group: {
_id: "$class_number",
studentsWithHighGPA: {$push: {
$cond: [{$gte: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}},
studentsWithoutHighGPA: {$push: {
$cond: [{$lt: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}}
}}
])
See how it works on the playground example
*This solution will limit the rank of the students, so there is an edge case of more than n students in the array (In case there are multiple students with the exact rank of n). it can be simply solved by adding a $slice step
Maybe MongoDB $facets are a solution. You can specify different output pipelines in one aggregation call.
Something like this:
const pipeline = [
{
'$facet': {
'studentsWithHighGPA': [
{ '$match': { 'grade': { '$gte': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
],
'studentsWithoutHighGPA': [
{ '$match': { 'grade': { '$lt': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
]
}
}
];
coll.aggregate(pipeline)
This should end up with one document including two arrays.
studentsWithHighGPA (array)
0 (object)
1 (object)
...
studentsWithoutHighGPA (array)
0 (object)
1 (object)
See each facet as an aggregation pipeline on its own. So you can also include $group to group by classes or something else.
https://www.mongodb.com/docs/manual/reference/operator/aggregation/facet/
I don't think there is a mongodb-provided operator to apply a limit inside of a $group stage.
You could use $accumulator, but that requires server-side scripting to be enabled, and may have performance impact.
Limiting studentsWithHighGPA to 16 throughout the grouping might look something like:
"studentsWithHighGPA": {
"$accumulator": {
init: "function(){
return {combined:[]};
}",
accumulate: "function(state, id, score){
if (score >= 80) {
state.combined.push({_id:id, score:score})
};
return {combined:state.combined.slice(0,16)}
}",
accumulateArgs: [ "$_id", "$grades.gpa"],
merge: "function(A,B){
return {combined:
A.combined.concat(B.combined).sort(
function(SA,SB){
return (SB.score - SA.score)
})
}
}",
finalize: "function(s){
return s.combined.slice(0,16).map(function(A){
return {_id:A._id}
})
}",
lang: "js"
}
}
Note that the score is also carried through until the very end so that partial result sets from different shards can be combined properly.

MongoDB - How to use $bucketAuto aggregation where the buckets are grouped by another property

I need to create an aggregation pipeline that return price ranges for each product category.
What I need to avoid is to load all available categories and call the Database again, one by one with a $match on each category. There must be a better way to do it.
Product documents
{
Price: 500,
Category: 'A'
},
{
Price: 7500,
Category: 'A'
},
{
Price: 340,
Category: 'B'
},
{
Price: 60,
Category: 'B'
}
Now I could use a $group stage to group the prices into an array by their category.
{
_id: "$Category",
Prices: {
$addToSet: "$Price"
}
}
Which would result in
{
_id: 'A',
Prices: [500, 7500]
},
{
_id: 'B',
Prices: [340, 60]
}
But If I use $bucketAuto stage after this, I am unable to groupBy multiple properties. Meaning it would not take the categories into account.
I have tried the following
{
groupBy: "$Prices",
buckets: 5,
output: {
Count: { $sum: 1}
}
}
This does not take categories into account, but I need the generated buckets to be organised by category. Either having the category field within the _id as well or have it as another field and have 5 buckets for each distinct category:
{
_id: {min: 500, max: 7500, category: 'A'},
Count: 2
},
{
_id: {min: 60, max: 340, category: 'B'},
Count: 2
}...
Query1
if you want to group by category and find the max and min price for that category you can do it like this
Playmongo
aggregate(
[{"$group":
{"_id": "$Category",
"min-price": {"$min": "$Price"},
"max-price": {"$max": "$Price"}}}])
Query2
if you want to group by category and then apply the bucket inside the array of the prices, to create like 5 buckets like in your example
you can do it with a trick, that allows us to use stage operators to do operators inside the array
the trick is to have 1 extra collection with only 1 document [{}]
you do lookup, you unwind that array, you do what you want on it
here we unwind the array and do $bucketAuto on it, with 5 buckets, like in your example, this way we can have group by category, and the prices in 5 ranges (5 buckets)
Playmongo
aggregate(
[{"$group": {"_id": "$Category", "prices": {"$push": "$Price"}}},
{"$lookup":
{"from": "coll_with_1_empty_doc",
"pipeline":
[{"$set": {"prices": "$$prices"}}, {"$unwind": "$prices"},
{"$bucketAuto": {"groupBy": "$prices", "buckets": 5}}],
"as": "bucket-prices",
"let": {"prices": "$prices", "category": "$_id"}}}])
If none of the above works, if you can give sample documents and example output

How to calculate percentage using MongoDB aggregation

I want to calculate percentage of with help of mongoDB aggregation,
My collection has following data.
subject_id
gender
other_data
1
Male
XYZ
1
Male
ABC
1
Male
LMN
2
Female
TBZ
3
Female
NDA
4
Unknown
UJY
I want output something like this:
[{
gender: 'Male',
total: 1,
percentage: 25.0
},{
gender: 'Female',
total: 2,
percentage: 50.0
},{
gender: 'Unknown',
total: 1,
percentage: 25.0
}]
I have tried various methods but none of them works, mainly unable to count total of Male, Female, Unknown summation(to calculate percentage). The trickiest part is there are only 4 members in above example but their subject_id may be repeated according to other_data
Thanks in Advance.
You can use this aggregation query:
First group by subject_id to get the different values (different persons).
Then use $facet to create "two ways". One to use $count and get the total number of docs, and other to get the documents grouped by gender.
Then with all desired values (grouped by gender and total docs) get the first element of the result from nDocs into $facet stage. $facet will generate an array and the value we want will be in the first position.
Later use $unwind to get every groupValue with the nDoc value
And last output the values you want using $project. To get the percentage you can $divide total/nDocs and $multiply by 100.
db.collection.aggregate([
{
"$group": {
"_id": "$subject_id",
"gender": {
"$first": "$gender"
}
}
},
{
"$facet": {
"nDocs": [
{
"$count": "nDocs"
},
],
"groupValues": [
{
"$group": {
"_id": "$gender",
"total": {
"$sum": 1
}
}
},
]
}
},
{
"$addFields": {
"nDocs": {
"$arrayElemAt": [
"$nDocs",
0
]
}
}
},
{
"$unwind": "$groupValues"
},
{
"$project": {
"_id": 0,
"gender": "$groupValues._id",
"total": "$groupValues.total",
"percentage": {
"$multiply": [
{
"$divide": [
"$groupValues.total",
"$nDocs.nDocs"
]
},
100
]
}
}
}
])
Example here

MongoDB - Obtain full document of a group taking into account the minimum value of one property

Good afternoon, I'm starting in MongoDB and I have a doubt with the group aggregation.
From the following set of documents; I need to get the cheapest room of all similar (grouping by identifier room).
{"_id":"874521035","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"1"},"name":"Doble"},"board":{"id":{"$numberInt":"1"},"name":"Sólo alojamiento"},"fare":{"id":"NRF","name":"No reembolsable"},"price":{"cost":{"$numberInt":"115"},"net":{"$numberInt":"116"},"pvp":{"$numberInt":"126"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
{"_id":"123456789","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"1"},"name":"Doble"},"board":{"id":{"$numberInt":"2"},"name":"Alojamiento y desayuno"},"fare":{"id":"NOR","name":"Reembolsable"},"price":{"cost":{"$numberInt":"120"},"net":{"$numberInt":"121"},"pvp":{"$numberInt":"131"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
{"_id":"987654321","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"2"},"name":"Triple"},"board":{"id":{"$numberInt":"1"},"name":"Sólo alojamiento"},"fare":{"id":"NOR","name":"Reembolsable"},"price":{"cost":{"$numberInt":"125"},"net":{"$numberInt":"126"},"pvp":{"$numberInt":"136"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
{"_id":"852963147","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"3"},"name":"Doble uso individual"},"board":{"id":{"$numberInt":"1"},"name":"Sólo alojamiento"},"price":{"cost":{"$numberInt":"99"},"net":{"$numberInt":"100"},"pvp":{"$numberInt":"110"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
I've got obtain only the cheapest price, the room identifier and the number of repetitions.
db.consolidation.aggregate ([
{
$group: {
_id: "$ room.id",
"cheapest": {$ min: "$ price.pvp"},
        "qty": {$ sum: 1}
}
}]);
{"_id": 2, "cheapest": 136, "qty": 1}
{"_id": 3, "cheapest": 110, "qty": 1}
{"_id": 1, "cheapest": 126, "qty": 2}
Investigating I have seen that data can be obtained with $first or $last, but the data is not the data I need since it is obtained according to the position of the document.
What I need is to obtain from the set of documents, each document with the cheapest room. This is the result I expect:
{"_id":"874521035","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"1"},"name":"Doble"},"board":{"id":{"$numberInt":"1"},"name":"Sólo alojamiento"},"fare":{"id":"NRF","name":"No reembolsable"},"price":{"cost":{"$numberInt":"115"},"net":{"$numberInt":"116"},"pvp":{"$numberInt":"126"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
{"_id":"987654321","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"2"},"name":"Triple"},"board":{"id":{"$numberInt":"1"},"name":"Sólo alojamiento"},"fare":{"id":"NOR","name":"Reembolsable"},"price":{"cost":{"$numberInt":"125"},"net":{"$numberInt":"126"},"pvp":{"$numberInt":"136"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
{"_id":"852963147","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"3"},"name":"Doble uso individual"},"board":{"id":{"$numberInt":"1"},"name":"Sólo alojamiento"},"price":{"cost":{"$numberInt":"99"},"net":{"$numberInt":"100"},"pvp":{"$numberInt":"110"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
I hope I have explained.
Thanks in advance.
Regards.
You can add capture $$ROOT as part of your $group stage and then use $filter to compare a list of your rooms against min value. $replaceRoot will allow you to get original shape:
db.collection.aggregate([
{
$group: {
_id: "$room.id",
"cheapest": {
$min: "$price.pvp"
},
"qty": { $sum: 1 },
docs: { $push: "$$ROOT" }
}
},
{
$replaceRoot: {
newRoot: { $arrayElemAt: [ { $filter: { input: "$docs", cond: { $eq: [ "$$this.price.pvp", "$cheapest" ] } } }, 0 ] }
}
}
])
Mongo Playground

Counting data per user with mongo aggregation framework

I have a collection, where each document contains user_ids as a property, which is an Array field. Example document(s) would be :
[{
_id: 'i3oi1u31o2yi12o3i1',
unique_prop: 33,
prop1: 'some string value',
prop2: 212,
user_ids: [1, 2, 3 ,4]
},
{
_id: 'i3oi1u88ffdfi12o3i1',
unique_prop: 34,
prop1: 'some string value',
prop2: 216,
user_ids: [2, 3 ,4]
},
{
_id: 'i3oi1u8834432ddsda12o3i1',
unique_prop: 35,
prop1: 'some string value',
prop2: 211,
user_ids: [2]
}]
My goal is to get number of documents per user, so sample output would be :
[
{user_id: 1, count: 1},
{user_id: 2, count: 3},
{user_id: 3, count: 2},
{user_id: 4, count: 2}
]
I've tried couple of things none of which worked, lastly I tried :
aggregate([
{ $group: {
_id: { unique_prop: "$unique_prop"},
users: { "$addToSet": "$user_ids" },
count: { "$sum": 1 }
}}
]
But it just returned the users per document. I m still trying to learn the any resource or advice would help.
You need to $unwind the "user_ids" array and in the $group stage count the number of time each "id" appears in the collection.
db.collection.aggregate([
{ "$unwind": "$user_ids" },
{ "$group": { "_id": "$user_ids", "count": {"$sum": 1 }}}
])
MongoDB aggregation performs computation on group of values from documents in a collection and return computed result through executing its stages in a pipeline.
According to above mentioned description please try executing following aggregate query in MongoDB shell.
db.collection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: "$user_ids"
},
// Stage 2
{
$group: {
_id:{user_id:'$user_ids'},
total:{$sum:1}
}
},
// Stage 3
{
$project: {
_id:0,
user_id:'$_id.user_id',
count:'$total'
}
},
]
);
In above aggregate query initially $unwind operator breaks an array field user_ids of each document into multiple documents for each element of array field and then it groups documents by value of user_ids field contained into each document and performs summation of documents for each value of user_ids field.