MongoDB sum returning always 0 - mongodb

I am new in Mongodb and I´m trying to create a query which print all the combinations of points that were assigned to the accommodations and sort them by the number of accommodations that received these points. However when I execute this query, the $sum is always returning 0 despite the 3 fields are numeric values:
db.test.aggregate([
{$addFields: {sumPoints: **{$sum: ["$lodging.reviews.cleanliness", "lodging.reviews.location", "lodging.reviews.food"]}**}},
{$group: {
_id: "$sumPoints",
count: {$sum: 1}
}},
{$sort: {count: 1}},
{$project: {_id: 0, count: 1, sumPoints: "$_id"}}
])
In the photo I show a document example.
Document example
Does anyone know what can be the problem?
I tried with that query and the result is just:
{ count: 5984, sumPoints: 0 }
because sumPoints is always returning 0.

I think there are two problems. The first is that you are missing the dollar sign (to indicate that you want to access the fields) for the second and third items. But on top of that, it seems that $sum might not be able to add from different values in the array by itself? Summing sums seems to have worked:
{
"$addFields": {
"sumPoints": {
$sum: [
{
$sum: [
"$lodging.reviews.cleanliness"
]
},
{
$sum: [
"$lodging.reviews.location"
]
},
{
$sum: [
"$lodging.reviews.food"
]
}
]
}
}
}
Playground example here
Alternatively, you can use the $reduce operator here:
{
"$addFields": {
"sumPoints": {
"$reduce": {
"input": "$lodging.reviews",
"initialValue": 0,
"in": {
$sum: [
"$$value",
"$$this.cleanliness",
"$$this.location",
"$$this.food"
]
}
}
}
}
}
Playground example here
In the future please also provide the text for your sample documents (or, better yet, a playground example directly) so that it is easier to assist.

Related

Limit number of objects pushed to array in MongoDB aggregation

I've been trying to find a way to limit the number of objects i'm pushing to arrays I'm creating while using "aggregate" on a MongoDB collection.
I have a collection of students - each has these relevant keys:
class number it takes this semester (only one value),
percentile in class (exists if is enrolled in class, null if not),
current score in class (> 0 if enrolled in class, else - 0),
total average (GPA),
max grade
I need to group all students who never failed, per class, in one array that contains those with a GPA higher than 80, and another array containing those without this GPA, sorted by their score in this specific class.
This is my query:
db.getCollection("students").aggregate([
{"$match": {
"class_number":
{"$in": [49, 50, 16]},
"grades.curr_class.percentile":
{"$exists": true},
"grades.min": {"$gte": 80},
}},
{"$sort": {"grades.curr_class.score": -1}},
{"$group": {"_id": "$class_number",
"studentsWithHighGPA":
{"$push":
{"$cond": [{"$gte": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
}
},
"studentsWithoutHighGPA":
{"$push":
{"$cond": [{"$lt": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
},
},
},
},
])
What i'm trying to do is limit the number of students in each of these arrays. I only want the top 16 in each array, but i'm not sure how to approach this.
Thanks in advance!
I've tried using limit in different variations, and slice too, but none seem to work.
Since mongoDb version 5.0, one option is to use $setWindowFields for this, and in particular, its $rank option. This will allow to keep only the relevant students and limit their count even before the $group step:
$match only relevant students as suggested by the OP
$set the groupId for the setWindowFields (as it can currently partition by one key only
$setWindowFields to define the rank of each student in their array
$match only students with the wanted rank
$group by class_number as suggested by the OP:
db.collection.aggregate([
{$match: {
class_number: {$in: [49, 50, 16]},
"grades.curr_class.percentile": {$exists: true},
"grades.min": {$gte: 80}
}},
{$set: {
groupId: {$concat: [
{$toString: "$class_number"},
{$toString: {$toBool: {$gte: ["$grades.gpa", 80]}}}
]}
}},
{$setWindowFields: {
partitionBy: "$groupId",
sortBy: {"grades.curr_class.score": -1},
output: {rank: {$rank: {}}}
}},
{$match: {rank: {$lte: rankLimit}}},
{$group: {
_id: "$class_number",
studentsWithHighGPA: {$push: {
$cond: [{$gte: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}},
studentsWithoutHighGPA: {$push: {
$cond: [{$lt: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}}
}}
])
See how it works on the playground example
*This solution will limit the rank of the students, so there is an edge case of more than n students in the array (In case there are multiple students with the exact rank of n). it can be simply solved by adding a $slice step
Maybe MongoDB $facets are a solution. You can specify different output pipelines in one aggregation call.
Something like this:
const pipeline = [
{
'$facet': {
'studentsWithHighGPA': [
{ '$match': { 'grade': { '$gte': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
],
'studentsWithoutHighGPA': [
{ '$match': { 'grade': { '$lt': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
]
}
}
];
coll.aggregate(pipeline)
This should end up with one document including two arrays.
studentsWithHighGPA (array)
0 (object)
1 (object)
...
studentsWithoutHighGPA (array)
0 (object)
1 (object)
See each facet as an aggregation pipeline on its own. So you can also include $group to group by classes or something else.
https://www.mongodb.com/docs/manual/reference/operator/aggregation/facet/
I don't think there is a mongodb-provided operator to apply a limit inside of a $group stage.
You could use $accumulator, but that requires server-side scripting to be enabled, and may have performance impact.
Limiting studentsWithHighGPA to 16 throughout the grouping might look something like:
"studentsWithHighGPA": {
"$accumulator": {
init: "function(){
return {combined:[]};
}",
accumulate: "function(state, id, score){
if (score >= 80) {
state.combined.push({_id:id, score:score})
};
return {combined:state.combined.slice(0,16)}
}",
accumulateArgs: [ "$_id", "$grades.gpa"],
merge: "function(A,B){
return {combined:
A.combined.concat(B.combined).sort(
function(SA,SB){
return (SB.score - SA.score)
})
}
}",
finalize: "function(s){
return s.combined.slice(0,16).map(function(A){
return {_id:A._id}
})
}",
lang: "js"
}
}
Note that the score is also carried through until the very end so that partial result sets from different shards can be combined properly.

Efficiently find the most recent filtered document in MongoDB collection using datetime field

I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.

Total count and field count with condition in a single MongoDB aggregation pipeline

I have a collection of components. Simplified, a document looks like this:
{
"_id" : "50c4f4f2-68b5-4153-80db-de8fcf716902",
"name" : "C156",
"posX" : "-136350",
"posY" : "-27350",
"posZ" : "962",
"inspectionIsFailed" : "False"
}
I would now like to calculate three things. The number of all components in the collection, the number of all faulty components "inspectionIsFailed": "True" and then the ratio (number of all faulty components divided by the number of all components).
I know how to get the first two things separately and in a row with one aggregation each.
Number of all components:
db.components.aggregate([
{$group: {_id: null, totalCount: {$sum: 1}}}
]);
Number of all faulty components:
db.components.aggregate([
{$match: {inspectionIsFailed: "True"}},
{$group: {_id: null, failedCount: {$sum: 1}}}
]);
However, I want to calculate the two values in a single pipeline and not separately. Then I could use $divide to calculate the ratio at the end of the pipeline. My desired output should then only contain the ratio:
{ ratio: 0.2 }
My problem with a single pipeline is:
If I try to calculate the total number first, then I can no longer calculate the number of the faulty components. If I first calculate the number of faulty components with $match, I can no longer calculate the total number.
You can try,
$group by null, get totalCount with $sum, and get failedCount on the base of $cond (condition) if inspectionIsFailed id True then return 1 and sum other wise 0
$project to get ratio using $divide
db.collection.aggregate([
{
$group: {
_id: null,
totalCount: { $sum: 1 },
failedCount: {
$sum: {
$cond: [{ $eq: ["$inspectionIsFailed", "True"] }, 1, 0 ]
}
}
}
},
{
$project: {
_id: 0,
ratio: {
$divide: ["$failedCount", "$totalCount"]
}
}
}
])
Playground
As I found out, you can not do it in one pipeline, then you have to use $facet as in this answer explained.
Also I suggest to use boolean for inspectionIsFailed.
db.collection.aggregate([
{
$facet: {
totalCount: [
{
$count: "value"
}
],
pipelineResults: [
{
$match: {
inspectionIsFailed: true
}
},
{
$group: {
_id: "$_id",
failedCount: {
$sum: 1
}
}
}
]
}
}
])
You can test it here.

Cannot sort arrays using unwind or reduce mongodb aggregation pipline

I am having a problem sorting an array in a mongodb aggregation pipeline. I couldn't find an answer in stackoverflow so am posting my problem
I have some data in mongodb which has this structure
{
testOne:{ scores:[1, 5, 8]},
testTwo:{scores:[3, 4, 5]},
testThree:{scores:[9,0,1]},
........
testFifty:{scores:[0,8,1]}
}
Basically a series of many tests and the scores from those tests (this is a simplified example to illustrate the problem).
After a few hours banging my head against a wall I found I could only unwind two arrays, any more and no data was returned.
So I couldn't unwind all my arrays and group to get arrays of each test scores from the db
db.collection.aggregate([
{
{$unwind: '$testOne.scores'},
{$unwind: '$testTwo.scores'},
{$unwind: '$testThree.scores'},
{$group:{
_id: {},
concatTestOne: { $push: "$testOne.scores"},
concatTestTwo: { $push: "$testTwo.scores"},
concatTestThree: { $push: "$testThree.scores"}
}
}
To my frustration , even if I just did one unwind the resulting array did not sort properly see below:
db.collection.aggregate([
{
{$unwind: '$testOne.scores'},
{$group:{
_id: {},
concatTestOne: { $push: "$testOne.scores"}
}
{$sort:{concatTestOne: 1}},
}
The result was [1,5,8,3,4,5,....,9,0,1]. No sorting
So to get all the tests scores I used reduce to flatten the nested arrays resulting from the grouping stage with no 'unwinding' e.g.:
db.collection.aggregate([
{
{$group:{
_id: {},
concatTestOne: { $push: "$testOne.scores"}
},
{$addFields:{
testOneScores: {$reduce: {
input: "$concatTestOne",
initialValue: [ ],
in: { $concatArrays: [ "$$value", "$$this" ] }
}
}
}
Once again the resulting arrays do not sort. Can anyone tell me what I am doing wrong. The arrays are large (length approx 3500) is it just mongoDB aggregation doesn't handle large arrays ?
Many thanks for any comments I have spent a lot of time trying to sort my arrays and noting works so far
In my understanding, this is your needed query, otherwise please update your sample output.
db.col.aggregate([
{$unwind: '$testOne.scores'},
{$unwind: '$testTwo.scores'},
{$unwind: '$testThree.scores'},
{"$sort" : {"testOne.scores" : -1,"testTwo.scores" : -1,"testThree.scores" : -1}},
{$group:{
_id: {},
concatTestOne: { $addToSet: "$testOne.scores"},
concatTestTwo: { $addToSet: "$testTwo.scores"},
concatTestThree: { $addToSet: "$testThree.scores"}
}
}
])

Mongo aggregation with paginated data and totals

I've crawled all over stack overflow, and have not found any info on how to return proper pagination data included in the resultset.
I'm trying to aggregate some data from my mongo store. What I want, is to have something return:
{
total: 5320,
page: 0,
pageSize: 10,
data: [
{
_id: 234,
currentEvent: "UPSTREAM_QUEUE",
events: [
{ ... }, { ... }, { ... }
]
},
{
_id: 235,
currentEvent: "UPSTREAM_QUEUE",
events: [
{ ... }, { ... }, { ... }
]
}
]
}
This is what I have so far:
// page and pageSize are variables
db.mongoAuditEvent.aggregate([
// Actual grouped data
{"$group": {
"_id" : "$corrId",
"currentEvent": {"$last": "$event.status"},
"events": { $push: "$$ROOT"}
}},
// Pagination group
{"$group": {
"_id": 0,
"total": { "$sum": "corrId" },
"page": page,
"pageSize": pageSize,
"data": {
"$push": {
"_id": "$_id",
"currentEvent": "$currentEvent",
"events": "$events"
}
}
}},
{"$sort": {"events.timestamp": -1} }, // Latest first
{"$skip": page },
{"$limit": pageSize }
], {allowDiskUse: true});
I'm trying to have a pagination group as root, containing the actual grouped data inside (so that I get actual totals, whilst still retaining skip and limits).
The above code will return the following error in mongo console:
The field 'page' must be an accumulator object
If I remove the page and pageSize from the pagination group, I still get the following error:
BSONObj size: 45707184 (0x2B96FB0) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 0
If I remove the pagination group alltogether, the query works fine. But I really need to return how many documents I have stored total, and allthough not actually necessary, page and pageSize would be nice to return as well.
Can somebody please tell me what I am doing wrong? Or tell me if it is at all possible to do this in one go?
If you have a lot of events, {$ push: "$$ ROOT"}, will make Mongo return an error, I have solved it with $facet (Only works with version 3.4+)
aggregate([
{ $match: options },
{
$facet: {
edges: [
{ $sort: sort },
{ $skip: skip },
{ $limit: limit },
],
pageInfo: [
{ $group: { _id: null, count: { $sum: 1 } } },
],
},
},
])
A performance optimization tip:
When you use $facet stage for pagination, Try to add it as soon as it's possible.
For example: if you want to add $project or $lookup stage, add them after $facet, not before it.
it will have impressive effect in aggregation speed. because $project stage require MongoDB to explore all documents and get involve with all fields(which is not necessary).
Did this in two steps instead of one:
// Get the totals
db.mongoAuditEvent.aggregate([{$group: {_id: "$corrId"}}, {$group: {_id: 1, total: {$sum: 1}}}]);
// Get the data
db.mongoAuditEvent.aggregate([
{$group: {
_id : "$corrId",
currentEvent: {"$last": "$event.status"},
"events": { $push: "$$ROOT"}
}},
{$sort: {"events.timestamp": -1} }, // Latest first
{$skip: 0 },
{$limit: 10}
], {allowDiskUse: true}).pretty();
I would be very happy if anybody got a better solution to this though.