How to determine if average is 0 vs null? - mongodb

I've got a Mongo database where I run some aggregation queries. Here's the simplified query I want to run:
db.coll.aggregate([
{ $group: {
_id: 'fieldA',
fieldB: { $avg: '$fieldB' }
} },
])
It groups data by fieldA and calculates average for fieldB. Anyway, some rows in result set have 0 as value for fieldB. There can be 2 reasons for that:
Average value IS 0.
All documents in a group didn't have fieldB (or had null as a value); and Mongo behavior is to return 0 in that case.
Is it possible to determine which scenario took place for each row in resulting selection without issuing other query and without leaving aggregation pipeline?
UPDATE
I can't filter out non-null fields, as I'm doing aggregation for few fields, like that:
db.coll.aggregate([
{ $group: {
_id: 'fieldA',
fieldB: { $avg: '$fieldB' },
fieldC: { $avg: '$fieldC' }
} },
])
Some of the documents may have fieldB but not fieldC and vice versa.

You can filter the data by using $match before your $group operation.
db.coll.aggregate([
{ $match: { fieldB : {$ne : null }}}},
{ $group: {
_id: 'fieldA',
fieldB: { $avg: '$fieldB' }
} },
])
This way you will get only documents that have fieldB set.
UPDATE
You can't use the $avg that way but you can find out if all values are NULL using $min operator:
db.coll.aggregate([
{ $group: {
_id: 'fieldA',
fieldB: { $avg: '$fieldB' } ,
fieldBAllNullOrMin: { $min: '$fieldB' }
} },
])
The $min operator will return null if all values are null, otherwise it will return min. value (but only in 2.4+ versions of MongoDB).

You can use the $max (or $min) operator to determine whether all
instances of fieldB in a group are null or missing, as the $max (or
$min) operator return null in that case. Given this aggregation
pipeline:
c.aggregate([
{$group: {
_id: '$fieldA',
avg: {$avg: '$fieldB'},
max: {$max: '$fieldB'},
}}
])
with these documents:
c.insert({fieldA: 1, fieldB: 3})
c.insert({fieldA: 1, fieldB: -3})
the result is:
{"_id": 1, "avg": 0, "max": 3}
whereas with these documents:
c.insert({fieldA: 1})
c.insert({fieldA: 1})
the result is:
{"_id": 1, "avg": 0, "max": null}
The null value for the max field tells you that fieldB was null or
missing in all documents in the group.
Hope this helps,
Bruce

Related

Limit number of objects pushed to array in MongoDB aggregation

I've been trying to find a way to limit the number of objects i'm pushing to arrays I'm creating while using "aggregate" on a MongoDB collection.
I have a collection of students - each has these relevant keys:
class number it takes this semester (only one value),
percentile in class (exists if is enrolled in class, null if not),
current score in class (> 0 if enrolled in class, else - 0),
total average (GPA),
max grade
I need to group all students who never failed, per class, in one array that contains those with a GPA higher than 80, and another array containing those without this GPA, sorted by their score in this specific class.
This is my query:
db.getCollection("students").aggregate([
{"$match": {
"class_number":
{"$in": [49, 50, 16]},
"grades.curr_class.percentile":
{"$exists": true},
"grades.min": {"$gte": 80},
}},
{"$sort": {"grades.curr_class.score": -1}},
{"$group": {"_id": "$class_number",
"studentsWithHighGPA":
{"$push":
{"$cond": [{"$gte": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
}
},
"studentsWithoutHighGPA":
{"$push":
{"$cond": [{"$lt": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
},
},
},
},
])
What i'm trying to do is limit the number of students in each of these arrays. I only want the top 16 in each array, but i'm not sure how to approach this.
Thanks in advance!
I've tried using limit in different variations, and slice too, but none seem to work.
Since mongoDb version 5.0, one option is to use $setWindowFields for this, and in particular, its $rank option. This will allow to keep only the relevant students and limit their count even before the $group step:
$match only relevant students as suggested by the OP
$set the groupId for the setWindowFields (as it can currently partition by one key only
$setWindowFields to define the rank of each student in their array
$match only students with the wanted rank
$group by class_number as suggested by the OP:
db.collection.aggregate([
{$match: {
class_number: {$in: [49, 50, 16]},
"grades.curr_class.percentile": {$exists: true},
"grades.min": {$gte: 80}
}},
{$set: {
groupId: {$concat: [
{$toString: "$class_number"},
{$toString: {$toBool: {$gte: ["$grades.gpa", 80]}}}
]}
}},
{$setWindowFields: {
partitionBy: "$groupId",
sortBy: {"grades.curr_class.score": -1},
output: {rank: {$rank: {}}}
}},
{$match: {rank: {$lte: rankLimit}}},
{$group: {
_id: "$class_number",
studentsWithHighGPA: {$push: {
$cond: [{$gte: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}},
studentsWithoutHighGPA: {$push: {
$cond: [{$lt: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}}
}}
])
See how it works on the playground example
*This solution will limit the rank of the students, so there is an edge case of more than n students in the array (In case there are multiple students with the exact rank of n). it can be simply solved by adding a $slice step
Maybe MongoDB $facets are a solution. You can specify different output pipelines in one aggregation call.
Something like this:
const pipeline = [
{
'$facet': {
'studentsWithHighGPA': [
{ '$match': { 'grade': { '$gte': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
],
'studentsWithoutHighGPA': [
{ '$match': { 'grade': { '$lt': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
]
}
}
];
coll.aggregate(pipeline)
This should end up with one document including two arrays.
studentsWithHighGPA (array)
0 (object)
1 (object)
...
studentsWithoutHighGPA (array)
0 (object)
1 (object)
See each facet as an aggregation pipeline on its own. So you can also include $group to group by classes or something else.
https://www.mongodb.com/docs/manual/reference/operator/aggregation/facet/
I don't think there is a mongodb-provided operator to apply a limit inside of a $group stage.
You could use $accumulator, but that requires server-side scripting to be enabled, and may have performance impact.
Limiting studentsWithHighGPA to 16 throughout the grouping might look something like:
"studentsWithHighGPA": {
"$accumulator": {
init: "function(){
return {combined:[]};
}",
accumulate: "function(state, id, score){
if (score >= 80) {
state.combined.push({_id:id, score:score})
};
return {combined:state.combined.slice(0,16)}
}",
accumulateArgs: [ "$_id", "$grades.gpa"],
merge: "function(A,B){
return {combined:
A.combined.concat(B.combined).sort(
function(SA,SB){
return (SB.score - SA.score)
})
}
}",
finalize: "function(s){
return s.combined.slice(0,16).map(function(A){
return {_id:A._id}
})
}",
lang: "js"
}
}
Note that the score is also carried through until the very end so that partial result sets from different shards can be combined properly.

Total count and field count with condition in a single MongoDB aggregation pipeline

I have a collection of components. Simplified, a document looks like this:
{
"_id" : "50c4f4f2-68b5-4153-80db-de8fcf716902",
"name" : "C156",
"posX" : "-136350",
"posY" : "-27350",
"posZ" : "962",
"inspectionIsFailed" : "False"
}
I would now like to calculate three things. The number of all components in the collection, the number of all faulty components "inspectionIsFailed": "True" and then the ratio (number of all faulty components divided by the number of all components).
I know how to get the first two things separately and in a row with one aggregation each.
Number of all components:
db.components.aggregate([
{$group: {_id: null, totalCount: {$sum: 1}}}
]);
Number of all faulty components:
db.components.aggregate([
{$match: {inspectionIsFailed: "True"}},
{$group: {_id: null, failedCount: {$sum: 1}}}
]);
However, I want to calculate the two values in a single pipeline and not separately. Then I could use $divide to calculate the ratio at the end of the pipeline. My desired output should then only contain the ratio:
{ ratio: 0.2 }
My problem with a single pipeline is:
If I try to calculate the total number first, then I can no longer calculate the number of the faulty components. If I first calculate the number of faulty components with $match, I can no longer calculate the total number.
You can try,
$group by null, get totalCount with $sum, and get failedCount on the base of $cond (condition) if inspectionIsFailed id True then return 1 and sum other wise 0
$project to get ratio using $divide
db.collection.aggregate([
{
$group: {
_id: null,
totalCount: { $sum: 1 },
failedCount: {
$sum: {
$cond: [{ $eq: ["$inspectionIsFailed", "True"] }, 1, 0 ]
}
}
}
},
{
$project: {
_id: 0,
ratio: {
$divide: ["$failedCount", "$totalCount"]
}
}
}
])
Playground
As I found out, you can not do it in one pipeline, then you have to use $facet as in this answer explained.
Also I suggest to use boolean for inspectionIsFailed.
db.collection.aggregate([
{
$facet: {
totalCount: [
{
$count: "value"
}
],
pipelineResults: [
{
$match: {
inspectionIsFailed: true
}
},
{
$group: {
_id: "$_id",
failedCount: {
$sum: 1
}
}
}
]
}
}
])
You can test it here.

How to retrieve specific keys when grouping on mongo while using $max on a field?

How can i retrieve keys beyond the grouped ones from mongodb?
Documents example:
{code: 'x-1', discount_value: 10, type: 1}
{code: 'x-2', discount_value: 8, type: 1}
{code: 'x-3', discount_value: 5, type: 2}
Query:
{
$match: {
type: 1
}
},
{
$group: {
_id: null
discount_value: {$max: '$discount_value'}
}
}
This query will retrieve the max value from discount_value (10) key and the key _id but how i can do to retrieve the code and type key as well if i don't have operation to do those keys?
The current result:
{_id: null, discount_value: 10}
Expected result:
{_id: null, discount_value: 10, type: 1, code: 'x-1'}
You can try below query :
db.collection.aggregate([
{
$match: { type: 1 }
},
{
$group: {
_id: null,
doc: {
$max: {
discount_value: "$discount_value",
type: "$type",
code: "$code"
}
}
}
}
])
I believe it would get $max on field discount_value and get respective type & code values from the doc where discount_value is max.
In another way, since you're using $match as first stage, I believe your data will be less enough to perform $sort efficiently :
db.collection.aggregate([
{
$match: { type: 1 }
},
{
$sort: { discount_value: -1 } // sort in desc order
},
{
$limit: 1
}
])
Test : mongoplayground
Note :
Test the first query on DB itself rather than in playground. In first query you can use $replaceRoot as last stage if you wanted to make doc field as root of your document.

Ignoring NULL values within an aggregate operation in MongoDB

I have the following MongoDB aggregate operation which is working fine but it also seems to be returning NULL values.
How can I ignore NULL values against projectIP field?
db.inventory.aggregate(
[
{ $match: {projectIP: { $exists:true }}},
{ $project: {projectIP: "$projectIP",_id : 0}},
{ $group: {_id: "$projectIP"}},
{ $sort: {projectIP: 1}}
];
)
Seems some of the keys contain null values. Add this as well
{ $match: { projectIP: { $exists:true, $ne: null }}}
by replacing the first stage in your query
You can assign a value (0 or anything) to them instead of a null value.
Here how you do it
projectIP: { $ifNull: [ "$projectIP", 0.0 ] }

MongoDB distinct aggregation

I'm working on a query to find cities with most zips for each state:
db.zips.distinct("state", db.zips.aggregate([
{ $group:
{ _id: {
state: "$state",
city: "$city"
},
numberOfzipcodes: {
$sum: 1
}
}
},
{ $sort: {
numberOfzipcodes: -1
}
}
])
)
The aggregate part of the query seems to work fine, but when I add the distinct I get an empty result.
Is this because I have state in the id? Can I do something like distinct("_id.state ?
You can use $addToSet with the aggregation framework to count distinct objects.
For example:
db.collectionName.aggregate([{
$group: {_id: null, uniqueValues: {$addToSet: "$fieldName"}}
}])
Or extended to get your unique values into a proper list rather than a sub-document inside a null _id record:
db.collectionName.aggregate([
{ $group: {_id: null, myFieldName: {$addToSet: "$myFieldName"}}},
{ $unwind: "$myFieldName" },
{ $project: { _id: 0 }},
])
Distinct and the aggregation framework are not inter-operable.
Instead you just want:
db.zips.aggregate([
{$group:{_id:{city:'$city', state:'$state'}, numberOfzipcodes:{$sum:1}}},
{$sort:{numberOfzipcodes:-1}},
{$group:{_id:'$_id.state', city:{$first:'$_id.city'},
numberOfzipcode:{$first:'$numberOfzipcodes'}}}
]);
SQL Query: (group by & count of distinct)
select city,count(distinct(emailId)) from TransactionDetails group by city;
Equivalent mongo query would look like this:
db.TransactionDetails.aggregate([
{$group:{_id:{"CITY" : "$cityName"},uniqueCount: {$addToSet: "$emailId"}}},
{$project:{"CITY":1,uniqueCustomerCount:{$size:"$uniqueCount"}} }
]);
You can call $setUnion on a single array, which also filters dupes:
{ $project: {Package: 1, deps: {'$setUnion': '$deps.Package'}}}