MongoDB count number of non-missing fields - mongodb

I'm using the following code to calculate average and standard deviation of a field named "b" in my collection.
db.ctg.aggregate(
[
{
$group:
{
_id: "b",
avg: { $avg: "$b" },
stdev: { $stdDevPop: "$b" }
}
}
]
)
The result is:
{ "_id" : "b", "avg" : 878.4397930385701, "stdev" : 893.8744489449962 }
I need to add number of non missing elements of "b" to my result so it looks like this:
{ "_id" : "b", "avg" : 878.4397930385701, "stdev" : 893.8744489449962, "nonmissing": 2126 }
How can I do this in the query above?

Result of $avg & $stdDevPop doesn't change even after removal of documents where b doesn't exists ($avg ignores all docs where field is non-numeric/missing), So you can try below query.
Query :
db.ctg.aggregate([
{ $match: { b: { $exists: true } } },
{
$group:
{
_id: "b",
avg: { $avg: "$b" },
stdev: { $stdDevPop: "$b" },
nonMissing: { $sum: 1 }
}
}
])

Related

I need more help getting aggregated data from mongodb

I have a table with documents that look like this:
{
"_id" : ObjectId("bbbbbb1d9486c90479aaaaaa"),
"record" : {
"debug" : false
"type" : "MX_GTI",
"products" : [
"DAM"
],
"agents" : [
{
"services" : "mssql",
"hpsAvg" : 772,
"hpsMax" : 42901
},
{
"services" : "mssql",
"hpsAvg" : 95,
"hpsMax" : 21631
},
{
"services" : "oracle",
"hpsAvg" : 0,
"hpsMax" : 0
},
{
"services" : "db2",
"hpsAvg" : 0,
"hpsMax" : 0
}
]
}
}
I need to find the average and max HPS per DB type (the field services) across all the agents in all records that match the condition ("type": "MX_GTI"),
The max is the largest hpsMax across all agents with the database type, and the average is the average of all the non-zero values of hpsAvg.
The output should look like this:
[
{
"dbtype": "oracle",
"maxHPS": 123456,
"avgHPS": 12345
},…
]
Thank you
The difficult part is to make average to work if the value is 0.
$avg aggregation ignores non numeric values, so we need to replace 0 values with null before applying average. We can use $cond to make this transformation.
Playground
db.collection.aggregate([
{
$match: {
"record.type": "MX_GTI"
}
},
{
$unwind: "$record.agents"
},
{
$addFields: {
"record.agents.hpsAvg": {
$cond: {
if: {
$eq: [
"$record.agents.hpsAvg",
0
]
},
then: null,
else: "$record.agents.hpsAvg"
}
}
}
},
{
$group: {
_id: "$record.agents.services",
maxHPS: {
$max: "$record.agents.hpsMax"
},
avgHPS: {
$avg: "$record.agents.hpsAvg"
}
}
},
{
$addFields: {
dbType: "$_id"
}
},
{
$project: {
_id: 0
}
}
])
You can start from here
db.collection.aggregate([
{
$match: {
"record.type": "MX_GTI"
}
},
{
$unwind: "$record.agents"
},
{
$group: {
_id: "$record.agents.services",
"maxHPS": {
$max: "$record.agents.hpsMax"
},
"avgHPS": {
$avg: "$record.agents.hpsMax"
}
}
}
])
Sample Playground

MongoDB count documents for each array elements

I h
{
code : "X1",
elements : ["A", "B", "C", "D"]
},
{
code : "X2",
elements : ["C", "D"]
},
{
code : "X3",
elements : ["A"]
}
...
I would like to know the number of documents present for each type of value in the "elements" array.
es.
es.
"A" : 2
"B" : 1
"C" : 2
"D" : 2
is it possible with a single query?
You can $unwind your array to get single document per element and then run $group to count elements:
db.collection.aggregate([
{
$unwind: "$elements"
},
{
$group: {
_id: "$elements",
count: { $sum: 1 }
}
}
])
EDIT: you can use additional group with $replaceRoot and $arrayToObject to return your ids as keys and counts as values:
db.collection.aggregate([
{
$unwind: "$elements"
},
{
$group: {
_id: "$elements",
count: { $sum: 1 }
}
},
{
$group: {
_id: null,
counts: { $push: { k: "$_id", v: "$count" } }
}
},
{
$replaceRoot: {
newRoot: { $arrayToObject: "$counts" }
}
}
])
Mongo Playground

Need to sum from array object value in mongodb

I am trying to calculate total value if that value exits. But query is not working 100%. So can somebody help me to solve this problem. Here my sample document. I have attached two documents. Please these documents & find out best solution
Document : 1
{
"_id" : 1"),
"message_count" : 4,
"messages" : {
"data" : [
{
"id" : "11",
"saleValue": 1000
},
{
"id" : "112",
"saleValue": 1400
},
{
"id" : "22",
},
{
"id" : "234",
"saleValue": 111
}
],
},
"createdTime" : ISODate("2018-03-18T10:18:48.000Z")
}
Document : 2
{
"_id" : 444,
"message_count" : 4,
"messages" : {
"data" : [
{
"id" : "444",
"saleValue" : 2060
},
{
"id" : "444",
},
{
"id" : 234,
"saleValue" : 260
},
{
"id" : "34534",
}
]
},
"createdTime" : ISODate("2018-03-18T03:11:50.000Z")
}
Needed Output:
{
total : 4831
}
My query :
db.getCollection('myCollection').aggregate([
{
"$group": {
"_id": "$Id",
"totalValue": {
$sum: {
$sum: "$messages.data.saleValue"
}
}
}
}
])
So please if possible help me to solve this problem. Thanks in advance
It's not working correctly because it is aggregating all the documents in the collection; you are grouping on a constant "_id": "tempId", you just need to reference the correct key by adding the $ as:
db.getCollection('myCollection').aggregate([
{ "$group": {
"_id": "$tempId",
"totalValue": {
"$sum": { "$sum": "$messages.data.saleValue" }
}
} }
])
which in essence is a single stage pipeline version of an aggregate operation with an extra field that holds the sum expression before the group pipeline then calling that field as the $sum operator in the group.
The above works since $sum from MongoDB 3.2+ is available in both the $project and $group stages and when used in the $project stage, $sum returns the sum of the list of expressions. The expression "$messages.data.value" returns a list of numbers [120, 1200] which are then used as the $sum expression:
db.getCollection('myCollection').aggregate([
{ "$project": {
"values": { "$sum": "$messages.data.value" },
"tempId": 1,
} },
{ "$group": {
"_id": "$tempId",
"totalValue": { "$sum": "$values" }
} }
])
You can add a $unwind before your $group, in that way you will deconstructs the data array, and then you can group properly:
db.myCollection.aggregate([
{
"$unwind": "$messages.data"
},
{
"$group": {
"_id": "tempId",
"totalValue": {
$sum: {
$sum: "$messages.data.value"
}
}
}
}
])
Output:
{ "_id" : "tempId", "totalValue" : 1320 }
db.getCollection('myCollection').aggregate([
{
$unwind: "$messages.data",
$group: {
"_id": "tempId",
"totalValue": { $sum: "$messages.data.value" }
}
}
])
$unwind
According to description as mentioned into above question, as a solution please try executing following aggregate query
db.myCollection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: {
path: '$messages.data'
}
},
// Stage 2
{
$group: {
_id: {
pageId: '$pageId'
},
total: {
$sum: '$messages.data.saleValue'
}
}
},
// Stage 3
{
$project: {
pageId: '$_id.pageId',
total: 1,
_id: 0
}
}
]
);
You can do it without using $group. Grouping made other data to be managed and addressed. So, I prefer using $sum and $map as shown below:
db.getCollection('myCollection').aggregate([
{
$addFields: {
total: {
$sum: {
$map: {
input: "$messages.data",
as: "message",
in: "$$message.saleValue",
},
},
},
},
},
}
])

Finding all documents which share the same value in an array

Consider I have the following data below:
{
"id":123,
"name":"apple",
"codes":["ABC", "DEF", "EFG"]
}
{
"id":234,
"name":"pineapple",
"codes":["DEF"]
}
{
"id":345,
"name":"banana",
"codes":["HIJ","KLM"]
}
If I didn't want to search by a specific code, is there a way to find all fruits in my mongodb collection which shares the same code?
db.collection.aggregate([
{ $unwind: '$codes' },
{ $group: { _id: '$codes', count: {$sum:1}, fruits: {$push: '$name'}}},
{ $match: {'count': {$gt:1}}},
{ $group:{_id:null, total:{$sum:1}, data:{$push:{fruits: '$fruits', code:'$_id'}}}}
])
result:
{ "_id" : null, "total" : 1, "data" : [ { "fruits" : [ "apple", "pineapple" ], "code" : "DEF" } ] }

MongoDB Aggregation: Compute Running Totals from sum of previous rows

Sample Documents:
{ time: ISODate("2013-10-10T20:55:36Z"), value: 1 }
{ time: ISODate("2013-10-10T22:43:16Z"), value: 2 }
{ time: ISODate("2013-10-11T19:12:66Z"), value: 3 }
{ time: ISODate("2013-10-11T10:15:38Z"), value: 4 }
{ time: ISODate("2013-10-12T04:15:38Z"), value: 5 }
It's easy to get the aggregated results that is grouped by date.
But what I want is to query results that returns a running total
of the aggregation, like:
{ time: "2013-10-10" total: 3, runningTotal: 3 }
{ time: "2013-10-11" total: 7, runningTotal: 10 }
{ time: "2013-10-12" total: 5, runningTotal: 15 }
Is this possible with the MongoDB Aggregation?
EDIT: Since MongoDB v5.0 the prefered approach would be to use the new $setWindowFields aggregation stage as shared by Xavier Guihot.
This does what you need. I have normalised the times in the data so they group together (You could do something like this). The idea is to $group and push the time's and total's into separate arrays. Then $unwind the time array, and you have made a copy of the totals array for each time document. You can then calculated the runningTotal (or something like the rolling average) from the array containing all the data for different times. The 'index' generated by $unwind is the array index for the total corresponding to that time. It is important to $sort before $unwinding since this ensures the arrays are in the correct order.
db.temp.aggregate(
[
{
'$group': {
'_id': '$time',
'total': { '$sum': '$value' }
}
},
{
'$sort': {
'_id': 1
}
},
{
'$group': {
'_id': 0,
'time': { '$push': '$_id' },
'totals': { '$push': '$total' }
}
},
{
'$unwind': {
'path' : '$time',
'includeArrayIndex' : 'index'
}
},
{
'$project': {
'_id': 0,
'time': { '$dateToString': { 'format': '%Y-%m-%d', 'date': '$time' } },
'total': { '$arrayElemAt': [ '$totals', '$index' ] },
'runningTotal': { '$sum': { '$slice': [ '$totals', { '$add': [ '$index', 1 ] } ] } },
}
},
]
);
I have used something similar on a collection with ~80 000 documents, aggregating to 63 results. I am not sure how well it will work on larger collections, but I have found that performing transformations(projections, array manipulations) on aggregated data does not seem to have a large performance cost once the data is reduced to a manageable size.
here is another approach
pipeline
db.col.aggregate([
{$group : {
_id : { time :{ $dateToString: {format: "%Y-%m-%d", date: "$time", timezone: "-05:00"}}},
value : {$sum : "$value"}
}},
{$addFields : {_id : "$_id.time"}},
{$sort : {_id : 1}},
{$group : {_id : null, data : {$push : "$$ROOT"}}},
{$addFields : {data : {
$reduce : {
input : "$data",
initialValue : {total : 0, d : []},
in : {
total : {$sum : ["$$this.value", "$$value.total"]},
d : {$concatArrays : [
"$$value.d",
[{
_id : "$$this._id",
value : "$$this.value",
runningTotal : {$sum : ["$$value.total", "$$this.value"]}
}]
]}
}
}
}}},
{$unwind : "$data.d"},
{$replaceRoot : {newRoot : "$data.d"}}
]).pretty()
collection
> db.col.find()
{ "_id" : ObjectId("4f442120eb03305789000000"), "time" : ISODate("2013-10-10T20:55:36Z"), "value" : 1 }
{ "_id" : ObjectId("4f442120eb03305789000001"), "time" : ISODate("2013-10-11T04:43:16Z"), "value" : 2 }
{ "_id" : ObjectId("4f442120eb03305789000002"), "time" : ISODate("2013-10-12T03:13:06Z"), "value" : 3 }
{ "_id" : ObjectId("4f442120eb03305789000003"), "time" : ISODate("2013-10-11T10:15:38Z"), "value" : 4 }
{ "_id" : ObjectId("4f442120eb03305789000004"), "time" : ISODate("2013-10-13T02:15:38Z"), "value" : 5 }
result
{ "_id" : "2013-10-10", "value" : 3, "runningTotal" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "runningTotal" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "runningTotal" : 15 }
>
Here is a solution without pushing previous documents into a new array and then processing them. (If the array gets too big then you can exceed the maximum BSON document size limit, the 16MB.)
Calculating running totals is as simple as:
db.collection1.aggregate(
[
{
$lookup: {
from: 'collection1',
let: { date_to: '$time' },
pipeline: [
{
$match: {
$expr: {
$lt: [ '$time', '$$date_to' ]
}
}
},
{
$group: {
_id: null,
summary: {
$sum: '$value'
}
}
}
],
as: 'sum_prev_days'
}
},
{
$addFields: {
sum_prev_days: {
$arrayElemAt: [ '$sum_prev_days', 0 ]
}
}
},
{
$addFields: {
running_total: {
$sum: [ '$value', '$sum_prev_days.summary' ]
}
}
},
{
$project: { sum_prev_days: 0 }
}
]
)
What we did: within the lookup we selected all documents with smaller datetime and immediately calculated the sum (using $group as the second step of lookup's pipeline). The $lookup put the value into the first element of an array. We pull the first array element and then calculate the sum: current value + sum of previous values.
If you would like to group transactions into days and after it calculate running totals then we need to insert $group to the beginning and also insert it into $lookup's pipeline.
db.collection1.aggregate(
[
{
$group: {
_id: {
$substrBytes: ['$time', 0, 10]
},
value: {
$sum: '$value'
}
}
},
{
$lookup: {
from: 'collection1',
let: { date_to: '$_id' },
pipeline: [
{
$group: {
_id: {
$substrBytes: ['$time', 0, 10]
},
value: {
$sum: '$value'
}
}
},
{
$match: {
$expr: {
$lt: [ '$_id', '$$date_to' ]
}
}
},
{
$group: {
_id: null,
summary: {
$sum: '$value'
}
}
}
],
as: 'sum_prev_days'
}
},
{
$addFields: {
sum_prev_days: {
$arrayElemAt: [ '$sum_prev_days', 0 ]
}
}
},
{
$addFields: {
running_total: {
$sum: [ '$value', '$sum_prev_days.summary' ]
}
}
},
{
$project: { sum_prev_days: 0 }
}
]
)
The result is:
{ "_id" : "2013-10-10", "value" : 3, "running_total" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "running_total" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "running_total" : 15 }
Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator:
// { time: ISODate("2013-10-10T20:55:36Z"), value: 1 }
// { time: ISODate("2013-10-10T22:43:16Z"), value: 2 }
// { time: ISODate("2013-10-11T12:12:66Z"), value: 3 }
// { time: ISODate("2013-10-11T10:15:38Z"), value: 4 }
// { time: ISODate("2013-10-12T05:15:38Z"), value: 5 }
db.collection.aggregate([
{ $group: {
_id: { $dateToString: { format: "%Y-%m-%d", date: "$time" } },
total: { $sum: "$value" }
}},
// e.g.: { "_id" : "2013-10-11", "total" : 7 }
{ $set: { "date": "$_id" } }, { $unset: ["_id"] },
// e.g.: { "date" : "2013-10-11", "total" : 7 }
{ $setWindowFields: {
sortBy: { date: 1 },
output: {
running: {
$sum: "$total",
window: { documents: [ "unbounded", "current" ] }
}
}
}}
])
// { date: "2013-10-11", total: 7, running: 7 }
// { date: "2013-10-10", total: 3, running: 10 }
// { date: "2013-10-12", total: 5, running: 15 }
Let's focus on the $setWindowFields stage that:
chronologically $sorts grouped documents by date: sortBy: { date: 1 }
adds the running field in each document (output: { running: { ... }})
which is the $sum of totals ($sum: "$total")
on a specified span of documents (the window)
which is in our case any previous document: window: { documents: [ "unbounded", "current" ] } }
as defined by [ "unbounded", "current" ] meaning the window is all documents seen between the first document (unbounded) and the current document (current).