MongoDB Aggregation: Compute Running Totals from sum of previous rows - mongodb

Sample Documents:
{ time: ISODate("2013-10-10T20:55:36Z"), value: 1 }
{ time: ISODate("2013-10-10T22:43:16Z"), value: 2 }
{ time: ISODate("2013-10-11T19:12:66Z"), value: 3 }
{ time: ISODate("2013-10-11T10:15:38Z"), value: 4 }
{ time: ISODate("2013-10-12T04:15:38Z"), value: 5 }
It's easy to get the aggregated results that is grouped by date.
But what I want is to query results that returns a running total
of the aggregation, like:
{ time: "2013-10-10" total: 3, runningTotal: 3 }
{ time: "2013-10-11" total: 7, runningTotal: 10 }
{ time: "2013-10-12" total: 5, runningTotal: 15 }
Is this possible with the MongoDB Aggregation?

EDIT: Since MongoDB v5.0 the prefered approach would be to use the new $setWindowFields aggregation stage as shared by Xavier Guihot.
This does what you need. I have normalised the times in the data so they group together (You could do something like this). The idea is to $group and push the time's and total's into separate arrays. Then $unwind the time array, and you have made a copy of the totals array for each time document. You can then calculated the runningTotal (or something like the rolling average) from the array containing all the data for different times. The 'index' generated by $unwind is the array index for the total corresponding to that time. It is important to $sort before $unwinding since this ensures the arrays are in the correct order.
db.temp.aggregate(
[
{
'$group': {
'_id': '$time',
'total': { '$sum': '$value' }
}
},
{
'$sort': {
'_id': 1
}
},
{
'$group': {
'_id': 0,
'time': { '$push': '$_id' },
'totals': { '$push': '$total' }
}
},
{
'$unwind': {
'path' : '$time',
'includeArrayIndex' : 'index'
}
},
{
'$project': {
'_id': 0,
'time': { '$dateToString': { 'format': '%Y-%m-%d', 'date': '$time' } },
'total': { '$arrayElemAt': [ '$totals', '$index' ] },
'runningTotal': { '$sum': { '$slice': [ '$totals', { '$add': [ '$index', 1 ] } ] } },
}
},
]
);
I have used something similar on a collection with ~80 000 documents, aggregating to 63 results. I am not sure how well it will work on larger collections, but I have found that performing transformations(projections, array manipulations) on aggregated data does not seem to have a large performance cost once the data is reduced to a manageable size.

here is another approach
pipeline
db.col.aggregate([
{$group : {
_id : { time :{ $dateToString: {format: "%Y-%m-%d", date: "$time", timezone: "-05:00"}}},
value : {$sum : "$value"}
}},
{$addFields : {_id : "$_id.time"}},
{$sort : {_id : 1}},
{$group : {_id : null, data : {$push : "$$ROOT"}}},
{$addFields : {data : {
$reduce : {
input : "$data",
initialValue : {total : 0, d : []},
in : {
total : {$sum : ["$$this.value", "$$value.total"]},
d : {$concatArrays : [
"$$value.d",
[{
_id : "$$this._id",
value : "$$this.value",
runningTotal : {$sum : ["$$value.total", "$$this.value"]}
}]
]}
}
}
}}},
{$unwind : "$data.d"},
{$replaceRoot : {newRoot : "$data.d"}}
]).pretty()
collection
> db.col.find()
{ "_id" : ObjectId("4f442120eb03305789000000"), "time" : ISODate("2013-10-10T20:55:36Z"), "value" : 1 }
{ "_id" : ObjectId("4f442120eb03305789000001"), "time" : ISODate("2013-10-11T04:43:16Z"), "value" : 2 }
{ "_id" : ObjectId("4f442120eb03305789000002"), "time" : ISODate("2013-10-12T03:13:06Z"), "value" : 3 }
{ "_id" : ObjectId("4f442120eb03305789000003"), "time" : ISODate("2013-10-11T10:15:38Z"), "value" : 4 }
{ "_id" : ObjectId("4f442120eb03305789000004"), "time" : ISODate("2013-10-13T02:15:38Z"), "value" : 5 }
result
{ "_id" : "2013-10-10", "value" : 3, "runningTotal" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "runningTotal" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "runningTotal" : 15 }
>

Here is a solution without pushing previous documents into a new array and then processing them. (If the array gets too big then you can exceed the maximum BSON document size limit, the 16MB.)
Calculating running totals is as simple as:
db.collection1.aggregate(
[
{
$lookup: {
from: 'collection1',
let: { date_to: '$time' },
pipeline: [
{
$match: {
$expr: {
$lt: [ '$time', '$$date_to' ]
}
}
},
{
$group: {
_id: null,
summary: {
$sum: '$value'
}
}
}
],
as: 'sum_prev_days'
}
},
{
$addFields: {
sum_prev_days: {
$arrayElemAt: [ '$sum_prev_days', 0 ]
}
}
},
{
$addFields: {
running_total: {
$sum: [ '$value', '$sum_prev_days.summary' ]
}
}
},
{
$project: { sum_prev_days: 0 }
}
]
)
What we did: within the lookup we selected all documents with smaller datetime and immediately calculated the sum (using $group as the second step of lookup's pipeline). The $lookup put the value into the first element of an array. We pull the first array element and then calculate the sum: current value + sum of previous values.
If you would like to group transactions into days and after it calculate running totals then we need to insert $group to the beginning and also insert it into $lookup's pipeline.
db.collection1.aggregate(
[
{
$group: {
_id: {
$substrBytes: ['$time', 0, 10]
},
value: {
$sum: '$value'
}
}
},
{
$lookup: {
from: 'collection1',
let: { date_to: '$_id' },
pipeline: [
{
$group: {
_id: {
$substrBytes: ['$time', 0, 10]
},
value: {
$sum: '$value'
}
}
},
{
$match: {
$expr: {
$lt: [ '$_id', '$$date_to' ]
}
}
},
{
$group: {
_id: null,
summary: {
$sum: '$value'
}
}
}
],
as: 'sum_prev_days'
}
},
{
$addFields: {
sum_prev_days: {
$arrayElemAt: [ '$sum_prev_days', 0 ]
}
}
},
{
$addFields: {
running_total: {
$sum: [ '$value', '$sum_prev_days.summary' ]
}
}
},
{
$project: { sum_prev_days: 0 }
}
]
)
The result is:
{ "_id" : "2013-10-10", "value" : 3, "running_total" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "running_total" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "running_total" : 15 }

Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator:
// { time: ISODate("2013-10-10T20:55:36Z"), value: 1 }
// { time: ISODate("2013-10-10T22:43:16Z"), value: 2 }
// { time: ISODate("2013-10-11T12:12:66Z"), value: 3 }
// { time: ISODate("2013-10-11T10:15:38Z"), value: 4 }
// { time: ISODate("2013-10-12T05:15:38Z"), value: 5 }
db.collection.aggregate([
{ $group: {
_id: { $dateToString: { format: "%Y-%m-%d", date: "$time" } },
total: { $sum: "$value" }
}},
// e.g.: { "_id" : "2013-10-11", "total" : 7 }
{ $set: { "date": "$_id" } }, { $unset: ["_id"] },
// e.g.: { "date" : "2013-10-11", "total" : 7 }
{ $setWindowFields: {
sortBy: { date: 1 },
output: {
running: {
$sum: "$total",
window: { documents: [ "unbounded", "current" ] }
}
}
}}
])
// { date: "2013-10-11", total: 7, running: 7 }
// { date: "2013-10-10", total: 3, running: 10 }
// { date: "2013-10-12", total: 5, running: 15 }
Let's focus on the $setWindowFields stage that:
chronologically $sorts grouped documents by date: sortBy: { date: 1 }
adds the running field in each document (output: { running: { ... }})
which is the $sum of totals ($sum: "$total")
on a specified span of documents (the window)
which is in our case any previous document: window: { documents: [ "unbounded", "current" ] } }
as defined by [ "unbounded", "current" ] meaning the window is all documents seen between the first document (unbounded) and the current document (current).

Related

Need help to MongoDB aggregate $group state

I have a collection of 1000 documents like this:
{
"_id" : ObjectId("628b63d66a5951db6bb79905"),
"index" : 0,
"name" : "Aurelia Gonzales",
"isActive" : false,
"registered" : ISODate("2015-02-11T04:22:39.000+0000"),
"age" : 41,
"gender" : "female",
"eyeColor" : "green",
"favoriteFruit" : "banana",
"company" : {
"title" : "YURTURE",
"email" : "aureliagonzales#yurture.com",
"phone" : "+1 (940) 501-3963",
"location" : {
"country" : "USA",
"address" : "694 Hewes Street"
}
},
"tags" : [
"enim",
"id",
"velit",
"ad",
"consequat"
]
}
I want to group those by year and gender. Like In 2014 male registration 105 and female registration 131. And finally return documents like this:
{
_id:2014,
male:105,
female:131,
total:236
},
{
_id:2015,
male:136,
female:128,
total:264
}
I have tried till group by registered and gender like this:
db.persons.aggregate([
{ $group: { _id: { year: { $year: "$registered" }, gender: "$gender" }, total: { $sum: NumberInt(1) } } },
{ $sort: { "_id.year": 1,"_id.gender":1 } }
])
which is return document like this:
{
"_id" : {
"year" : 2014,
"gender" : "female"
},
"total" : 131
}
{
"_id" : {
"year" : 2014,
"gender" : "male"
},
"total" : 105
}
Please guide to figure out from this whole.
db.collection.aggregate([
{
"$group": { //Group things
"_id": "$_id.year",
"gender": {
"$addToSet": {
k: "$_id.gender",
v: "$total"
}
},
sum: { //Sum it
$sum: "$total"
}
}
},
{
"$project": {//Reshape it
g: {
"$arrayToObject": "$gender"
},
_id: 1,
sum: 1
}
},
{
"$project": { //Reshape it
_id: 1,
"g.female": 1,
"g.male": 1,
sum: 1
}
}
])
Play
Just add one more group stage to your aggregation pipeline, like this:
db.persons.aggregate([
{ $group: { _id: { year: { $year: "$registered" }, gender: "$gender" }, total: { $sum: NumberInt(1) } } },
{ $sort: { "_id.year": 1,"_id.gender":1 } },
{
$group: {
_id: "$_id.year",
male: {
$sum: {
$cond: {
if: {
$eq: [
"$_id.gender",
"male"
]
},
then: "$total",
else: 0
}
}
},
female: {
$sum: {
$cond: {
if: {
$eq: [
"$_id.gender",
"female"
]
},
then: "$total",
else: 0
}
}
},
total: {
$sum: "$total"
}
},
}
]);
Here's the working link. We are grouping by year in this last step, and calculating the counts for gender conditionally and the total is just the total of the counts irrespective of the gender.
Besides #Gibbs mentioned in the comment which proposes the solution with 2 $group stages,
You can achieve the result as below:
$group - Group by year of registered. Add gender value into genders array.
$sort - Order by _id.
$project - Decorate output documents.
3.1. male - Get the size of array from $filter the value of "male" in "genders" array.
3.2. female - Get the size of array from $filter the value of "female" in "genders" array.
3.3. total - Get the size of "genders" array.
Propose this method if you are expected to count and return the "male" and "female" gender fields.
db.collection.aggregate([
{
$group: {
_id: {
$year: "$registered"
},
genders: {
$push: "$gender"
}
}
},
{
$sort: {
"_id": 1
}
},
{
$project: {
_id: 1,
male: {
$size: {
$filter: {
input: "$genders",
cond: {
$eq: [
"$$this",
"male"
]
}
}
}
},
female: {
$size: {
$filter: {
input: "$genders",
cond: {
$eq: [
"$$this",
"female"
]
}
}
}
},
total: {
$size: "$genders"
}
}
}
])
Sample Mongo Playground

MongoDB needs to get total counts from all objects

my aggregate query looks like this. I need to get the total count from the value.
db.getCollection('mydesk').aggregate([
{
$match: {
"accountId": ObjectId("616ea615edc5fa4278ccb7f6"),
"val" : { $ne : null},
"deskId": { "$in": [
ObjectId("61934f7efdb9dc5a7c1c3a01"),
ObjectId("61713730857c3243ec1d257c"),
ObjectId("629d9548e0c93e34e435e7b9"),
ObjectId("616eaf613bcd9655b8035a25"),
]}
}
},
{
$project: {
item: 1,
value: { $size: "$val.shapes" },
}
}
])
I got result like this. But need to get the total counts of my value.
/* 1 */
{
"_id" : ObjectId("616fab4f12b90d59d03f380e"),
"value" : 11
}
/* 2 */
{
"_id" : ObjectId("616fbad35700980a041cd190"),
"value" : 4
}
/* 3 */
{
"_id" : ObjectId("61713752857c3243ec1d257e"),
"value" : 12
}
Needed result :
{
"totalValueCount" : 27
}
Thanks in advance
One option is to use $group to $sum up the values:
db.getCollection('mydesk').aggregate([
{
$match: {
"accountId": ObjectId("616ea615edc5fa4278ccb7f6"),
"val" : { $ne : null},
"deskId": { "$in": [
ObjectId("61934f7efdb9dc5a7c1c3a01"),
ObjectId("61713730857c3243ec1d257c"),
ObjectId("629d9548e0c93e34e435e7b9"),
ObjectId("616eaf613bcd9655b8035a25"),
]}
}
},
{
$group: {
_id: null,
total: {$sum: { $size: "$val.shapes"}},
}
},
{$project: {_id: 0, total: 1}}
])

MongoDB Count Items in array by name

I have documents like this:
{
"_id" : ObjectId("5b3ced158735f1196d73a743"),
"cid" : 1,
"foo" : [
{
"k" : "sport",
"v" : "climbing"
},
{
"k" : "sport",
"v" : "soccer"
},
{
"k" : "sport",
"v" : "soccer"
}
]
}
This Query just return the documents which contains a soccer field.
db.coll.find({foo:{$elemMatch:{ v: "soccer"}} }, {"foo.$" : 1,cid:1})
returns:
{ "_id" : ObjectId("5b3ced158735f1196d73a743"), "cid" : 1, "node" : [ { "k" : "sport", "v" : "climbing" } ] }
But I want to know, how many soccer-Elements are in each returned document. How can I count them?
db.coll.aggregate(
// Pipeline
[
// Stage 1
{
$match: {
foo: {
$elemMatch: {
v: 'soccer'
}
}
}
},
// Stage 2
{
$unwind: {
path: '$foo'
}
},
// Stage 3
{
$project: {
cid: 1,
count: {
$cond: {
if: {
$eq: ['$foo.v', 'soccer']
},
then: {
$sum: 1
},
else: 0
}
}
}
},
// Stage 4
{
$group: {
_id: '$cid',
total_count: {
$sum: '$count'
}
}
}
]
);
You can use below query to $filter and $size the filtered array to count no of matching occurrences.
db.coll.aggregate([
{"$project":{
"cid":1,
"count":{
"$size":{
"$filter":{
"input":"$foo",
"cond":{"$eq":["$$this.v","soccer"]
}
}
}
}
}}
])

using mongo aggregation how to replace the fields names [duplicate]

I have large collection of documents which represent some kind of events. Collection contains events for different userId.
{
"_id" : ObjectId("57fd7d00e4b011cafdb90d22"),
"userId" : "123123123",
"userType" : "mobile",
"event_type" : "clicked_ok",
"country" : "US",
"timestamp" : ISODate("2016-10-12T00:00:00.308Z")
}
{
"_id" : ObjectId("57fd7d00e4b011cafdb90d22"),
"userId" : "123123123",
"userType" : "mobile",
"event_type" : "clicked_cancel",
"country" : "US",
"timestamp" : ISODate("2016-10-12T00:00:00.308Z")
}
At midnight I need to run aggregation for all documents for the previous day. Documents need to aggregated in the way so I could get number of different events for particular userId.
{
"userId" : "123123123",
"userType" : "mobile",
"country" : "US",
"clicked_ok" : 23,
"send_message" : 14,
"clicked_cancel" : 100,
"date" : "2016-11-24",
}
During aggregation I need to perform two things:
calculate number of events for particular userId
add "date" text fields with date
Any help is greatly appreciated! :)
you can do this with aggregation like this :
db.user.aggregate([
{
$match:{
$and:[
{
timestamp:{
$gte: ISODate("2016-10-12T00:00:00.000Z")
}
},
{
timestamp:{
$lt: ISODate("2016-10-13T00:00:00.000Z")
}
}
]
}
},
{
$group:{
_id:"$userId",
timestamp:{
$first:"$timestamp"
},
send_message:{
$sum:{
$cond:[
{
$eq:[
"$event_type",
"send_message"
]
},
1,
0
]
}
},
clicked_cancel:{
$sum:{
$cond:[
{
$eq:[
"$event_type",
"clicked_cancel"
]
},
1,
0
]
}
},
clicked_ok:{
$sum:{
$cond:[
{
$eq:[
"$event_type",
"clicked_ok"
]
},
1,
0
]
}
}
}
},
{
$project:{
date:{
$dateToString:{
format:"%Y-%m-%d",
date:"$timestamp"
}
},
userId:1,
clicked_cancel:1,
send_message:1,
clicked_ok:1
}
}
])
explanation:
keep only document for a specific day in $match stage
group doc by userId and count occurrences for each event in $group stage
finally format the timestamp field into yyyy_MM-dd format in $project stage
for the data you provided, this will output
{
"_id":"123123123",
"send_message":0,
"clicked_cancel":1,
"clicked_ok":1,
"date":"2016-10-12"
}
Check the following query
db.sandbox.aggregate([{
$group: {
_id: {
userId: "$userId",
date: {
$dateToString: { format: "%Y-%m-%d", date: "$timestamp" }}
},
send_message: {
$sum: {
$cond: { if: { $eq: ["$event_type", "send_message"] }, then: 1, else: 0 } }
},
clicked_cancel: {
$sum: {
$cond: { if: { $eq: ["$event_type", "clicked_cancel"] }, then: 1, else: 0 }
}
},
clicked_ok: {
$sum: {
$cond: { if: { $eq: ["$event_type", "clicked_ok"] }, then: 1, else: 0 }
}
}
}
}])

MongoDB aggregate using distinct

I have an aggregation that groups on a date and creates a sum.
db.InboundWorkItems.aggregate({
$match: {
notificationDate: {
$gte: ISODate("2013-07-18T04:00:00Z")
},
dropType: 'drop'
}
}, {
$group: {
_id: {
notificationDate: "$notificationDate"
},
nd: {
$first: "$notificationDate"
},
count: {
$sum: 1
}
}
}, {
$sort: {
nd: 1
}
})
The output is
"result" : [
{
"_id" : {
"notificationDate" : ISODate("2013-07-18T04:00:00Z")
},
"nd" : ISODate("2013-07-18T04:00:00Z"),
"count" : 484
},
{
"_id" : {
"notificationDate" : ISODate("2013-07-19T04:00:00Z")
},
"nd" : ISODate("2013-07-19T04:00:00Z"),
"count" : 490
},
{
"_id" : {
"notificationDate" : ISODate("2013-07-20T04:00:00Z")
},
"nd" : ISODate("2013-07-20T04:00:00Z"),
"count" : 174
},
{
"_id" : {
"notificationDate" : ISODate("2013-07-21T04:00:00Z")
},
"nd" : ISODate("2013-07-21T04:00:00Z"),
"count" : 6
},
{
"_id" : {
"notificationDate" : ISODate("2013-07-22T04:00:00Z")
},
"nd" : ISODate("2013-07-22T04:00:00Z"),
"count" : 339
},
{
"_id" : {
"notificationDate" : ISODate("2013-07-23T04:00:00Z")
},
"nd" : ISODate("2013-07-23T04:00:00Z"),
"count" : 394
},
{
"_id" : {
"notificationDate" : ISODate("2013-07-24T04:00:00Z")
},
"nd" : ISODate("2013-07-24T04:00:00Z"),
"count" : 17
}
],
"ok" : 1
so far so good. What I need to do now is to keep this, but also add a distinct in the criteria (for argument's sake I want to use AccountId). The would yield me the count of the grouped dates only using distinct AccountId. Is distinct even possible within the aggregation framework?
you can use two group commands in the pipeline, the first to group by accoundId, followed by second group that does usual operation. something like this:
db.InboundWorkItems.aggregate(
{$match: {notificationDate: {$gte: ISODate("2013-07-18T04:00:00Z")}, dropType:'drop' }},
{$group: {_id:"accountId",notificationDate:"$notificationDate"}},
{$group: {_id:1, nd: {$first:"$notificationDate"}, count:{$sum:1} }},
{$sort:{nd:1}} )
db.InboundWorkItems.aggregate({
$match: {
notificationDate: {
$gte: ISODate("2013-07-18T04:00:00Z")
},
dropType: 'drop'
}
}, {
$group: {
_id: "$AccountId",
notificationDate: {
$max: "$notificationDate"
},
dropType: {
$max: "$dropType"
}
}
}, {
$group: {
_id: {
notificationDate: "$notificationDate"
},
nd: {
$first: "$notificationDate"
},
count: {
$sum: 1
}
}
}, {
$sort: {
nd: 1
}
})
I think you might actually be looking for a single group (English is a bit confusing) like so:
db.InboundWorkItems.aggregate({
$match: {
notificationDate: {
$gte: ISODate("2013-07-18T04:00:00Z")
},
dropType: 'drop'
}
}, {
$group: {
_id: {
notificationDate: "$notificationDate", accountId: '$accountId'
},
nd: {
$first: "$notificationDate"
},
count: {
$sum: 1
}
}
}, {
$sort: {
nd: 1
}
})
I add the compound _id in the $group because of:
The would yield me the count of the grouped dates only using distinct AccountId.
Which makes me think you want the grouped date count by account ID.