I have a booking table and I want to get number of bookings in a month i.e. group by month.
And I am confused that how to get month from a date.
Here is my schema:
{
"_id" : ObjectId("5485dd6af4708669af35ffe6"),
"bookingid" : 1,
"operatorid" : 1,
...,
"bookingdatetime" : "2012-10-11T07:00:00Z"
}
{
"_id" : ObjectId("5485dd6af4708669af35ffe7"),
"bookingid" : 2,
"operatorid" : 1,
...,
"bookingdatetime" : "2014-07-26T05:00:00Z"
}
{
"_id" : ObjectId("5485dd6af4708669af35ffe8"),
"bookingid" : 3,
"operatorid" : 2,
...,
"bookingdatetime" : "2014-03-17T11:00:00Z"
}
And this is I have tried:
db.booking.aggregate([
{ $group: {
_id: new Date("$bookingdatetime").getMonth(),
numberofbookings: { $sum: 1 }
}}
])
but it returns:
{ "_id" : NaN, "numberofbookings" : 3 }
Where am I going wrong?
You need to use the $month keyword in your group. Your new Date().getMonth() call will only happen once, and will try and create a month out of the string "$bookingdatetime".
db.booking.aggregate([
{$group: {
_id: {$month: "$bookingdatetime"},
numberofbookings: {$sum: 1}
}}
]);
You can't include arbitrary JavaScript in your aggregation pipeline, so because you're storing bookingdatetime as a string instead of a Date you can't use the $month operator.
However, because your date strings follow a strict format, you can use the $substr operator to extract the month value from the string:
db.test.aggregate([
{$group: {
_id: {$substr: ['$bookingdatetime', 5, 2]},
numberofbookings: {$sum: 1}
}}
])
Outputs:
{
"result" : [
{
"_id" : "03",
"numberofbookings" : 1
},
{
"_id" : "07",
"numberofbookings" : 1
},
{
"_id" : "10",
"numberofbookings" : 1
}
],
"ok" : 1
}
Starting in Mongo 4, you can use the $toDate operator to convert your string to date (building on the answer given by Will Shaver):
// { date: "2012-10-11T07:00:00Z" }
// { date: "2012-10-23T18:30:00Z" }
// { date: "2012-11-02T21:30:00Z" }
db.bookings.aggregate([
{ $group: {
_id: { month: { $month: { $toDate: "$date" } } },
bookings: { $sum: 1 }
}}
])
// { "_id" : { "month" : 10 }, "bookings" : 2 }
// { "_id" : { "month" : 11 }, "bookings" : 1 }
If you intend to get groups by months even if your data spreads over multiple years, you can use a combination of $dateFromString and $dateToString (in order to format dates as "%Y-%m" (e.g. 2012-10)):
// { date: "2012-10-11T07:00:00Z" }
// { date: "2012-10-23T18:30:00Z" }
// { date: "2012-11-02T21:30:00Z" }
// { date: "2013-01-11T18:30:00Z" }
// { date: "2013-10-07T14:15:00Z" }
db.bookings.aggregate([
{ $group: {
_id: {
$dateToString: {
date: { $dateFromString: { dateString: "$date" } },
format: "%Y-%m"
}
},
bookings: { $count: {} } // or { $sum: 1 } prior to Mongo 5
}}
])
// { _id: "2012-10", bookings: 2 }
// { _id: "2012-11", bookings: 1 }
// { _id: "2013-01", bookings: 1 }
// { _id: "2013-10", bookings: 1 }
This:
first transforms the string date into a string: $dateFromString: { dateString: "$date" }
in order to format the date as %Y-%m: $dateToString: { date: { }, format: "%Y-%m" }
the combination of which ($dateFromString/$dateToString) is used as our group key
and finally we count our grouped bookings with $count (or { $sum: 1 } prior to Mongo 5)
Related
I have a timeseries data in mongodb and I want to calculate the sum per day between two given dates of every sensor after I have calculated the difference between the max and min reading of the day by the sensor, using the below query
db.ts_events.aggregate([
{ $match: {
"metadata.assetCode": { $in: [
"h"
]
},
"timestamp": { $gte: ISODate("2022-07-01T02:39:02.000+0000"), $lte: ISODate("2022-07-01T06:30:00.000+0000")
}
}
},
{
$project: {
date: {
$dateToParts: { date: "$timestamp"
}
},
activeEnergy: 1,
"metadata.meterId": 1,
}
},
{
$group: {
_id: {
date: {
year: "$date.year",
month: "$date.month",
day: "$date.day"
},
meter: "$metadata.meterId",
},
maxValue: { $max: "$activeEnergy"
},
minValue: { $min: "$activeEnergy"
},
}
},
{
$addFields: {
differnce: { $subtract: [
"$maxValue",
"$minValue"
]
},
}
},
])
I get the following output
{
"_id" : {
"date" : {
"year" : NumberInt(2022),
"month" : NumberInt(7),
"day" : NumberInt(1)
},
"meter" : "B"
},
"maxValue" : 1979.78,
"minValue" : 1979.77,
"differnce" : 0.009999999999990905
}
{
"_id" : {
"date" : {
"year" : NumberInt(2022),
"month" : NumberInt(7),
"day" : NumberInt(1)
},
"meter" : "A"
},
"maxValue" : 7108.01,
"minValue" : 7098.18,
"differnce" : 9.829999999999927
}
I want to calculate the sum of both meter difference how can I do that?
Apart from this one more problem I am facing which I am putting forward in this edited version, as you can see date is in ISODate format but I will be getting a unix epoch format,
I tried to tweak the query but it is not working
db.ts_events.aggregate([
{
$project: {
date: {
$dateToParts: {
date: "$timestamp"
}
},
activeEnergy: 1,
"metadata.meterId": 1,
"metadata.assetCode": 1,
"timestamp": 1,
startDate: {
$toDate: 1656686342000
},
endDate: {
$toDate: 1656700200000
}
}
},
{
$match: {
"metadata.assetCode": {
$in: [
"h"
]
},
"timestamp": {
$gte: "$startDate", $lte: "$endDate"
}
}
},
{
$group: {
_id: {
date: {
year: "$date.year",
month: "$date.month",
day: "$date.day"
},
meter: "$metadata.meterId",
},
maxValue: {
$max: "$activeEnergy"
},
minValue: {
$min: "$activeEnergy"
},
}
},
{
$addFields: {
differnce: {
$subtract: [
"$maxValue",
"$minValue"
]
},
}
},
{
$group: {
_id: "$_id.date", res: {
$push: '$$ROOT'
}, differnceSum: {
$sum: '$differnce'
}
}
}
])
Can you help me solve the problem?
One option is to add one more step like this (depending on your expected output format):
This step will group together your separate documents, into one document, which will allow you to sum their values together. Be careful when grouping, since now it is a one big document and a document has a size limit.
We use $$ROOT to keep the original document structure (here inside a new array)
{$group: {_id: 0, res: {$push: '$$ROOT'}, differnceSum: {$sum: $differnce'}}}
I have the following documents stored in a collection:
{
"REQUESTTIMESTAMP" : "26-JUN-19 01.34.10.095000000 AM",
"UNHANDLED_INTENT" : 0,
"USERID" : "John",
"START_OF_INTENT_SKILL_CONVERSATION" : 0,
"PROPERTYCODE" : ""
}
I want to group this by the hour(which we will get from 'REQUESTTIMESTAMP')
Earlier, I had this document stored in the collection in a different way, where I had a separate field for hours, and used that hours field to group:
Previous aggregation query :
collection.aggregate([
{'$match': query}, {
'$group': {
"_id": {
"hour": "$hour",
"sessionId": "$sessionId"
}
}
}, {
"$group": {
"_id": "$_id.hour",
"count": {
"$sum": 1
}
}
}
])
Previous collection structure:
{
"timestamp" : "1581533210921",
"date" : "12-02-2020",
"hour" : "13",
"month" : "02",
"time" : "13:46:50",
"weekDay" : "Wednesday",
"__v" : 0
}
How can I do the above same Previous aggregation query with the new document structure (After extracting hours from 'REQUESTTIMESTAMP' field?)
You should convert your timestamp to Date object then take hour from your date object.
db.collection.aggregate([{
'$match': query
}, {
$project: {
date: {
$dateFromString: {
dateString: '$REQUESTTIMESTAMP',
format: "%m-%d-%Y" //This should be your date format
}
}
}
}, {
$group: {
_id: {
hour: {
$hour: "$date"
}
}
}
}])
Problem is months names are not supported by MongoDB. Either you write a lot of code or you use libraries like moments.js. First update your REQUESTTIMESTAMP to proper Date object, then you can group it.
db.collection.find().forEach(function (doc) {
var d = moment(doc.REQUESTTIMESTAMP, "DD-MMM-YY hh.mm.ss.SSS a");
db.collection.updateOne(
{ _id: doc._id },
{ $set: { date: d.toDate() } }
);
})
db.collection.aggregate([
{
$group: {
_id: { $hour: "$date" },
count: { $sum: 1 }
}
}
])
In case if you're not able to update DB with actual date field & still wanted to proceed with existing format, try this query it will add hour field extracted from given string field REQUESTTIMESTAMP :
Query :
db.collection.aggregate([
{
$addFields: {
hour: {
$let: {
/** split string into three parts date + hours + AM/PM */
vars: { hour: { $slice: [{ $split: ["$REQUESTTIMESTAMP", " "] }, 1, 2] } },
in: {
$cond: [{ $in: ["AM", "$$hour"] }, // Check AM exists in array
{ $toInt: { $substr: [{ $arrayElemAt: ["$$hour", 0] }, 0, 2] } }, // If yes then return int of first 2 letters of first element in hour array
{ $add: [{ $toInt: { $substr: [{ $arrayElemAt: ["$$hour", 0] }, 0, 2] } }, 12] } ] // If PM add 12 to int of first 2 letters of first element in hour array
}
}
}
}
}
])
Test : MongoDB-Playground
Lets asume I have this set of data:
{ ValidFrom: "2019-03-25T16:01:55.714+0000", ValidTo: "2019-03-25T16:01:55.714+0000" },
{ ValidFrom: "2019-03-26T16:01:55.714+0000", ValidTo: "2019-03-25T16:01:55.714+0000" },
{ ValidFrom: "2019-03-25T16:01:55.714+0000", ValidTo: "2019-03-27T16:01:55.714+0000" }
I would like to see this result with one query:
{ "Day": "2019-03-25", ValidFromCount: 2, ValidToCount: 2 },
{ "Day": "2019-03-26", ValidFromCount: 1, ValidToCount: 0 },
{ "Day": "2019-03-27", ValidFromCount: 0, ValidToCount: 1 }
Currently I wrote this aggregation but I am stuck now:
{
$addFields: {
ValidFromDay: { $dateToString: { format: "%Y-%m-%d", date: "$ValidFrom" } },
ValidUntilDay: { $dateToString: { format: "%Y-%m-%d", date: "$ValidUntil" } }
}
},
{
$group : {
_id: { FromDate: '$ValidFromDay', ToDate: '$ValidUntilDay' },
Count: { "$sum": 1 },
}
},
{
$group : {
_id: null,
FromDates: { "$addToSet": { "Date": "$_id.FromDate", "FromCount": { "$sum": "$Count" } } },
ToDate: { "$addToSet": { "Date": "$_id.ToDate", "UntilCount": "$Count" } }
}
}
Is it possible to produce the results I am looking for in some way?
You need to add an array of 2 fields, not just 2 fields. It will let you to unwind it and count by date:
{
$addFields: {
boundary: [
{ day: {$dateToString: { format: "%Y-%m-%d", date: "$ValidFrom" } }, from: 1 },
{ day: { $dateToString: { format: "%Y-%m-%d", date: "$ValidTo" } } , to: 1 }
]
}
},
{
$unwind: "$boundary"
},
{
$group: {
_id: "$boundary.day",
ValidFromCount: {$sum: "$boundary.from"},
ValidToCount: {$sum: "$boundary.to"},
}
}
I think this will do what you want. There are three stages to the pipeline. A$project that constructs a separate day, month and year fields.
> projector
{
"$project" : {
"day" : {
"$dayOfMonth" : "$ValidFrom"
},
"month" : {
"$month" : "$ValidFrom"
},
"year" : {
"$year" : "$ValidFrom"
},
"ValidFrom" : 1
}
}
Then a $group to create the totals and count them by individual day by using an _id of {year, month, day}.
> grouper
{
"$group" : {
"_id" : {
"year" : "$year",
"month" : "$month",
"day" : "$day"
},
"ValidFromCount" : {
"$sum" : 1
},
"ValidToCount" : {
"$sum" : 1
}
}
}
Finally, a projection to eliminate the spurious fields and also get the Day field into the format you want.
> converter
{
"$project" : {
"_id" : 0,
"Day" : {
"$concat" : [
{
"$toString" : "$_id.year"
},
"-",
{
"$toString" : "$_id.month"
},
"-",
{
"$toString" : "$_id.day"
}
]
},
"ValidFromCount" : 1,
"ValidToCount" : 1
}
}
to run just execute (I created your data in collection so2):
> db.so2.find()
{ "_id" : ObjectId("5ca75adfd1a64a2919883a8d"), "ValidFrom" : "2019-03-25T16:01:55.714+0000", "ValidTo" : "2019-03-25T16:01:55.714+0000" }
{ "_id" : ObjectId("5ca75adfd1a64a2919883a8e"), "ValidFrom" : "2019-03-26T16:01:55.714+0000", "ValidTo" : "2019-03-25T16:01:55.714+0000" }
{ "_id" : ObjectId("5ca75adfd1a64a2919883a8f"), "ValidFrom" : "2019-03-25T16:01:55.714+0000", "ValidTo" : "2019-03-27T16:01:55.714+0000" }
>
> db.so3.aggregate([projector,grouper,converter])
{ "ValidFromCount" : 1, "ValidToCount" : 1, "Day" : "2019-3-26" }
{ "ValidFromCount" : 2, "ValidToCount" : 2, "Day" : "2019-3-25" }
>
I'm not sure if the test data you supplied is correct because the second document appears to go back in time so the ValidTo is before the ValidFrom.
I am trying to derive a query to get a count of distinct values and display the relevant fields. The grouping is done by the tempId and the date where the tempId can occur one-to-many times within a single day and within a time frame.
following is my approach,
db.getCollection('targetCollection').aggregate(
{
$match:{
"user.vendor": 'vendor1',
tool: "tool1",
date: {
"$gte": ISODate("2016-04-01"),
"$lt": ISODate("2016-04-04")
}
}
},
{
$group:{
_id: {
tempId: '$tempId',
month: { $month: "$date" },
day: { $dayOfMonth: "$date" },
year: { $year: "$date" }
},
count: {$sum : 1}
}
},
{
$group:{
_id: 1,
count: {$sum : 1}
}
})
This query generates the following output,
{
"_id" : 1,
"count" : 107
}
Which is correct but, I would like to show them separated by the date and with the particular count for that date. For example something like this,
{
"date" : 2016-04-01
"count" : 50
},
{
"date" : 2016-04-02
"count" : 30
},
{
"date" : 2016-04-03
"count" : 27
}
P.S. I am not sure how to put this question together as I am quite new to this technology. Please let me know if refinements are required in the question.
Following is the sample data of the mongodb collection that I am trying to query,
{
"_id" : 1,
"tempId" : "temp1",
"user" : {
"_id" : "user1",
"email" : "user1#email.com",
"vendor" : "vendor1"
},
"tool" : "tool1",
"date" : ISODate("2016-03-09T08:30:42.403Z")
},...
I have come up with the solution myself. What i did was,
I first grouped by the tempId and the date
Then I grouped by the date
This printed out the daily distinct count of tempId, the result I want. The query is as follows,
db.getCollection('targetCollection').aggregate(
{
$match:{
"user.vendor": 'vendor1',
tool: "tool1",
date: {
"$gte": ISODate("2016-04-01"),
"$lt": ISODate("2016-04-13")
}
}
},
{
$group:{
_id: {
tempId: "$tempId",
month: { $month: "$date" },
day: { $dayOfMonth: "$date" },
year: { $year: "$date" }
},
count: {$sum : 1}
}
},
{
$group:{
_id: {
month:"$_id.month" ,
day: "$_id.day" ,
year: "$_id.year"
},
count: {$sum : 1}
}
})
group them via date
db.getCollection('targetCollection').aggregate([
{
$match:{
"user.vendor": 'vendor1',
tool: "tool1",
date: {
"$gte": ISODate("2016-04-01"),
"$lt": ISODate("2016-04-04")
}
}
},
{
$group: {
_id: {
date: "$date",
tempId: "$tempId"
},
count: { $sum: 1 }
}
}
]);
Sample Documents:
{ time: ISODate("2013-10-10T20:55:36Z"), value: 1 }
{ time: ISODate("2013-10-10T22:43:16Z"), value: 2 }
{ time: ISODate("2013-10-11T19:12:66Z"), value: 3 }
{ time: ISODate("2013-10-11T10:15:38Z"), value: 4 }
{ time: ISODate("2013-10-12T04:15:38Z"), value: 5 }
It's easy to get the aggregated results that is grouped by date.
But what I want is to query results that returns a running total
of the aggregation, like:
{ time: "2013-10-10" total: 3, runningTotal: 3 }
{ time: "2013-10-11" total: 7, runningTotal: 10 }
{ time: "2013-10-12" total: 5, runningTotal: 15 }
Is this possible with the MongoDB Aggregation?
EDIT: Since MongoDB v5.0 the prefered approach would be to use the new $setWindowFields aggregation stage as shared by Xavier Guihot.
This does what you need. I have normalised the times in the data so they group together (You could do something like this). The idea is to $group and push the time's and total's into separate arrays. Then $unwind the time array, and you have made a copy of the totals array for each time document. You can then calculated the runningTotal (or something like the rolling average) from the array containing all the data for different times. The 'index' generated by $unwind is the array index for the total corresponding to that time. It is important to $sort before $unwinding since this ensures the arrays are in the correct order.
db.temp.aggregate(
[
{
'$group': {
'_id': '$time',
'total': { '$sum': '$value' }
}
},
{
'$sort': {
'_id': 1
}
},
{
'$group': {
'_id': 0,
'time': { '$push': '$_id' },
'totals': { '$push': '$total' }
}
},
{
'$unwind': {
'path' : '$time',
'includeArrayIndex' : 'index'
}
},
{
'$project': {
'_id': 0,
'time': { '$dateToString': { 'format': '%Y-%m-%d', 'date': '$time' } },
'total': { '$arrayElemAt': [ '$totals', '$index' ] },
'runningTotal': { '$sum': { '$slice': [ '$totals', { '$add': [ '$index', 1 ] } ] } },
}
},
]
);
I have used something similar on a collection with ~80 000 documents, aggregating to 63 results. I am not sure how well it will work on larger collections, but I have found that performing transformations(projections, array manipulations) on aggregated data does not seem to have a large performance cost once the data is reduced to a manageable size.
here is another approach
pipeline
db.col.aggregate([
{$group : {
_id : { time :{ $dateToString: {format: "%Y-%m-%d", date: "$time", timezone: "-05:00"}}},
value : {$sum : "$value"}
}},
{$addFields : {_id : "$_id.time"}},
{$sort : {_id : 1}},
{$group : {_id : null, data : {$push : "$$ROOT"}}},
{$addFields : {data : {
$reduce : {
input : "$data",
initialValue : {total : 0, d : []},
in : {
total : {$sum : ["$$this.value", "$$value.total"]},
d : {$concatArrays : [
"$$value.d",
[{
_id : "$$this._id",
value : "$$this.value",
runningTotal : {$sum : ["$$value.total", "$$this.value"]}
}]
]}
}
}
}}},
{$unwind : "$data.d"},
{$replaceRoot : {newRoot : "$data.d"}}
]).pretty()
collection
> db.col.find()
{ "_id" : ObjectId("4f442120eb03305789000000"), "time" : ISODate("2013-10-10T20:55:36Z"), "value" : 1 }
{ "_id" : ObjectId("4f442120eb03305789000001"), "time" : ISODate("2013-10-11T04:43:16Z"), "value" : 2 }
{ "_id" : ObjectId("4f442120eb03305789000002"), "time" : ISODate("2013-10-12T03:13:06Z"), "value" : 3 }
{ "_id" : ObjectId("4f442120eb03305789000003"), "time" : ISODate("2013-10-11T10:15:38Z"), "value" : 4 }
{ "_id" : ObjectId("4f442120eb03305789000004"), "time" : ISODate("2013-10-13T02:15:38Z"), "value" : 5 }
result
{ "_id" : "2013-10-10", "value" : 3, "runningTotal" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "runningTotal" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "runningTotal" : 15 }
>
Here is a solution without pushing previous documents into a new array and then processing them. (If the array gets too big then you can exceed the maximum BSON document size limit, the 16MB.)
Calculating running totals is as simple as:
db.collection1.aggregate(
[
{
$lookup: {
from: 'collection1',
let: { date_to: '$time' },
pipeline: [
{
$match: {
$expr: {
$lt: [ '$time', '$$date_to' ]
}
}
},
{
$group: {
_id: null,
summary: {
$sum: '$value'
}
}
}
],
as: 'sum_prev_days'
}
},
{
$addFields: {
sum_prev_days: {
$arrayElemAt: [ '$sum_prev_days', 0 ]
}
}
},
{
$addFields: {
running_total: {
$sum: [ '$value', '$sum_prev_days.summary' ]
}
}
},
{
$project: { sum_prev_days: 0 }
}
]
)
What we did: within the lookup we selected all documents with smaller datetime and immediately calculated the sum (using $group as the second step of lookup's pipeline). The $lookup put the value into the first element of an array. We pull the first array element and then calculate the sum: current value + sum of previous values.
If you would like to group transactions into days and after it calculate running totals then we need to insert $group to the beginning and also insert it into $lookup's pipeline.
db.collection1.aggregate(
[
{
$group: {
_id: {
$substrBytes: ['$time', 0, 10]
},
value: {
$sum: '$value'
}
}
},
{
$lookup: {
from: 'collection1',
let: { date_to: '$_id' },
pipeline: [
{
$group: {
_id: {
$substrBytes: ['$time', 0, 10]
},
value: {
$sum: '$value'
}
}
},
{
$match: {
$expr: {
$lt: [ '$_id', '$$date_to' ]
}
}
},
{
$group: {
_id: null,
summary: {
$sum: '$value'
}
}
}
],
as: 'sum_prev_days'
}
},
{
$addFields: {
sum_prev_days: {
$arrayElemAt: [ '$sum_prev_days', 0 ]
}
}
},
{
$addFields: {
running_total: {
$sum: [ '$value', '$sum_prev_days.summary' ]
}
}
},
{
$project: { sum_prev_days: 0 }
}
]
)
The result is:
{ "_id" : "2013-10-10", "value" : 3, "running_total" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "running_total" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "running_total" : 15 }
Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator:
// { time: ISODate("2013-10-10T20:55:36Z"), value: 1 }
// { time: ISODate("2013-10-10T22:43:16Z"), value: 2 }
// { time: ISODate("2013-10-11T12:12:66Z"), value: 3 }
// { time: ISODate("2013-10-11T10:15:38Z"), value: 4 }
// { time: ISODate("2013-10-12T05:15:38Z"), value: 5 }
db.collection.aggregate([
{ $group: {
_id: { $dateToString: { format: "%Y-%m-%d", date: "$time" } },
total: { $sum: "$value" }
}},
// e.g.: { "_id" : "2013-10-11", "total" : 7 }
{ $set: { "date": "$_id" } }, { $unset: ["_id"] },
// e.g.: { "date" : "2013-10-11", "total" : 7 }
{ $setWindowFields: {
sortBy: { date: 1 },
output: {
running: {
$sum: "$total",
window: { documents: [ "unbounded", "current" ] }
}
}
}}
])
// { date: "2013-10-11", total: 7, running: 7 }
// { date: "2013-10-10", total: 3, running: 10 }
// { date: "2013-10-12", total: 5, running: 15 }
Let's focus on the $setWindowFields stage that:
chronologically $sorts grouped documents by date: sortBy: { date: 1 }
adds the running field in each document (output: { running: { ... }})
which is the $sum of totals ($sum: "$total")
on a specified span of documents (the window)
which is in our case any previous document: window: { documents: [ "unbounded", "current" ] } }
as defined by [ "unbounded", "current" ] meaning the window is all documents seen between the first document (unbounded) and the current document (current).