I am trying to get the average time interval for all the documents in a collection.
The output I am trying to figure out is in this format Average: HH:MM:SS
It will compute the time interval for each document and then aggregate it to get the average time interval for the whole data set.
This is sample data.
{
"_id" : ObjectId("60dc1e7d72296329347b2bbe"),
"name": "firstupdate",
"starttime" : ISODate("2021-06-30T07:38:06.926Z"),
"endtime" : ISODate("2021-06-30T12:35:08.265Z"),
},
{
"_id" : ObjectId("60dc1e7d72296329347b2bce"),
"name": "secondupdate",
"starttime" : ISODate("2021-07-29T07:41:06.926Z"),
"endtime" : ISODate("2021-07-30T01:52:07.937Z"),
},
{
"_id" : ObjectId("60dc1ff472d9f809d6d2f23e"),
"name": "thirdupdate",
"starttime" : ISODate("2021-07-15T07:43:06.926Z"),
"endtime" : ISODate("2021-07-14T10:34:13.269Z"),
},
{
"_id" : ObjectId("60dc204e03362e293a5f5014"),
"name": "fourthupdate",
"starttime" : ISODate("2021-07-21T05:11:23.654Z"),
"endtime" : ISODate("2021-07-21T09:46:33.000Z"),
},
{
"_id" : ObjectId("60dc21436a9e0e09f9a551ae"),
"name": "fifthupdate",
"starttime" : ISODate("2021-07-07T02:34:06.926Z"),
"endtime" : ISODate("2021-07-07T08:11:06.926Z"),
},
Thank you in advance
You can use this one:
db.collection.aggregate([
{
$group: {
_id: null,
diff: {
$avg: {
$dateDiff: {
startDate: "$starttime",
endDate: "$endtime",
unit: "millisecond"
}
}
}
}
},
{
$set: {
diff: {
$dateToString: {
date: { $toDate: "$diff" },
format: "%H:%M:%S"
}
}
}
}
])
Optionally use format: "%j %H:%M:%S",
%j is day of year, i.e. this would return valid output up to one year. Otherwise you would need some Date math.
Mongo Playground
Related
I have a collection "employees" with sample entries as below-
{
"_id" : ObjectId("62ccaa238a322322211"),
"employeeId" : "1234",
"date" : ISODate("2022-07-11T12:00:00.000+0000"),
"hours" : 15.0,
"createdBy" : "user1",
"createdDate" : ISODate("2023-02-19T21:54:27.213+0000"),
"updatedBy" : "user1",
"updatedDate" : ISODate("2023-02-19T21:54:27.213+0000"),
},
{
"_id" : ObjectId("62ccaa238a322388821"),
"employeeId" : "1234",
"date" : ISODate("2022-07-10T12:00:00.000+0000"),
"hours" : 25.0,
"createdBy" : "user1",
"createdDate" : ISODate("2023-02-19T22:54:27.213+0000"),
"updatedBy" : "user1",
"updatedDate" : ISODate("2023-02-19T22:54:27.213+0000"),
}
I am trying to get sum of hours for each employee along with the list of dates for those entries
{
employeeId :"1234"
hours : 40 // sum of the hours from both entries
dates : [2022-07-11, 2022-07-10] // list of `date` column
}
I tried below one but dont know how to adapt to get employeeId and sum
db.getCollection("employees").aggregate( [
{ $match : {
$and :[
{"employeeId" : "1234"},
{"date" : { $lte: new ISODate("2023-02-19") }}
]}},
{
$group:
{
"_id": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$date"
}
},
totalAmount: { $sum: "$hours" }
}
}
] )
You should group by employeeId instead of date.
db.collection.aggregate([
{
$group: {
_id: "$employeeId",
hours: {
$sum: "$hours"
},
dates: {
$push: {
$dateTrunc: {
date: "$date",
unit: "day"
}
}
}
}
}
])
Mongo Playground
I have a MongoDB aggregation pipeline that has been frustrating me for a while now, because it never seems to be accurate or correct to my needs. The aim is to count the number of new unique users each day per chatbot, starting from the very beginning.
Here's what my pipeline looks like right now.
[
{
"$project" : {
"_id" : 0,
"bot_id" : 1,
"customer_id" : 1,
"timestamp" : {
"$ifNull" : [
'$incoming_log.created_at', '$outcome_log.created_at'
]
}
}
},
{
"$project" : {
"customer_id" : 1,
"bot_id" : 1,
"timestamp" : {
"$dateFromString" : {
"dateString" : {
"$substr" : [
"$timestamp", 0, 10
]
}
}
}
}
},
{
"$group" : {
"_id" : "$customer_id",
"timestamp" : {
"$first" : "$timestamp"
},
"bot_id" : {
"$addToSet" : "$bot_id"
}
}
},
{
"$unwind" : "$bot_id"
},
{
"$group" : {
"_id" : {
"bot_id" : "$bot_id",
"customer_id" : "$_id"
},
"timestamp" : {
"$first" : "$timestamp"
}
}
},
{
"$project" : {
"_id" : 0,
"timestamp" : 1,
"customer_id" : "$_id.customer_id",
"bot_id" : "$_id.bot_id"
}
},
{
"$group" : {
"_id": {
"timestamp" : "$timestamp",
"bot_id" : "$bot_id"
},
"new_users" : {
"$sum" : 1
}
}
},
{
"$project" : {
"_id" : 0,
"timestamp" : "$_id.timestamp",
"bot_id" : "$_id.bot_id",
"new_users" : 1
}
}
]
Some sample data for an idea of what the data looks like...
{
"mid" : "...",
"bot_id" : "...",
"bot_name" : "JOBBY",
"customer_id" : "U122...",
"incoming_log" : {
"created_at" : ISODate("2020-12-08T09:14:16.237Z"),
"event_payload" : "",
"event_type" : "text"
},
"outcome_log" : {
"created_at" : ISODate("2020-12-08T09:14:18.145Z"),
"distance" : 0.25,
"incoming_msg" : "🥺"
}
}
My expected outcome is something along the lines of:
{
"new_users" : 1187.0,
"timestamp" : ISODate("2021-01-27T00:00:00.000Z"),
"bot_id" : "5ffd......."
},
{
"new_users" : 1359.0,
"timestamp" : ISODate("2021-01-27T00:00:00.000Z"),
"bot_id" : "6def......."
}
Have I overcomplicated my pipeline somewhere? I seem to get a reasonable number of new users per bot each day, but for some reason my colleague tells me that the number is too high. I need some tips, please!
I have really no idea what you are looking for.
"The aim is to count the number of new unique users each day per chatbot, starting from the very beginning."
What is "new unique users"? What do you mean by "starting from the very beginning"? You ask for count per day but you use {"$group": {"_id": "$customer_id", "timestamp": { "$first": "$timestamp" } } }
For me your grouping does not make any sense. With only one single sample document, it is almost impossible to guess what you like to count.
Regarding group per day: I prefer to work always with Date values, rather than strings. It is less error prone. Maybe you have to consider time zones, because UTC midnight is not your local midnight. When you work with Dates then you have better control over it.
The $project stages are useless when you do $group afterwards. Typically you have only one $project stage at the end.
So, put something to start.
db.collection.aggregate([
{
$set: {
day: {
$dateToParts: {
date: { $ifNull: ["$incoming_log.created_at", "$outcome_log.created_at"] }
}
}
}
},
{
$group: {
_id: "$customer_id",
timestamp: {$min: { $dateFromParts: { year: "$day.year", month: "$day.month", day: "$day.day" } }}
}
}
]);
I have a MongoDB collection named Bookings
{
"_id" : ObjectId("5fca982d219fee6f00e631a0"),
"price" : 45.9,
"createdAt" : ISODate("2020-12-04T20:12:29.117Z")
}
{
"_id" : ObjectId("5fca990b219fee6f00e631a1"),
"price" : 45.9,
"createdAt" : ISODate("2020-12-04T20:16:11.925Z")
}
{
"_id" : ObjectId("5fcab925a912a2064fe7b916"),
"price" : 45.9,
"createdAt" : ISODate("2020-12-04T22:33:09.958Z")
}
{
"_id" : ObjectId("5fcab938a912a2064fe7b917"),
"price" : 45.9,
"createdAt" : ISODate("2020-12-04T22:33:28.641Z")
}
{
"_id" : ObjectId("5fcab94aa912a2064fe7b918"),
"createdAt" : ISODate("2020-12-04T22:33:46.118Z")
}
{
"_id" : ObjectId("5fcb73e0e396cf18e6141dc6"),
"price" : 45.9,
"createdAt" : ISODate("2020-12-05T11:49:52.544Z")
}
{
"_id" : ObjectId("5fcb73eee396cf18e6141dc7"),
"price" : 45.9,
"createdAt" : ISODate("2020-12-05T11:50:06.914Z")
}
{
"_id" : ObjectId("5fcbee785ef206248fa9513e"),
"price" : 35.7,
"createdAt" : ISODate("2020-12-05T20:32:56.508Z")
}
{
"_id" : ObjectId("5fcbf0045ef206248fa9513f"),
"price" : 2047.66,
"createdAt" : ISODate("2020-12-05T20:39:32.369Z")
}
I need to display the data according to a week and collective price for that week. If I use aggregation pipeline, It would only give me the range of dates on which booking is made.
{$group: {
_id: {
$week: "$createdAt"
},
start_date: {$min: "$createdAt"},
end_date: {$max: "$createdAt"}
}}
Suppose the previous week started from 14-06-21 and ended on 20-06-21
Instead, I want a result which would actually include start_date as 14-06-21 and end_date as 20-06-21, and if no earning is made it would be 0 otherwise the total price in a given week and so on for other group of weeks for whole year or month accordingly.
You can categorize total price by week number using $week operator, but it is hard to get the week's start date and end date in MongoDB, I would suggest you to get start date and end date from the week number in your client-side language.
$group by createdAt's week using $week operator and get total price by $sum
db.collection.aggregate([
{
$group: {
_id: { $week: "$createdAt" },
totalPrice: { $sum: "$price" }
}
}
])
Playground
For fetching the start day of a week, you can make use of the $dateToString operator and pass the %g and %v in format which represents ISO Year and Week of the year respectively.
Similarly, add 518400000 (6 days in milliseconds) to the start date of the week to get the end date.
Also, these two operations will only work inside the _id field of the $group stage, so parse the sub-object of the _id key to get the required data values. The weekNo key is useless for the $group stage, but keep it if it's required.
db.collection.aggregate([
{
$group: {
_id: {
"weekNo": {
$week: "$createdAt"
},
"start_date": {
"$dateFromString": {
"dateString": {
"$dateToString": {
"date": "$createdAt",
"format": "%G/%V",
},
},
"format": "%G/%V"
}
},
"end_date": {
"$add": [
{
"$dateFromString": {
"dateString": {
"$dateToString": {
"date": "$createdAt",
"format": "%G/%V",
},
},
"format": "%G/%V"
}
},
518400000,
],
},
},
totalPrice: {
$sum: "$price"
}
}
}
])
Mongo Playground Sample Execution
I have a collection called biosignals in my MongoDB, in which there exist entries that represent physical activity (e.g. walking).
Each such entry has a 'start_date_time' and an 'end_date_time', which are ISO strings (e.g. 2017-04-26T07:12:09.463Z).
I want to do the following query, where I group the physical activity entries by day and I calculate the total duration of activity for each day.
db.biosignals.aggregate([
{
$match: {
"name": "physical-activity"
}
},
{
$project: {
duration: {
"$subtract": [new Date("$end_date_time"), new Date("$start_date_time")]
},
date: {
$substr: [ "$start_date_time", 0, 10]
}
}
},
{
$group: {
_id: "$date",
total: { $sum: "$duration" }
}
},
{
$sort: {
_id: 1
}
}
])
However, I only get 0 as a result, as shown here:
{ "_id" : "2017-04-24", "total" : NumberLong(0) }
{ "_id" : "2017-04-25", "total" : NumberLong(0) }
{ "_id" : "2017-04-26", "total" : NumberLong(0) }
{ "_id" : "2017-04-27", "total" : NumberLong(0) }
If, instead, I hardcode the dates (e.g. start_date_time = 2017-04-26T07:12:08.463Z and end_date_time = 2017-04-26T07:12:09.463Z, that is one second difference), I get the expected result:
{ "_id" : "2017-04-24", "total" : NumberLong(16000) }
{ "_id" : "2017-04-25", "total" : NumberLong(3000) }
{ "_id" : "2017-04-26", "total" : NumberLong(7000) }
{ "_id" : "2017-04-27", "total" : NumberLong(12000) }
How could I fix that?
Thank you very much!
I am logging data into MongoDB in the following format:
{ "_id" : ObjectId("54f2393f80b72b00079d1a53"), "outT" : 10.88, "inT3" : 22.3, "light" : 336, "humidity" : 41.4, "pressure" : 990.31, "inT1" : 22.81, "logtime" : ISODate("2015-02-28T21:55:11.838Z"), "inT2" : 21.5 }
{ "_id" : ObjectId("54f2394580b72b00079d1a54"), "outT" : 10.88, "inT3" : 22.3, "light" : 338, "humidity" : 41.4, "pressure" : 990.43, "inT1" : 22.75, "logtime" : ISODate("2015-02-28T21:55:17.690Z"), "inT2" : 311.72 }
...
As you can see there is a single time element and multiple readings logged. I want to aggregate across all of the readings to provide a max min and average for each variable grouped by hour of day. I have managed to do this for a single variable using the following aggregation script:
db.logs.aggregate(
[
{
$match: {
logtime: {
$gte: ISODate("2015-03-01T00:00:00.000Z"),
$lt: ISODate("2015-03-03T00:00:00.000Z")
}
}
},
{
$project: {_id: 0, logtime: 1, outT: 1}
},
{
$group: {
_id: {
day: {$dayOfYear: "$logtime"},
hour: {$hour: "$logtime"}
},
average: {$avg: "$outT"},
max: {$max: "$outT"},
min:{$min: "$outT"}
}
}
]
)
which produces:
{ "_id" : { "day" : 61, "hour" : 22 }, "average" : 3.1878750000000116, "max" : 3.44, "min" : 3 }
{ "_id" : { "day" : 61, "hour" : 14 }, "average" : 13.979541666666638, "max" : 17.81, "min" : 8.81 }
...
I would like to produce output which looks like:
{"outT": { output from working aggregation above },
"inT1": { ... },
...
}
Everything I try seems to throw an error in the mongo console. Can anyone help?
Thanks
You can do this by including each statistic in your $group with a different name and then following that with a $project stage to reshape it into your desired format:
db.logs.aggregate([
{
$match: {
logtime: {
$gte: ISODate("2015-02-28T00:00:00.000Z"),
$lt: ISODate("2015-03-03T00:00:00.000Z")
}
}
},
{
$project: {_id: 0, logtime: 1, outT: 1, inT1: 1}
},
{
$group: {
_id: {
day: {$dayOfYear: "$logtime"},
hour: {$hour: "$logtime"}
},
outT_average: {$avg: "$outT"},
outT_max: {$max: "$outT"},
outT_min:{$min: "$outT"},
inT1_average: {$avg: "$inT1"},
inT1_max: {$max: "$inT1"},
inT1_min:{$min: "$inT1"}
}
},
{
$project: {
outT: {
average: '$outT_average',
max: '$outT_max',
min: '$outT_min'
},
inT1: {
average: '$inT1_average',
max: '$inT1_max',
min: '$inT1_min'
}
}
}
])
This gives you output that looks like:
{
"_id" : {
"day" : 59,
"hour" : 21
},
"outT" : {
"average" : 10.88,
"max" : 10.88,
"min" : 10.88
},
"inT1" : {
"average" : 22.78,
"max" : 22.81,
"min" : 22.75
}
}
$max in Mongodb gets the maximum of the corresponding values from all documents in the collection. $min gets the minimum values from all documents in the collection. $avg gets the average value from the collection.
you must go through the Mongodb link for sample examples.