How to use $dayOfYear aggregation with epoch timestamps [duplicate] - mongodb

I am trying to aggregate records in a MongoDB collection by hour and need to convert date stored as timestamp (milliseconds) to ISODate so that I can use aggregate framework's built-in date operators ($hour, $month, etc.)
Records are stored as
{
"data" : { "UserId" : "abc", "ProjId" : "xyz"},
"time" : NumberLong("1395140780706"),
"_id" : ObjectId("532828ac338ed9c33aa8eca7")
}
I am trying to use an aggregate query of following type:
db.events.aggregate(
{
$match : {
"time" : { $gte : 1395186209804, $lte : 1395192902825 }
}
},
{
$project : {
_id : "$_id",
dt : {$concat : (Date("$time")).toString()} // need to project as ISODate
}
},
// process records further in $project or $group clause
)
which produces results of the form:
{
"result" : [
{
"_id" : ObjectId("5328da21fd207d9c3567d3ec"),
"dt" : "Fri Mar 21 2014 17:35:46 GMT-0400 (EDT)"
},
{
"_id" : ObjectId("5328da21fd207d9c3567d3ed"),
"dt" : "Fri Mar 21 2014 17:35:46 GMT-0400 (EDT)"
},
...
}
I want to extract hour, day, month, and year from the date but since time is projected forward as string I am unable to use aggregate framework's built-in date operators ($hour, etc.).
How can I convert time from milliseconds to ISO date to do sometime like the following:
db.events.aggregate(
{
$match : {
"time" : { $gte : 1395186209804, $lte : 1395192902825 }
}
},
{
$project : {
_id : "$_id",
dt : <ISO date from "$time">
}
},
{
$project : {
_id : "$_id",
date : {
hour : {$hour : "$dt"}
}
}
}
)

Actually, it is possible, the trick is to add your milliseconds time to a zero-milliseconds Date() object using syntax similar to:
dt : {$add: [new Date(0), "$time"]}
I modified your aggregation from above to produce the result:
db.events.aggregate(
{
$project : {
_id : "$_id",
dt : {$add: [new Date(0), "$time"]}
}
},
{
$project : {
_id : "$_id",
date : {
hour : {$hour : "$dt"}
}
}
}
);
The result is (with one entry of your sample data):
{
"result": [
{
"_id": ObjectId("532828ac338ed9c33aa8eca7"),
"date": {
"hour": 11
}
}
],
"ok": 1
}

I assume there's no way to do it. Because aggregation framework is written in native code. not making use of the V8 engine. Thus everything of JavaScript is not gonna work within the framework (And that's also why aggregation framework runs much faster).
Map/Reduce is a way to work this out, but aggregation framework definitely got much better performance.
About Map/Reduce performance, read this thread.
Another way to work it out would be get a "raw" result from aggregation framework, put it into an JSON array. Then do the conversion by running JavaScript. Sort of like:
var results = db.events.aggregate(...);
reasult.forEach(function(data) {
data.date = new Date(data.dateInMillionSeconds);
// date is now stored in the "date" property
}

To return a valid BSON date all you need is a little date "maths" using the $add operator. You need to add new Date(0) to the timestamp. The new Date(0) represents the number of milliseconds since the Unix epoch (Jan 1, 1970) and is a shorthand for new Date("1970-01-01").
db.events.aggregate([
{ "$match": { "time": { "$gte" : 1395136209804, "$lte" : 1395192902825 } } },
{ "$project": {
"hour": { "$hour": { "$add": [ new Date(0), "$time" ] } },
"day": { "$dayOfMonth": { "$add": [ new Date(0), "$time" ] } },
"month": { "$month": { "$add": [ new Date(0), "$time" ] } },
"year": { "$year": { "$add": [ new Date(0), "$time" ] } }
}}
])
Which yields:
{
"_id" : ObjectId("532828ac338ed9c33aa8eca7"),
"hour" : 11,
"day" : 18,
"month" : 3,
"year" : 2014
}

Starting Mongo 4.0, there is a new $toDate aggregation operator which can convert from various types to a date (in this case from a long):
// { time: NumberLong("1395140780706") }
db.collection.aggregate({ $set: { time: { $toDate: "$time" } } })
// { time: ISODate("2014-03-18T11:06:20.706Z") }
And to get the hour out of it:
// { time: NumberLong("1395140780706") }
db.collection.aggregate({ $project: { hour: { $hour: { $toDate: "$time" } } } })
// { hour: 11 }

use this if {$add: [new Date(0), "$time"]} function returning string type not an ISO date type
I use all of that option but still fail, because my new date from $project return a string type like '2000-11-2:xxxxxxx' not date type like ISO('2000-11-2:xxxxxxx') for anyone who have same problem with me use this.
db.events.aggregate(
{
$project : {
_id : "$_id",
dt : {$add: [new Date(0), "$time"]}
}
},
{
$project : {
_id : "$_id",
"year": { $substr: [ "$dt", 0, 4 ] },
"month": { $substr: [ "$dt", 5, 2] },
"day": { $substr: [ "$dt", 8, 2 ] }
}
}
);
the result will be
{ _id: '59f940eaea87453b30f42cf5',
year: '2017',
month: '07',
day: '04'
},
you can get hours or minute if you want depending on which string you want to subset, then you can group that again according to same date,month or year

Related

Mongodb aggregate get the expired records by using the epoch current timestamp

{
"id" : "58",
"topicHeader" : {
"replayData" : {
"messageDateInms" : NumberLong(1649448201357),
"messageDelayInms" : NumberLong(600000)
}
},
"status" : "IN_PROGRESS"
},
{
"id" : "59",
"topicHeader" : {
"replayData" : {
"messageDateInms" : NumberLong(1650220023677),
"messageDelayInms" : NumberLong(600000)
}
},
"status" : "IN_IROGRESS"
}
i need to get the expired records based on current epoch timestamp i.e.
(topicHeader.replayData.messageDateInms + topicHeader.replayData.messageDelayInms) <= epoch current timestamp
I am able to resolve by using find() but trying to find better solution so it wont cause any performance issue:
db.getCollection("col1").find
({
$expr: {
$lte: [{ "$add": ["$topicHeader.replayData.messageDateInms", "$topicHeader.replayData.messageDelayInms"] }, 1650226443611]
}
})
Thank you in advance.
Use $add to sum up the 2 fields. Use $toDate to cast them into date field and compare with $$NOW
db.collection.aggregate([
{
"$match": {
$expr: {
$lte: [
{
$toDate: {
"$add": [
"$topicHeader.replayData.messageDateInms",
"$topicHeader.replayData.messageDelayInms"
]
}
},
"$$NOW"
]
}
}
}
])
Here is the Mongo playground for your reference.

MongoDB - convert string to timestamp, group by hour

I have the following documents stored in a collection:
{
"REQUESTTIMESTAMP" : "26-JUN-19 01.34.10.095000000 AM",
"UNHANDLED_INTENT" : 0,
"USERID" : "John",
"START_OF_INTENT_SKILL_CONVERSATION" : 0,
"PROPERTYCODE" : ""
}
I want to group this by the hour(which we will get from 'REQUESTTIMESTAMP')
Earlier, I had this document stored in the collection in a different way, where I had a separate field for hours, and used that hours field to group:
Previous aggregation query :
collection.aggregate([
{'$match': query}, {
'$group': {
"_id": {
"hour": "$hour",
"sessionId": "$sessionId"
}
}
}, {
"$group": {
"_id": "$_id.hour",
"count": {
"$sum": 1
}
}
}
])
Previous collection structure:
{
"timestamp" : "1581533210921",
"date" : "12-02-2020",
"hour" : "13",
"month" : "02",
"time" : "13:46:50",
"weekDay" : "Wednesday",
"__v" : 0
}
How can I do the above same Previous aggregation query with the new document structure (After extracting hours from 'REQUESTTIMESTAMP' field?)
You should convert your timestamp to Date object then take hour from your date object.
db.collection.aggregate([{
'$match': query
}, {
$project: {
date: {
$dateFromString: {
dateString: '$REQUESTTIMESTAMP',
format: "%m-%d-%Y" //This should be your date format
}
}
}
}, {
$group: {
_id: {
hour: {
$hour: "$date"
}
}
}
}])
Problem is months names are not supported by MongoDB. Either you write a lot of code or you use libraries like moments.js. First update your REQUESTTIMESTAMP to proper Date object, then you can group it.
db.collection.find().forEach(function (doc) {
var d = moment(doc.REQUESTTIMESTAMP, "DD-MMM-YY hh.mm.ss.SSS a");
db.collection.updateOne(
{ _id: doc._id },
{ $set: { date: d.toDate() } }
);
})
db.collection.aggregate([
{
$group: {
_id: { $hour: "$date" },
count: { $sum: 1 }
}
}
])
In case if you're not able to update DB with actual date field & still wanted to proceed with existing format, try this query it will add hour field extracted from given string field REQUESTTIMESTAMP :
Query :
db.collection.aggregate([
{
$addFields: {
hour: {
$let: {
/** split string into three parts date + hours + AM/PM */
vars: { hour: { $slice: [{ $split: ["$REQUESTTIMESTAMP", " "] }, 1, 2] } },
in: {
$cond: [{ $in: ["AM", "$$hour"] }, // Check AM exists in array
{ $toInt: { $substr: [{ $arrayElemAt: ["$$hour", 0] }, 0, 2] } }, // If yes then return int of first 2 letters of first element in hour array
{ $add: [{ $toInt: { $substr: [{ $arrayElemAt: ["$$hour", 0] }, 0, 2] } }, 12] } ] // If PM add 12 to int of first 2 letters of first element in hour array
}
}
}
}
}
])
Test : MongoDB-Playground

MongoDB aggregate - average on specific values in array of documents

I'm currently working on a database with the following structure:
{"_id" : ObjectId("1abc2"),
"startdatetime" : ISODate("2016-09-11T18:00:37Z"),
"diveValues" : [
{
"temp" : 15.269,
"depth" : 0.0,
},
{
"temp" : 14.779257384,
"depth" : 1.0,
},
{
"temp" : 14.3940253165,
"depth" : 2.0,
},
{
"temp" : 13.9225795455,
"depth" : 3.0,
},
{
"temp" : 13.8214431818,
"depth" : 4.0,
},
{
"temp" : 13.6899553571,
"depth" : 5.0,
}
]}
The database has information about depth n metres in water, and the temperature on given depth. This is stored in the "diveValues" array. I have been successful on averaging on all depths between to dates, both monthly average and daily average. What I'm having a serious issue with is to get the average between to depths, say between 1 and 4 metres, for every month the last 6 months.
Here is an example of average temperature for each month from January to June, for all depths:
db.collection.aggregate(
[
{$unwind:"$diveValues"},
{$match:
{'startdatetime':
{$gt:new ISODate("2016-01-10T06:00:29Z"),
$lt:new ISODate("2016-06-10T06:00:29Z")}
}
},
{$group:
{_id:
{ year: { $year: "$startdatetime" },
month: { $month: "$startdatetime" }},
avgTemp: { $avg: "$diveValues.temp" }}
},
{$sort:{_id:1}}
]
)
Resulting in:
{ "_id" : { "year" : 2016, "month" : 1 }, "avgTemp" : 7.575706502958313 }
{ "_id" : { "year" : 2016, "month" : 3 }, "avgTemp" : 6.85037457740135 }
{ "_id" : { "year" : 2016, "month" : 4 }, "avgTemp" : 7.215702831902588 }
{ "_id" : { "year" : 2016, "month" : 5 }, "avgTemp" : 9.153453683614638 }
{ "_id" : { "year" : 2016, "month" : 6 }, "avgTemp" : 11.497953009390237 }
Now, I can not seem to figure out how to get average temperature between 1 and 4 metres for the same period.
I have been trying to group the values by wanted depths, but have not managed it - more often than not ending up with bad syntax. Also, if I'm not wrong, the $match pipeline would return all depths as long as the dive has values for 1 and 4 metres, so that will not work.
With the find() tool I am using $slice to return the values I intend from the array - but have not been successful along with the aggregate() function.
Is there a way to solve this? Thanks in advance, much appreciated!
You'd need to place your $match pipeline before $unwind to optimize your aggregation operation as doing an $unwind operation on the whole collection could potentially cause some performance issues since it produces a copy of each document per array entry and that uses more memory (possible memory cap on aggregation pipelines of 10% total memory) thus takes "time" to produce the flattened arrays as well as "time" to process it. Hence it's better to limit the number of documents getting into the pipeline to be flattened.
db.collection.aggregate([
{
"$match": {
"startdatetime": {
"$gt": new ISODate("2016-01-10T06:00:29Z"),
"$lt": new ISODate("2016-06-10T06:00:29Z")
},
"diveValues.depth": { "$gte": 1, "$lte": 4 }
}
},
{ "$unwind": "$diveValues" },
{ "$match": { "diveValues.depth": { "$gte": 1, "$lte": 4 } } },
{
"$group": {
"_id": {
"year": { "$year": "$startdatetime" },
"month": { "$month": "$startdatetime" }
},
"avgTemp": { "$avg": "$diveValues.temp" }
}
}
])
If you want results to contain the average temps for all depths and for the 1-4 depth range, then you would need to run this pipeline which would use the $cond tenary operator to feed the $avg operator the accumulated temperatures within a group based on the depth range:
db.collection.aggregate([
{
"$match": {
"startdatetime": {
"$gt": new ISODate("2016-01-10T06:00:29Z"),
"$lt": new ISODate("2016-06-10T06:00:29Z")
}
}
},
{ "$unwind": "$diveValues" },
{
"$group": {
"_id": {
"year": { "$year": "$startdatetime" },
"month": { "$month": "$startdatetime" }
},
"avgTemp": { "$avg": "$diveValues.temp" },
"avgTempDepth1-4": {
"$avg": {
"$cond": [
{
"$and": [
{ "$gte": [ "$diveValues.depth", 1 ] },
{ "$lte": [ "$diveValues.depth", 4 ] }
]
},
"$diveValues.temp",
null
]
}
}
}
}
])
First of all, the date $match operator should be used at the beginning of the pipeline so that indexes can be used.
Now, to the question, you just need to filter the depth interval like you did with the dates:
db.col.aggregate([
{"$match": {
'startdatetime': {
"$gt": new ISODate("2016-01-10T06:00:29Z"),
"$lt": new ISODate("2016-11-10T06:00:29Z")
}
}},
{"$unwind": "$diveValues"},
{"$match": {
"diveValues.depth": {
"$gte": 1.0,
"$lt": 4.0
}
}},
{"$group": {
"_id": {
"year": {"$year": "$startdatetime" },
"month": {"$month": "$startdatetime" }
},
"avgTemp": { "$avg": "$diveValues.temp" }}
}
])
This will give you the average only for the chosen depth interval.

Get distinct ISO dates by days, months, year

I want to get a distinct set of years and months for all document objects in my MongoDB.
For example, if documents have dates:
2015/08/11
2015/08/11
2015/08/12
2015/09/14
2014/10/30
2014/10/30
2014/08/11
Return unique months and years for all documents, ex:
2015/08
2015/09
2014/10
2014/08
Schema snippet:
var myObjSchema = mongoose.Schema({
date: Date,
request: {
...
I tried using distinct against schema field date:
db.mycollection.distinct('date', {}, {})
But this gave duplicate dates. Output snippet:
ISODate("2015-08-11T20:03:42.122Z"),
ISODate("2015-08-11T20:53:31.135Z"),
ISODate("2015-08-11T21:31:32.972Z"),
ISODate("2015-08-11T22:16:27.497Z"),
ISODate("2015-08-11T22:41:58.587Z"),
ISODate("2015-08-11T23:28:17.526Z"),
ISODate("2015-08-11T23:38:45.778Z"),
ISODate("2015-08-12T06:21:53.898Z"),
ISODate("2015-08-12T13:25:33.627Z"),
ISODate("2015-08-12T14:46:59.763Z")
So the question is:
a: How can I accomplish the above?
b: Is it possible to specify which part of the date you want distinct? Like distinct('date.month'...)?
EDIT: I've found you can get these dates and such with the following query, however the results are not distinct:
db.mycollection.aggregate(
[
{
$project : {
month : {
$month: "$date"
},
year : {
$year: "$date"
},
day: {
$dayOfMonth: "$date"
}
}
}
]
);
Output: duplicates
{ "_id" : "", "month" : 7, "year" : 2015, "day" : 14 }
{ "_id" : "", "month" : 7, "year" : 2015, "day" : 15 }
{ "_id" : "", "month" : 7, "year" : 2015, "day" : 15 }
You need to group your document after the projection and use $addToSet accumulator operator
db.mycollection.aggregate([
{ "$project": {
"year": { "$year": "$date" },
"month": { "$month": "$date" }
}},
{ "$group": {
"_id": null,
"distinctDate": { "$addToSet": { "year": "$year", "month": "$month" }}
}}
])
Indeed, you can distinct values via a $group/_id: null/$addToSet stage.
I'm also including here the use of dateToString that formats your dates into "%Y-%m" (e.g. 2021-12).
// { date: ISODate("2021-12-05") }
// { date: ISODate("2021-12-08") }
// { date: ISODate("2022-04-05") }
// { date: ISODate("2022-12-14") }
db.collection.aggregate([
{ $group: {
_id: null,
months: { $addToSet: { $dateToString: { date: "$date", format: "%Y-%m" } } }
}}
])
// { _id: null, months: ["2021-12", "2022-04", "2022-12"] }
db.mycollection.aggregate(
[
{
"$project": {
"year": { "$year": "$date" },
"month": { "$month": "$date" }
}
},{ $group : {
"_id" :{"year" : "$year" }
}
},
{
$sort: {'_id': -1
}
}
])

Mongo aggregation within intervals of time

I have some log data stored in a mongo collection that includes basic information as a request_id and the time it was added to the collection, for example:
{
"_id" : ObjectId("55ae6ea558a5d3fe018b4568"),
"request_id" : "030ac9f1-aa13-41d1-9ced-2966b9a6g5c3",
"time" : ISODate("2015-07-21T16:00:00.00Z")
}
I was wondering if I could use the aggregation framework to aggregate some statistical data. I would like to get the counts of the objects created within each interval of N minutes for the last X hours.
So the output which I need for 10 minutes intervals for the last 1 hour should be something like the following:
{ "_id" : 0, "time" : ISODate("2015-07-21T15:00:00.00Z"), "count" : 67 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:10:00.00Z"), "count" : 113 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:20:00.00Z"), "count" : 40 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:30:00.00Z"), "count" : 10 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:40:00.00Z"), "count" : 32 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:50:00.00Z"), "count" : 34 }
I would use that to get data for graphs.
Any advice is appreciated!
There are a couple of ways of approaching this depending on which output format best suits your needs. The main note is that with the "aggregation framework" itself, you cannot actually return something "cast" as a date, but you can get values that are easily reconstructed into a Date object when processing results in your API.
The first approach is to use the "Date Aggregation Operators" available to the aggregation framework:
db.collection.aggregate([
{ "$match": {
"time": { "$gte": startDate, "$lt": endDate }
}},
{ "$group": {
"_id": {
"year": { "$year": "$time" },
"dayOfYear": { "$dayOfYear": "$time" },
"hour": { "$hour": "$time" },
"minute": {
"$subtract": [
{ "$minute": "$time" },
{ "$mod": [ { "$minute": "$time" }, 10 ] }
]
}
},
"count": { "$sum": 1 }
}}
])
Which returns a composite key for _id containing all the values you want for a "date". Alternately if just within an "hour" always then just use the "minute" part and work out the actual date based on the startDate of your range selection.
Or you can just use plain "Date math" to get the milliseconds since "epoch" which can again be fed to a date contructor directly.
db.collection.aggregate([
{ "$match": {
"time": { "$gte": startDate, "$lt": endDate }
}},
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$time", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$time", new Date(0) ] },
1000 * 60 * 10
]}
]
},
"count": { "$sum": 1 }
}}
])
In all cases what you do not want to do is use $project before actually applying $group. As a "pipeline stage", $project must "cycle" though all documents selected and "transform" the content.
This takes time, and adds to the execution total of the query. You can simply just apply to the $group directly as has been shown.
Or if you are really "pure" about a Date object being returned without post processing, then you can always use "mapReduce", since the JavaScript functions actually allow recasting as a date, but slower than the aggregation framework and of course without a cursor response:
db.collection.mapReduce(
function() {
var date = new Date(
this.time.valueOf()
- ( this.time.valueOf() % ( 1000 * 60 * 10 ) )
);
emit(date,1);
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
Your best bet is using aggregation though, as transforming the response is quite easy:
db.collection.aggregate([
{ "$match": {
"time": { "$gte": startDate, "$lt": endDate }
}},
{ "$group": {
"_id": {
"year": { "$year": "$time" },
"dayOfYear": { "$dayOfYear": "$time" },
"hour": { "$hour": "$time" },
"minute": {
"$subtract": [
{ "$minute": "$time" },
{ "$mod": [ { "$minute": "$time" }, 10 ] }
]
}
},
"count": { "$sum": 1 }
}}
]).forEach(function(doc) {
doc._id = new Date(doc._id);
printjson(doc);
})
And then you have your interval grouping output with real Date objects.
Something like this?
pipeline = [
{"$project":
{"date": {
"year": {"$year": "$time"},
"month": {"$month": "$time"},
"day": {"$dayOfMonth": "$time"},
"hour": {"$hour": "$time"},
"minute": {"$subtract": [
{"$minute": "$time"},
{"$mod": [{"$minute": "$time"}, 10]}
]}
}}
},
{"$group": {"_id": "$date", "count": {"$sum": 1}}}
]
Example:
> db.foo.insert({"time": new Date(2015, 7, 21, 22, 21)})
> db.foo.insert({"time": new Date(2015, 7, 21, 22, 23)})
> db.foo.insert({"time": new Date(2015, 7, 21, 22, 45)})
> db.foo.insert({"time": new Date(2015, 7, 21, 22, 33)})
> db.foo.aggregate(pipeline)
and output:
{ "_id" : { "year" : 2015, "month" : 8, "day" : 21, "hour" : 20, "minute" : 40 }, "count" : 1 }
{ "_id" : { "year" : 2015, "month" : 8, "day" : 21, "hour" : 20, "minute" : 20 }, "count" : 2 }
{ "_id" : { "year" : 2015, "month" : 8, "day" : 21, "hour" : 20, "minute" : 30 }, "count" : 1 }
a pointer in lieu of a concrete answer. you can very easily do it for minutes, hours and given periods using the date aggregations . every 10 minutes will be a bit trickier but likely possible with some wrangling. nevertheless, the aggregation will be slow as nuts on large data sets.
i would suggest extracting the minutes post-insert
{
"_id" : ObjectId("55ae6ea558a5d3fe018b4568"),
"request_id" : "030ac9f1-aa13-41d1-9ced-2966b9a6g5c3",
"time" : ISODate("2015-07-21T16:00:00.00Z"),
"minutes": 16
}
and even though it sounds utterly absurd adding quartiles and sextiles or whatever that N might be.
{
"_id" : ObjectId("55ae6ea558a5d3fe018b4568"),
"request_id" : "030ac9f1-aa13-41d1-9ced-2966b9a6g5c3",
"time" : ISODate("2015-07-21T16:00:00.00Z"),
"minutes": 16,
"quartile: 1,
"sextile: 2,
}
first try doing a $div on the minutes. doesnt do ceil and floor. but check out
Is there a floor function in Mongodb aggregation framework?