How to use aggregate to group by half hour with rounding?

How to use aggregate to group by half hour with rounding? - mongodb

I'm using $group to group my post by hour like:
"$group" : {
"_id" : {
"$hour" : {
$add : ["$createdAt", 10*60*60*1000]
}
},
...
}
But now I also want to group by half-of-hour, it's mean:
2:30 => 3:00
2:29 => 2:00
How I using mongo aggregate to pass this trouble?
Sr for my bad English. :)

I gather the +10 here is for a timezone adjustment. The same basic principles apply to producing the date with 30 minute rounding, except you want to first just convert to a numeric value and work back the intervals via a modulo ( $mod ):
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [
{ "$add": [ "$createdAt", 1000 * 60 * 60 * 10 ] },
new Date(0)
]},
{ "$mod": [
{ "$subtract": [
{ "$add": [ "$createdAt", 1000 * 60 * 60 * 10 ] },
new Date(0)
]},
1000 * 60 * 30
]}
]},
new Date(0)
]
},
"count": { "$sum": 1 } // or whatever accumulation required
}}
Using the epoch date ( Date(0) ) with a $subtract operation from the stored date ( adjusted ) will return the milliseconds since epoch as a numeric value from the date stored. The modulo operation to the milliseconds in 30 minutes returns the remainder from the current date and you then $subtract that again to get a rounded interval.
The same is present with the $add operation where the epoch date object to a numeric value returns a Date again.
So every interval start is now the grouping key, as of every 30 minutes.
You can alternately use date aggregation operators, but this returns a BSON Date object which will be translated in API rather than just an numeric value for the "minutes" interval.
It's just standard "date math", so all the same operations apply.

Related

Get last inserted item per day

Is there a feature in mongodb that I can use to get the last inserted item per day ? I have a collection where I need to get the last inserted item per day, the data is grouped on an hourly basis like in the structure below.
{
timestamp: 2017-05-04T09:00:00.000+0000,
data: {}
},
{
timestamp: 2017-05-04T10:00:00.000+0000,
data: {}
}
I thought about using a projection but I am not quite sure how I could do this.
Edit: Also, since mongodb stores data in UTC, I would like to account for the offset as well.

You can $sort and use $last for the item, with rounding out the grouping key to each day:
db.collection.aggregate([
{ "$sort": { "timestamp": 1 } },
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]},
new Date(0)
]
},
"lastDoc": { "$last": "$$ROOT" }
}}
])
So the sort makes things appear in order, and then the grouping _id is rounded for each day by some date math. You subtract the epoch date from the current date to make it a number. Use the modulus to round to a day, then add the epoch date to the number to return a Date.
So stepping through the math we have getting the timestamp value from the date with the $subract line. We do this a couple of times:
{ "$subtract": [ "$timestamp", new Date(0) ] }
// Is roughly internally like
ISODate("2017-06-06T10:44:37.627Z") - ISODate("1970-01-01T00:00:00Z")
1496745877627
Then there is the modulo with $mod which when applied to the numeric value returns the difference. The 1000 milliseconds * 60 seconds * 60 * minutes * 24 hours gives the other argument:
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 60 * 24
]}
// Equivalent to
1496745877627 % (1000 * 60 * 60 * 24)
38677627
Then there is the wrapping $subtract of the two numbers:
{ "$subtract": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]}
// Subtract "difference" of the modulo to a day
// from the milliseconds value of the current date
1496745877627 - 38677627
1496707200000
Then add back to the epoch date value to create a date rounded to the current day, which to the aggregation pipeline basically looks like providing the millisecond value to the constructor:
new Date(1496707200000)
ISODate("2017-06-06T00:00:00Z")
Which takes the timestamp value and subrtacts out the difference of the divisor from "one day" and ends up at the time at the "start of day".
Just using $$ROOT here to represent the whole document. But any document path provided to $last here provides the result.

MongoDB aggregation : time series with granularity

I have a MongoDB Analytics-style collection. It contains documents with a timestamp field and various data. Now I want to get a time series with the number of documents for a time period with a granularity parameter.
I'm currently using the aggregation framework like this (assuming that the granularity is DAY) :
db.collection.aggregate([{
$match: {
timestamp: {
$gte: start_time,
$lt: end_time
}
}
}, {
$group: {
_id: {
year: { $year: '$timestamp' },
month: { $month: '$timestamp' },
day: { $dayOfMonth: '$timestamp' }
},
count: { $sum: 1 }
}
}, {
$sort: {
_id: 1
}
}])
This way I have a count value for every day.
The problem is that the counts will depend on the timezone used when computing the $dayOfMonth part (each count is from 00:00:000 UTC to 23:59:999 UTC).
I would like to be able to achieve this without being dependant on the timezone, but relying on the start_time.
For example, if I use a start_time at 07:00 UTC, I will get counts for every day at 07:00 UTC to the next day at 07:00 UTC.
TL;DR : I want something like this : https://dev.twitter.com/ads/reference/get/stats/accounts/%3Aaccount_id/campaigns
Any idea on how to perform this ?

I found a solution that works pretty good. It's not very natural but anyway.
The idea is to compute a "normalized" date based on the startDate and the date of the row. I use the $mod operator on the startDate to get the milliseconds + seconds + hours (in the case of a DAY granularity), and then I use $subtract to subtract it from the date of the row.
Here is an example for a DAY granularity :
var startDate = ISODate("2015-08-25 13:30:00.000Z")
var endDate = ISODate("2015-08-27 13:30:00.000Z")
db.collection.aggregate([{
$match: {
timestamp: {
$gte: startDate,
$lt: endDate
}
}, {
$project: {
timestamp_normalized: {
$subtract: [
"$timestamp",
{
$mod: [
{ "$subtract": [ startDate, new Date("1970-01-01") ] },
1000 * 60 * 60 * 24
]
}
]
}
}
}, {
// now $group with $dayOfMonth
}])
The $mod part computes the hours + seconds + milliseconds of the startDate after 00:00 UTC, in milliseconds.
The $subtract retrieves these milliseconds from the original timestamp.
Now I can use $dayOfMonth operator on my normalized_timestamp field to get the day if we consider intervals from 13:30 to 13:30 the next day, and use $group to get count values for these intervals.
EDIT: It's even easier to compute the value to remove from the timestamp for normalization before creating the query, using :
(startDate - new Date(0)) % (1000 * 60 * 60 * 24)
(for a DAY granularity)
Then subtract directly this value from timestamp instead of using $mod.

How to Get the Max Daily Value per Week by Grouping

I am making a project which requires me to first calculate how much distance was traveled per day. and then on that data I have to how show What was the maximum, minimum and average distance traveled that particular week?
This is a mongoDB script I have written.
db = connect("localhost:27017/mydb");
var result = db.trips.aggregate([
{
"$unwind" : "$trips"
},
{
"$match" : {
"trips.startTime" : {"$lte" : ISODate("2015-10-31T23:59:59Z"), "$gte" : ISODate("2015-10-25T00:00:00Z")}
}
},
{
"$group" :
{
"_id" : {
"date" : {"$dayOfMonth" : "$trips.startTime"}
},
"distance" :{"$sum" : "$trips.distance"}
}
}
]);
while(result.hasNext())
{
print(tojson(result.next()));
}
Which when replaced by dynamic dates gives me correct values.
Now it leaves me with two options, either I modify the current group query or write a double group query. Double group query seems a more valid approach. My attempt at writing such a query.
{
"$group" :
{
"_id" : {
"week" : "$_id.date"
},
"max-distance" : {
"$max" : "$distance"
}
}
}
Adding these lines didn't make a difference, clearly I know I am doing wrong, but how to correct it. i would need help with that
Thanks

You seem to wan the $week operator, but of course you need a valid Date as input in order to extract the "week" from that.
What you may not know is that you can instead use "date math" to round out the date to a "day", where the result is still a Date object. Then you can use the $week operator to obtain your $max values:
db.trips.aggregate([
{ "$unwind" : "$trips" },
{ "$match": {
"trips.startTime" : {
"$lte": ISODate("2015-10-31T23:59:59Z"),
"$gte": ISODate("2015-10-25T00:00:00Z")
}
}},
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$trips.startTime", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$trips.startTime", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]},
new Date(0)
]
},
"distance": { "$sum": "$trips.distance" }
}},
{ "$group": {
"_id": { "$week": "$_id" },
"max-distance": { "$max": "$distance" }
}
]);
The basic trick in the first part is when you $subtract one Date object from another, the result is the millseconds in difference. So using the epoch date the data is converted to it's milliseconds equivalent and then you can use the math to round that number to a day.
(1000 * 60 * 60 * 24) is the number of milliseconds in a day, so finding the modulo ( $mod ) of that returns the remainder of milleseconds past the day, which you can subtract from the date value in the document to round to a day.
The same is true of $add when adding a Date object to a number, the result is a Date. So this handles the conversion, and then the $week can be extracted from there.

Mogodb split values into 5 minute intervals and return most recent within interval group

My Mongo database has documents as so:
{
"timestamp": ISODate("2015-09-27T15:28:06.0Z"),
"value": '123'
},
{
"timestamp": ISODate("2015-09-27T15:31:06.0Z"),
"value": '737'
},
{
"timestamp": ISODate("2015-09-27T15:35:00.0Z"),
"value": '456'
},
{
"timestamp": ISODate("2015-09-27T15:40:20.0Z"),
"value": '789'
}
...etc...
What I want to do is aggregate these in 5 minute intervals and than get the most recent (with the latest timestamp) value per 'group of 5 minutes'.
So basically the steps are:
1) split into groups of 5 minutes
2) return the 5-minute timestamp and the value of the document that has the newest timestamp within this 5 minute group
Based on that and my documents above the documents returned should be:
{
"timestamp": ISODate("2015-09-27T15:25:00.0Z"),
"value": '123'
},
{
"timestamp": ISODate("2015-09-27T15:35:00.0Z"),
"value": '456' // 456 has a newer timestamp than 737, which are in the same 5 minute range
},
{
"timestamp": ISODate("2015-09-27T15:40:00.0Z"),
"value": '789'
}
I have tried grouping into 5 minute intervals as described here: https://stackoverflow.com/a/26814496/1007236
Starting from there I can't find out how to return the value of the most recent within each 5 minute group.
How can I do that?

You solve this by a very simple application of Date math:
db.collection.aggregate([
{ "$sort": { "timestamp": 1 } },
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 5
]}
]},
new Date(0)
]
},
"value": { "$first": "$value" }
}}
])
Where the basic principle is finding the modulo ( $mod ) or "remainder" from the time by a five minute interval and subtracting that from the base time. This rounds to "five minutes".
Of course the other part is you $sort in order to make sure the smallest original "timestamp" sorted "value" is on "top".
The other parts are when you $subtract "epoch" date as a BSON Date from another date then you receive an "integer" in result. The similar part is adding ( $add ) an "integer" to a BSON Date type to receive another BSON Date.
The result is BSON Date objects rounded out to the interval you use with the math.
1000 millisecons X 60 seconds X 5 minutes.

Getting unix timestamp in seconds out of MongoDB ISODate during aggregation

I was searching for this one but I couldn't find anything useful to solve my case. What I want is to get the unix timestamp in seconds out of MongoDB ISODate during aggregation. The problem is that I can get the timestamp out of ISODate but it's in milliseconds. So I would need to cut out those milliseconds. What I've tried is:
> db.data.aggregate([
{$match: {dt:2}},
{$project: {timestamp: {$concat: [{$substr: ["$md", 0, -1]}, '01', {$substr: ["$id", 0, -1]}]}}}
])
As you can see I'm trying to get the timestamp out of 'md' var and also concatenate this timestamp with '01' and the 'id' number. The above code gives:
{
"_id" : ObjectId("52f8fc693890fc270d8b456b"),
"timestamp" : "2014-02-10T16:20:56011141"
}
Then I improved the command with:
> db.data.aggregate([
{$match: {dt:2}},
{$project: {timestamp: {$concat: [{$substr: [{$subtract: ["$md", new Date('1970-01-01')]}, 0, -1]}, '01', {$substr: ["$id", 0, -1]}]}}}
])
Now I get:
{
"_id" : ObjectId("52f8fc693890fc270d8b456b"),
"timestamp" : "1392049256000011141"
}
What I really need is 1392049256011141 so without the 3 extra 000. I tried with $subtract:
> db.data.aggregate([
{$match: {dt:2}},
{$project: {timestamp: {$concat: [{$substr: [{$divide: [{$subtract: ["$md", new Date('1970-01-01')]}, 1000]}, 0, -1]}, '01', {$substr: ["$id", 0, -1]}]}}}
])
What I get is:
{
"_id" : ObjectId("52f8fc693890fc270d8b456b"),
"timestamp" : "1.39205e+009011141"
}
Not exactly what I would expect from the command. Unfortunately the $substr operator doesn't allow negative length. Does anyone have any other solution?

I'm not sure why you think you need the value in seconds rather than milliseconds as generally both forms are valid and within most language implementations the milliseconds is actually preferred. But generally speaking, trying to coerce this into a string is the wrong way to go around this, and generally you just do the math:
db.data.aggregate([
{ "$project": {
"timestamp": {
"$subtract": [
{ "$divide": [
{ "$subtract": [ "$md", new Date("1970-01-01") ] },
1000
]},
{ "$mod": [
{ "$divide": [
{ "$subtract": [ "$md", new Date("1970-01-01") ] },
1000
]},
1
]}
]
}
}}
])
Which returns you an epoch timestamp in seconds. Basically derived from when one BSON date object is subtracted from another one then the result is the time interval in milliseconds. Using the initial epoch date of "1970-01-01" results in essentially extracting the milliseconds value from the current date value. The $divide operator essentially takes off the milliseconds portion and the $mod does the modulo to implement rounding.
Really though you are better off doing the work in the native language for your application as all BSON dates will be returned there as a native "date/datetime" type where you can extract the timestamp value. Consider the JavaScript basics in the shell:
var date = new Date()
( date.valueOf() / 1000 ) - ( ( date.valueOf() / 1000 ) % 1 )
Typically with aggregation you want to do this sort of "math" to a timestamp value for use in something like aggregating values within a time period such as a day. There are date operators available to the aggregation framework, but you can also do it the date math way:
db.data.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$md", new Date("1970-01-01") ] },
{ "$mod": [
{ "$subtract": [ "$md", new Date("1970-01-01") ] },
1000 * 60 * 60 * 24
]}
]
},
"count": { "$sum": 1 }
}}
])
That form would be more typical to emit a timestamp rounded to a day, and aggregate the results within those intervals.
So your purposing of the aggregation framework just to extract a timestamp does not seem to be the best usage or indeed it should not be necessary to convert this to seconds rather than milliseconds. In your application code is where I think you should be doing that unless of course you actually want results for intervals of time where you can apply the date math as shown.
The methods are there, but unless you are actually aggregating then this would be the worst performance option for your application. Do the conversion in code instead.