I was searching for this one but I couldn't find anything useful to solve my case. What I want is to get the unix timestamp in seconds out of MongoDB ISODate during aggregation. The problem is that I can get the timestamp out of ISODate but it's in milliseconds. So I would need to cut out those milliseconds. What I've tried is:
> db.data.aggregate([
{$match: {dt:2}},
{$project: {timestamp: {$concat: [{$substr: ["$md", 0, -1]}, '01', {$substr: ["$id", 0, -1]}]}}}
])
As you can see I'm trying to get the timestamp out of 'md' var and also concatenate this timestamp with '01' and the 'id' number. The above code gives:
{
"_id" : ObjectId("52f8fc693890fc270d8b456b"),
"timestamp" : "2014-02-10T16:20:56011141"
}
Then I improved the command with:
> db.data.aggregate([
{$match: {dt:2}},
{$project: {timestamp: {$concat: [{$substr: [{$subtract: ["$md", new Date('1970-01-01')]}, 0, -1]}, '01', {$substr: ["$id", 0, -1]}]}}}
])
Now I get:
{
"_id" : ObjectId("52f8fc693890fc270d8b456b"),
"timestamp" : "1392049256000011141"
}
What I really need is 1392049256011141 so without the 3 extra 000. I tried with $subtract:
> db.data.aggregate([
{$match: {dt:2}},
{$project: {timestamp: {$concat: [{$substr: [{$divide: [{$subtract: ["$md", new Date('1970-01-01')]}, 1000]}, 0, -1]}, '01', {$substr: ["$id", 0, -1]}]}}}
])
What I get is:
{
"_id" : ObjectId("52f8fc693890fc270d8b456b"),
"timestamp" : "1.39205e+009011141"
}
Not exactly what I would expect from the command. Unfortunately the $substr operator doesn't allow negative length. Does anyone have any other solution?
I'm not sure why you think you need the value in seconds rather than milliseconds as generally both forms are valid and within most language implementations the milliseconds is actually preferred. But generally speaking, trying to coerce this into a string is the wrong way to go around this, and generally you just do the math:
db.data.aggregate([
{ "$project": {
"timestamp": {
"$subtract": [
{ "$divide": [
{ "$subtract": [ "$md", new Date("1970-01-01") ] },
1000
]},
{ "$mod": [
{ "$divide": [
{ "$subtract": [ "$md", new Date("1970-01-01") ] },
1000
]},
1
]}
]
}
}}
])
Which returns you an epoch timestamp in seconds. Basically derived from when one BSON date object is subtracted from another one then the result is the time interval in milliseconds. Using the initial epoch date of "1970-01-01" results in essentially extracting the milliseconds value from the current date value. The $divide operator essentially takes off the milliseconds portion and the $mod does the modulo to implement rounding.
Really though you are better off doing the work in the native language for your application as all BSON dates will be returned there as a native "date/datetime" type where you can extract the timestamp value. Consider the JavaScript basics in the shell:
var date = new Date()
( date.valueOf() / 1000 ) - ( ( date.valueOf() / 1000 ) % 1 )
Typically with aggregation you want to do this sort of "math" to a timestamp value for use in something like aggregating values within a time period such as a day. There are date operators available to the aggregation framework, but you can also do it the date math way:
db.data.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$md", new Date("1970-01-01") ] },
{ "$mod": [
{ "$subtract": [ "$md", new Date("1970-01-01") ] },
1000 * 60 * 60 * 24
]}
]
},
"count": { "$sum": 1 }
}}
])
That form would be more typical to emit a timestamp rounded to a day, and aggregate the results within those intervals.
So your purposing of the aggregation framework just to extract a timestamp does not seem to be the best usage or indeed it should not be necessary to convert this to seconds rather than milliseconds. In your application code is where I think you should be doing that unless of course you actually want results for intervals of time where you can apply the date math as shown.
The methods are there, but unless you are actually aggregating then this would be the worst performance option for your application. Do the conversion in code instead.
Related
Is there a feature in mongodb that I can use to get the last inserted item per day ? I have a collection where I need to get the last inserted item per day, the data is grouped on an hourly basis like in the structure below.
{
timestamp: 2017-05-04T09:00:00.000+0000,
data: {}
},
{
timestamp: 2017-05-04T10:00:00.000+0000,
data: {}
}
I thought about using a projection but I am not quite sure how I could do this.
Edit: Also, since mongodb stores data in UTC, I would like to account for the offset as well.
You can $sort and use $last for the item, with rounding out the grouping key to each day:
db.collection.aggregate([
{ "$sort": { "timestamp": 1 } },
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]},
new Date(0)
]
},
"lastDoc": { "$last": "$$ROOT" }
}}
])
So the sort makes things appear in order, and then the grouping _id is rounded for each day by some date math. You subtract the epoch date from the current date to make it a number. Use the modulus to round to a day, then add the epoch date to the number to return a Date.
So stepping through the math we have getting the timestamp value from the date with the $subract line. We do this a couple of times:
{ "$subtract": [ "$timestamp", new Date(0) ] }
// Is roughly internally like
ISODate("2017-06-06T10:44:37.627Z") - ISODate("1970-01-01T00:00:00Z")
1496745877627
Then there is the modulo with $mod which when applied to the numeric value returns the difference. The 1000 milliseconds * 60 seconds * 60 * minutes * 24 hours gives the other argument:
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 60 * 24
]}
// Equivalent to
1496745877627 % (1000 * 60 * 60 * 24)
38677627
Then there is the wrapping $subtract of the two numbers:
{ "$subtract": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]}
// Subtract "difference" of the modulo to a day
// from the milliseconds value of the current date
1496745877627 - 38677627
1496707200000
Then add back to the epoch date value to create a date rounded to the current day, which to the aggregation pipeline basically looks like providing the millisecond value to the constructor:
new Date(1496707200000)
ISODate("2017-06-06T00:00:00Z")
Which takes the timestamp value and subrtacts out the difference of the divisor from "one day" and ends up at the time at the "start of day".
Just using $$ROOT here to represent the whole document. But any document path provided to $last here provides the result.
I'm trying to use mongo aggregation to group documents by the week of a timestamp on each document. All timestamps are stored in UTC time and I need to calculating the week using the clients time not UTC time.
I can provide and add the clients UTC offset as shown below but this doesn't always work due to daylight savings time. The offset is different depending on the date and therefore adjusting all the timestamps with the offset of the current date won't do.
Does anyone know of a way to group by week that consistently accounts for daylight savings time?
db.collection.aggregate([
{ $group:
{ "_id":
{ "$week":
{ "$add": [ "$Timestamp", clientsUtcOffsetInMilliseconds ] }
}
},
{ "firstTimestamp":
{ "$min": "$Timestamp"}
}
}
]);
The basic concept here is to make your query "aware" of when "daylight savings" both "starts" and "ends" for the given query period and simply supply this test to $cond in order to determine which "offset" to use:
db.collection.aggregate([
{ "$group": {
"_id": {
"$week": {
"$add": [
"$Timestamp",
{ "$cond": [
{ "$and": [
{ "$gte": [
{ "$dayOfyear": "$Timestamp" },
daylightSavingsStartDay
]},
{ "$lt": [
{ "$dayOfYear": "$Timestamp" },
daylightSavingsEndDay
]}
]},
daylightSavingsOffset,
normalOffset
]}
]
}
},
"min": { "$min": "$Timestamp" }
}}
])
So you can make that a little more complex when covering several years, but it still is the basic principle. In in the southern hemisphere you are always spanning a year so each condition would be a "range" of "start" to "end of year" and "begining of year" to "end". Therefore an $or with an inner $and within the "range" operators demonstrated.
If you have different values to apply, then detect when you "should" choose either and then apply using $cond.
I'm using $group to group my post by hour like:
"$group" : {
"_id" : {
"$hour" : {
$add : ["$createdAt", 10*60*60*1000]
}
},
...
}
But now I also want to group by half-of-hour, it's mean:
2:30 => 3:00
2:29 => 2:00
How I using mongo aggregate to pass this trouble?
Sr for my bad English. :)
I gather the +10 here is for a timezone adjustment. The same basic principles apply to producing the date with 30 minute rounding, except you want to first just convert to a numeric value and work back the intervals via a modulo ( $mod ):
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [
{ "$add": [ "$createdAt", 1000 * 60 * 60 * 10 ] },
new Date(0)
]},
{ "$mod": [
{ "$subtract": [
{ "$add": [ "$createdAt", 1000 * 60 * 60 * 10 ] },
new Date(0)
]},
1000 * 60 * 30
]}
]},
new Date(0)
]
},
"count": { "$sum": 1 } // or whatever accumulation required
}}
Using the epoch date ( Date(0) ) with a $subtract operation from the stored date ( adjusted ) will return the milliseconds since epoch as a numeric value from the date stored. The modulo operation to the milliseconds in 30 minutes returns the remainder from the current date and you then $subtract that again to get a rounded interval.
The same is present with the $add operation where the epoch date object to a numeric value returns a Date again.
So every interval start is now the grouping key, as of every 30 minutes.
You can alternately use date aggregation operators, but this returns a BSON Date object which will be translated in API rather than just an numeric value for the "minutes" interval.
It's just standard "date math", so all the same operations apply.
I am creating a mongo aggregation query which use a $subtract operator in my $match block. As explained in these codes below.
This query doesn't work:
db.coll.aggregate(
[
{
$match: {
timestamp: {
$gte: {
$subtract: [new Date(), 24 * 60 * 60 * 1000]
}
}
}
},
{
$group: {
_id: {
timestamp: "$timestamp"
},
total: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
timestamp: "$_id.timestamp",
total: "$total",
}
},
{
$sort: {
timestamp: -1
}
}
]
)
However, this second query work:
db.coll.aggregate(
[
{
$match: {
timestamp: {
$gte: new Date(new Date() - 24 * 60 * 60 * 1000)
}
}
},
{
$group: {
_id: {
timestamp: "$timestamp"
},
total: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
timestamp: "$_id.timestamp",
total: "$total",
}
},
{
$sort: {
timestamp: -1
}
}
]
)
I need to use $subtract on my $match block so I can't use the last query.
As of mongodb 3.6 you can use $subtract in the $match stage via the $expr. Here's the docs: https://docs.mongodb.com/manual/reference/operator/query/expr/
I was able to get a query like what you're describing via this $expr and a new system variable in mongodb 4.2 called $$NOW. Here is my query, which gives me orders that have been created within the last 4 hours:
[
{ $match:
{ $expr:
{ $gt: [
"$_created_at",
{ $subtract: [ "$$NOW", 4 * 60 * 60 * 1000] } ]
}
}
}
]
Well you cannot do that and you are not meant to do so either. Another valid thing is that you say to "need" to do this but in reality you really do not.
Pretty much all of the general aggregation operators outside of the pipeline operators are really only valid within a $project or a $group pipeline stage. Mostly within $project but certainly not in others.
A $match pipeline is really the same as a general "query" operation, so the only things valid in there are the query operators.
As for the case for your "need", any "value" that is submitted within an aggregation pipeline and particularly within a $match needs to be evaluated outside of the actual pipeline before the BSON representation is sent to the server.
The only exception is the notation that defines variables in the document, particularly "fieldnames" such a "$fieldname" and then only really in $project or $group. So that means something that "refers" to an existing value of a document, and that is something that cannot be done within any type of "query" document expression.
If you need to work with the value of another field in the document then you work it out with $project first, as in:
db.collection.aggregate([
{ "$project": {
"fieldMath": { "$subtract": [ "$fieldOne", "$fieldTwo" ] }
}},
{ "$match": { "fieldMath": { "$gt": 2 } }}
])
For any other purpose you really want to evaluate the value "outside" the pipeline.
The above answers the question you asked, but this answers the question you didn't ask.
Your pipeline doesn't make any sense since grouping on the "timestamp" alone would be unlikely to group anything since the values are of millisecond accuracy and there is likely not to be more than just a few at best for very active systems.
It appears like you are looking for the math to group by "day", which you can do like this:
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]
},
"total": { "$sum": "$total" }
}}
])
That "rounds" your timestamp value to a single day and has a much better chance of "aggregating" something than you would otherwise have.
Or you can use the "date aggregation operators" to do much the same thing with a composite key.
So if you want to "query" then it evaluates externally. If you want to work on a value "within the document" then you must do so in either a $project or $group pipeline stage.
The $subtract operator is a projection-operator. It is only available during a $project step. So your options are:
(not recommended) Add a $project-step before your $match-step to convert the timestamp field of all documents for the following match-step. I would not recommend you to do this because this operation needs to be performed on every single document on your database and prevents the database from using an index on the timestamp field, so it could cost you a lot of performance.
(recommended) Generate the Date you want to match against in the shell / in your application. Generate a new Date() object, store it in a variable, subtract 24 hours from it and perform your 2nd query using that variable.
I have a MongoDB whom store the date objects in UTC. Well, I want to perform aggregation by year,month day in a different timezone (CET).
doing this, works fine for UTC:
BasicDBObject group_id = new BasicDBObject("_id", new BasicDBObject("year", new BasicDBObject("$year", "$tDate")).
append("month", new BasicDBObject("$month", "$tDate")).
append("day", new BasicDBObject("$dayOfMonth", "$tDate")).
append("customer", "$customer"));
BasicDBObject groupFields = group_id.
append("eventCnt", new BasicDBObject("$sum", "$eventCnt"));
BasicDBObject group = new BasicDBObject("$group", groupFields);
or, if you use the command line (not tested, I only tested the java version):
{
$group: {
_id: {
"year": {
"$year", "$tDate"
},
"month": {
"$month", "$tDate"
},
"day": {
"$dayOfMonth", "$tDate"
},
"customer": "$customer"
},
"eventCount": {
"$sum": "$eventCount"
}
}
}
How do I convert these dates into CET inside the aggregation framework?
For example '2013-09-16 23:45:00 UTC' is '2013-09-17 00:45:00 CET', this is a different day.
I'm not an expert on CET and its relation to UTC, but the following code (for the shell) should do a proper conversion (adding an hour) to a MongoDB date type:
db.dates.aggregate(
{$project: {"tDate":{$add: ["$tDate", 60*60*1000]}, "eventCount":1, "customer":1}}
)
If you run that project command before the rest of your pipeline, the results should be in CET.
You can provide the timezone to the date operators starting in 3.6.
Replace the timezone with your timezone.
{
"$group":{
"_id":{
"year":{"$year":{"date":"$tDate","timezone":"America/Chicago"}},
"month":{"$month":{"date":"$tDate","timezone":"America/Chicago"}},
"dayOfMonth":{"$dayOfMonth":{"date":"$tDate","timezone":"America/Chicago"}}
},
"count":{"$sum":1}
}
}
After searching for hours, this is the solution that worked for me. It is also very simple. Just convert the timezone by subtracting the timezone offset in milliseconds.
25200000 = 7 hour offset // 420 min * 60 sec * 1000 mili
$group: {
_id = {
year: { $year : [{ $subtract: [ "$timestamp", 25200000 ]}] },
month: { $month : [{ $subtract: [ "$timestamp", 25200000 ]}] },
day: { $dayOfMonth : [{ $subtract: [ "$timestamp", 25200000 ]}] }
},
count = {
$sum : 1
}
};
Use for example moment.js to dertmine the current timezone offset for CET but this way you get the summer&winter offsets
var offsetCETmillisec = moment.tz.zone('Europe/Berlin').offset(moment())* 60 * 1000;
$group: {
_id: {
'year': {'$year': [{ $subtract: [ '$createdAt', offsetCETmillisec ]}] },
'month': {'$month': [{ $subtract: [ '$createdAt', offsetCETmillisec ]}] },
'day': {'$dayOfMonth': [{ $subtract: [ '$createdAt', offsetCETmillisec ]}] }
},
count: {$sum: 1}
}
}
MongoDB's documentation suggests that you save the timezone offset alongside the timestamp:
var now = new Date();
db.data.save( { date: now,
offset: now.getTimezoneOffset() } );
This is of course not the ideal solution – but one that works, until we have in MongoDb's aggregation pipeline a proper $utcOffset function.
The solution with timezone is a good one, but in version 3.6 you can also format the output using timezone, so, you get the result ready for use:
{
"$project":{
"year_month_day": {"$dateToString": { "format": "%Y-%m-%d", "date": "$tDate", "timezone": "America/Chicago"}}
},
"$group":{
"_id": "$year_month_day",
"count":{"$sum":1}
}
}
Make sure that your "$match" also considers timezone, or else you will get wrong results.
Mongo stores the dates in UTC,
so this is the procedure to get them in other zone
check that mongo saves the dates in UTC, insert some records etc.
get timezone offset with moment-timezone.js eg moment().tz('Europe/Zagreb').utcOffset() functions, for your
specified timezone
Prepare $gte and $lte for $match stage (eg user input for dates 1.1.2019 - 13.1.2019.):
If offset is positive subtract() those seconds in $match stage; If offset is negative add() those seconds in $match stage
Then normalize the dates (because $match stage will return them in UTC) to your zone like this:
-if timezone offset is positive add() those seconds in $project stage;
-if timezone offset is negative subtract() those seconds in $project stage.
$group goes last, this is important (because we want to group normalized results, and not $match-ed)
Basically it is this: shift input(s) to $match(UTC), and then normalize to your timezone.
<?php
date_default_timezone_set('Asia/Karachi');
$date=getdate(date("U"));
$day = $date['mday'];
$month =$date['mon'];
$year = $date['year'];
$currentDate = $year.'-'.$month.'-'.$day;
?>