Time-span aggregation on MongoDB - mongodb

Let there be a MongoDB collection Data that holds the history of temperatures of some items. So the items are like this:
{
"itemId" : "ABCD",
"timePoint" : NumberLong("1618922410288"),
"temperature" : 15.15
}
meaning, that the item "ABCD" had a temperature 15.15 at 1618922410288ms since the epoch.
What is the query that will result in the history of some span-average temperatures of the given item starting from the given time point?
So e.g. for itemId="ABCD", span=60*60*1000, startingTimePoint=0 it has to be a query that will result to the hourly-avarage temperatures of "ABCD" starting throm 1970-01-01 00:00:00.000.
I would also like the query not to be "welded" with the average, but rather accept an arbitrary aggregation function, so it will be easy to use the same technique for avg, min, max, or something else. But this is the second question.

Related

Return documents created less than an hour ago - Elasticsearch query

I need to write a query in ES that only returns documents that have been created less than N minutes, or hours, ago.
There's a createdTimeStamp field in millis, and I am able to write a simple query like this:
{
"query": {
"match": {
"createdTimeStamp": "1526011575731"
}
}
}
However, this query returns the documents where the createdTimeStamp matches the value "1526011575731". Not sure if a range query would work here as the field stores millis values.
You can use range together with now for that. E.g. to get the last hour:
{
"query": {
"range" : {
"createdTimeStamp" : {
"gte" : "now-1h"
}
}
}
}
For minutes, use m instead of h. You can also round, etc., see Date Math documentation.
That your date is in milliseconds should not make a difference, as internally, all comparisons are handled like that anyway:
Internally, dates are converted to UTC (if the time-zone is specified) and stored as a long number representing milliseconds-since-the-epoch.
Queries on dates are internally converted to range queries on this long representation, and the result of aggregations and stored fields is converted back to a string depending on the date format that is associated with the field.
(from https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html)

Mongodb: Get avg duration of products in inventory

I have a collection with $vehicleId and $Scraped Date. I am trying to get the avg days a car is in inventory. And I want to calculate it for all the historical days.
Sample Doc
{"_id":{"$oid":"5e1b46d853848fae2832e01a"},"Scraped Date":{"$date":{"$numberLong":"1578845911324"}},"vehicleId":{"$numberInt":"1376788"}}
{"_id":{"$oid":"5e1b46d853848fae2832e01b"},"Scraped Date":{"$date":{"$numberLong":"1578845911324"}},"vehicleId":{"$numberInt":"1376771"}}
{"_id":{"$oid":"5e1b46d853848fae2832e01c"},"Scraped Date":{"$date":{"$numberLong":"1578845911324"}},"vehicleId":{"$numberInt":"1376734"}}
{"_id":{"$oid":"5e1b46d853848fae2832e01d"},"Scraped Date":{"$date":{"$numberLong":"1578845911324"}},"vehicleId":{"$numberInt":"1376706"}}
{"_id":{"$oid":"5e1b46d853848fae2832e01e"},"Scraped Date":{"$date":{"$numberLong":"1578845911324"}},"vehicleId":{"$numberInt":"1376505"}}
collection.aggregate([
{'$group': {
'_id' : {'vehicleId': '$vehicleId'},
'date' : {'$addToSet': "$Scraped Date"}
} }
]
)
This code is giving me a list of dates the vehicleId was found in the inventory. How can I convert this to list of dates with avg length the cars were in inventory for that day? I could think of finding the avg length of the dates column but that wont give the me the data day wise.
The current output looks like this in a dataframe:
dataframe view
I figured out a solution. Created a simple for loop for every date and then used the $match query to first filter the results and then calculate the avg length. The question is closed for now. I will update the code in the original question in a while

Fetch documents from a MongoDB collection based on timestamp

I have a MongoDB collection with documents that look like this
{
"_id" : ObjectId("5aab91b2caa256021558f3d2"),
"Timestamp" : "2017-11-16T14:43:07.5357785+01:00",
"status" : 1,
"created_at" : 1521193394,
"updated_at" : 1521193394,
"deleted_at" : ""
}
Data gets entered into the collection every 15 minutes. Using the created_at field, which is in epoch time, I would like to find a way to fetch data at the top of every hour. So for example, data is entered at 12.00 12.15 12.30 12.45 13.00 13.15 13.30 13.45 14.00.
I would like to fetch entries from the collection that were entered at 12.00 13.00 and 14.00.
I am also open to suggestions as to whether or not using epoch time is the best way to go about it.
Using epoch time is really a good way to go.
Since you are stoing in seconds, every round hour can be divisible by 3600(seconds in hours) without remainder. You can make use of this property to find your documents.
db.collection.find({created_at: {$mod: [ 3600, 0 ]}});
According to $mod documentation, it will,
Select documents where the value of a field divided by a divisor has
the specified remainder
We provided divisor as 3600 and remainder as 0. This should give what you expect.
To ignore seconds:
For this condition, mod(epoch, 3600) should be less than 59. This query can be formed using $expr of mongo 3.6
db.collection.find({$expr: {$lte: [{ $mod: [ '$created_at', 3600 ] }, 59]}});
Hope this helps!

MongoDB query to retrieve distinct documents by date

I have documents in the database with a dateTime value like so:
{
"_id" : ObjectId("5a66fa22d29dbd0001521023"),
"exportSuccessful" : true,
"month" : 0,
"week" : 4,
"weekDay" : "Mon",
"dateTime" : ISODate("2018-01-22T09:02:26.525Z"),
"__v" : 0
}
I'd like to:
query the database for a given date and have it return the document that contains the dateTime if the date matches (I don't care about the time). This is mainly to test before inserting a document that there isn't already one for this date. In the above example, if my given date is 2018-01-22 I'd like the document to be returned.
retrieve all documents with a distinct date from the database (again, I don't care about the time portion). If there are two documents with the same date (but different times), just return the first one.
From what I understand Mongo's ISODate type does not allow me to store only a date, it will always have to be a dateTime value. And on my side, I don't have control over what goes in the database.
Try range query with start date time from start of the day to end date time to end of the day. So basically create dates a day apart.
Something like
var start = moment().utc().startOf('day');
var end = moment().utc().endOf('day');
db.collection.find({
dateTime: {
$gte: start,
$lte: end
}
})
Get all distinct dates documents:
db.collection.aggregate(
{"$group":{
"_id":{
"$dateToString":{"format":"%Y-%m-%d","date":"$dateTime"}
},
"first":{
"$first":"$$ROOT"
}
}}])

elasticsearch filter dates based on datetime fields

assuming I have the following nested document structure, where my document contains nested routes with an array of date time values.
{
property_1: ...,
routes: [
{
start_id: 1,
end_id: 2,
execution_times: ['2016-08-28T11:11:47+02:00', ...]
}
]
}
Now I could filter my documents that match certain execution_times with something like this.
query: {
filtered: {
query: {
match_all: { }
},
filter: {
nested: {
path: 'routes',
filter: {
bool: {
must: [
{
terms: {
'routes.execution_times': ['2016-08-28T11:11:47+02:00', ...]
}
},
...
]
}
}
}
}
}
}
But what if I would like to filter my documents based on execution dates. What's the best way achieving this?
Should I use a range filter to map my dates to time ranges?
Or is it better to use a script query and do a conversion of the execution_times to dates there?
Or is the best way to change the document structure to contain both, the execution_date and execution_time?
Update
"The dates are not a range but individual dates like [today, day after tomorrow, 4 days from now, 10 days from now]"
Well, this is still a range as a day means 24 hours. So if you store your field as date time, you can use leverage range query : from 20-Nov-2010 00:00:00 TO 20-Nov-2010 23:59:59 with appropriate time zone for a specific day.
If you store it as a String then you will lose all the flexibility of date maths and you would be able to do only exact String matches. You will then have to do all the date manipulations at the client side to find exact matches and ranges.
I suggest play with range queries using Sense plugin and I am sure it will satisfy almost all your requirements.
-----------------------
You should make sure that you use appropriate date-time mapping for your field and use range filter over that field. You don't need to split into 2 separate fields. Date maths will allow you to query just based on date.
This will make your life much easier if you want to do aggregations over date time field.
Reference:
Date Maths:
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#date-math
Date Mapping : https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html
Date Range Queries:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html