How Do I Structure MongoDB To Query By Date Time Per Minute? - mongodb

I am trying to store the price of stocks price per minute, so I can easily return results based on the minute date time per minute interval and store historical data, so i can query like last 24 hours, last 30 days etc (please also let me know if this is wrong approach)
for example if i check current time with fmt.Println("time now: ", time.Now()) i get the following date time 2022-01-29 11:47:02.398118591 +0000 UTC m=+499755.770119738
so what i want is to only get up to minute level, so i can store per minute
so i will liek to use this date time 2022-01-29 11:47:00 +0000 UTC
I will like to UTC, so i can stick to that universal time zone to store and retrive data
Each row will be a list of multiple stock price data
Do i need to have the _id field? Am not sure, so just looking for best practice as help.
database name: "stock-price-db"
collection name: "stock-price"
Thinking of something like this, just for example
[
{
"_id" : ObjectId("5458b6ee09d76eb7326df3a4"),
"2022-01-29 11:48:00 +0000 UTC":
[
{
"stock": "TSLA",
"price": "859.83",
"marketcap": "8938289305",
},
{
"stock": "AAPL",
"price": "175.50",
"marketcap": "3648289305",
},
]
},
{
"_id" : ObjectId("5458b6ee09d76eb7326df3a4"),
"2022-01-29 11:47:00 +0000 UTC":
[
{
"stock": "TSLA",
"price": "855.50",
"marketcap": "8848289305",
},
{
"stock": "AAPL",
"price": "172.96",
"marketcap": "3638289305",
},
]
},
]
First, is this the right way to do store this type of data in mongodb and how do I structure the model to store the data this way so I can store the data per minute interval, so I can query per minute interval?

There are few drawbacks in your design.
Do not use dynamic keys - you will end up using few extra aggregation pipelines.
Store the date in a static-key field i.e time:ISODate()
Better store all the available time units, till milliseconds, it will be helpful to handle the future requirement changes
If there are too many stocks changes, it is not a scalable design.
If you want to find out historical data for a stock, provided design may have performance issues.
You will end up with issues in sharding.
What other alternatives:
Not all the use-cases can be solved by one design.
If this use case is purely for time series use case, I would recommend you to use a time series design/ time series database i.e influx, tsdb.
If you need to cover all the use-cases, normalise and use GQL.

Related

Time-span aggregation on MongoDB

Let there be a MongoDB collection Data that holds the history of temperatures of some items. So the items are like this:
{
"itemId" : "ABCD",
"timePoint" : NumberLong("1618922410288"),
"temperature" : 15.15
}
meaning, that the item "ABCD" had a temperature 15.15 at 1618922410288ms since the epoch.
What is the query that will result in the history of some span-average temperatures of the given item starting from the given time point?
So e.g. for itemId="ABCD", span=60*60*1000, startingTimePoint=0 it has to be a query that will result to the hourly-avarage temperatures of "ABCD" starting throm 1970-01-01 00:00:00.000.
I would also like the query not to be "welded" with the average, but rather accept an arbitrary aggregation function, so it will be easy to use the same technique for avg, min, max, or something else. But this is the second question.

Fetch documents from a MongoDB collection based on timestamp

I have a MongoDB collection with documents that look like this
{
"_id" : ObjectId("5aab91b2caa256021558f3d2"),
"Timestamp" : "2017-11-16T14:43:07.5357785+01:00",
"status" : 1,
"created_at" : 1521193394,
"updated_at" : 1521193394,
"deleted_at" : ""
}
Data gets entered into the collection every 15 minutes. Using the created_at field, which is in epoch time, I would like to find a way to fetch data at the top of every hour. So for example, data is entered at 12.00 12.15 12.30 12.45 13.00 13.15 13.30 13.45 14.00.
I would like to fetch entries from the collection that were entered at 12.00 13.00 and 14.00.
I am also open to suggestions as to whether or not using epoch time is the best way to go about it.
Using epoch time is really a good way to go.
Since you are stoing in seconds, every round hour can be divisible by 3600(seconds in hours) without remainder. You can make use of this property to find your documents.
db.collection.find({created_at: {$mod: [ 3600, 0 ]}});
According to $mod documentation, it will,
Select documents where the value of a field divided by a divisor has
the specified remainder
We provided divisor as 3600 and remainder as 0. This should give what you expect.
To ignore seconds:
For this condition, mod(epoch, 3600) should be less than 59. This query can be formed using $expr of mongo 3.6
db.collection.find({$expr: {$lte: [{ $mod: [ '$created_at', 3600 ] }, 59]}});
Hope this helps!

elasticsearch filter dates based on datetime fields

assuming I have the following nested document structure, where my document contains nested routes with an array of date time values.
{
property_1: ...,
routes: [
{
start_id: 1,
end_id: 2,
execution_times: ['2016-08-28T11:11:47+02:00', ...]
}
]
}
Now I could filter my documents that match certain execution_times with something like this.
query: {
filtered: {
query: {
match_all: { }
},
filter: {
nested: {
path: 'routes',
filter: {
bool: {
must: [
{
terms: {
'routes.execution_times': ['2016-08-28T11:11:47+02:00', ...]
}
},
...
]
}
}
}
}
}
}
But what if I would like to filter my documents based on execution dates. What's the best way achieving this?
Should I use a range filter to map my dates to time ranges?
Or is it better to use a script query and do a conversion of the execution_times to dates there?
Or is the best way to change the document structure to contain both, the execution_date and execution_time?
Update
"The dates are not a range but individual dates like [today, day after tomorrow, 4 days from now, 10 days from now]"
Well, this is still a range as a day means 24 hours. So if you store your field as date time, you can use leverage range query : from 20-Nov-2010 00:00:00 TO 20-Nov-2010 23:59:59 with appropriate time zone for a specific day.
If you store it as a String then you will lose all the flexibility of date maths and you would be able to do only exact String matches. You will then have to do all the date manipulations at the client side to find exact matches and ranges.
I suggest play with range queries using Sense plugin and I am sure it will satisfy almost all your requirements.
-----------------------
You should make sure that you use appropriate date-time mapping for your field and use range filter over that field. You don't need to split into 2 separate fields. Date maths will allow you to query just based on date.
This will make your life much easier if you want to do aggregations over date time field.
Reference:
Date Maths:
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#date-math
Date Mapping : https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html
Date Range Queries:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

Unique hash/index for time interval

I am working on a simple resource booking app. The use of the resource is exclusive so it can't be booked more than once at the same time. I am wondering if this constraint can be enforced by a unique index instead of having to build validation in code.
The resource can only be booked by block of 30 minutes and the start and end time must be on the hour OR the half hour. So the booking can modeled as an array of blocks that are unique (dividing the timestamp in chunks of 30 minutes).
Can anyone think of a way to hash that so any booking with one or more 30-min. block in common would violate the unique index condition?
NB: I am using MongoDB (I don't think it really matters)
I am wondering if this constraint can be enforced by a unique index instead of having to build validation in code.
Use an unique compound index on the resource id, day and chunk of 30 minutes. Then insert one document for each 30 minutes of period of reservation.
For example, to reserve the resource id 123 on 9 of June 2015 from 8:00 to 9:30 (16th, 17th and 18th 30 minutes period of the day), you insert 3 documents:
> db.booking.createIndex({resource: 1,
day: 1, period:1},{unique:true})
{
resource: 123,
day: ISODate("2015-09-06"),
period: 16
},
{
resource: 123,
day: ISODate("2015-09-06"),
period: 17
},
{
resource: 123,
day: ISODate("2015-09-06"),
period: 18
},
Depending the number en entries, you might consider using embedded documents instead:
> db.resource.createIndex({_id: 1,
"booking.day": 1,
"booking:period":1},{unique:true})
And describe your resources like this:
{
_id: 123,
someOtherResourceAttributes: "...",
booking: [
{
day: ISODate("2015-09-06"),
period: 16
},
{
day: ISODate("2015-09-06"),
period: 17
},
{
day: ISODate("2015-09-06"),
period: 18
},
]
},
This has the great advantage that insert/update would be atomic for the whole reservation. But beware that document size is limited to 16M.

Designing a database for querying by dates?

For example, I need to query the last 6 months by the earliest instance for each day (and last day by earliest instances in each hours, and last day by minutes). I was thinking of using MongoDB and having a nested structure like
year:{ month:{ day: { hour: { minute: [array of seconds]
But to get the first instance I would have to sort the array which is costly. Is there an easier way?
Would be better just to have a date field.
And query to be something like:
find(date: {$gt : 'starting_date', $lt : 'ending_date'})