After midnight times in postgresql - postgresql

I have data from a text file I'm reading into a postgres 9.1 table, and the data looks like this:
451,22:30:00,22:30:00,San Jose,1
451,22:35:00,22:35:00,Santa Clara,2
451,22:40:00,22:40:00,Lawrence,3
451,22:44:00,22:44:00,Sunnyvale,4
451,22:49:00,22:49:00,Mountain View,5
451,22:53:00,22:53:00,San Antonio,6
451,22:57:00,22:57:00,California Ave,7
451,23:01:00,23:01:00,Palo Alto,8
451,23:04:00,23:04:00,Menlo Park,9
451,23:07:00,23:07:00,Atherton,10
451,23:11:00,23:11:00,Redwood City,11
451,23:15:00,23:15:00,San Carlos,12
451,23:18:00,23:18:00,Belmont,13
451,23:21:00,23:21:00,Hillsdale,14
451,23:24:00,23:24:00,Hayward Park,15
451,23:27:00,23:27:00,San Mateo,16
451,23:30:00,23:30:00,Burlingame,17
451,23:33:00,23:33:00,Broadway,18
451,23:38:00,23:38:00,Millbrae,19
451,23:42:00,23:42:00,San Bruno,20
451,23:47:00,23:47:00,So. San Francisco,21
451,23:53:00,23:53:00,Bayshore,22
451,23:58:00,23:58:00,22nd Street,23
451,24:06:00,24:06:00,San Francisco,24
It is from a timetable for a commuter rail line, Caltrain. I'm trying to query stations, to get train arrival and departure times. I did this several months ago in MySql, and I got
select * from trains as a, trains as b where a.trip_id=b.trip_id and a.st
op_id='San Antonio' and b.stop_id='San Carlos' and a.arrival_time < b.arrival_ti
me;
So far so good, pretty straightforward. However, when I tried copying the data into a postgres database, I got an error for the various columns that had times after midnight, either 24 or 25:00:00 something. However, if I change them to be 00:00:00 and 01:00:00 something, won't that mess with the query? A time after midnight will appear to be before the starting time? MySql apparently didn't have a problem with those times, and I'm not sure what to do. I'm thinking I should use the last column, or maybe convert the times to something that doesn't take into account PM/AM?

You should try using the interval type for the time columns. Those will keep track of the number of hours, minutes, and seconds instead of trying to record a time of day.
See the PostgreSQL documentation on dates and times.
An interval can have a time component greater than 24 hours, unlike the time datatype that is confined to 00:00 <= x <= 23:59.

Related

RethinkDB Filter for range of dates and times

My database table has the timestamp format "2019-12-08T13:03:16.502639-0600". I am having difficult figuring out how to filter a 24 hour range of dates so I only get the data plucked from inside that 24 hour running period of time. I need something like now() then back 24 hours on a running basis. I have tried several ways but none of them work.
Do I need to parse the timestamp first and only use the parts I need?
In Python, the query could look like this:
r.table('MyDB').filter(r.row['timestamp'] > r.now() - 24*60*60).run(con)

How do i divide my minute data into tables containing each month in Timescaledb (PostgreSQL extension)

I am new to timescaledb and I want to store one minute ohlcv ticks for each stock in the table. There are 1440 ticks generated daily for one stock and 43200 ticks a month. I have a 100 assets whose ticks I would like to store every month and basically have the tables divide every 30 days or so, so that I dont have to build complex logic for this division. Any suggestions on how this can be done with timescale DB.
Currently, the way I am doing it is
Take incoming tick (ex timestamp 1535090420)
Round its timestamp to the nearest 30 day period (1535090420/(86400 * 30)) = 592.241674383
Round this number to 592 and multiply by interval to get 1534464000 which is the nearest 30 day bucket inside which all the ticks should be stored So I create a table called OHLC_1534464000 if not exists and add the ticks there
Is there a better way to do this
Unless I'm missing something, it sounds like you just want a table partitioned by 30 day intervals that will automatically know which table to put incoming data into based on the timestamp. This is exactly the functionality TimescaleDB's hypertables.
You can create your table using the following syntax:
SELECT create_hypertable('ohlc', 'timestamp', chunk_time_interval => interval '30 days');
This will create 30 day chunks (just Postgres tables under the hood) and transparently place each tick in the appropriate chunk based on its timestamp. It will automatically create a new chunk once a tick passing the current 30 day period comes in. You can just insert and query on the main table (ohlc) as though it were a single Postgres table.

Need to split out whole and partial hours from a time period (postgresql)

I have a table of data that data that contains a date, time-in, and time-out pulled from a postgresql db table. Currently the table shows the time a person checked in, and out over a period of hours. I need to use postgresql to split out the hours between time-in and time-out such that I capture each full hour (eg. 10am-11am) and each fraction (eg. 4.15pm to 5pm) of the hour during which that person was checked in. So if a person checked in at 9.30am, took a 1 hour lunch break at 12, and checked out at 4.30pm, my current table would show two rows for that member i.e. one row for the time before lunch break, and the time after the lunch break. I want to show each hour (whole or partial) on each row, with 1 representing that 1 whole hour, and the minute portion to capture the partial hour worked.
Below are the before and after images. Any help is appreciated.
Shows that I want to convert from
Shows that I want to convert to
Thanks
try this.
CAST(time_out - time_in AS TIME)

inconsistency between month, day, second representation of interval data type

I understand why postgresql uses month,day and second fields to representate the sql interval datatype. A month is not always the same length and a day can have 23, 24 or 25 hours if a daylight savings time adjustment is involved. this is from postgresql documentation.
But I then do not understand why this is not consequently handled both for months and days. see the following query which calculates an exact interval where the number of seconds between two points in time is exactly calculatable:
select ('2017-01-01'::timestamp-'2016-01-01'::timestamp); -->366 days.
postgresql chooses to give a result in days. not in months and not in seconds.
But why is the result days and not seconds? it is NOT defined how long days are (they can be 23,24 or 25 hours long). so why does he not give output in seconds?
Then since the length of months is also not defined, why doesn't postgresql give an output of 12 month instead of 366 days?
He does not care that the length of days is not defined, but obviously he cares that the length of month is not defined.
Why this asymmetrie?
For further explanation, see this query:
select ('10 days'::interval-'24 hours'::interval); --> 10 days -24:00:00
you see that postgresql correctly refuses to answer with 9 days. He is pretty aware of the problem that days and hours cannot be interchanged. But then again why does the first query return days?
I can't answer your question, but I think I can point you in the right direction. I think the book SQL-99 Complete, Really is the most accessible source for understanding SQL intervals. It's available online: https://mariadb.com/kb/en/sql-99/08-temporal-values/.
SQL standards describe two kinds of intervals: year-month intervals and day-time intervals. It does this to prevent month parts and day parts from appearing in the same interval, because, as you already know, the number of days in a month is ambiguous. The number of days in the interval '3' month depends on which three months you're talking about.
I think this is the verbose, standard SQL way to write your first query.
select cast(timestamp '2017-01-01' - timestamp '2016-01-01' as interval day to hour) as new_column;
new_column
interval day to hour
--
366 days
I suspect that you'll find that SQL standards have rules for what a SQL dbms is supposed to do when things like interval day to hour are omitted. PostgreSQL might or might not follow those rules.
postgresql chooses to give a result in days. not in months and not in seconds.
Standard SQL prevents month parts and day parts from appearing in the same interval. Also, the range of valid seconds is from 0 to 59.
select interval '59' second;
interval
interval second
--
00:00:59
select interval '60' second;
interval
interval second
--
00:01:00

Web analytics schema with postgres

I am building a web analytics tool and use Postgresql as a database. I will not insert postgres each user visit but only aggregated data each 5 seconds:
time country browser num_visits
========================================
0 USA Chrome 12
0 USA IE 7
5 France IE 5
As you can see each 5 seconds I insert multiple rows (one per each dimensions combination).
In order to reduce the number of rows need to be scanned in queries, I am thinking to have multiple tables with the above schema based on their resolution: 5SecondResolution, 30SecondResolution, 5MinResolution, ..., 1HourResolution. Now when the user asks about the last day I will go to the hour resolution table which is smaller than the 5 sec resolution table (although I could have used that one too - it's just more rows to scan).
Now what if the hour resolution table has data on hours 0,1,2,3,... but users asks to see hourly trend from 1:59 to 8:59. In order to get data for the 1:59-2:59 period I could do multiple queries to the different resolutions tables so I get 1:59:2:00 from 1MinResolution, 2:00-2:30 from 30MinResolution and etc. AFAIU I have traded one query to a huge table (that has many relevant rows to scan) with multiple queries to medium tables + combine results on client side.
Does this sound like a good optimization?
Any other considerations on this?
Now what if the hour resolution table has data on hours 0,1,2,3,... but users asks to see hourly trend from 1:59 to 8:59. In order to get data for the 1:59-2:59 period I could do multiple queries to the different resolutions tables so I get 1:59:2:00 from 1MinResolution, 2:00-2:30 from 30MinResolution and etc.
You can't do that if you want your results to be accurate. Imagine if they're asking for one hour resolution from 01:30 to 04:30. You're imagining that you'd get the first and last half hour from the 5 second (or 1 minute) res table, then the rest from the one hour table.
The problem is that the one-hour table is offset by half an hour, so the answers won't actually be correct; each hour will be from 2:00 to 3:00, etc, when the user wants 2:30 to 3:30. It's an even more serious problem as you move to coarser resolutions.
So: This is a perfectly reasonable optimisation technique, but only if you limit your users' search start precision to the resolution of the aggregated table. If they want one hour resolution, force them to pick 1:00, 2:00, etc and disallow setting minutes. If they want 5 min resolution, make them pick 1:00, 1:05, 1:10, ... and so on. You don't have to limit the end precision the same way, since an incomplete ending interval won't affect data prior to the end and can easily be marked as incomplete when displayed. "Current day to date", "Hour so far", etc.
If you limit the start precision you not only give them correct results but greatly simplify the query. If you limit the end precision too then your query is purely against the aggregated table, but if you want "to date" data it's easy enough to write something like:
SELECT blah, mytimestamp
FROM mydata_1hour
WHERE mytimestamp BETWEEN current_date + INTERVAL '1' HOUR AND current_date + INTERVAL '4' HOUR
UNION ALL
SELECT sum(blah), current_date + INTERVAL '5' HOUR
FROM mydata_5second
WHERE mytimestamp BETWEEN current_date + INTERVAL '4' HOUR AND current_date + INTERVAL '5' HOUR;
... or even use several levels of union to satisfy requests for coarser resolutions.
You could use inheritance/partition. One resolution master table and many hourly resolution children tables ( and, perhaps, many minutes and seconds resolution children tables).
Thus you only have to select from the master table only, let the constraint of each children tables decide which is which.
Of course you have to add a trigger function to separate insert into appropriate children tables.
Complexities in insert versus complexities in display.
PostgreSQL - View or Partitioning?