inconsistency between month, day, second representation of interval data type - postgresql

I understand why postgresql uses month,day and second fields to representate the sql interval datatype. A month is not always the same length and a day can have 23, 24 or 25 hours if a daylight savings time adjustment is involved. this is from postgresql documentation.
But I then do not understand why this is not consequently handled both for months and days. see the following query which calculates an exact interval where the number of seconds between two points in time is exactly calculatable:
select ('2017-01-01'::timestamp-'2016-01-01'::timestamp); -->366 days.
postgresql chooses to give a result in days. not in months and not in seconds.
But why is the result days and not seconds? it is NOT defined how long days are (they can be 23,24 or 25 hours long). so why does he not give output in seconds?
Then since the length of months is also not defined, why doesn't postgresql give an output of 12 month instead of 366 days?
He does not care that the length of days is not defined, but obviously he cares that the length of month is not defined.
Why this asymmetrie?
For further explanation, see this query:
select ('10 days'::interval-'24 hours'::interval); --> 10 days -24:00:00
you see that postgresql correctly refuses to answer with 9 days. He is pretty aware of the problem that days and hours cannot be interchanged. But then again why does the first query return days?

I can't answer your question, but I think I can point you in the right direction. I think the book SQL-99 Complete, Really is the most accessible source for understanding SQL intervals. It's available online: https://mariadb.com/kb/en/sql-99/08-temporal-values/.
SQL standards describe two kinds of intervals: year-month intervals and day-time intervals. It does this to prevent month parts and day parts from appearing in the same interval, because, as you already know, the number of days in a month is ambiguous. The number of days in the interval '3' month depends on which three months you're talking about.
I think this is the verbose, standard SQL way to write your first query.
select cast(timestamp '2017-01-01' - timestamp '2016-01-01' as interval day to hour) as new_column;
new_column
interval day to hour
--
366 days
I suspect that you'll find that SQL standards have rules for what a SQL dbms is supposed to do when things like interval day to hour are omitted. PostgreSQL might or might not follow those rules.
postgresql chooses to give a result in days. not in months and not in seconds.
Standard SQL prevents month parts and day parts from appearing in the same interval. Also, the range of valid seconds is from 0 to 59.
select interval '59' second;
interval
interval second
--
00:00:59
select interval '60' second;
interval
interval second
--
00:01:00

Related

How can I always get the full period when grouping by week in PostgreSQL?

I'm used to do the following syntax when analysing weekly data:
select week(creation_date)::date as week,
count(*) as n
from table_1
where creation_date > current_date - 30
group by 1
However, by doing this I will get just part of the first week.
Is there any smart way to alway get a whole week in the beginning?
Like get the first day of the week I would get half of.
First off you need to define what you mean by "week". This is more difficult than it appears. While humans have an intuitive since of a week, computers are just not that smart. There are 2 common conventions: the ISO-8601 Standard and, for lack of a better term, Traditional. ISO-8601 defines a week as always beginning on Monday and always containing 7 days. Traditional weeks begin on Sunday (usually) but may have weeks with less than 7 days. This results from having the 1st week of the year beginning on 1-Jan regardless of day of week. Thus the 1st and/or last weeks may have less than 7 days. ISO-8601 throws it own curve into the mix: the 1st week of the year begins on the week containing 4-Jan. Thus the last days of Dec may be in week 1 of the next year and the first days Jan may be in week 52/53 of the prior year.
All the below assume the ISO-8061.
Secondly there is no week function in Postgres. In you need extract function. So for this particular case:
select extract(week from creation_date)::integer as week, ...
Finally, your predicate (current_date - 30) ensures you will unusually not begin on the 1st of the week. To get the correct date take that result back 1 week, then go forward to the next Monday.
with days_to_monday (day_adj) as
( values ('{7,6,5,4,3,2,1}'::int[]) )
select current_date - 30
, current_date - 30 - 7 + day_adj[extract (isodow from current_date - 30 )]
from table_1 cross join days_to_monday;
The CTE establishes an array which for a given day of the week contains the number of days need to the next Monday. That main query extracts the day of week of current date and uses that to index the array. The corresponding value is added to get the proper date.
Putting that together with your original query to arrive at:
with next_week (monday) as
( values (current_date - 30 - 7
+ ('{7,6,5,4,3,2,1}'::int[])[extract (isodow from current_date - 30 )])
)
select extract(week from creation_date) as week,
count(*) as n
from table_1
where creation_date >= (select monday from next_week)
group by 1
order by 1;
For full example see fiddle.

How do I compare two TIMESTAMP columns to check for a difference of at most 15 minutes?

I'm using PostGres 9.5. I have a column in my table, article, of type TIMESTAMP. I would like to write a query in which one of the conditions is to compare two articles whose dates are separated by at most 15 minutes. I tried this ...
where extract(minute from a2.created_on - a1.created_on) < 15
but I'm realizing this is incorrect. This returns articles separted by 15 minutes but also articles separated by an hour and 15 minutes and two hours, 15 minutes, etc. How do I refine my condition so that it only considers articles separated by 15 minutes?
It should be more simple:
WHERE a2.created_on - a1.created_on < '15min'
Difference of two timestamp values is a interval value.

How to tweak the SET intervalstyle (change the Interval Output) in PostgreSQL?

I have read in this online PostgreSQL documentation... http://www.postgresql.org/docs/9.4/static/datatype-datetime.html#INTERVAL-STYLE-OUTPUT-TABLE
in the point 8.5.5 something about how to tweak the default Interval Output. . I mean the default interval is shown like this...
00:00:00.000 (if the timedifference is lower than a day or month or year)
1 day 00:00:00.000 (if the timedifference reaches days, but is lower than a month or a year)
1 month 1 day 00:00:00.000 (if the timediffence reaches months, but is lower than a year)
1 year 1 month 1 day 00:00:00.000 (if it reaches years, months, days)
it evens uses plurarl cases (years, mons, days) when their values are greater than one.
All these variations make difficult to any other app when SELECTing (query) this interval values (as text) to convert it to a proper time. So I would like postgresql to always show year, month n days, even if their value are 0 (it could be even better if it could show the date part of the interval like this... 01-11-30, adding zeros to the left side when values are less than ten)
I know I can change the interval to text, using to_char() but I really would like to avoid that, I would like some good fellow postgresql programmer to tell me if it is true that there is a way to tweak the Interval Output as is said in the postgresql documentation.
Thanks Advanced.
PD: two more links about the subject
https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/SQLReferenceManual/Statements/SET/SETDATESTYLE.htm
http://my.vertica.com/docs/6.1.x/HTML/index.htm#13874.htm
You can set the interval output style, but only to one of a few pre-defined formats that are unambigious on input, and that PostgreSQL knows how to parse back into intervals. Per the documentation these are the SQL standard interval format, two variants of PostgreSQL specific syntax, and iso_8601 intervals.
If you want something familiar and easy to parse, consider using:
SET intervalstyle = 'iso_8601'
and using an off-the-shelf parser for ISO1601 intervals.

After midnight times in postgresql

I have data from a text file I'm reading into a postgres 9.1 table, and the data looks like this:
451,22:30:00,22:30:00,San Jose,1
451,22:35:00,22:35:00,Santa Clara,2
451,22:40:00,22:40:00,Lawrence,3
451,22:44:00,22:44:00,Sunnyvale,4
451,22:49:00,22:49:00,Mountain View,5
451,22:53:00,22:53:00,San Antonio,6
451,22:57:00,22:57:00,California Ave,7
451,23:01:00,23:01:00,Palo Alto,8
451,23:04:00,23:04:00,Menlo Park,9
451,23:07:00,23:07:00,Atherton,10
451,23:11:00,23:11:00,Redwood City,11
451,23:15:00,23:15:00,San Carlos,12
451,23:18:00,23:18:00,Belmont,13
451,23:21:00,23:21:00,Hillsdale,14
451,23:24:00,23:24:00,Hayward Park,15
451,23:27:00,23:27:00,San Mateo,16
451,23:30:00,23:30:00,Burlingame,17
451,23:33:00,23:33:00,Broadway,18
451,23:38:00,23:38:00,Millbrae,19
451,23:42:00,23:42:00,San Bruno,20
451,23:47:00,23:47:00,So. San Francisco,21
451,23:53:00,23:53:00,Bayshore,22
451,23:58:00,23:58:00,22nd Street,23
451,24:06:00,24:06:00,San Francisco,24
It is from a timetable for a commuter rail line, Caltrain. I'm trying to query stations, to get train arrival and departure times. I did this several months ago in MySql, and I got
select * from trains as a, trains as b where a.trip_id=b.trip_id and a.st
op_id='San Antonio' and b.stop_id='San Carlos' and a.arrival_time < b.arrival_ti
me;
So far so good, pretty straightforward. However, when I tried copying the data into a postgres database, I got an error for the various columns that had times after midnight, either 24 or 25:00:00 something. However, if I change them to be 00:00:00 and 01:00:00 something, won't that mess with the query? A time after midnight will appear to be before the starting time? MySql apparently didn't have a problem with those times, and I'm not sure what to do. I'm thinking I should use the last column, or maybe convert the times to something that doesn't take into account PM/AM?
You should try using the interval type for the time columns. Those will keep track of the number of hours, minutes, and seconds instead of trying to record a time of day.
See the PostgreSQL documentation on dates and times.
An interval can have a time component greater than 24 hours, unlike the time datatype that is confined to 00:00 <= x <= 23:59.

Can't understand values being returned by Facebook Insights API

I don't understand the way the API returns values. Here's a sample of a page_impressions call, with 'week' as the period.
"values"=>
[{"end_time"=>"2012-01-08T08:00:00+0000", "value"=>1116},
{"end_time"=>"2012-01-09T08:00:00+0000", "value"=>1171},
{"end_time"=>"2012-01-10T08:00:00+0000", "value"=>1175}]
It seems that they're showing how many hits I had in the last 7 days up to the date in "end_time", is that correct? If it is, then I don't understand what use this would have, there is a huge overlap in the data.
How can I get the number of impressions of the last weeks instead? And how can I get more than 3 values to display? I really can't understand the logic behind this or how it could be useful.
What's happening here is that you're being given the total number of page_impressions for the 7-day period ending on each of the dates shown (i.e., how many times was the page seen over the past 7 days assuming the week ended on the end_time? and then on end_time+1? end_time+2)
Facebook is returning three (3) separate readings, presumably so you can spot/review very local trends (e.g., "are my weekly impressions creeping up?) or perhaps because you missed a measurement and want to have values for every day.
To answer your question specifically:
The 7-day period 2012-01-01 through 2012-01-08 12:00am* had 1,116 impressions.
The 7-day period 2012-01-02 through 2012-01-09 12:00am* had 1,171 impressions.
The 7-day period 2012-01-03 through 2012-01-10 12:00am* had 1,175 impressions.
As is quoted below, the end_time itself is always midnight in PDT. Thus, an end_time of 2012-01-08 really means the measurement stopped the night before, i.e., at 1 minute past 11:59pm on 2012-01-07.
From https://developers.facebook.com/docs/reference/fql/insights/:
The end of the period during which the metrics were collected,
expressed as a UNIX time (which should always be midnight, Pacific
Daylight Time) or using the function end_time_date() which takes a
date string in 'YYYY-MM-DD' format. Note: If the UNIX time provided is
not midnight, Pacific Daylight Time, your query may return an empty
resultset. Example: To obtain data for the 24-hour period starting on
September 15th at 00:00 (i.e. 12:00 midnight) and ending on September
16th at 00:00 (i.e. 12:00 midnight), specify 1284620400 as the
end_time and 86400 as the period. Note: end_time should not be
specified when querying lifetime metrics.