postgreSQL - return interval as time type - postgresql

In a form we have, people will enter an age by first specifying a category (minutes, hours, days, months, years) and then secondarily entering an integer value. The problem with this practice is that ages become inconsistent - 18 months is greater than 1 year, this makes statistical analysis difficult.
Internally I need to store this data not as two integers (as it is currently stored) but as an interval and an integer. The second integer indicates the original unit.
Here is the question: what is the simple method for telling postgreSQL to return this interval back as the time it was entered as? So for instance, the user enters: 18 months. Right now, it will return it as 1 year, 6 months. -- can't redisplay that on the form. I know based on the other integer what they entered it as, so there will be no fractional component - i.e. if they entered it in days, there will be no hours, etc.
Ideally it would be something like
SELECT as_time((person.age).value,'months')
Note that these intervals are entered as-is, so the months would not ever be anything other than the 30-day interval 'months' (representational months). If need be I can use a plpgsql function for it, since I'll need to do a lookup on the time type code; but the strict requirement in all cases is that if they entered some time, they get back exactly that time, no rounding, no 'correction', etc.

WITH x(unit, t) AS (
VALUES
(1, interval '1 month')
,(1, interval '2 month')
,(1, interval '3 month')
,(1, interval '12 month')
,(1, interval '13 month')
,(1, interval '18 month')
,(1, interval '24 month')
,(1, interval '30 month')
,(1, interval '300 month')
,(2, interval '1 year')
,(2, interval '2 year')
,(2, interval '5 year')
,(2, interval '100 year')
)
SELECT CASE unit
WHEN 1 THEN EXTRACT(year FROM t)::int * 12
+ EXTRACT(month FROM t)::int
WHEN 2 THEN EXTRACT(year FROM t)::int
ELSE -1 -- should not occour
END AS units
,CASE unit
WHEN 1 THEN 'months'
WHEN 2 THEN 'years'
ELSE 'unknown' -- should not occour
END as unit
FROM x;
EXTRACT() in the manual.
You could try and extract the epoch value (count of seconds) and convert it - like #Magnus demonstrates here, but months and years are justified for the irregular nature of months, so the calculation would be off with bigger intervals.

Related

Grouping data by quarter intervals (or any time interval) with a defined starting basis in postgresql

Let's say I have a table orders with columns amount and order_date.
I want to be able to group this data by quarters and aggregate the amount, the catch however is that the quarters do not start on January 1st but on any given arbitrary date, say July 12th. These quarters are also split in 13 week intervals. From what I see using something like date_trunc such as:
SELECT SUM(orders.amount), DATE_TRUNC('quarter', orders.order_date) AS interval FROM orders WHERE orders.order_date BETWEEN [date_start] AND [date_end] GROUP BY interval
is out of the question as this forces quarters to start on Jan 1st and it has 'hardcoded' quarter starting dates (Apr 1st, Jul 1st, etc).
I have tried using something like:
SELECT SUM(orders.amount),
to_timestamp(floor((extract('epoch' from orders.order_date / 7862400 )) * 7862400 ) AT TIME ZONE 'UTC' AS interval
FROM orders
WHERE orders.order_date BETWEEN [date_start] AND [date_end]
GROUP BY interval
(where 7862400 is the time interval that I want)
But with this method I cannot figure out how to set the offset for the initial grouping date, in my example I would like it to start from July 12th of each year (then count 13 weeks and start the next quarter, and so on). Hope I was clear and I would appreciate any help!
You can use generate_series() to create the first day of each quarter, join it and group by it.
SELECT quarters.first_day,
quarters.first_day + '13 weeks'::interval last_day,
sum(orders.amount) amount
FROM orders
LEFT JOIN generate_series('2019-07-12'::timestamp,
'2020-07-10'::timestamp,
'13 weeks'::interval) quarters (first_day)
ON quarters.first_day <= orders.order_date
AND quarters.first_day + '13 weeks'::interval > orders.order_date
WHERE orders.order_date BETWEEN [date_start]
AND [date_end]
GROUP BY quarters.first_day,
quarters.first_day + '13 weeks'::interval;
You just need to make sure, that the boundary days you give the generate_series() cover the whole period you want to query, so that depends on your [date_start] and [date_end].
You can generate your own 'quarterly calendar' and use that in place of the Postgers 'quarter' date extraction.
create or replace function quarterly_calendar(annual_date text default extract('YEAR' from current_date)::text)
returns table( quarter integer
, quarter_start_date date
, quarter_end_date date
)
language sql immutable strict leakproof
as $$
with RECURSIVE quarters as
(select 1 qtr, qdt::date q_start_dt, (qdt + interval '90 day' )::date q_end_dt, (qdt+interval '1 year' - interval '1 day')::date last_dt
from ( select date_trunc('year',current_date) + interval '6 month 11 day' qdt) q
union all
select qtr+1, (q_end_dt + interval '1 day')::date, least ((q_end_dt + interval '91 day')::date,last_dt), last_dt
from quarters
where qtr+1 <=5
)
select qtr, q_start_dt, q_end_dt
from quarters;
$$;
-- test
select * from quarterly_calender();
It does actually create 5 quarters. But that is because a year is not a multiple of 13 weeks (or 91 days or 7862400 seconds). In your given year from 12-July-2019 through 11-July-2020 is 2 days (366 days total) over 4 times that interval. You'll have to decide how to handle that 5th quarter. It occurs every year, having either 1 or 2 days. Hope this helps .

How to calculate how many intervals at given daterange? simpler version

I can write:
select count(*) from generate_series(
'2019-03-01'::date, '2019-05-01'::date,
interval '3 day 1 hour'
)
-- exclude upper boundary
where generate_series <> date '2019-05-01'::date;
Is there a way to do it simpler? like:
daterange( '2019-03-01', '2019-05-01' ) / interval '3 day 1 hour'
You can use
EXTRACT(epoch FROM some_interval)
to get an interval's duration in seconds.
You could use that as follows:
SELECT EXTRACT(epoch FROM '2019-05-01'::timestamptz - '2019-03-01'::timestamptz)
/ EXTRACT(epoch FROM interval '3 day 1 hour');
Note that this will only give correct answers for intervals that are measured in days or lesser units; for months and more you have to go with your original solution.

Compare day in current month to same day previous month PostgreSQL

I'm trying to compare values of current month's data to previous months using PostgreSQL. So if today is 4/23/2018, I want the data for 3/23/2018.
I've tried current_date - interval '1 month' but it is problematic for months with 31 days.
My table is structured as simply as
date, value
Check this example query:
WITH dates AS (SELECT date::date FROM generate_series('2018-01-01'::date, '2018-12-31'::date, INTERVAL '1 day') AS date)
SELECT
start_dates.date AS start_date,
end_dates.date AS end_date
FROM
dates AS start_dates
RIGHT JOIN dates AS end_dates
ON ( start_dates.date + interval '1 month' = end_dates.date AND
end_dates.date - interval '1 month' = start_dates.date);
It will output all end_dates and corresponding start_dates. The corresponding dates are defined by interval '1 month' and checked in both ways:
start_dates.date + interval '1 month' = end_dates.date AND
end_dates.date - interval '1 month' = start_dates.date
The output looks like this:
....
2018-02-26 2018-03-26
2018-02-27 2018-03-27
2018-02-28 2018-03-28
2018-03-29
2018-03-30
2018-03-31
2018-03-01 2018-04-01
2018-03-02 2018-04-02
2018-03-03 2018-04-03
2018-03-04 2018-04-04
....
Note, that there are 'gaps' for days without corresponding dates.
Back to your table, join the table with itself (giving aliases) and use given join condition, so the query would look like this:
SELECT
start_dates.value - end_dates.value AS change,
start_dates.date AS start_date,
end_dates.date AS end_date
FROM
_your_table_name_ AS start_dates
RIGHT JOIN _your_table_name_ AS end_dates
ON ( start_dates.date + interval '1 month' = end_dates.date AND
end_dates.date - interval '1 month' = start_dates.date);
Given the following table structure:
create table t (
d date,
v int
);
After populating with some dates and values, there is a way to find the value of the previous month using simple calculations and the LAG function, without resorting to joins. I am not sure how it compares from a performance perspective, so please run your own tests before selecting which solution to use.
select
*,
lag(v, day_of_month) over (order by d) as v_end_of_last_month,
lag(v, last_day_of_previous_month + day_of_month - cast(extract(day from d - interval '1 month') as int)) over (order by d) as v_same_day_last_month
from (
select
*,
lag(day_of_month, day_of_month) over (order by d) as last_day_of_previous_month
from (
select
*,
cast(extract(day from d) as int) as day_of_month
from
t
) t_dom
) t_dom_ldopm;
You may note that between the 29th and 31st of March, the comparison will be made against the 28th of February, since the same day does not exist in February for those particular dates. The same logic applies to other months with different number of days.

postgres '1 year' equals '360 days'?

Am wondering if anyone else has encountered this or knows information about it.
Today is November 3, 2014 and if i check whether or not November 5, 2013 is within the last year i get different answers depending on how i check: 1 year versus 365 days
select now() - '20131105' as diff,
case when now() - '20131105' <= '1 year' then 'within year' else 'not within year' end as yr_check,
case when now() - '20131105' <= '365 days' then 'within 365 days' else 'not within 365 days' end as day_check
2014-11-03 16:27:38.39669-06; 363 days 16:27:38.39669; not within year; within 365 days
Looks like when querying against November 9 tho, it's ok
select now() as right_now, now() - '20131109' as diff,
case when now() - '20131109' <= '1 year' then 'within year' else 'not within year' end as yr_check,
case when now() - '20131109' <= '365 days' then 'within 365 days' else 'not within 365 days' end as day_check
2014-11-03 16:31:12.464469-06; 359 days 16:31:12.464469; within year; within 365 days
anyone have an idea about this? or is there something about date arithmetic that's funny?
postgres version is 9.2.4
or is there something about date arithmetic that's funny?
It's funny alright, but not in the way that makes you laugh.
Twelve months has to equal a year doesn't it?
=> SELECT '12 months'::interval = '1 year'::interval;
?column?
----------
t
Good. Makes sense. Hmm - wonder how long a month is.
=> SELECT '30 days'::interval = '1 month'::interval;
?column?
----------
t
Fair enough. Suppose they had to pick something.
Hmm - but that means...
=> SELECT '360 days'::interval = '12 months'::interval;
?column?
----------
t
Which seems to imply...
=> SELECT '360 days'::interval = '1 year'::interval;
?column?
----------
t
That can't be right! What they need to do is have a month equal to 30.41666 days. No hang on, what about leap years? Hmm - does this affect weeks? AARGH!
Basically, you can't convert sensibly between time units. There aren't 60 seconds in a minute, or 24 hours in a day, 52 weeks in a year or even 365 days. Unfortunately, humans (particularly customer-shaped humans) like converting between time units so we end up with a mess like this.
PostgreSQL's system is no more loony than any other and in fact is better than most.
I'm not sure what is real problem with this check, but it works other way around:
select now() - interval '1 year' <= date '2013-11-05'
I'm no expert in Postgres, but it can be something with type comparisons, because:
select pg_typeof(now() - date '2013-11-05'),
pg_typeof(now() - interval '1 year')
yields result:
interval, timestamp with time zone
so your example compares interval with interval, but for different scales - days vs year, and my solution compares timestamp with date, which seems to work
UPDATE:
You can check that interval '1 year' when not attached to year (not added to date or timestamp) equals to 360 days:
select interval '1 year' <= interval '359 days',
interval '1 year' <= interval '360 days'
which yields:
f, t
From my understanding you can't just compare random year interval when you don't know year it is attached - always compare dates, and just use interval to create new date object.
select now() - interval '1 year' <= now() - interval '365 days'
t
From www.postgresql.org/docs/current/static/datatype-datetime.html:
Internally interval values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. Because intervals are usually created from constant strings or timestamp subtraction, this storage method works well in most cases. Functions justify_days and justify_hours are available for adjusting days and hours that overflow their normal ranges.
Because you compare two intervals, PostgreSQL internally normalizes values (like justify_interval()), before comparing:
SELECT INTERVAL '31 days' > INTERVAL '1 mon' -- yields 't'
But, if you apply interval substraction/addition, varying day & month length taken into consideration:
SELECT (timestamptz '2014-11-03 00:00:00 America/New_York' - INTERVAL '1 day') AT TIME ZONE 'America/New_York',
timestamptz '2014-11-03 00:00:00 America/New_York' - timestamptz '2014-11-02 00:00:00 America/New_York' <= interval '1 day';
-- | timestamp | boolean |
-- +---------------------+---------+
-- | 2014-11-02 01:00:00 | f |
So, if you need to test, whether a timestamp/date is within a range, you should manipulate timestampts/dates (or use timestamp/date ranges) & compare those values with <, > or BETWEEN.
SELECT timestamp '2014-11-03 00:00:00' - timestamp '2014-10-03 00:00:00' <= interval '1 mon',
timestamp '2014-11-03 00:00:00' - interval '1 mon' <= timestamp '2014-10-03 00:00:00';
-- | boolean | boolean |
-- +---------+---------+
-- | f | t |

How to get the number of days in a month?

I am trying to get the following in Postgres:
select day_in_month(2);
Expected output:
28
Is there any built-in way in Postgres to do that?
SELECT
DATE_PART('days',
DATE_TRUNC('month', NOW())
+ '1 MONTH'::INTERVAL
- '1 DAY'::INTERVAL
)
Substitute NOW() with any other date.
Using the smart "trick" to extract the day part from the last date of the month, as demonstrated by Quassnoi. But it can be a bit simpler / faster:
SELECT extract(days FROM date_trunc('month', now()) + interval '1 month - 1 day');
Rationale
extract is standard SQL, so maybe preferable, but it resolves to the same function internally as date_part(). The manual:
The date_part function is modeled on the traditional Ingres equivalent to the SQL-standard function extract:
But we only need to add a single interval. Postgres allows multiple time units at once. The manual:
interval values can be written using the following verbose syntax:
[#] quantity unit[quantity unit...] [direction]
where quantity is a number (possibly signed); unit is microsecond,
millisecond, second, minute, hour, day, week, month, year, decade,
century, millennium, or abbreviations or plurals of these units;
ISO 8601 or standard SQL format are also accepted. Either way, the manual again:
Internally interval values are stored as months, days, and seconds.
This is done because the number of days in a month varies, and a day
can have 23 or 25 hours if a daylight savings time adjustment is
involved. The months and days fields are integers while the seconds
field can store fractions.
(Output / display depends on the setting of IntervalStyle.)
The above example uses default Postgres format: interval '1 month - 1 day'. These are also valid (while less readable):
interval '1 mon - 1 d' -- unambiguous abbreviations of time units are allowed
IS0 8601 format:
interval '0-1 -1 0:0'
Standard SQL format:
interval 'P1M-1D';
All the same.
Note that expected output for day_in_month(2) can be 29 because of leap years. You might want to pass a date instead of an int.
Also, beware of daylight saving : remove the timezone or else some monthes calculations could be wrong (next example in CET / CEST) :
SELECT DATE_TRUNC('month', '2016-03-12'::timestamptz) + '1 MONTH'::INTERVAL
- DATE_TRUNC('month', '2016-03-12'::timestamptz) ;
------------------
30 days 23:00:00
SELECT DATE_TRUNC('month', '2016-03-12'::timestamp) + '1 MONTH'::INTERVAL
- DATE_TRUNC('month', '2016-03-12'::timestamp) ;
----------
31 days
This works as well.
WITH date_ AS (SELECT your_date AS d)
SELECT d + INTERVAL '1 month' - d FROM date_;
Or just:
SELECT your_date + INTERVAL '1 month' - your_date;
These two return interval, not integer.
SELECT cnt_dayofmonth(2016, 2); -- 29
create or replace function cnt_dayofmonth(_year int, _month int)
returns int2 as
$BODY$
-- ZU 2017.09.15, returns the count of days in mounth, inputs are year and month
declare
datetime_start date := ('01.01.'||_year::char(4))::date;
datetime_month date := ('01.'||_month||'.'||_year)::date;
cnt int2;
begin
select extract(day from (select (datetime_month + INTERVAL '1 month -1 day'))) into cnt;
return cnt;
end;
$BODY$
language plpgsql;
You can write a function:
CREATE OR REPLACE FUNCTION get_total_days_in_month(timestamp)
RETURNS decimal
IMMUTABLE
AS $$
select cast(datediff(day, date_trunc('mon', $1), last_day($1) + 1) as decimal)
$$ LANGUAGE sql;