How to get the difference in minutes between two timestamps excluding weekends? - postgresql

I need to get the difference in minutes excluding weekends (Saturday, Sunday), between 2 timestamps in postgres, but I'm not getting the expected result.
Examples:
Get diff in minutes, however, weekends are include
SELECT EXTRACT(EPOCH FROM (NOW() - '2021-08-01 08:00:00') / 60)::BIGINT as diff_in_minutes;
$ diff_in_minutes = 17566
Get diff in weekdays, excluding saturday and sunday
SELECT COUNT(*) as diff_in_days
FROM generate_series('2021-08-01 08:00:00', NOW(), interval '1d') d
WHERE extract(isodow FROM d) < 6;
$ diff_in_days = 10
Expected:
From '2021-08-12 08:00:00' to '2021-08-13 08:00:00' = 1440
From '2021-08-13 08:00:00' to '2021-08-16 08:00:00' = 1440
From '2021-08-13 08:00:00' to '2021-08-17 08:00:00' = 2880
and so on ...

the solution is:
SELECT GREATEST(COUNT(*) - 1, 0)
FROM generate_series(from_ts, to_ts, interval'1 minute') AS x
WHERE extract(isodow FROM x) <= 5
so
SELECT GREATEST(COUNT(*) - 1, 0)
FROM generate_series('2021-08-13 08:00:00'::timestamp, '2021-08-17 08:00:00', '1 minute') AS x
WHERE extract(isodow FROM x) <= 5
returns 2880

This is not an optimal solution - but I will leave finding the optimal solution as a homework for you.
First, create an SQL function
CREATE OR REPLACE FUNCTION public.time_overlap (
b_1 timestamptz,
e_1 timestamptz,
b_2 timestamptz,
e_2 timestamptz
)
RETURNS interval AS
$body$
SELECT GREATEST(interval '0 second',e_1 - b_1 - GREATEST(interval '0 second',e_1 - e_2) - GREATEST(interval '0 second',b_2 - b_1));
$body$
LANGUAGE 'sql'
IMMUTABLE
RETURNS NULL ON NULL INPUT
SECURITY INVOKER
PARALLEL SAFE
COST 100;
Then, call it like this:
WITH frame AS (SELECT generate_series('2021-08-13 00:00:00', '2021-08-17 23:59:59', interval '1d') AS d)
SELECT SUM(EXTRACT(epoch FROM time_overlap('2021-08-13 08:00:00', '2021-08-17 08:00:00',d,d + interval '1 day'))/60) AS total
FROM frame
WHERE extract(isodow FROM d) < 6
In the CTE you should round down the left/earlier of the 2 timestamps and round up the right/later of the 2 timestamps. The idea is that you should generate the series over whole days - not in the middle of the day.
When calling the time_overlap function you should use the exact values of your 2 timestamps so that it properly calculates the overlapping in minutes between each day of the generated series and the given timeframe between your 2 timestamps.
In the end, when you sum over all the overlappings - you will get the total number of minutes excluding the weekends.

Related

Postgresql - query to get difference in data count

I have two tables, today's_table and yeterday's_table.
I need to compare the data for an interval of 15 mins at exact same times for today and yesterday.
For example, for below data let's I need to check from 00:00:00 and 00:15:00 on 20201202 and 20201202. So difference should come out as '3' since the yesterday's_table has 8 records and today's_table has 5 records.
today's_table:
Yesterday's table:
I tried something like; (consider now() is 00:15:00)
select count(*) from yeterday's_table where time between now() - interval "24 hours" and now() - interval "23 hours 45 mins"
minus
select count(*) from today's_table where time = now() - interval "15 minutes";
is there any other way to do this?
You can easily do this with subqueries:
SELECT b.c - a.c
FROM (select count(*) as c from yeterdays_table where time between now() - interval '24 hours' and now() - interval '23 hours 45 mins') a,
(select count(*) as c from todays_table where time = now() - interval '15 minutes') b;
Bear in mind you need to single-quote your intervals, and your table names cannot have quotes in them.

Find difference between timestamps in amount of custom intervals in PostgreSQL

I would like to find difference between two timestamps (with timezone) in amount of custom intervals. So function should be like custom_diff(timestamptz from, timestamptz to, interval custom).
Keep in mind, that it is not equivalent to (to-from)/custom (custom_diff('2016-08-01 00:00:00','2016-09-01 00:00:00','1 day') is exactly 31, but ('2016-08-01 00:00:00','2016-09-01 00:00:00')/'1 day')='1 month'/'1 day' and is ambiguous).
Also I understand that in general there is no exact result of such operation (custom_diff('2016-08-01 00:00:00','2016-09-01 00:00:00','1 month 1 day') so it is possible to have group of function (round-to-nearest, round-to-lower, round-to-upper and truncating, all of them should return integer number).
Is there any standard/common way for such calculation in PostgreSQL (PL/pgSQL)? My main interesting is round-to-nearest function.
The best way I have invented is to iteratively add/substract interval custom to/from timestamptz from and compare with timestamptz to. Also it can be optimized by initially finding approximate result (for example divide [difference in seconds between timestamps] for [approximation of interval custom in seconds]) to reduce amount of iterations.
UPD 1:
Why
SELECT EXTRACT(EPOCH FROM (timestamp '2016-08-01 10:00'
- timestamp '2016-08-01 00:00'))
/ EXTRACT(EPOCH FROM interval '1 day');
is a wrong solution: lets try yourself:
SELECT EXTRACT(EPOCH FROM ( TIMESTAMPTZ '2016-01-01 utc' -
TIMESTAMPTZ '1986-01-01 utc' ))
/ EXTRACT(EPOCH FROM INTERVAL '1 month');
Result is 365.23.... Then check result:
SELECT ( TIMESTAMPTZ '1986-01-01 utc' + 365 * INTERVAL '1 month' )
AT TIME ZONE 'utc';
Result is 2016-06-01 00:00:00.000000. Of cause 365 is wrong result, because timestamps in this example describe exactly 30 years and in any year always exactly 12 months, so right answer is 12*30=360.
UPD 2:
My solution is
CREATE OR REPLACE FUNCTION custom_diff(
_from TIMESTAMPTZ, _to TIMESTAMPTZ, _custom INTERVAL, OUT amount INTEGER)
RETURNS INTEGER
LANGUAGE plpgsql
AS $function$
DECLARE
max_iterations INTEGER :=10;
t INTEGER;
BEGIN
amount:=0;
WHILE max_iterations > 0 AND NOT (
extract(EPOCH FROM _to) <= ( extract(EPOCH FROM _from) + extract(EPOCH FROM _from + _custom) ) / 2
AND
extract(EPOCH FROM _to) >= ( extract(EPOCH FROM _from) + extract(EPOCH FROM _from - _custom) ) / 2
) LOOP
-- RAISE NOTICE 'iter: %', max_iterations;
t:=EXTRACT(EPOCH FROM ( _to - _from )) / EXTRACT(EPOCH FROM _custom);
_from:=_from + t * _custom;
amount:=amount + t;
max_iterations:=max_iterations - 1;
END LOOP;
RETURN;
END;
$function$
but I does not sure that it is correct and still waiting for sugestion about existing/common solution.
You can get exact result after extracting the epoch from both intervals:
SELECT EXTRACT(EPOCH FROM (timestamp '2016-08-01 10:00'
- timestamp '2016-08-01 00:00'))
/ EXTRACT(EPOCH FROM interval '1 day'); -- any given interval
If you want rounded (truncated) result, a simple option is to cast both to integer. Integer division cuts off the remainder.
SELECT EXTRACT(EPOCH FROM (ts_to - ts_from))::int
/ EXTRACT(EPOCH FROM interval '1 day')::int; -- any given interval
You can easily wrap the logic into a IMMUTABLE SQL function.
You are drawing the wrong conclusions from what you read in the manual. The result of a timestamp subtraction is an exact interval, storing only days and seconds (not months). So the result is exact. Try my query, it isn't "ambiguous".
You can avoid involving the data type interval:
SELECT EXTRACT(EPOCH FROM ts_to) - EXTRACT(EPOCH FROM ts_from))
/ 86400 -- = 24*60*60 -- any given interval as number of seconds
But the result is the same.
Aside:
"Exact" is an elusive term when dealing with timestamps. You may have to take DST rules and other corner cases of your time zone into consideration. You might convert to UTC time or use timestamptz before doing the math.

Checking for the minimum variability of a temporal database in postgresql

I have a table like this:
+------------+------------------+
|temperature |Date_time_of_data |
+------------+------------------+
| 4.5 |9/15/2007 12:12:12|
| 4.56 |9/15/2007 12:14:16|
| 4.44 |9/15/2007 12:16:02|
| 4.62 |9/15/2007 12:18:23|
| 4.89 |9/15/2007 12:21:01|
+------------+------------------+
The data-set contains more than 1000 records and I want to check for the minimum variability.
For every 30 minutes if the variance of temperature doesn't exceed 0.2, I want all the temperature values of that half an hour replaced by NULL.
Here is a SELECT to get the start of a period for every record:
SELECT temperature,
Date_time_of_data,
date_trunc('hour', Date_time_of_data)+
CASE WHEN date_part('minute', Date_time_of_data) >= 30
THEN interval '30 minutes'
ELSE interval '0 minutes'
END as start_of_period
FROM your_table
It truncates the date to hours (9/15/2007 12:12:12 to 9/15/2007 12:12:00)
and then adds 30 minutes if the date initially had more than 30 minutes.
Next - use start_of_period to group results and get min and max for every group:
SELECT temperature,
Date_time_of_data,
max(Date_time_of_data) OVER (PARTITION BY start_of_period) as max_temp,
min(Date_time_of_data) OVER (PARTITION BY start_of_period) as min_temp
FROM (previou_select_here)
Next - filter out the records, where the variance is more than 0.2
SELECT temperature,
Date_time_of_data
FROM (previou_select_here)
WHERE (max_temp - min_temp) <=0.2
And finally update your table
UPDATE your_table
SET temperature = NULL
WHERE Date_time_of_data IN (previous_select_here)
You may need to correct some spelling mistakes in this queries, before they work. I havent tested them.
And you can simplify them, if you need to.
P.S. If you need to filter out the data with variance less than 0.2 , you can simply create a VIEW from the third SELECT with
WHERE (max_temp - min_temp) > 0.2
And use the VIEW instead of table.
This query should do the job:
with intervals as (
select
date_trunc('hour', Date_time_of_data) + interval '30 min' * round(date_part('minute', Date_time_of_data) / 30.0) as valid_interval
from T
group by 1
having var_samp(temperature) > 0.2
)
select * from T
where
date_trunc('hour', Date_time_of_data) + interval '30 min' * round(date_part('minute', Date_time_of_data) / 30.0) in (select valid_interval from intervals)
The inner query (labeled as intervals) returns times when variance is over 0.2 (having var_samp(temperature) > 0.2). date_trunc ... expression rounds Date_time_of_data to half hour intervals.
The query returns nothing on the provided dataset.
create table T (temperature float8, Date_time_of_data timestamp without time zone);
insert into T values
(4.5, '2007-9-15 12:12:12'),
(4.56, '2007-9-15 12:14:16'),
(4.44, '2007-9-15 12:16:02'),
(4.62, '2007-9-15 12:18:23'),
(4.89, '2007-9-15 12:21:01')
;

Count months between two timestamp on postgresql?

I want to count the number of months between two dates.
Doing :
SELECT TIMESTAMP '2012-06-13 10:38:40' - TIMESTAMP '2011-04-30 14:38:40';
Returns :
0 years 0 mons 409 days 20 hours 0 mins 0.00 secs
and so:
SELECT extract(month from TIMESTAMP '2012-06-13 10:38:40' - TIMESTAMP '2011-04-30 14:38:40');
returns 0.
age function returns interval:
age(timestamp1, timestamp2)
Then we try to extract year and month out of the interval and add them accordingly:
select extract(year from age(timestamp1, timestamp2)) * 12 +
extract(month from age(timestamp1, timestamp2))
Please note that the most voted answer by #ram and #angelin is not accurate when you are trying to get calendar month difference using.
select extract(year from age(timestamp1, timestamp2))*12 + extract(month from age(timestamp1, timestamp2))
for example, if you try to do:
select extract(year from age('2018-02-02'::date, '2018-03-01'::date))*12 + extract(month from age('2018-02-02'::date , '2018-03-01'::date))
the result will be 0 but in terms of months between March from February should be 1 no matter the days between dates.
so the formula should be like the following saying that we start with timestamp1 and timestamp2:
((year2 - year1)*12) - month1 + month2 = calendar months between two timestamps
in pg that would be translated to:
select ((extract('years' from '2018-03-01 00:00:00'::timestamp)::int - extract('years' from '2018-02-02 00:00:00'::timestamp)::int) * 12)
- extract('month' from '2018-02-02 00:00:00'::timestamp)::int + extract('month' from '2018-03-01 00:00:00'::timestamp)::int;
you can create a function like:
CREATE FUNCTION months_between (t_start timestamp, t_end timestamp)
RETURNS integer
AS $$
select ((extract('years' from $2)::int - extract('years' from $1)::int) * 12)
- extract('month' from $1)::int + extract('month' from $2)::int
$$
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;
The age function give a justified interval to work with:
SELECT age(TIMESTAMP '2012-06-13 10:38:40', TIMESTAMP '2011-04-30 14:38:40');
returns 1 year 1 mon 12 days 20:00:00, and with that you can easily use EXTRACT to count the number of months:
SELECT EXTRACT(YEAR FROM age) * 12 + EXTRACT(MONTH FROM age) AS months_between
FROM age(TIMESTAMP '2012-06-13 10:38:40', TIMESTAMP '2011-04-30 14:38:40') AS t(age);
If you will do this multiple times, you could define the following function:
CREATE FUNCTION months_between (t_start timestamp, t_end timestamp)
RETURNS integer
AS $$
SELECT
(
12 * extract('years' from a.i) + extract('months' from a.i)
)::integer
from (
values (justify_interval($2 - $1))
) as a (i)
$$
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;
so that you can then just
SELECT months_between('2015-01-01', now());
SELECT date_part ('year', f) * 12
+ date_part ('month', f)
FROM age ('2015-06-12', '2014-12-01') f
Result: 6 Months
Gives the differenece of months of two dates
SELECT ((extract( year FROM TIMESTAMP '2012-06-13 10:38:40' ) - extract( year FROM TIMESTAMP '2011-04-30 14:38:40' )) *12) + extract(MONTH FROM TIMESTAMP '2012-06-13 10:38:40' ) - extract(MONTH FROM TIMESTAMP '2011-04-30 14:38:40' );
The Result : 14
Have to extract months seperately for both the dates and then the difference of both the results
Here is a PostgreSQL function with the exact same behavior as the Oracle MONTHS_BETWEEN function.
It has been tested on a wide range of years (including leap ones) and more than 700k combinations of dates (including end of every months).
CREATE OR REPLACE FUNCTION months_between
( DATE,
DATE
)
RETURNS float
AS
$$
SELECT
(EXTRACT(YEAR FROM $1) - EXTRACT(YEAR FROM $2)) * 12
+ EXTRACT(MONTH FROM $1) - EXTRACT(MONTH FROM $2)
+ CASE
WHEN EXTRACT(DAY FROM $2) = EXTRACT(DAY FROM LAST_DAY($2))
AND EXTRACT(DAY FROM $1) = EXTRACT(DAY FROM LAST_DAY($1))
THEN
0
ELSE
(EXTRACT(DAY FROM $1) - EXTRACT(DAY FROM $2)) / 31
END
;
$$
LANGUAGE SQL
IMMUTABLE STRICT;
This function requires a LAST_DAY function (behaving the same as Oracle's one) :
CREATE OR REPLACE FUNCTION last_day
( DATE
)
RETURNS DATE
AS
$$
SELECT
(DATE_TRUNC('MONTH', $1) + INTERVAL '1 MONTH' - INTERVAL '1 DAY')::date
;
$$
LANGUAGE SQL
IMMUTABLE STRICT;
I had the same problem once upon a time and wrote this ... it's quite ugly:
postgres=> SELECT floor((extract(EPOCH FROM TIMESTAMP '2012-06-13 10:38:40' ) - extract(EPOCH FROM TIMESTAMP '2005-04-30 14:38:40' ))/30.43/24/3600);
floor
-------
85
(1 row)
In this solution "one month" is defined to be 30.43 days long, so it may give some unexpected results over shorter timespans.
Extract by year and months will floor on months:
select extract(year from age('2016-11-30'::timestamp, '2015-10-15'::timestamp)); --> 1
select extract(month from age('2016-11-30'::timestamp, '2015-10-15'::timestamp)); --> 1
--> Total 13 months
This approach maintains fractions of months (thanks to tobixen for the divisor)
select round(('2016-11-30'::date - '2015-10-15'::date)::numeric /30.43, 1); --> 13.5 months
Try this solution:
SELECT extract (MONTH FROM age('2014-03-03 00:00:00'::timestamp,
'2013-02-03 00:00:00'::timestamp)) + 12 * extract (YEAR FROM age('2014-03-03
00:00:00'::timestamp, '2013-02-03 00:00:00'::timestamp)) as age_in_month;
SELECT floor(extract(days from TIMESTAMP '2012-06-13 10:38:40' - TIMESTAMP
'2011-04-30 14:38:40')/30.43)::integer as months;
Gives an approximate value but avoids duplication of timestamps. This uses hint from tobixen's answer to divide by 30.43 in place of 30 to be less incorrect for long timespans while computing months.
I made a function like this:
/* similar to ORACLE's MONTHS_BETWEEN */
CREATE OR REPLACE FUNCTION ORACLE_MONTHS_BETWEEN(date_from DATE, date_to DATE)
RETURNS REAL LANGUAGE plpgsql
AS
$$
DECLARE age INTERVAL;
declare rtn real;
BEGIN
age := age(date_from, date_to);
rtn := date_part('year', age) * 12 + date_part('month', age) + date_part('day', age)/31::real;
return rtn;
END;
$$;
Oracle Example)
SELECT MONTHS_BETWEEN
(TO_DATE('2015-02-02','YYYY-MM-DD'), TO_DATE('2014-12-01','YYYY-MM-DD') )
"Months" FROM DUAL;
--result is: 2.03225806451612903225806451612903225806
My PostgreSQL function example)
select ORACLE_MONTHS_BETWEEN('2015-02-02'::date, '2014-12-01'::date) Months;
-- result is: 2.032258
From the result you can use CEIL()/FLOOR() for rounding.
select ceil(2.032258) --3
select floor(2.032258) --2
Try;
select extract(month from age('2012-06-13 10:38:40'::timestamp, '2011-04-30 14:38:40'::timestamp)) as my_months;

How to split start/end time columns into discrete chunks with PostgreSQL?

We have some tables, which have a structure like:
start, -- datetime
end, -- datetime
cost -- decimal
So, for example, there might be a row like:
01/01/2010 10:08am, 01/01/2010 1:56pm, 135.00
01/01/2010 11:01am, 01/01/2010 3:22pm, 118.00
01/01/2010 06:19pm, 01/02/2010 1:43am, 167.00
Etc...
I'd like to get this into a format (with a function?) that returns data in a format like:
10:00am, 10:15am, X, Y, Z
10:15am, 10:30am, X, Y, Z
10:30am, 10:45am, X, Y, Z
10:45am, 11:00am, X, Y, Z
11:00am, 11:15am, X, Y, Z
....
Where:
X = the number of rows that match
Y = the cost / expense for that chunk of time
Z = the total amount of time during this duration
IE, for the above data, we might have:
10:00am, 10:15am, 1, (135/228 minutes*7), 7
The first row starts at 10:08am, so only 7 minutes are used from 10:00-10:15.
There are 228 minutes in the start->end time.
....
11:00am, 11:15am, 2, ((135+118)/((228+261) minutes*(15+14)), 29
The second row starts right after 11:00am, so we need 15 minutes from the first row, plus 14 minutes from the second row
There are 261 minutes in the second start->end time
....
I believe I've done the math right here, but need to figure out how to make this into a PG function, so that it can be used within a report.
Ideally, I'd like to be able to call the function with some arbitrary duration, ie 15minute, or 30minute, or 60minute, and have it split up based on that.
Any ideas?
Here is my try. Given this table definition:
CREATE TABLE interval_test
(
"start" timestamp without time zone,
"end" timestamp without time zone,
"cost" integer
)
This query seems to do what you want. Not sure if it is the best solution, though.
Also note that it needs Postgres 8.4 to work, because it uses WINDOW functions and WITH queries.
WITH RECURSIVE intervals(period_start) AS (
SELECT
date_trunc('hour', MIN(start)) AS period_start
FROM interval_test
UNION ALL
SELECT intervals.period_start + INTERVAL '15 MINUTES'
FROM intervals
WHERE (intervals.period_start + INTERVAL '15 MINUTES') < (SELECT MAX("end") FROM interval_test)
)
SELECT DISTINCT period_start, intervals.period_start + INTERVAL '15 MINUTES' AS period_end,
COUNT(*) OVER (PARTITION BY period_start ) AS record_count,
SUM (LEAST(period_start + INTERVAL '15 MINUTES', "end")::timestamp - GREATEST(period_start, "start")::timestamp)
OVER (PARTITION BY period_start ) AS total_time,
(SUM(cost) OVER (PARTITION BY period_start ) /
(EXTRACT(EPOCH FROM SUM("end" - "start") OVER (PARTITION BY period_start )) / 60)) *
((EXTRACT (EPOCH FROM SUM (LEAST(period_start + INTERVAL '15 MINUTES', "end")::timestamp - GREATEST(period_start, "start")::timestamp)
OVER (PARTITION BY period_start )))/60)
AS expense
FROM interval_test
INNER JOIN intervals ON (intervals.period_start, intervals.period_start + INTERVAL '15 MINUTES') OVERLAPS (interval_test.start, interval_test.end)
ORDER BY period_start ASC