How to split start/end time columns into discrete chunks with PostgreSQL? - postgresql

We have some tables, which have a structure like:
start, -- datetime
end, -- datetime
cost -- decimal
So, for example, there might be a row like:
01/01/2010 10:08am, 01/01/2010 1:56pm, 135.00
01/01/2010 11:01am, 01/01/2010 3:22pm, 118.00
01/01/2010 06:19pm, 01/02/2010 1:43am, 167.00
Etc...
I'd like to get this into a format (with a function?) that returns data in a format like:
10:00am, 10:15am, X, Y, Z
10:15am, 10:30am, X, Y, Z
10:30am, 10:45am, X, Y, Z
10:45am, 11:00am, X, Y, Z
11:00am, 11:15am, X, Y, Z
....
Where:
X = the number of rows that match
Y = the cost / expense for that chunk of time
Z = the total amount of time during this duration
IE, for the above data, we might have:
10:00am, 10:15am, 1, (135/228 minutes*7), 7
The first row starts at 10:08am, so only 7 minutes are used from 10:00-10:15.
There are 228 minutes in the start->end time.
....
11:00am, 11:15am, 2, ((135+118)/((228+261) minutes*(15+14)), 29
The second row starts right after 11:00am, so we need 15 minutes from the first row, plus 14 minutes from the second row
There are 261 minutes in the second start->end time
....
I believe I've done the math right here, but need to figure out how to make this into a PG function, so that it can be used within a report.
Ideally, I'd like to be able to call the function with some arbitrary duration, ie 15minute, or 30minute, or 60minute, and have it split up based on that.
Any ideas?

Here is my try. Given this table definition:
CREATE TABLE interval_test
(
"start" timestamp without time zone,
"end" timestamp without time zone,
"cost" integer
)
This query seems to do what you want. Not sure if it is the best solution, though.
Also note that it needs Postgres 8.4 to work, because it uses WINDOW functions and WITH queries.
WITH RECURSIVE intervals(period_start) AS (
SELECT
date_trunc('hour', MIN(start)) AS period_start
FROM interval_test
UNION ALL
SELECT intervals.period_start + INTERVAL '15 MINUTES'
FROM intervals
WHERE (intervals.period_start + INTERVAL '15 MINUTES') < (SELECT MAX("end") FROM interval_test)
)
SELECT DISTINCT period_start, intervals.period_start + INTERVAL '15 MINUTES' AS period_end,
COUNT(*) OVER (PARTITION BY period_start ) AS record_count,
SUM (LEAST(period_start + INTERVAL '15 MINUTES', "end")::timestamp - GREATEST(period_start, "start")::timestamp)
OVER (PARTITION BY period_start ) AS total_time,
(SUM(cost) OVER (PARTITION BY period_start ) /
(EXTRACT(EPOCH FROM SUM("end" - "start") OVER (PARTITION BY period_start )) / 60)) *
((EXTRACT (EPOCH FROM SUM (LEAST(period_start + INTERVAL '15 MINUTES', "end")::timestamp - GREATEST(period_start, "start")::timestamp)
OVER (PARTITION BY period_start )))/60)
AS expense
FROM interval_test
INNER JOIN intervals ON (intervals.period_start, intervals.period_start + INTERVAL '15 MINUTES') OVERLAPS (interval_test.start, interval_test.end)
ORDER BY period_start ASC

Related

How to get the difference in minutes between two timestamps excluding weekends?

I need to get the difference in minutes excluding weekends (Saturday, Sunday), between 2 timestamps in postgres, but I'm not getting the expected result.
Examples:
Get diff in minutes, however, weekends are include
SELECT EXTRACT(EPOCH FROM (NOW() - '2021-08-01 08:00:00') / 60)::BIGINT as diff_in_minutes;
$ diff_in_minutes = 17566
Get diff in weekdays, excluding saturday and sunday
SELECT COUNT(*) as diff_in_days
FROM generate_series('2021-08-01 08:00:00', NOW(), interval '1d') d
WHERE extract(isodow FROM d) < 6;
$ diff_in_days = 10
Expected:
From '2021-08-12 08:00:00' to '2021-08-13 08:00:00' = 1440
From '2021-08-13 08:00:00' to '2021-08-16 08:00:00' = 1440
From '2021-08-13 08:00:00' to '2021-08-17 08:00:00' = 2880
and so on ...
the solution is:
SELECT GREATEST(COUNT(*) - 1, 0)
FROM generate_series(from_ts, to_ts, interval'1 minute') AS x
WHERE extract(isodow FROM x) <= 5
so
SELECT GREATEST(COUNT(*) - 1, 0)
FROM generate_series('2021-08-13 08:00:00'::timestamp, '2021-08-17 08:00:00', '1 minute') AS x
WHERE extract(isodow FROM x) <= 5
returns 2880
This is not an optimal solution - but I will leave finding the optimal solution as a homework for you.
First, create an SQL function
CREATE OR REPLACE FUNCTION public.time_overlap (
b_1 timestamptz,
e_1 timestamptz,
b_2 timestamptz,
e_2 timestamptz
)
RETURNS interval AS
$body$
SELECT GREATEST(interval '0 second',e_1 - b_1 - GREATEST(interval '0 second',e_1 - e_2) - GREATEST(interval '0 second',b_2 - b_1));
$body$
LANGUAGE 'sql'
IMMUTABLE
RETURNS NULL ON NULL INPUT
SECURITY INVOKER
PARALLEL SAFE
COST 100;
Then, call it like this:
WITH frame AS (SELECT generate_series('2021-08-13 00:00:00', '2021-08-17 23:59:59', interval '1d') AS d)
SELECT SUM(EXTRACT(epoch FROM time_overlap('2021-08-13 08:00:00', '2021-08-17 08:00:00',d,d + interval '1 day'))/60) AS total
FROM frame
WHERE extract(isodow FROM d) < 6
In the CTE you should round down the left/earlier of the 2 timestamps and round up the right/later of the 2 timestamps. The idea is that you should generate the series over whole days - not in the middle of the day.
When calling the time_overlap function you should use the exact values of your 2 timestamps so that it properly calculates the overlapping in minutes between each day of the generated series and the given timeframe between your 2 timestamps.
In the end, when you sum over all the overlappings - you will get the total number of minutes excluding the weekends.

Postgresql - query to get difference in data count

I have two tables, today's_table and yeterday's_table.
I need to compare the data for an interval of 15 mins at exact same times for today and yesterday.
For example, for below data let's I need to check from 00:00:00 and 00:15:00 on 20201202 and 20201202. So difference should come out as '3' since the yesterday's_table has 8 records and today's_table has 5 records.
today's_table:
Yesterday's table:
I tried something like; (consider now() is 00:15:00)
select count(*) from yeterday's_table where time between now() - interval "24 hours" and now() - interval "23 hours 45 mins"
minus
select count(*) from today's_table where time = now() - interval "15 minutes";
is there any other way to do this?
You can easily do this with subqueries:
SELECT b.c - a.c
FROM (select count(*) as c from yeterdays_table where time between now() - interval '24 hours' and now() - interval '23 hours 45 mins') a,
(select count(*) as c from todays_table where time = now() - interval '15 minutes') b;
Bear in mind you need to single-quote your intervals, and your table names cannot have quotes in them.

How to query max and min records of every 5 seconds (or any other user-defined period) within a selected period of time in postgres

I have a postgres 9.6 table with hundreds of millions of records in 'prices' table with only four columns: uid, price, unit, dt. dt is a datetime in standard format like '2017-05-01 00:00:00.585' with fractions of a second. It might be none to dozens record each second.
I can find MAX and MIN price record in some time period. I can quite easily select a period using
SELECT date_trunc('second', dt) as time, min(price), max(price)
FROM prices
WHERE dt >= '2017-05-01 00:00:00' AND dt < '2017-05-01 00:00:59'
GROUP BY time
ORDER BY time;
But date_trunc does not have flexibility and does not allow to set arbitrary period, for example 5 seconds, or 10 minutes. Is there a way to solve it?
Use generate_series to get the ranges on the interval of time you need to search. Then use dd + '5 seconds'::interval to get the upper bound of the range
In this example we look for one day of data every 5 seconds
WITH ranges as (
SELECT dd as start_range,
dd + '5 seconds'::interval as end_range,
ROW_NUMBER() over () as grp
FROM generate_series
( '2017-05-01 00:00:00'::timestamp
, '2017-05-02 00:00:00'::timestamp
, '5 seconds'::interval) dd
), create_grp as (
SELECT r.grp, r.start_range, r.end_range, p.price
FROM prices p
JOIN ranges r
ON p.date >= r.start_range
AND p.date < r.end_range
)
SELECT grp, start_range, end_range, MIN(price), MAX(price)
FROM create_grp
GROUP BY grp, start_range, end_range
ORDER BY grp

postgres: generate series of timestamps respecting time zone

I'm stumped by a tricky issue regarding time zone changes from daylight savings to non daylight savings.
I'm trying to generate a series of timestamps, 6 hrs apart. This is later joined with data with corresponding timestamps at the 00, 06, 12, 18 hrs for each day in the dataset.
This works fine normally, using:
generate_series(extract(epoch from start_ts)::integer, extract(epoch from end_ts)::integer, 21600)
where start_ts is 00 hr on the first date, and end_ts is 00 hr on the last date exclusive.
However, when timezone offset goes from +11 to +10 half way through the series, it will no longer match any records since the series elements become 1 hr off.
Does anyone have suggestions on how to generate a series of 'epoch integers' or timestamps which would match 00,06,12,18 hr timestamps while respecting the timezone's offset?
This will generate it (using PostgreSQL 9.5+), starting from today and for 10 days:
select (current_date::timestamp + ((a-1)||' days')::interval)::timestamptz
from generate_series(1, 10, .25) a
Test it on a whole year:
select *, date_part('hour', d::timestamp), d::timestamp
from (
select (current_date::timestamp + ((a-1)||' days')::interval)::timestamptz AS d
from generate_series(1, 365, .25) a
) x
where date_part('hour', d) not in (0, 6, 12, 18)
Edit: The version below works with versions of PostgreSQL older than 9.5:
select (current_date::timestamp + (((a-1)/4.0)||' days')::interval)::timestamptz
from generate_series(1, 4* 10 ) a -- 10 days
#Ziggy's answer is great, use that. however here's how I solved it in my application which can't use decimals in generate_series (v9.4):
_min timestamp with time zone, -- the first timestamp in the series
_max timestamp with time zone, -- the last timestamp in the series
_inc integer, -- the increment in seconds, eg 21600 (6hr)
_tz text
creates a series from the _max down using the tz offset of the _max,
creates a series from the _min up using the tz offset of the _min,
merges the results
validates each result is divisible by the _inc in the tz of the result, discards if not
query:
select t1 from (
select ser,
to_timestamp(ser) t1,
extract(epoch from
to_timestamp(ser) at time zone _tz
- date_trunc('day', to_timestamp(ser) at time zone _tz)
)::integer % _inc = 0 is_good
from (
select 'ser1' s, generate_series(extract(epoch from _min)::integer, extract(epoch from _max)::integer, _inc) ser
union all
select 'ser2' s, generate_series(extract(epoch from _max)::integer, extract(epoch from _min)::integer, _inc * -1) ser
) x
group by ser, _tz, _inc
order by ser asc
) x
where is_good
;

Checking for the minimum variability of a temporal database in postgresql

I have a table like this:
+------------+------------------+
|temperature |Date_time_of_data |
+------------+------------------+
| 4.5 |9/15/2007 12:12:12|
| 4.56 |9/15/2007 12:14:16|
| 4.44 |9/15/2007 12:16:02|
| 4.62 |9/15/2007 12:18:23|
| 4.89 |9/15/2007 12:21:01|
+------------+------------------+
The data-set contains more than 1000 records and I want to check for the minimum variability.
For every 30 minutes if the variance of temperature doesn't exceed 0.2, I want all the temperature values of that half an hour replaced by NULL.
Here is a SELECT to get the start of a period for every record:
SELECT temperature,
Date_time_of_data,
date_trunc('hour', Date_time_of_data)+
CASE WHEN date_part('minute', Date_time_of_data) >= 30
THEN interval '30 minutes'
ELSE interval '0 minutes'
END as start_of_period
FROM your_table
It truncates the date to hours (9/15/2007 12:12:12 to 9/15/2007 12:12:00)
and then adds 30 minutes if the date initially had more than 30 minutes.
Next - use start_of_period to group results and get min and max for every group:
SELECT temperature,
Date_time_of_data,
max(Date_time_of_data) OVER (PARTITION BY start_of_period) as max_temp,
min(Date_time_of_data) OVER (PARTITION BY start_of_period) as min_temp
FROM (previou_select_here)
Next - filter out the records, where the variance is more than 0.2
SELECT temperature,
Date_time_of_data
FROM (previou_select_here)
WHERE (max_temp - min_temp) <=0.2
And finally update your table
UPDATE your_table
SET temperature = NULL
WHERE Date_time_of_data IN (previous_select_here)
You may need to correct some spelling mistakes in this queries, before they work. I havent tested them.
And you can simplify them, if you need to.
P.S. If you need to filter out the data with variance less than 0.2 , you can simply create a VIEW from the third SELECT with
WHERE (max_temp - min_temp) > 0.2
And use the VIEW instead of table.
This query should do the job:
with intervals as (
select
date_trunc('hour', Date_time_of_data) + interval '30 min' * round(date_part('minute', Date_time_of_data) / 30.0) as valid_interval
from T
group by 1
having var_samp(temperature) > 0.2
)
select * from T
where
date_trunc('hour', Date_time_of_data) + interval '30 min' * round(date_part('minute', Date_time_of_data) / 30.0) in (select valid_interval from intervals)
The inner query (labeled as intervals) returns times when variance is over 0.2 (having var_samp(temperature) > 0.2). date_trunc ... expression rounds Date_time_of_data to half hour intervals.
The query returns nothing on the provided dataset.
create table T (temperature float8, Date_time_of_data timestamp without time zone);
insert into T values
(4.5, '2007-9-15 12:12:12'),
(4.56, '2007-9-15 12:14:16'),
(4.44, '2007-9-15 12:16:02'),
(4.62, '2007-9-15 12:18:23'),
(4.89, '2007-9-15 12:21:01')
;