postgresql: create graph based on count of two different timestamp columns vs create_time - postgresql

So I have a table with three columns:
create_time (date of table entry), process_time (date order was processed), report_time (date order was reported). Chronologically speaking, the order is always the following: process_time > report_time > create_time.
Both process_time and report_time can be different than create_time or themselves. But the main column I want to compare against is the create_time.
I would like to create a graph where the X column is the date of create_time and the Y column is a count of how many times that create_time date appears in the process_time or report_time columns. Not a count of process_time / report_time cells which have a value, but a count of the actual date.
Very simple example:
| create_time | process_time | report_time |
|-------------|--------------|-------------|
| 2019-02-01 | 2019-01-27 | 2019-01-28 |
| 2019-02-20 | 2019-02-20 | 2019-02-20 |
| 2019-02-26 | 2019-02-20 | 2019-02-25 |
In this example the graph would show a count of 0 for the first create_time date, since there are no process_time or report_time values that match that same date. For the second create_time it would show a count of 2 process_time and 1 report time and for the third one it would show a count of 0.
Hope this makes sense.

Creating the sample table:
CREATE TABLE example_table(create_time DATE, process_time DATE, report_time DATE);
INSERT INTO example_table(create_time, process_time, report_time)
VALUES ('2019-02-01', '2019-01-27', '2019-01-28'),
('2019-02-20', '2019-02-20', '2019-02-20'),
('2019-02-26', '2019-02-20', '2019-02-25');
The query that first selects all distinct create_time values and then calculates the number of appearances of that date in the process_time and report_time columns.
WITH create_dates AS (
SELECT DISTINCT create_time FROM example_table
)
SELECT * FROM create_dates cd
CROSS JOIN LATERAL (
SELECT
COUNT(*) FILTER (WHERE cd.create_time = et.process_time) as process_time_count,
COUNT(*) FILTER (WHERE cd.create_time = et.report_time) as report_time_count
FROM example_table et
) temp;
The result:
+------------+--------------------+-------------------+
| crete_time | process_time_count | report_time_count |
+------------+--------------------+-------------------+
| 2019-02-20 | 2 | 1 |
+------------+--------------------+-------------------+
| 2019-02-01 | 0 | 0 |
+------------+--------------------+-------------------+
| 2019-02-26 | 0 | 0 |
+------------+--------------------+-------------------+

Related

Calculate duration of time ranges without overlap in PostgreSQL

I'm on Postgres 13 and have a table like this
| key | from | to
-------------------------------------------
| A | 2022-11-27T08:00 | 2022-11-27T09:00
| B | 2022-11-27T09:00 | 2022-11-27T10:00
| C | 2022-11-27T08:30 | 2022-11-27T10:30
I want to calculate the duration of each record, but without overlaps. So the desired result would be
| key | from | to | duration
----------------------------------------------------------
| A | 2022-11-27T08:00 | 2022-11-27T09:00 | '1 hour'
| B | 2022-11-27T09:00 | 2022-11-27T09:45 | '45 minutes'
| C | 2022-11-27T08:30 | 2022-11-27T10:00 | '15 minutes'
I guess, I need a subquery and subtract the overlap somehow, but how would I factor in multiple overlaps? In the example above C overlaps A and B, so I must subtract 30 minutes from A and then 45 minute from B... But I'm stuck here:
SELECT key, (("to" - "from")::interval - s.overlap) as duration
FROM time_entries, (
SELECT (???) as overlap
) s
select
key,
fromDT,
toDT,
(toDT-fromDT)::interval -
COALESCE((SELECT SUM(LEAST(te2.toDT,te1.toDT)-GREATEST(te2.fromDT,te1.fromDT))::interval
FROM time_entries te2
WHERE (te2.fromDT<te1.toDT or te2.toDT>te1.fromDT)
AND te2.key<te1.key),'0 minutes') as duration
from time_entries te1;
output:
key
fromdt
todt
duration
A
2022-11-27 08:00:00
2022-11-27 09:00:00
01:00:00
B
2022-11-27 09:00:00
2022-11-27 10:00:00
01:00:00
C
2022-11-27 08:30:00
2022-11-27 10:30:00
00:30:00
I renamed the columns from and to to fromDT and toDT to avoid using reserved words.
a, step by step, explanation is in the DBFIDDLE
Another approach.
WITH DATA AS
(SELECT KEY,
FROMDT,
TODT,
MIN(FROMDT) OVER(PARTITION BY FROMDT::DATE
ORDER BY KEY) AS START_DATE,
MAX(TODT) OVER(PARTITION BY FROMDT::DATE
ORDER BY KEY) AS END_DATE
FROM TIME_ENTRIES
ORDER BY KEY) ,STAGING_DATA AS
(SELECT KEY,
FROMDT,
TODT,
COALESCE(LAG(START_DATE) OVER (PARTITION BY FROMDT::DATE
ORDER BY KEY),FROMDT) AS T1_DATE,
COALESCE(LAG(END_DATE) OVER (PARTITION BY FROMDT::DATE
ORDER BY KEY),TODT) AS T2_DATE
FROM DATA)
SELECT KEY,
FROMDT,
TODT,
CASE
WHEN FROMDT = T1_DATE
AND TODT = T2_DATE THEN (TODT - FROMDT) ::Interval
WHEN T2_DATE < TODT THEN (TODT - T2_DATE)::Interval
ELSE (T2_DATE - TODT)::interval
END
FROM STAGING_DATA;

How to count rows using a variable date range provided by a table in PostgreSQL

I suspect I require some sort of windowing function to do this. I have the following item data as an example:
count | date
------+-----------
3 | 2017-09-15
9 | 2017-09-18
2 | 2017-09-19
6 | 2017-09-20
3 | 2017-09-21
So there are gaps in my data first off, and I have another query here:
select until_date, until_date - (lag(until_date) over ()) as delta_days from ranges
Which I have generated the following data:
until_date | delta_days
-----------+-----------
2017-09-08 |
2017-09-11 | 3
2017-09-13 | 2
2017-09-18 | 5
2017-09-21 | 3
2017-09-22 | 1
So I'd like my final query to produce this result:
start_date | ending_date | total_items
-----------+-------------+--------------
2017-09-08 | 2017-09-10 | 0
2017-09-11 | 2017-09-12 | 0
2017-09-13 | 2017-09-17 | 3
2017-09-18 | 2017-09-20 | 15
2017-09-21 | 2017-09-22 | 3
Which tells me the total count of items from the first table, per day, based on the custom ranges from the second table.
In this particular example, I would be summing up total_items BETWEEN start AND end (since there would be overlap on the dates, I'd subtract 1 from the end date to not count duplicates)
Anyone know how to do this?
Thanks!
Use the daterange type. Note that you do not have to calculate delta_days, just convert ranges to dataranges and use the operator <# - element is contained by.
with counts(count, date) as (
values
(3, '2017-09-15'::date),
(9, '2017-09-18'),
(2, '2017-09-19'),
(6, '2017-09-20'),
(3, '2017-09-21')
),
ranges (until_date) as (
values
('2017-09-08'::date),
('2017-09-11'),
('2017-09-13'),
('2017-09-18'),
('2017-09-21'),
('2017-09-22')
)
select daterange, coalesce(sum(count), 0) as total_items
from (
select daterange(lag(until_date) over (order by until_date), until_date)
from ranges
) s
left join counts on date <# daterange
where not lower_inf(daterange)
group by 1
order by 1;
daterange | total_items
-------------------------+-------------
[2017-09-08,2017-09-11) | 0
[2017-09-11,2017-09-13) | 0
[2017-09-13,2017-09-18) | 3
[2017-09-18,2017-09-21) | 17
[2017-09-21,2017-09-22) | 3
(5 rows)
Note, that in the dateranges above lower bounds are inclusive while upper bound are exclusive.
If you want to calculate items per day in the dateranges:
select
daterange, total_items,
round(total_items::dec/(upper(daterange)- lower(daterange)), 2) as items_per_day
from (
select daterange, coalesce(sum(count), 0) as total_items
from (
select daterange(lag(until_date) over (order by until_date), until_date)
from ranges
) s
left join counts on date <# daterange
where not lower_inf(daterange)
group by 1
) s
order by 1
daterange | total_items | items_per_day
-------------------------+-------------+---------------
[2017-09-08,2017-09-11) | 0 | 0.00
[2017-09-11,2017-09-13) | 0 | 0.00
[2017-09-13,2017-09-18) | 3 | 0.60
[2017-09-18,2017-09-21) | 17 | 5.67
[2017-09-21,2017-09-22) | 3 | 3.00
(5 rows)

Postgresql Time Series for each Record

I'm having issues trying to wrap my head around how to extract some time series stats from my Postgres DB.
For example, I have several stores. I record how many sales each store made each day in a table that looks like:
+------------+----------+-------+
| Date | Store ID | Count |
+------------+----------+-------+
| 2017-02-01 | 1 | 10 |
| 2017-02-01 | 2 | 20 |
| 2017-02-03 | 1 | 11 |
| 2017-02-03 | 2 | 21 |
| 2017-02-04 | 3 | 30 |
+------------+----------+-------+
I'm trying to display this data on a bar/line graph with different lines per Store and the blank dates filled in with 0.
I have been successful getting it to show the sum per day (combining all the stores into one sum) using generate_series, but I can't figure out how to separate it out so each store has a value for each day... the result being something like:
["Store ID 1", 10, 0, 11, 0]
["Store ID 2", 20, 0, 21, 0]
["Store ID 3", 0, 0, 0, 30]
It is necessary to build a cross join dates X stores:
select store_id, array_agg(total order by date) as total
from (
select store_id, date, coalesce(sum(total), 0) as total
from
t
right join (
generate_series(
(select min(date) from t),
(select max(date) from t),
'1 day'
) gs (date)
cross join
(select distinct store_id from t) s
) using (date, store_id)
group by 1,2
) s
group by 1
order by 1
;
store_id | total
----------+-------------
1 | {10,0,11,0}
2 | {20,0,21,0}
3 | {0,0,0,30}
Sample data:
create table t (date date, store_id int, total int);
insert into t (date, store_id, total) values
('2017-02-01',1,10),
('2017-02-01',2,20),
('2017-02-03',1,11),
('2017-02-03',2,21),
('2017-02-04',3,30);

Query to combine two tables into one based on timestamp

I have three tables in Postgres. They are all about a single event (an occurrence, not "sports event"). Each table is about a specific item during the event.
table_header columns
gid, start_timestamp, end_timestamp, location, positions
table_item1 columns
gid, side, visibility, item1_timestamp
table_item2 columns
gid, position_id, name, item2_timestamp
I've tried the following query:
SELECT h.gid, h.location, h.start_timestamp, h.end_timestamp, i1.side,
i1.visibility, i2.position_id, i2.name, i2.item2_timestamp AS timestamp
FROM tablet_header AS h
LEFT OUTER JOIN table_item1 i1 on (i1.gid = h.gid)
LEFT OUTER JOIN table_item2 i2 on (i2.gid = i1.gid AND
i1.item1_timestamp = i2.item2_timestamp)
WHERE h.start_timestamp BETWEEN '2016-03-24 12:00:00'::timestamp AND now()::timestamp
The problem is that I'm losing some data from rows when item1_timestamp and item2_timestamp do not match.
So if I have in table_item1 and table_item2:
gid | item1_timestamp | side gid | item2_timestamp | name
---------------------------- -----------------------------------
1 | 17:00:00 | left 1 | 17:00:00 | charlie
1 | 17:00:05 | right 1 | 17:00:03 | frank
1 | 17:00:10 | left 1 | 17:00:06 | dee
I would want the final output to be:
gid | timestamp | side | name
-----------------------------
1 | 17:00:00 | left | charlie
1 | 17:00:03 | | frank
1 | 17:00:05 | right |
1 | 17:00:06 | | dee
1 | 17:00:10 | left |
based purely on the timestamp (and gid). Naturally I would have the header info in there too, but that's trivial.
I tried playing around with the query I posted used different JOINs and UNIONs, but I cannot seem to get it right. The one I posted gives the best results I could manage, but it's incomplete.
Side note: every minute or so there will be a new "event". So the gid will be unique to each event and the query needs to ensure that each dataset is paired with data from the same gid. Which is the reason for my i1.gid = h.gid lines. Data between different events should not be compared.
select t1.gid, t1.timestamp, t1.side, t2.name
from t1
left join t2 on t2.timestamp=t1.timestamp and t2.gid=t1.gid
union
select t1.gid, t1.timestamp, t1.side, t2.name
from t2
left join t1 on t2.timestamp=t1.timestamp and t2.gid=t1.gid

DB2 SQL to aggregate value for months with no gaps

I have 2 tables which I need to join against, along with a table that is generated inline using WITH. The WITH is a daterange, and I need to display all rows from 1 table for all months, even where no data exists in the 2nd table.
This is the data within the tables :
Table REFERRAL_GROUPINGS
referral_group
--------------
VER
FRD
FCC
Table DATA_VALUES
referral_group | task_date | task_id | over_threshold
---------------+------------+---------+---------------
VER | 2015-10-01 | 10 | 0
FRD | 2015-11-04 | 20 | 1
The date range will need to select 3 months :
Oct-2015
Nov-2015
Dec-2015
The data I expect to end up with will be :
MonthYear | referral_group | count_of_group | total_over_threshold
----------+----------------+----------------+---------------------
Oct-2015 | VER | 1 | 0
Oct-2015 | FRD | 0 | 0
Oct-2015 | FCC | 0 | 0
Nov-2015 | VER | 0 | 0
Nov-2015 | FRD | 1 | 1
Nov-2015 | FCC | 0 | 0
Dec-2015 | VER | 0 | 0
Dec-2015 | FRD | 0 | 0
Dec-2015 | FCC | 0 | 0
DDL to create the 2 tables and populate with data is as below..
CREATE TABLE test_data (
referral_group char(3),
task_date date,
task_id integer,
over_threshold integer);
insert into test_data values
('VER','2015-10-01',10,1),
('FRD','2015-11-04',20,0);
CREATE TABLE referral_grouper (
referral_group char(3));
insert into referral_grouper values
('FRD'),
('VER'),
('FCC');
This is a very cut-down example which uses the minimal tables/columns for this example, which is why I have no primary keys/indexes.
I can get this running under LUW no problem, by using NOT EXISTS in the joins as per this SQL.
WITH DATERANGE(FROM_DTE,yyyymm, TO_DTE) AS
(
SELECT DATE('2015-10-01'), YEAR('2015-10-01')*100+MONTH('2015-10-01'), '2015-12-31'
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT FROM_DTE + 1 DAY, YEAR(FROM_DTE+1 DAY)*100+MONTH(FROM_DTE+1 DAY), TO_DTE
FROM DATERANGE
WHERE FROM_DTE < TO_DTE
)
select
referral_grouper.referral_group,
daterange.yyyymm,
count(test_data.task_id) AS total_count,
COALESCE(SUM(over_threshold),0) AS total_over_threshold
FROM
test_data
RIGHT OUTER JOIN daterange ON (daterange.from_dte=test_data.task_date OR NOT EXISTS (SELECT 1 FROM daterange d2 WHERE d2.from_dte=test_data.task_date))
RIGHT OUTER JOIN referral_grouper ON (referral_grouper.referral_group=test_data.referral_group OR NOT EXISTS (SELECT 1 FROM referral_grouper g2 WHERE g2.referral_group=test_data.referral_group))
GROUP BY
referral_grouper.referral_group,
daterange.yyyymm
However... This needs to work on ZOS, and under ZOS you cannot use subqueries with EXISTS in a join. Removing the NOT EXISTS means the non existing rows no longer show up.
There must be a way to write the SQL to return all rows from the 2 linking tables without using NOT EXISTS, but I just cannot seem to find it. Any help with this would be very appreciated as it has me stumped