Calculate duration of time ranges without overlap in PostgreSQL - postgresql

I'm on Postgres 13 and have a table like this
| key | from | to
-------------------------------------------
| A | 2022-11-27T08:00 | 2022-11-27T09:00
| B | 2022-11-27T09:00 | 2022-11-27T10:00
| C | 2022-11-27T08:30 | 2022-11-27T10:30
I want to calculate the duration of each record, but without overlaps. So the desired result would be
| key | from | to | duration
----------------------------------------------------------
| A | 2022-11-27T08:00 | 2022-11-27T09:00 | '1 hour'
| B | 2022-11-27T09:00 | 2022-11-27T09:45 | '45 minutes'
| C | 2022-11-27T08:30 | 2022-11-27T10:00 | '15 minutes'
I guess, I need a subquery and subtract the overlap somehow, but how would I factor in multiple overlaps? In the example above C overlaps A and B, so I must subtract 30 minutes from A and then 45 minute from B... But I'm stuck here:
SELECT key, (("to" - "from")::interval - s.overlap) as duration
FROM time_entries, (
SELECT (???) as overlap
) s

select
key,
fromDT,
toDT,
(toDT-fromDT)::interval -
COALESCE((SELECT SUM(LEAST(te2.toDT,te1.toDT)-GREATEST(te2.fromDT,te1.fromDT))::interval
FROM time_entries te2
WHERE (te2.fromDT<te1.toDT or te2.toDT>te1.fromDT)
AND te2.key<te1.key),'0 minutes') as duration
from time_entries te1;
output:
key
fromdt
todt
duration
A
2022-11-27 08:00:00
2022-11-27 09:00:00
01:00:00
B
2022-11-27 09:00:00
2022-11-27 10:00:00
01:00:00
C
2022-11-27 08:30:00
2022-11-27 10:30:00
00:30:00
I renamed the columns from and to to fromDT and toDT to avoid using reserved words.
a, step by step, explanation is in the DBFIDDLE

Another approach.
WITH DATA AS
(SELECT KEY,
FROMDT,
TODT,
MIN(FROMDT) OVER(PARTITION BY FROMDT::DATE
ORDER BY KEY) AS START_DATE,
MAX(TODT) OVER(PARTITION BY FROMDT::DATE
ORDER BY KEY) AS END_DATE
FROM TIME_ENTRIES
ORDER BY KEY) ,STAGING_DATA AS
(SELECT KEY,
FROMDT,
TODT,
COALESCE(LAG(START_DATE) OVER (PARTITION BY FROMDT::DATE
ORDER BY KEY),FROMDT) AS T1_DATE,
COALESCE(LAG(END_DATE) OVER (PARTITION BY FROMDT::DATE
ORDER BY KEY),TODT) AS T2_DATE
FROM DATA)
SELECT KEY,
FROMDT,
TODT,
CASE
WHEN FROMDT = T1_DATE
AND TODT = T2_DATE THEN (TODT - FROMDT) ::Interval
WHEN T2_DATE < TODT THEN (TODT - T2_DATE)::Interval
ELSE (T2_DATE - TODT)::interval
END
FROM STAGING_DATA;

Related

How to detect streaks of continuous activity in Postgres?

I want to aggregate data based on streaks of continuous activity.
DDL:
CREATE TABLE t_series (t date, data int)
INSERT INTO t_series VALUES
(date '2018-03-01',12),
(date '2018-03-02',43),
(date '2018-03-03',9),
(date '2018-03-04',13),
(date '2018-03-09',23),
(date '2018-03-10',26),
(date '2018-03-11',28),
(date '2018-03-14',21),
(date '2018-03-15',15)
I want an intermediate output as:
          t | data | period
------------+------+------
 2018-03-01 | 12 | 1
 2018-03-02 | 43 | 1
 2018-03-03 | 9 | 1
 2018-03-04 | 13 | 1
 2018-03-09 | 23 | 2
 2018-03-10 | 26 | 2
 2018-03-11 | 28 | 2
 2018-03-14 | 21 | 3
 2018-03-15 | 15 | 3
And the final output as:
period | sum
--------+-----
      1 | 77
      2 | 77
      3 | 36
I have tried using below but doesn't seem to work:
SELECT *, SUM(CASE WHEN diff IS NULL
                     OR diff <2 THEN 1 ELSE NULL END) OVER (ORDER BY t) AS period
       FROM (SELECT *, t - lag(t, 1) OVER (ORDER BY t) AS diff
             FROM t_series
       ) AS x;
Could anyone please suggest a fix.
Thanks in advance.
I came up with this solution:
SELECT period, SUM(data) AS sum
FROM (
SELECT t, data, SUM(groups) OVER (ORDER BY t) AS period
FROM (
SELECT t, data,
CASE
WHEN diff IS NULL OR diff = 1 THEN 0
ELSE 1
END AS groups
FROM (
SELECT t, data, t - LAG(t) OVER (ORDER BY t) AS diff
FROM t_series
) d
) g -- your intermediate output
) p
GROUP BY period
ORDER BY period
;
Result:
period | sum
--------+-----
0 | 77
1 | 77
2 | 36
The only difference is that my period starts with 0, but I think it's ok

Data from last 12 months each month with trailing 12 months

This is TSQL and I'm trying to calculate repeat purchase rate for last 12 months. This is achieved by looking at sum of customers who have bought more than 1 time last 12 months and the total number of customers last 12 months.
The SQL code below will give me just that; but i would like to dynamically do this for the last 12 months. This is the part where i'm stuck and not should how to best achieve this.
Each month should include data going back 12 months. I.e. June should hold data between June 2018 and June 2018, May should hold data from May 2018 till May 2019.
[Order Date] is a normal datefield (yyyy-mm-dd hh:mm:ss)
DECLARE #startdate1 DATETIME
DECLARE #enddate1 DATETIME
SET #enddate1 = DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE())-1, 0) -- Starting June 2018
SET #startdate1 = DATEADD(mm,DATEDIFF(mm,0,GETDATE())-13,0) -- Ending June 2019
;
with dataset as (
select [Phone No_] as who_identifier,
count(distinct([Order No_])) as mycount
from [MyCompany$Sales Invoice Header]
where [Order Date] between #startdate1 and #enddate1
group by [Phone No_]
),
frequentbuyers as (
select who_identifier, sum(mycount) as frequentbuyerscount
from dataset
where mycount > 1
group by who_identifier),
allpurchases as (
select who_identifier, sum(mycount) as allpurchasescount
from dataset
group by who_identifier
)
select sum(frequentbuyerscount) as frequentbuyercount, (select sum(allpurchasescount) from allpurchases) as allpurchasecount
from frequentbuyers
I'm hoping to achieve end result looking something like this:
...Dec, Jan, Feb, March, April, May, June each month holding both values for frequentbuyercount and allpurchasescount.
Here is the code. I made a little modification for the frequentbuyerscount and allpurchasescount. If you use a sumif like expression you don't need a second cte.
if object_id('tempdb.dbo.#tmpMonths') is not null drop table #tmpMonths
create table #tmpMonths ( MonthID datetime, StartDate datetime, EndDate datetime)
declare #MonthCount int = 12
declare #Month datetime = DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0)
while #MonthCount > 0 begin
insert into #tmpMonths( MonthID, StartDate, EndDate )
select #Month, dateadd(month, -12, #Month), #Month
set #Month = dateadd(month, -1, #Month)
set #MonthCount = #MonthCount - 1
end
;with dataset as (
select m.MonthID as MonthID, [Phone No_] as who_identifier,
count(distinct([Order No_])) as mycount
from [MyCompany$Sales Invoice Header]
inner join #tmpMonths m on [Order Date] between m.StartDate and m.EndDate
group by m.MonthID, [Phone No_]
),
buyers as (
select MonthID, who_identifier
, sum(iif(mycount > 1, mycount, 0)) as frequentbuyerscount --sum only if count > 1
, sum(mycount) as allpurchasescount
from dataset
group by MonthID, who_identifier
)
select
b.MonthID
, max(tm.StartDate) StartDate, max(tm.EndDate) EndDate
, sum(b.frequentbuyerscount) as frequentbuyercount
, sum(b.allpurchasescount) as allpurchasecount
from buyers b inner join #tmpMonths tm on tm.MonthID = b.MonthID
group by b.MonthID
Be aware, that the code was tested only syntax-wise.
After the test data, this is the result:
MonthID | StartDate | EndDate | frequentbuyercount | allpurchasecount
-----------------------------------------------------------------------------
2018-08-01 | 2017-08-01 | 2018-08-01 | 340 | 3702
2018-09-01 | 2017-09-01 | 2018-09-01 | 340 | 3702
2018-10-01 | 2017-10-01 | 2018-10-01 | 340 | 3702
2018-11-01 | 2017-11-01 | 2018-11-01 | 340 | 3702
2018-12-01 | 2017-12-01 | 2018-12-01 | 340 | 3703
2019-01-01 | 2018-01-01 | 2019-01-01 | 340 | 3703
2019-02-01 | 2018-02-01 | 2019-02-01 | 2 | 8
2019-03-01 | 2018-03-01 | 2019-03-01 | 2 | 3
2019-04-01 | 2018-04-01 | 2019-04-01 | 2 | 3
2019-05-01 | 2018-05-01 | 2019-05-01 | 2 | 3
2019-06-01 | 2018-06-01 | 2019-06-01 | 2 | 3
2019-07-01 | 2018-07-01 | 2019-07-01 | 2 | 3

Show complete date range with NULL in PostgreSQL

I'm trying to create this query to get all complete date on range and data with nulls if the date is not exist on the table
For example this is my tbl_example
Original data:
id | userid(str) | comment(str) | mydate(date)
1 0001 sample1 2019-06-20T16:00:00.000Z
2 0002 sample2 2019-06-21T16:00:00.000Z
3 0003 sample3 2019-06-24T16:00:00.000Z
4 0004 sample4 2019-06-25T16:00:00.000Z
5 0005 sample5 2019-06-26T16:00:00.000Z
Then:
select * from tbl_example where mydate between '2019-06-20' AND
DATE('2019-06-20') + interval '5 day')
how to output all the dates on range with possible null like this
Expected output:
id | userid(str) | comment(str) | mydate(date)
1 0001 sample1 2019-06-20T16:00:00.000Z
2 0002 sample2 2019-06-21T16:00:00.000Z
null null null 2019-06-22T16:00:00.000Z
null null null 2019-06-23T16:00:00.000Z
4 0003 sample3 2019-06-24T16:00:00.000Z
5 0004 sample4 2019-06-25T16:00:00.000Z
This is my sample test environment: http://www.sqlfiddle.com/#!17/f5285/2
OK, just see my SQL as below:
with all_dates as (
select generate_series(min(mydate),max(mydate),'1 day'::interval) as dates from tbl_example
)
,null_dates as (
select
a.dates
from
all_dates a
left join
tbl_example t on a.dates = t.mydate
where
t.mydate is null
)
select null as id, null as userid, null as comment, dates as mydate from null_dates
union
select * from tbl_example order by mydate;
id | userid | comment | mydate
----+--------+---------+---------------------
1 | 0001 | sample1 | 2019-06-20 16:00:00
2 | 0002 | sample1 | 2019-06-21 16:00:00
| | | 2019-06-22 16:00:00
| | | 2019-06-23 16:00:00
3 | 0003 | sample1 | 2019-06-24 16:00:00
4 | 0004 | sample1 | 2019-06-25 16:00:00
5 | 0005 | sample1 | 2019-06-26 16:00:00
(7 rows)
Or the generate_series clause you can just write the date arguments you want ,as below:
select generate_series('2019-06-20 16:00:00','2019-06-20 16:00:00'::timestamp + '5 days'::interval,'1 day'::interval) as dates
SELECT id, userid, "comment", d.mydate
FROM generate_series('2019-06-20'::date, '2019-06-25'::date, INTERVAL '1 day') d (mydate)
LEFT JOIN tbl_example ON d.mydate = tbl_example.mydate
Result

postgresql: create graph based on count of two different timestamp columns vs create_time

So I have a table with three columns:
create_time (date of table entry), process_time (date order was processed), report_time (date order was reported). Chronologically speaking, the order is always the following: process_time > report_time > create_time.
Both process_time and report_time can be different than create_time or themselves. But the main column I want to compare against is the create_time.
I would like to create a graph where the X column is the date of create_time and the Y column is a count of how many times that create_time date appears in the process_time or report_time columns. Not a count of process_time / report_time cells which have a value, but a count of the actual date.
Very simple example:
| create_time | process_time | report_time |
|-------------|--------------|-------------|
| 2019-02-01 | 2019-01-27 | 2019-01-28 |
| 2019-02-20 | 2019-02-20 | 2019-02-20 |
| 2019-02-26 | 2019-02-20 | 2019-02-25 |
In this example the graph would show a count of 0 for the first create_time date, since there are no process_time or report_time values that match that same date. For the second create_time it would show a count of 2 process_time and 1 report time and for the third one it would show a count of 0.
Hope this makes sense.
Creating the sample table:
CREATE TABLE example_table(create_time DATE, process_time DATE, report_time DATE);
INSERT INTO example_table(create_time, process_time, report_time)
VALUES ('2019-02-01', '2019-01-27', '2019-01-28'),
('2019-02-20', '2019-02-20', '2019-02-20'),
('2019-02-26', '2019-02-20', '2019-02-25');
The query that first selects all distinct create_time values and then calculates the number of appearances of that date in the process_time and report_time columns.
WITH create_dates AS (
SELECT DISTINCT create_time FROM example_table
)
SELECT * FROM create_dates cd
CROSS JOIN LATERAL (
SELECT
COUNT(*) FILTER (WHERE cd.create_time = et.process_time) as process_time_count,
COUNT(*) FILTER (WHERE cd.create_time = et.report_time) as report_time_count
FROM example_table et
) temp;
The result:
+------------+--------------------+-------------------+
| crete_time | process_time_count | report_time_count |
+------------+--------------------+-------------------+
| 2019-02-20 | 2 | 1 |
+------------+--------------------+-------------------+
| 2019-02-01 | 0 | 0 |
+------------+--------------------+-------------------+
| 2019-02-26 | 0 | 0 |
+------------+--------------------+-------------------+

SQL calculating stock per month

I have specific task, and don't know how to realize it. I hope someone can help me =)
I have stock_move table:
product_id |location_id |location_dest_id |product_qty |date_expected |
-----------|------------|-----------------|------------|--------------------|
327 |80 |84 |10 |2014-05-28 00:00:00 |
327 |80 |84 |10 |2014-05-23 00:00:00 |
327 |80 |84 |10 |2014-02-26 00:00:00 |
327 |80 |85 |10 |2014-02-21 00:00:00 |
327 |80 |84 |10 |2014-02-12 00:00:00 |
327 |84 |85 |20 |2014-02-06 00:00:00 |
322 |84 |80 |120 |2015-12-16 00:00:00 |
322 |80 |84 |30 |2015-12-10 00:00:00 |
322 |80 |84 |30 |2015-12-04 00:00:00 |
322 |80 |84 |15 |2015-11-26 00:00:00 |
i.e. it's table of product moves from one warehouse to second.
I can calculate stock at custom date if I use something like this:
select
coalesce(si.product_id, so.product_id) as "Product",
(coalesce(si.stock, 0) - coalesce(so.stock, 0)) as "Stock"
from
(
select
product_id
,sum(product_qty * price_unit) as stock
from stock_move
where
location_dest_id = 80
and date_expected < now()
group by product_id
) as si
full outer join (
select
product_id
,sum(product_qty * price_unit) as stock
from stock_move
where
location_id = 80
and date_expected < now()
group by product_id
) as so
on si.product_id = so.product_id
Result I have current stock:
Product |Stock |
--------|------|
325 |1058 |
313 |34862 |
304 |2364 |
BUT what to do if I need stock per month?
something like this?
Month |Total Stock |
--------|------------|
Jan |130238 |
Feb |348262 |
Mar |2323364 |
How can I sum product qty from start period to end of each month?
I have just one idea - it's use 24 sub queries for get stock per each month (ex. below)
Jan |Feb | Mar |
----|----|-----|
123 |234 |345 |
End after this rotate rows and columns?
I think this's stupid, but I don't know another way... Help me pls =)
Something like this could give you monthly "ending" inventory snapshots. The trick is your data may omit certain months for certain parts, but that part will still have a balance (ie 50 received in January, nothing happened in February, but you still want to show February with a running total of 50).
One way to handle this is to come up with all possible combinations part/dates. I assumed 1/1/14 + 24 months in this example, but that's easily changed in the all_months subquery. For example, you may only want to start with the minimum date from the stock_move table.
with all_months as (
select '2014-01-01'::date + interval '1 month' * generate_series(0, 23) as month_begin
),
stock_calc as (
select
product_id, date_expected,
date_trunc ('month', date_expected)::date as month_expected,
case
when location_id = 80 then -product_qty * price_unit
when location_dest_id = 80 then product_qty * price_unit
else 0
end as qty
from stock_move
union all
select distinct
s.product_id, m.month_begin::date, m.month_begin::date, 0
from
stock_move s
cross join all_months m
),
running_totals as (
select
product_id, date_expected, month_expected,
sum (qty) over (partition by product_id order by date_expected) as end_qty,
row_number() over (partition by product_id, month_expected
order by date_expected desc) as rn
from stock_calc
)
select
product_id, month_expected, end_qty
from running_totals
where
rn = 1