Valid periods - SQL VIEW - tsql

I have 2 tables (actually there are 4, but for now lets say it's 2) with data like this:
Table PersonA
ClientID ID From Till
1 10 1.1.2017 30.4.2017
1 12 1.8.2017 2.1.2018
Table PersonB
ClientID ID From Till
1 6 1.3.2017 30.6.2017
And I need to generate view that would show something like this:
ClientID From Till PersonA PersonB
1 1.1.2017 28.2.2017 10 NULL
1 1.3.2017 30.4.2017 10 6
1 1.5.2017 30.6.2017 NULL 6
1 1.8.2017 02.1.2018 12 NULL
So basically I need to create view that would show what "persons" each client had in given period.
So when there is an overlap, client have both PersonA and PersonB (same should apply for PersonC and PersonD).
So in the final view one client can't have any overlapping dates.
I don't know how to approach this.

In an adaptation of this algorithm, we can already handle the overlaps:
declare #PersonA table(ClientID int, ID int, [From] date, Till date);
insert into #PersonA values (1,10,'20170101','20170430'),(1,12,'20170801','20180112');
declare #PersonB table(ClientID int, ID int, [From] date, Till date);
insert into #PersonB values (1,6,'20170301','20170630');
declare #PersonC table(ClientID int, ID int, [From] date, Till date);
insert into #PersonC values (1,12,'20170401','20170625');
declare #PersonD table(ClientID int, ID int, [From] date, Till date);
insert into #PersonD values (1,14,'20170501','20170525'),(1,14,'20170510','20171122');
with X(ClientID,EdgeDate)
as (select ClientID
,case
when toggle = 1
then Till
else [From]
end as EdgeDate
from
(
select ClientID,[From],Till from #PersonA
union all
select ClientID,[From],Till from #PersonB
union all
select ClientID,[From],Till from #PersonC
union all
select ClientID,[From],Till from #PersonD
) as concated
cross join
(
select-1 as toggle
union all
select 1 as toggle
) as toggler
),merged
as (select distinct
S.ClientID
,S.EdgeDate as [From]
,min(E.EdgeDate) as Till
from
X as S
inner join X as E
on S.ClientID = E.ClientID
and S.EdgeDate < E.EdgeDate
group by S.ClientID
,S.EdgeDate
),prds
as (select distinct
merged.ClientID
,merged.[From]
,merged.Till
,A.ID as PersonA
,B.ID as PersonB
,C.ID as PersonC
,D.ID as PersonD
from
merged
left join #PersonA as A
on merged.ClientID = A.ClientID
and A.[From] <= merged.[From]
and merged.Till <= A.Till
left join #PersonB as B
on merged.ClientID = B.ClientID
and B.[From] <= merged.[From]
and merged.Till <= B.Till
left join #PersonC as C
on merged.ClientID = C.ClientID
and C.[From] <= merged.[From]
and merged.Till <= C.Till
left join #PersonD as D
on merged.ClientID = D.ClientID
and D.[From] <= merged.[From]
and merged.Till <= D.Till
where not(A.ID is null
and B.ID is null
and C.ID is null
and D.ID is null
)
)
select ClientID
,[From]
,case
when Till = lead([From]
) over(order by Till)
then dateadd(d,-1,Till)
else Till
end as Till
,PersonA
,PersonB
,PersonC
,PersonD
from
prds
order by ClientID
,[From]
,Till;
Output with just the two Person tables given in the question:
+----------+------------+------------+---------+---------+
| ClientID | From | Till | PersonA | PersonB |
+----------+------------+------------+---------+---------+
| 1 | 2017-01-01 | 2017-02-28 | 10 | NULL |
| 1 | 2017-03-01 | 2017-04-29 | 10 | 6 |
| 1 | 2017-04-30 | 2017-06-30 | NULL | 6 |
| 1 | 2017-08-01 | 2018-01-12 | 12 | NULL |
+----------+------------+------------+---------+---------+
Output of script as it is above, with four Person tables:
+----------+------------+------------+---------+---------+---------+---------+
| ClientID | From | Till | PersonA | PersonB | PersonC | PersonD |
+----------+------------+------------+---------+---------+---------+---------+
| 1 | 2017-01-01 | 2017-02-28 | 10 | NULL | NULL | NULL |
| 1 | 2017-03-01 | 2017-03-31 | 10 | 6 | NULL | NULL |
| 1 | 2017-04-01 | 2017-04-29 | 10 | 6 | 12 | NULL |
| 1 | 2017-04-30 | 2017-04-30 | NULL | 6 | 12 | NULL |
| 1 | 2017-05-01 | 2017-05-09 | NULL | 6 | 12 | 14 |
| 1 | 2017-05-10 | 2017-05-24 | NULL | 6 | 12 | 14 |
| 1 | 2017-05-25 | 2017-06-24 | NULL | 6 | 12 | 14 |
| 1 | 2017-06-25 | 2017-06-29 | NULL | 6 | NULL | 14 |
| 1 | 2017-06-30 | 2017-07-31 | NULL | NULL | NULL | 14 |
| 1 | 2017-08-01 | 2017-11-21 | 12 | NULL | NULL | 14 |
| 1 | 2017-11-22 | 2018-01-12 | 12 | NULL | NULL | NULL |
+----------+------------+------------+---------+---------+---------+---------+

Related

PostgreSQL - Check if column value exists in any previous row

I'm working on a problem where I need to check if an ID exists in any previous records within another ID set, and create a tag if it does.
Suppose I have the following table
| client_id | order_date | supplier_id |
| 1 | 2022-01-01 | 1 |
| 1 | 2022-02-01 | 2 |
| 1 | 2022-03-01 | 1 |
| 1 | 2022-04-01 | 3 |
| 2 | 2022-05-01 | 1 |
| 2 | 2022-06-01 | 1 |
| 2 | 2022-07-01 | 2 |
And I want to create a column with a "is new supplier" tag (for each client):
| client_id | order_date | supplier_id | is_new_supplier|
| 1 | 2022-01-01 | 1 | True
| 1 | 2022-02-01 | 2 | True
| 1 | 2022-03-01 | 1 | False
| 1 | 2022-04-01 | 3 | True
| 2 | 2022-05-01 | 1 | True
| 2 | 2022-06-01 | 1 | False
| 2 | 2022-07-01 | 2 | True
First I tried doing this by creating a dense_rank and filtering out repeated ranks, but it didn't work:
with aux as (SELECT client_id,
order_date,
supplier_id
FROM table)
SELECT *, dense_rank() over (
partition by client_id
order by supplier_id
) as _dense_rank
FROM aux
Another way I thought about doing this, is by creating an auxiliary id with client_id + supplier_id, ordering by date and checking if the aux id exists in any previous row, but I don't know how to do this in SQL.
You are on the right track.
Instead of dense_rank, you can just use row_number and on your partition by add supplier id..
Don't forget to order by order_date
with aux as (SELECT client_id,
order_date,
supplier_id,
row_number() over (
partition by client_id, supplier_id
order by order_date
) as rank
FROM table)
SELECT client_id,
order_date,
supplier_id,
rank,
(rank = 1) as is_new_supplier
FROM aux

Lederboards in PostgreSQL and get 2 next and previous rows

We use Postgresql 14.1
I have a sample data that contains over 50 million records.
base table:
+------+----------+--------+--------+--------+
| id | item_id | battles| wins | damage |
+------+----------+--------+--------+--------+
| 1 | 255 | 35 | 52.08 | 1245.2 |
| 2 | 255 | 35 | 52.08 | 1245.2 |
| 3 | 255 | 35 | 52.08 | 1245.3 |
| 4 | 255 | 35 | 52.08 | 1245.3 |
| 5 | 255 | 35 | 52.09 | 1245.4 |
| 6 | 255 | 35 | 52.08 | 1245.3 |
| 7 | 255 | 35 | 52.08 | 1245.3 |
| 8 | 255 | 35 | 52.08 | 1245.7 |
| 1 | 460 | 18 | 47.35 | 1010.1 |
| 2 | 460 | 27 | 49.18 | 1518.9 |
| 3 | 460 | 16 | 50.78 | 1171.2 |
+------+----------+--------+--------+--------+
We need to get the target row number and 2 next and 2 previous rows as quickly as possible.
Indexed columns:
id
item_id
Sorting:
damage (DESC)
wins (DESC)
battles (ASC)
id (ASC)
At the example, we need to find the row number and +- 2 rows where id = 4 and item_id = 255. The result table should be:
+------+----------+--------+--------+--------+------+
| id | item_id | battles| wins | damage | rank |
+------+----------+--------+--------+--------+------+
| 5 | 255 | 35 | 52.09 | 1245.4 | 2 |
| 3 | 255 | 35 | 52.08 | 1245.3 | 3 |
| 4 | 255 | 35 | 52.08 | 1245.3 | 4 |
| 6 | 255 | 35 | 52.08 | 1245.3 | 5 |
| 7 | 255 | 35 | 52.08 | 1245.3 | 6 |
+------+----------+--------+--------+--------+------+
How can I do this with Row number windows function?
Is there is any way optimize in query to make it faster because other columns have no indexes?
CREATE OR REPLACE FUNCTION find_top(in_id integer, in_item_id integer) RETURNS TABLE (
r_id int,
r_item_id int,
r_battles int,
r_wins real,
r_damage real,
r_rank bigint,
r_eff real,
r_frags int
) AS $$
DECLARE
center_place bigint;
BEGIN
SELECT place INTO center_place FROM
(SELECT
id, item_id,
ROW_NUMBER() OVER (ORDER BY damage DESC, wins DESC, battles, id) AS place
FROM
public.my_table
WHERE
item_id = in_item_id
AND battles >= 20
) AS s
WHERE s.id = in_id;
RETURN QUERY SELECT
s.place, pt.id, pt.item_id, pt.battles, pt.wins, pt.damage
FROM
(
SELECT * FROM
(SELECT
ROW_NUMBER () OVER (ORDER BY damage DESC, wins DESC, battles, id) AS place,
id, item_id
FROM
public.my_table
WHERE
item_id = in_item_id
AND battles >= 20) x
WHERE x.place BETWEEN (center_place - 2) AND (center_place + 2)
) s
JOIN
public.my_table pt
ON pt.id = s.id AND pt.item_id = s.item_id;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION find_top(in_id integer, in_item_id integer) RETURNS TABLE (
r_id int,
r_item_id int,
r_battles int,
r_wins real,
r_damage real,
r_rank bigint,
r_eff real,
r_frags int
) AS $$
BEGIN
RETURN QUERY
SELECT c.*, B.ord -3 AS row_number
FROM
( SELECT array_agg(id) OVER w AS id
, array_agg(item_id) OVER w AS item_id
FROM public.my_table
WINDOW w AS (ORDER BY damage DESC, wins DESC, battles, id ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
) AS a
CROSS JOIN LATERAL unnest(a.id, a.item_id) WITH ORDINALITY AS b(id, item_id, ord)
INNER JOIN public.my_table AS c
ON c.id = b.id
AND c.item_id = b.item_id
WHERE a.item_id[3] = in_item_id
AND a.id[3] = in_id
ORDER BY b.ord ;
END ; $$ LANGUAGE plpgsql;
test result in dbfiddle

How to fill Null with the previous value in PostgreSQL?

I have a table which contains Null values. I need to replace them with a previous non-Null value.
This is an example of data which I have:
date | category | start_period | period_number |
------------------------------------------------------
2018-01-01 | A | 1 | 1 |
2018-01-02 | A | 0 | Null |
2018-01-03 | A | 0 | Null |
2018-01-04 | A | 0 | Null |
2018-01-05 | B | 1 | 2 |
2018-01-06 | B | 0 | Null |
2018-01-07 | B | 0 | Null |
2018-01-08 | A | 1 | 3 |
2018-01-09 | A | 0 | Null |
2018-01-10 | A | 0 | Null |
The result should look like this:
date | category | start_period | period_number |
------------------------------------------------------
2018-01-01 | A | 1 | 1 |
2018-01-02 | A | 0 | 1 |
2018-01-03 | A | 0 | 1 |
2018-01-04 | A | 0 | 1 |
2018-01-05 | B | 1 | 2 |
2018-01-06 | B | 0 | 2 |
2018-01-07 | B | 0 | 2 |
2018-01-08 | A | 1 | 3 |
2018-01-09 | A | 0 | 3 |
2018-01-10 | A | 0 | 3 |
I tried the following query, but in this case, only the first Null value will be replaced.
select
date,
category,
start_period,
case
when period_number isnull then lag(period_number) over()
else period_number
end as period_number
from period_table;
Also, I tried to use first_value() window function, but I don't know how to set up the correct window.
Any help is highly appreciated.
You can join table with itself and get desired value. Assuming your date column is the primary key or unique.
update your_table upd set period_number = tbl.period_number
from
(
select b.date, max(b2.date) as d2 from your_table b
inner join d_batch_tab b2 on b2.date< b.date and b2.period_number is not null
group by b.date
)t
inner join your_table tbl on tbl.date = t.d2
where t.date= upd.date
If you don't need to update the table but only a select statement then
select yt.date, yt.category, yt.start_period, tbl.period_number
from your_table yt
inner join
(
select b.date, max(b2.date) as d2 from your_table b
inner join d_batch_tab b2 on b2.date< b.date and b2.period_number is not null
group by b.date
)t on yt.date = t.date
inner join your_table tbl on tbl.date = t.d2
If you replace your case statement with:
(
select
_.period_number
from
period_table as _
where
_.period_number is not null
and _.category = period_table.category
and _.date <= period_table.date
order by
_.date desc
limit 1
) as period_number
Then it should have the intended effect. It's nowhere near as elegant as a window function but I don't think window functions are quite flexible enough for your specific use case here (Or at least, if they are, I don't know how to flex them that much)
Examples of windows function and frame clause:
select
date,category,score
,FIRST_VALUE(score) OVER (
PARTITION BY category
ORDER BY date RANGE BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW
) as last_score
from testing.rec_test
order by date, category
select
date,category,score
,LAST_VALUE(score) OVER (
PARTITION BY category
ORDER BY date RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) as last_score
from testing.rec_test
order by date, category

Create trip number on tracking data

I am currently working on a Postgres database with data for car tracking which looks similar to this:
+----+--------+------------+----------+
| id | car_id | date | time |
+----+--------+------------+----------+
| 11 | 1 | 2014-12-20 | 12:12:12 |
| 12 | 1 | 2014-12-20 | 12:12:13 |
| 13 | 1 | 2014-12-20 | 12:12:14 |
| 23 | 1 | 2015-12-20 | 23:42:10 |
| 24 | 1 | 2015-12-20 | 23:42:11 |
| 31 | 2 | 2014-12-20 | 15:12:12 |
| 32 | 2 | 2014-12-20 | 15:12:14 |
+----+--------+------------+----------+
Here is the setup:
CREATE TABLE test (
id int
, car_id int
, date text
, time text
);
INSERT INTO test VALUES
(11, 1, '2014-12-20', '12:12:12'),
(12, 1, '2014-12-20', '12:12:13'),
(13, 1, '2014-12-20', '12:12:14'),
(23, 1, '2015-12-20', '23:42:10'),
(24, 1, '2015-12-20', '23:42:11'),
(31, 2, '2014-12-20', '15:12:12'),
(32, 2, '2014-12-20', '15:12:14');
I want to create a column where the traces are assigned a trip number sorted by id
id car_id date time (trip)
11 1 2014-12-20 12:12:12 1
12 1 2014-12-20 12:12:13 1
13 1 2014-12-20 12:12:14 1
23 1 2015-12-20 23:42:10 2 (trip +1 because time difference is bigger then 5 sec)
24 1 2015-12-20 23:42:11 2
31 2 2014-12-20 15:12:12 3 (trip +1 because car id is different)
32 2 2014-12-20 15:12:14 3 `
I have put op following rules
first row (lowest id) gets the value trip = 1
for the following rows: if car_id is equal to the row above and time
difference between the row and the row above is smaller then 5 then trip is
the same as the row above, else trip is the row above +1
I have tried with the following
Create table test as select
"id", "date", "time", car_id,
extract(epoch from "date" + "time") - lag(extract(epoch from "date" + "time")) over (order by "id") as diff,
Case
when t_diff < 5 and car_id - lag(car_id) over (order by "id") = 0
then lag(trip) over (order by "id")
else lag(trip) over (order by "id") + 1
end as trip
From road_1 order by "id"
but it does not work :( How can I compute the trip column?
First, use (date || ' ' || time)::timestamp AS datetime to form a timestamp out of date and time
SELECT id, test.car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test
which yields
| id | car_id | datetime |
|----+--------+---------------------|
| 11 | 1 | 2014-12-20 12:12:12 |
| 12 | 1 | 2014-12-20 12:12:13 |
| 13 | 1 | 2014-12-20 12:12:14 |
| 23 | 1 | 2015-12-20 23:42:10 |
| 24 | 1 | 2015-12-20 23:42:11 |
| 31 | 2 | 2014-12-20 15:12:12 |
| 32 | 2 | 2014-12-20 15:12:14 |
It is helpful to do this since we'll be using datetime - prev > '5 seconds'::interval
to identify rows which are 5 seconds apart. Notice that
2014-12-20 23:59:59 and 2014-12-21 00:00:00 are 5 seconds apart
but it would be difficult/tedious to determine this if all we had were separate date and time columns.
Now we can express the rule that the trip is increased by 1 when
NOT ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval))
(More on why the condition is expressed in this seemingly backwards way, below).
SELECT id, car_id, prev_car_id, datetime, prev_date
, (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
FROM (
SELECT id, car_id, datetime
, lag(datetime) OVER () AS prev_date
, lag(car_id) OVER () AS prev_car_id
FROM (
SELECT id, car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test ) t1
) t2
yields
| id | car_id | prev_car_id | datetime | prev_date | new_trip |
|----+--------+-------------+---------------------+---------------------+----------|
| 11 | 1 | | 2014-12-20 12:12:12 | | 1 |
| 12 | 1 | 1 | 2014-12-20 12:12:13 | 2014-12-20 12:12:12 | 0 |
| 13 | 1 | 1 | 2014-12-20 12:12:14 | 2014-12-20 12:12:13 | 0 |
| 23 | 1 | 1 | 2015-12-20 23:42:10 | 2014-12-20 12:12:14 | 1 |
| 24 | 1 | 1 | 2015-12-20 23:42:11 | 2015-12-20 23:42:10 | 0 |
| 31 | 2 | 1 | 2014-12-20 15:12:12 | 2015-12-20 23:42:11 | 1 |
| 32 | 2 | 2 | 2014-12-20 15:12:14 | 2014-12-20 15:12:12 | 0 |
Now trip can be expressed as the cumulative sum over the new_trip column:
SELECT id, car_id, datetime, sum(new_trip) OVER (ORDER BY datetime) AS trip
FROM (
SELECT id, car_id, prev_car_id, datetime, prev_date
, (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
FROM (
SELECT id, car_id, datetime
, lag(datetime) OVER () AS prev_date
, lag(car_id) OVER () AS prev_car_id
FROM (
SELECT id, car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test ) t1
) t2
) t3
yields
| id | car_id | datetime | trip |
|----+--------+---------------------+------|
| 11 | 1 | 2014-12-20 12:12:12 | 1 |
| 12 | 1 | 2014-12-20 12:12:13 | 1 |
| 13 | 1 | 2014-12-20 12:12:14 | 1 |
| 31 | 2 | 2014-12-20 15:12:12 | 2 |
| 32 | 2 | 2014-12-20 15:12:14 | 2 |
| 23 | 1 | 2015-12-20 23:42:10 | 3 |
| 24 | 1 | 2015-12-20 23:42:11 | 3 |
I used
(CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END)
instead of
(CASE WHEN ((car_id != prev_car_id) OR (datetime-prev_date > '5 seconds'::interval)) THEN 1 ELSE 0 END)
because prev_car_id and prev_date may be NULL. Thus, on the first row, (car_id != prev_car_id) returns NULL when instead we want TRUE.
By expressing the condition in the opposite way, we can identify the unintersting rows correctly:
((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval))
and use the ELSE clause to return 1 when the condition is TRUE or NULL. You can see the difference here:
SELECT id
, (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
, (CASE WHEN ((car_id != prev_car_id) OR (datetime-prev_date > '5 seconds'::interval)) THEN 1 ELSE 0 END) AS new_trip_wrong
, car_id, prev_car_id, datetime, prev_date
FROM (
SELECT id, car_id, datetime
, lag(datetime) OVER () AS prev_date
, lag(car_id) OVER () AS prev_car_id
FROM (
SELECT id, car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test ) t1
) t2
yields
| id | new_trip | new_trip_wrong | car_id | prev_car_id | datetime | prev_date |
|----+----------+----------------+--------+-------------+---------------------+---------------------|
| 11 | 1 | 0 | 1 | | 2014-12-20 12:12:12 | |
| 12 | 0 | 0 | 1 | 1 | 2014-12-20 12:12:13 | 2014-12-20 12:12:12 |
| 13 | 0 | 0 | 1 | 1 | 2014-12-20 12:12:14 | 2014-12-20 12:12:13 |
| 23 | 1 | 1 | 1 | 1 | 2015-12-20 23:42:10 | 2014-12-20 12:12:14 |
| 24 | 0 | 0 | 1 | 1 | 2015-12-20 23:42:11 | 2015-12-20 23:42:10 |
| 31 | 1 | 1 | 2 | 1 | 2014-12-20 15:12:12 | 2015-12-20 23:42:11 |
| 32 | 0 | 0 | 2 | 2 | 2014-12-20 15:12:14 | 2014-12-20 15:12:12 |
Note the difference in the new_trip versus new_trip_wrong columns.

Grouping in t-sql with latest dates

I have a table like this
Event ID | Contract ID | Event date | Amount |
----------------------------------------------
1 | 1 | 2009-01-01 | 100 |
2 | 1 | 2009-01-02 | 20 |
3 | 1 | 2009-01-03 | 50 |
4 | 2 | 2009-01-01 | 80 |
5 | 2 | 2009-01-04 | 30 |
For each contract I need to fetch the latest event and amount associated with the event and get something like this
Event ID | Contract ID | Event date | Amount |
----------------------------------------------
3 | 1 | 2009-01-03 | 50 |
5 | 2 | 2009-01-04 | 30 |
I can't figure out how to group the data correctly. Any ideas?
Thanks in advance.
SQL 2k5/2k8:
with cte_ranked as (
select *
, row_number() over (
partition by ContractId order by EvantDate desc) as [rank]
from [table])
select *
from cte_ranked
where [rank] = 1;
SQL 2k:
select t.*
from table as t
join (
select max(EventDate) as MaxDate
, ContractId
from table
group by ContractId) as mt
on t.ContractId = mt.ContractId
and t.EventDate = mt.MaxDate