How can I retrieve the next row's value in a set? - tsql

I want to iterate on a table and get records from next row ordering by starting date but I want to do this by handling rows id by id for example:
Main Table
ID | StartingDate | PlannedFinishDate
1 2017-01-13 17:48:05.150 2017-01-15 12:00:00.00
1 2017-01-14 14:15:09.000 2017-01-16 12:00:00.00
1 2017-01-16 09:40:30.000 2017-01-18 12:00:00.00
2 2017-02-06 12:00:00.000 2017-02-10 12:00:00.00
2 2017-03-01 13:45:00.000 2017-03-05 12:00:00.00
3 2017-02-09 11:31:16.830 2017-02-11 12:00:00.00
Table after fetching from next row:
ID | StartingDate | PlannedFinishDate
1 2017-01-13 17:48:05.150 2017-01-16 12:00:00.00
1 2017-01-14 14:15:09.000 2017-01-18 12:00:00.00
1 2017-01-16 09:40:30.000 NULL
2 2017-02-06 12:00:00.000 2017-03-05 12:00:00.00
2 2017-03-01 13:45:00.000 NULL
3 2017-02-09 11:31:16.830 NULL

It seems that you don't actually ask about iteration. You seem to want to retrieve the next planned finish date for each starting date.
You can do that easily with the LEAD function. This function returns the next value in a result set. It requires an ORDER BY clause in order to know what the next row is. It also allows you to specify a PARTITION BY clause to partition the result set.
Since you want to find the next planned finish date per ID, you need to PARTITION BY ID :
select
ID,
StartingDate,
LEAD(PlannedFinishDate) OVER (PARTITION BY ID ORDER BY StartingDate ASC)
AS NextFinishDate
FROM SomeTable
If there is no next value, NULL is returned.
This snippet
declare #table table (ID int, StartingDate datetime, PlannedFinishDate datetime)
insert into #table (ID,StartingDate,PlannedFinishDate)
values
( 1,'2017-01-13 17:48:05.150','2017-01-15 12:00:00.008'),
( 1,'2017-01-14 14:15:09.000','2017-01-16 12:00:00.008'),
( 1,'2017-01-16 09:40:30.000','2017-01-18 12:00:00.008'),
( 2,'2017-02-06 12:00:00.000','2017-02-10 12:00:00.008'),
( 2,'2017-03-01 13:45:00.000','2017-03-05 12:00:00.008'),
( 3,'2017-02-09 11:31:16.830','2017-02-11 12:00:00.008')
select
ID,
StartingDate,
LEAD(PlannedFinishDate) OVER (PARTITION BY ID ORDER BY StartingDate ASC)
AS NextFinishDate
FROM #table
Returns :
ID StartingDate NextFinishDate
1 2017-01-13 17:48:05.150 2017-01-16 12:00:00.007
1 2017-01-14 14:15:09.000 2017-01-18 12:00:00.007
1 2017-01-16 09:40:30.000 NULL
2 2017-02-06 12:00:00.000 2017-03-05 12:00:00.007
2 2017-03-01 13:45:00.000 NULL
3 2017-02-09 11:31:16.830 NULL

Related

BigQuery SQL: Group rows with shared ID that occur within 7 days of each other, and return values from most recent occurrence

I have a table of datestamped events that I need to bundle into 7-day groups, starting with the earliest occurrence of each event_id.
The final output should return each bundle's start and end date and 'value' column of the most recent event from each bundle.
There is no predetermined start date, and the '7-day' windows are arbitrary, not 'week of the year'.
I've tried a ton of examples from other posts but none quite fit my needs or use things I'm not sure how to refactor for BigQuery
Sample Data;
Event_Id
Event_Date
Value
1
2022-01-01
010203
1
2022-01-02
040506
1
2022-01-03
070809
1
2022-01-20
101112
1
2022-01-23
131415
2
2022-01-02
161718
2
2022-01-08
192021
3
2022-02-12
212223
Expected output;
Event_Id
Start_Date
End_Date
Value
1
2022-01-01
2022-01-03
070809
1
2022-01-20
2022-01-23
131415
2
2022-01-02
2022-01-08
192021
3
2022-02-12
2022-02-12
212223
You might consider below.
CREATE TEMP FUNCTION cumsumbin(a ARRAY<INT64>) RETURNS INT64
LANGUAGE js AS """
bin = 0;
a.reduce((c, v) => {
if (c + Number(v) > 6) { bin += 1; return 0; }
else return c += Number(v);
}, 0);
return bin;
""";
WITH sample_data AS (
select 1 event_id, DATE '2022-01-01' event_date, '010203' value union all
select 1 event_id, '2022-01-02' event_date, '040506' value union all
select 1 event_id, '2022-01-03' event_date, '070809' value union all
select 1 event_id, '2022-01-20' event_date, '101112' value union all
select 1 event_id, '2022-01-23' event_date, '131415' value union all
select 2 event_id, '2022-01-02' event_date, '161718' value union all
select 2 event_id, '2022-01-08' event_date, '192021' value union all
select 3 event_id, '2022-02-12' event_date, '212223' value
),
binning AS (
SELECT *, cumsumbin(ARRAY_AGG(diff) OVER w1) bin
FROM (
SELECT *, DATE_DIFF(event_date, LAG(event_date) OVER w0, DAY) AS diff
FROM sample_data
WINDOW w0 AS (PARTITION BY event_id ORDER BY event_date)
) WINDOW w1 AS (PARTITION BY event_id ORDER BY event_date)
)
SELECT event_id,
MIN(event_date) start_date,
ARRAY_AGG(
STRUCT(event_date AS end_date, value) ORDER BY event_date DESC LIMIT 1
)[OFFSET(0)].*
FROM binning GROUP BY event_id, bin;

Select previous different value PostgreSQL

I have a table:
id
date
value
1
2022-01-01
1
1
2022-01-02
1
1
2022-01-03
2
1
2022-01-04
2
1
2022-01-05
3
1
2022-01-06
3
I want to detect changing of value column by date:
id
date
value
diff
1
2022-01-01
1
null
1
2022-01-02
1
null
1
2022-01-03
2
1
1
2022-01-04
2
1
1
2022-01-05
3
2
1
2022-01-06
3
2
I tried a window function lag(), but all I got:
id
date
value
diff
1
2022-01-01
1
null
1
2022-01-02
1
1
1
2022-01-03
2
1
1
2022-01-04
2
2
1
2022-01-05
3
2
1
2022-01-06
3
3
I am pretty sure you have to do a gaps-and-islands to "group" your changes.
There may be a more concise way to get the result you want, but this is how I would solve this:
with changes as ( -- mark the changes and lag values
select id, date, value,
coalesce((value != lag(value) over w)::int, 1) as changed_flag,
lag(value) over w as last_value
from a_table
window w as (partition by id order by date)
), groupnums as ( -- number the groups, carrying the lag values forward
select id, date, value,
sum(changed_flag) over (partition by id order by date) as group_num,
last_value
from changes
window w as (partition by id order by date)
) -- final query that uses group numbering to return the correct lag value
select id, date, value,
first_value(last_value) over (partition by id, group_num
order by date) as diff
from groupnums;
db<>fiddle here

Difference between the max date and the penultimate max for specific employee - postgresql

Bit stuck on a problem. Trying to find the difference between two dates in postgreSQL.
I have a table emp with many employees in it:
emp_id, date
1, 31-10-2017
1, 08-08-2017
1, 02-06-2017
I want it to look like this:
emp_id, max_date, penultimate_date, difference
1, 31-10-2017, 08-08-2017, 84 days
Obviously you can use max(date) and group by the emp_id, however how do you retrieve the penultimate date. I have used a few functions like:
order by date desc limit 1 offset 1
I have also tried to put these in sub queries but that hasn,t worked as there are many employee numbers and I need one row for each employee.
Can anyone help???
Thanks,
pp84
as kindly suggested by #Haleemur Ali, order by date desc limit 1 offset 1 would not work with several emp_id:
t=# with d(emp_id, date)as (values(1, '31-10-2017'::date),(1, '08-08-2017'),(1, '02-06-2017' ),(2,'2016-01-01'),(2,'2016-02-02'),(2,'2016-03-03'))
select distinct emp_id
, max(date) over (partition by emp_id) max_date
, nth_value(date,2) over (partition by emp_id) penultimate_date
, max(date) over (partition by emp_id) - nth_value(date,2) over (partition by emp_id) diff
from d
;
emp_id | max_date | penultimate_date | diff
--------+------------+------------------+------
2 | 2016-03-03 | 2016-02-02 | 30
1 | 2017-10-31 | 2017-08-08 | 84
(2 rows)
Time: 0.756 ms
WITH emps (emp_id, date) AS (
VALUES (1, '2017-10-31'::DATE)
, (1, '2017-08-08'::DATE)
, (1, '2017-08-08'::DATE)
)
SELECT DISTINCT ON (emp_id)
emp_id
, "date" max_date
, LEAD("date") OVER w penultimate_date
, "date" - LEAD("date") OVER w difference
FROM emps
WINDOW w AS (PARTITION BY emp_id)
ORDER BY emp_id, date DESC
When ordered in descending order, the LEAD("date") w will give the value of the date value from the next row.
The DISTINCT ON limits the resultset to 1 row (the first row encountered) per emp_id.
With our ordering this first row must contain the greatest date, and the LEAD(...) over w therefore returns the penultimate date. This gives us the following result:
emp_id | max_date | penultimate_date | difference
--------+------------+------------------+------------
1 | 2017-10-31 | 2017-08-08 | 84
(1 row)

Difference of dates using lag function postgres

I have customer ID and transaction Date(yyyy-mm-dd) as shown below
Cust_id Trans_date
1 2017-01-01
1 2017-01-03
1 2017-01-06
2 2017-01-01
2 2017-01-04
2 2017-01-05
I need to find the difference in no_of_days for each transaction grouped at Cust_id
I tried with date_diff and extract using lag function, but I am getting error
function lag(timestamp without time zone) may only be called as a window function
I looking for the result as below
Cust_id Trans_date difference
1 2017-01-01 0
1 2017-01-03 3
1 2017-01-05 2
2 2017-01-01 0
2 2017-01-04 4
2 2017-01-05 1
How to find the difference in postgreSQL?
This is what you want?
with t(Cust_id,Trans_date) as(
select 1 ,'2017-01-01'::timestamp union all
select 1 ,'2017-01-03'::timestamp union all
select 1 ,'2017-01-06'::timestamp union all
select 2 ,'2017-01-01'::timestamp union all
select 2 ,'2017-01-04'::timestamp union all
select 2 ,'2017-01-05'::timestamp
)
select
Cust_id,
Trans_date,
coalesce(Trans_date::date - lag(Trans_date::date) over(partition by Cust_id order by Trans_date), 0) as difference
from t;

How to generate a date to be included in UNPIVOT results without a loop?

Say I had an example like so, where Im transposing columns into rows with UNPIVOT.
DECLARE #pvt AS TABLE (VendorID int, Emp1 int, Emp2 int, Emp3 int, Emp4 int, Emp5 int);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (1,4,3,5,4,4);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (2,4,1,5,5,5);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (3,4,3,5,4,4);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (4,4,2,5,5,4);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (5,5,1,5,5,5);
--Unpivot the table.
SELECT VendorID, Employee, Orders
FROM
(SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
FROM #pvt) p
UNPIVOT
(Orders FOR Employee IN
(Emp1, Emp2, Emp3, Emp4, Emp5)
)AS unpvt;
GO
Which produces results like this
VendorID Employee Orders
1 Emp1 4
1 Emp2 3
1 Emp3 5
1 Emp4 4
1 Emp5 4
2 Emp1 4
2 Emp2 1
2 Emp3 5
2 Emp4 5
2 Emp5 5
3 Emp1 4
3 Emp2 3
3 Emp3 5
3 Emp4 4
3 Emp5 4
However, I want to include an "incremental date like so that it repeats in a group for each Vendor and the results would be like this
VendorID Employee Orders OrderDate
1 Emp1 4 01/01/2014
1 Emp2 3 02/01/2014
1 Emp3 5 03/01/2014
1 Emp4 4 04/01/2014
1 Emp5 4 05/01/2014
2 Emp1 4 ..
2 Emp2 1
2 Emp3 5
2 Emp4 5
2 Emp5 5
3 Emp1 4
3 Emp2 3
3 Emp3 5
3 Emp4 4
3 Emp5 4
The kicker is that I want to try to do this without resorting to a loop since the transposed results are going to be about 100K records. Is there a way to generate that date field like that without looping over the results?
[edit]
I think, but not sure yet, that [this]1 post might help, using ROW NUMBER
You can use:
Dateadd(DAY, row_number() over( partition by VendorId Order by Employee), #stardate)
According to your example you can partition by vendorId and order by Employee. But you can change just like a regular order by.