Postgres update table using CTE with priority - postgresql

I have a employee leave balance table as follows
emp_code
leave_type
yearmonth
Balance
Priority
1
PL
202205
2
0
1
SL
202205
1
1
2
PL
202205
3
0
2
SL
202205
1
1
3
PL
202205
1
0
3
SL
202205
1
1
and a Attendance Table as follows
emp_code
date
yearmonth
Attendance
Leave
3
2022-05-01
202205
1
3
2022-05-02
202205
1
3
2022-05-03
202205
1
1
2022-05-01
202205
0
1
2022-05-02
202205
0
1
2022-05-03
202205
0
1
2022-05-04
202205
0
2
2022-05-01
202205
1
2
2022-05-02
202205
1
I just wanted to update the attendance table with the respective leave (based on the priority and availability) if the attendance field value is 0
For eg: employee 1 have 3 leave balance and 4 days absent
After the update, the records for emp_code 1 in attendance should be as follows
emp_code
date
yearmonth
Attendance
Leave
1
2022-05-01
202205
0
PL
1
2022-05-02
202205
0
PL
1
2022-05-03
202205
0
SL
1
2022-05-04
202205
0
I know, we can do this through SP or function. But my company policy does not allow me to create SP or functions (I can update this via my backend code, but there are millions of records there to be updated so I am worried about the performance)
I wonder, is there any ways to achieve this in PG using CTE/Window function/any other means ?
here is a fiddle https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/4952
Thanks

If the last attendance record in your fiddle is in error and removed, then my approach would be:
Expand the leave balance for the month into rows using generate_series() and assign row numbers based on the priority
Assign row numbers to absences within a month
Calculate changes by left join from absences to leave records
with leave_rows as (
select b.*,
row_number() over (partition by emp_code, yearmonth
order by priority) as use_order
from emp_leave_balance b
cross join lateral generate_series(1, b.balance, 1)
), absence_rows as (
select a.*,
row_number() over (partition by emp_code, yearmonth
order by date) as use_order
from attendance a
where attendance = 0
), changes as (
select a.emp_code, a.date, a.yearmonth, a.attendance, l.leave_type
from absence_rows a
left join leave_rows l
on (l.emp_code, l.yearmonth, l.use_order) =
(a.emp_code, a.yearmonth, a.use_order)
)
update attendance
set leave = c.leave_type
from changes c
where (c.emp_code, c.date) = (attendance.emp_code, attendance.date)
;
Your updated fiddle

Related

query customer retention over range

I am trying to find the best way to accomplish the following.
Get the beginning customer count, which carries from the previous day
Get New Customer count
Get the number of Customers who have not come in since the prior month
Get the number of Customers who have come back after lapsing
Get the number of total customers
The following example
Customer ID
Store ID
Date
Amount
1
1
1/2/22
1.00
2
2
1/2/22
2.00
1
1
2/2/22
1.00
3
2
3/2/22
1.00
2
2
3/2/22
1.00
1
1
3/2/22
1.00
1
1
4/2/22
1.00
4
1
4/2/22
1.00
2
2
4/2/22
1.00
The result would be
Date
Store
Beginning
New
Dropped
Returned
Total
1/2/22
1
0
1
0
0
1
1/2/22
2
0
1
0
0
1
2/2/22
1
1
0
0
0
1
2/2/22
2
1
0
1
0
0
3/2/22
1
1
0
0
0
1
3/2/22
2
0
1
0
1
2
4/2/22
1
1
1
0
0
2
4/2/22
2
2
0
1
0
1
I kind of have a query, but it's not getting the right results
WITH customerset AS (
SELECT
location_id,
date,
array_agg(DISTINCT customer_id ORDER BY customer_id ASC) AS customer_ids
FROM customer_orders
GROUP BY
location_id,
date
)
SELECT
cset.location_id,
cset.date,
array_length(cset2.customer_ids, 1) AS beginning,
array_length((past2.customer_ids - cset.customer_ids), 1) AS dropped,
array_length((cset.customer_ids - past2.customer_ids), 1) AS returned
FROM
(
SELECT
ords.location_id,
ords.date,
array_agg(DISTINCT ords.customer_id ORDER BY ords.customer_id ASC) AS customers_id
FROM customer_orders ords
GROUP BY
ords.location_id,
ords.date
) cset
JOIN
customerset cset2 ON cset.date - '1 month'::interval = cset2.date
AND cset2.location_id = cset.location_id
GROUP BY
cset.location_id,
cset.date,
cset2.customer_ids,
cset.customer_ids
ORDER BY
cset.date ASC

Difference of dates using lag function postgres

I have customer ID and transaction Date(yyyy-mm-dd) as shown below
Cust_id Trans_date
1 2017-01-01
1 2017-01-03
1 2017-01-06
2 2017-01-01
2 2017-01-04
2 2017-01-05
I need to find the difference in no_of_days for each transaction grouped at Cust_id
I tried with date_diff and extract using lag function, but I am getting error
function lag(timestamp without time zone) may only be called as a window function
I looking for the result as below
Cust_id Trans_date difference
1 2017-01-01 0
1 2017-01-03 3
1 2017-01-05 2
2 2017-01-01 0
2 2017-01-04 4
2 2017-01-05 1
How to find the difference in postgreSQL?
This is what you want?
with t(Cust_id,Trans_date) as(
select 1 ,'2017-01-01'::timestamp union all
select 1 ,'2017-01-03'::timestamp union all
select 1 ,'2017-01-06'::timestamp union all
select 2 ,'2017-01-01'::timestamp union all
select 2 ,'2017-01-04'::timestamp union all
select 2 ,'2017-01-05'::timestamp
)
select
Cust_id,
Trans_date,
coalesce(Trans_date::date - lag(Trans_date::date) over(partition by Cust_id order by Trans_date), 0) as difference
from t;

One SQL Stored Procedure to get cut off date of two different cut off date format

I have one system that read from two client databases. For the two clients, both of them have different format of cut off date:
1) Client A: Every month at 15th. Example: 15-12-2016.
2) Client B: Every first day of the month. Example: 1-1-2017.
The cut off date are stored in the table as below:
Now I need a single query to retrieve the current month's cut off date of the client. For instance, today is 15-2-2017, so the expected cut off date for both clients should be as below:
1) Client A: 15-1-2017
2) Client B: 1-2-2017
How can I accomplish this in a single Stored Procedure? For client B, I can always get the first day of the month. But this can't apply to client A since their cut off is last month's date.
Might be something like this you are looking for:
DECLARE #DummyClient TABLE(ID INT IDENTITY,ClientName VARCHAR(100));
DECLARE #DummyDates TABLE(ClientID INT,YourDate DATE);
INSERT INTO #DummyClient VALUES
('A'),('B');
INSERT INTO #DummyDates VALUES
(1,{d'2016-12-15'}),(2,{d'2017-01-01'});
WITH Numbers AS
( SELECT 0 AS Nr
UNION ALL SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 9
UNION ALL SELECT 10
UNION ALL SELECT 11
UNION ALL SELECT 12
UNION ALL SELECT 13
UNION ALL SELECT 14
UNION ALL SELECT 15
UNION ALL SELECT 16
UNION ALL SELECT 17
UNION ALL SELECT 18
UNION ALL SELECT 19
UNION ALL SELECT 20
UNION ALL SELECT 21
UNION ALL SELECT 22
UNION ALL SELECT 23
UNION ALL SELECT 24
)
,ClientExt AS
(
SELECT c.*
,MIN(d.YourDate) AS MinDate
FROM #DummyClient AS c
INNER JOIN #DummyDates AS d ON c.ID=d.ClientID
GROUP BY c.ID,c.ClientName
)
SELECT ID,ClientName,D
FROM ClientExt
CROSS APPLY(SELECT DATEADD(MONTH,Numbers.Nr,MinDate)
FROM Numbers) AS RunningDate(D);
The result
ID Cl Date
1 A 2016-12-15
1 A 2017-01-15
1 A 2017-02-15
1 A 2017-03-15
1 A 2017-04-15
1 A 2017-05-15
1 A 2017-06-15
1 A 2017-07-15
1 A 2017-09-15
1 A 2017-10-15
1 A 2017-11-15
1 A 2017-12-15
1 A 2018-01-15
1 A 2018-02-15
1 A 2018-03-15
1 A 2018-04-15
1 A 2018-05-15
1 A 2018-06-15
1 A 2018-07-15
1 A 2018-08-15
1 A 2018-09-15
1 A 2018-10-15
1 A 2018-11-15
1 A 2018-12-15
2 B 2017-01-01
2 B 2017-02-01
2 B 2017-03-01
2 B 2017-04-01
2 B 2017-05-01
2 B 2017-06-01
2 B 2017-07-01
2 B 2017-08-01
2 B 2017-10-01
2 B 2017-11-01
2 B 2017-12-01
2 B 2018-01-01
2 B 2018-02-01
2 B 2018-03-01
2 B 2018-04-01
2 B 2018-05-01
2 B 2018-06-01
2 B 2018-07-01
2 B 2018-08-01
2 B 2018-09-01
2 B 2018-10-01
2 B 2018-11-01
2 B 2018-12-01
2 B 2019-01-01

How to insert row data between consecutive dates in HIVE?

Sample Data:
customer txn_date tag
A 1-Jan-17 1
A 2-Jan-17 1
A 4-Jan-17 1
A 5-Jan-17 0
B 3-Jan-17 1
B 5-Jan-17 0
Need to fill every missing txn_date between date range (1-Jan-17 to 5-Jan-2017). Just like below:
Output should be:
customer txn_date tag
A 1-Jan-17 1
A 2-Jan-17 1
A 3-Jan-17 0 (inserted)
A 4-Jan-17 1
A 5-Jan-17 0
B 1-Jan-17 0 (inserted)
B 2-Jan-17 0 (inserted)
B 3-Jan-17 1
B 4-Jan-17 0 (inserted)
B 5-Jan-17 0
select c.customer
,d.txn_date
,coalesce(t.tag,0) as tag
from (select date_add (from_date,i) as txn_date
from (select date '2017-01-01' as from_date
,date '2017-01-05' as to_date
) p
lateral view
posexplode(split(space(datediff(p.to_date,p.from_date)),' ')) pe as i,x
) d
cross join (select distinct
customer
from t
) c
left join t
on t.customer = c.customer
and t.txn_date = d.txn_date
;
c.customer d.txn_date tag
A 2017-01-01 1
A 2017-01-02 1
A 2017-01-03 0
A 2017-01-04 1
A 2017-01-05 0
B 2017-01-01 0
B 2017-01-02 0
B 2017-01-03 1
B 2017-01-04 0
B 2017-01-05 0
Just have the delta content i.e the missing data in a file(input.txt) delimited with the same delimiter you have mentioned when you created the table.
Then use the load data command to insert this records into the table.
load data local inpath '/tmp/input.txt' into table tablename;
Your data wont be in the order you have mentioned , it would get appended to the last. You could retrieve the order by adding order by txn_date in the select query.

Counting dates that fall between two dates in the same column

I have two tables and for each ID and Level combination in table1, I need to get a count of times matching ID appears in table2 in between sequential times for levels in table1.
So for example, for ID = 1 and Level=1 in table1, two Time entries from table2 for ID=1 fall between Time of Level=1 and Level=2 in table1, so result will be 2 in the result table.
table1:
ID Level Time
1 1 6/7/13 7:03
1 2 6/9/13 7:05
1 3 6/12/13 12:02
1 4 6/17/13 5:01
2 1 6/18/13 8:38
2 3 6/20/13 9:38
2 4 6/23/13 10:38
2 5 6/28/13 1:38
table2:
ID Time
1 6/7/13 11:51
1 6/7/13 14:15
1 6/9/13 16:39
1 6/9/13 19:03
2 6/20/13 11:02
2 6/20/13 15:50
Result would be
ID Level Count
1 1 2
1 2 2
1 3 0
1 4 0
2 1 0
2 3 2
2 4 0
2 5 0
select transformed_tab1.id, transformed_tab1.level, count(tab2.id)
from
(select tab1.id, tab1.level, tm, lead(tm) over (partition by id order by tm) as next_tm
from
(
select 1 as id, 1 as level, '2013-06-07 07:03'::timestamp as tm union
select 1 as id, 2 as level, '2013-06-09 07:05 '::timestamp as tm union
select 1 as id, 3 as level, '2013-06-12 12:02'::timestamp as tm union
select 1 as id, 4 as level, '2013-06-17 05:01'::timestamp as tm union
select 2 as id, 1 as level, '2013-06-18 08:38'::timestamp as tm union
select 2 as id, 3 as level, '2013-06-20 09:38'::timestamp as tm union
select 2 as id, 4 as level, '2013-06-23 10:38'::timestamp as tm union
select 2 as id, 5 as level, '2013-06-28 01:38'::timestamp as tm) tab1
) transformed_tab1
left join
(select 1 as id, '2013-06-07 11:51'::timestamp as tm union
select 1 as id, '2013-06-07 14:15'::timestamp as tm union
select 1 as id, '2013-06-09 16:39'::timestamp as tm union
select 1 as id, '2013-06-09 19:03'::timestamp as tm union
select 2 as id, '2013-06-20 11:02'::timestamp as tm union
select 2 as id, '2013-06-20 15:50'::timestamp as tm) tab2
on transformed_tab1.id=tab2.id and tab2.tm between transformed_tab1.tm and transformed_tab1.next_tm
group by transformed_tab1.id, transformed_tab1.level
order by transformed_tab1.id, transformed_tab1.level
;
SQL Fiddle
select t1.id, level, count(t2.id)
from
(
select id, level,
tsrange(
"time",
lead("time", 1, 'infinity') over(
partition by id order by level
),
'[)'
) as time_range
from t1
) t1
left join
t2 on t1.id = t2.id and t1.time_range #> t2."time"
group by t1.id, level
order by t1.id, level
The solution starts creating a range of timestamps using the lead window function. Notice the [) parameter to the tsrange constructor. It means to include the lower and exclude the upper bound.
Then it joins the two tables with the #> range operator. It means the range includes the element.
It is necessary to left join t1 to have the zero counts.