how to update a Db2 table row and delete an existing row? - db2

DB2 Table ZB_BILL_ERR :
PROCESS_DATE CURR_PROCESS_DT ACCOUNT_NUMBER SEQ_NUM ERROR_REASON
07/14/2013 07/14/2013 A123456789 1 Trancode Invalid
07/15/2013 07/15/2013 B987654321 1 Adjustment code invalid
07/16/2013 07/16/2013 A123456789 2 Multi Single ind invalid
Expected Output :
PROCESS_DATE CURR_PROCESS_DT ACCOUNT_NUMBER SEQ_NUM ERROR_REASON
07/15/2013 07/15/2013 B987654321 1 Adjustment code invalid
07/14/2013 07/16/2013 A123456789 2 Multi Single ind invalid
The process date of the latest row of A123456789 will have the oldest process date of the account in the table, in this case which is 07/14/2013 And delete the oldest row of A123456789

To identify the most recent record, use
row_number() over (partition by account_number
order by curr_processs_dt descending
) as aging
which will rank your rows in descending date order within an account. Pick rows where aging = 1 to get the most recent.
To get the earliest date for an account:
select account_number, min(process_dt) as first_processed
from input
group by account_number
So let's put this together into one (somewhat complex) SQL statement
MERGE into ZB_BILL_ERR m
USING (
with g as
(
select account_number
, min(process_dt) as first_processed
from ZB_BILL_ERR
group by account_number
)
select i.*
, row_number() over (partition by account_number
order by curr_processs_dt descending
) as aging
, first_processed
from g
join ZB_BILL_ERR i on i.account_number = g.account_number
) as x
ON m.account_number = x.accout_number
WHEN MATCHED and x.aging = 1 THEN
UPDATE process_dt = x.first_processed
WHEN MATCHED and x.aging > 1 THEN
DELETE
;

Related

How to find the average of the three maximum values in a specific group in a moving window in Big Query?

I have a data set as in the table below. I want to find the average of the maximum three values in a rolling 12 month window grouped by id.
id date value
id1 2020/01/01 500
id1 2021/02/01 300
id1 2021/03/01 150
id1 2021/08/01 100
id1 2021/12/01 400
id2 2020/01/01 50
id2 2020/02/01 900
id2 2021/12/01 100
So my expected output is:
id date value
id1 2020/01/01 500
id1 2021/02/01 300
id1 2021/03/01 225
id1 2021/08/01 183.33
id1 2021/12/01 283.33
id2 2020/01/01 50
id2 2020/02/01 500
id2 2021/12/01 100
I.e. for id1 2021/12/01: (400+300+150)/3 = 283.33 which is the average of the three largest values in a rolling 12 month window for group ID1.
I managed to get to this point:
CREATE TEMP FUNCTION avg_array(arr ANY TYPE) AS ((
SELECT AVG(val) FROM(
SELECT val FROM UNNEST(arr) val ORDER BY val DESC LIMIT 3)
)
);
SELECT id, date, avg_array(val_arr)
FROM (
SELECT
id, date, ARRAY_AGG(value) OVER (
PARTITION BY id
ORDER BY id, date DESC ROWS BETWEEN CURRENT ROW AND 11 FOLLOWING
) as val_arr
FROM `table` )
Which works, but I feel like there must be a better way to do this. Specifically, I can't figure out how to get the average of the maximum three from the OVER as well rather than creating a seperate function.
(If not possible to combine date window with finding maximum values, it would also be useful for me to know how to find the average of the maximum three in any group by group without creating a seperate function)
`
In your code, the year of the date in the “PARTITION BY id,EXTRACT(YEAR FROM date) “ statement is missing.
CREATE TEMP FUNCTION avg_array(arr ANY TYPE) AS ((
SELECT AVG(val) FROM(
SELECT val FROM UNNEST(arr) val ORDER BY val DESC LIMIT 3))
);
SELECT id, date, avg_array(val_arr)
FROM (
SELECT
id, date, ARRAY_AGG(value) OVER (
PARTITION BY id,EXTRACT(YEAR FROM date)
ORDER BY id, date DESC ROWS BETWEEN CURRENT ROW AND 11 FOLLOWING
) as val_arr
FROM `table` )
order by id,date asc
Here, you can see a sample code to get the maximum 3 numbers of a group:
select id,AVG(value) as vg from (
select id,date,value from (
select id, date, value from `table`
order by value desc) a limit 3
) b group by id
You can see more information about over function in this link.
Consider below approach
select id, date,
(select round(avg(value), 2) from (
select value from t.arr value
order by value desc
limit 3
)) value
from (
select *, array_agg(value) over last_12_month arr from table
window last_12_month as (partition by id
order by 12 * (extract(year from date)) + extract(month from date)
range between 11 preceding and current row
)
) t
if applied to sample data in your question - output is

Can lead() return the next row only when a condition is met?

Recently my company upgraded from SQL Server 2008 to 2016, so I want to take advantage of some "new" features, one of which is lead().
I understand the basic usage, but I want to know if I can return the next row only when a condition is met. My original query looked like the following, where x.next_id is null if the next row isn't more than 12 days past the current row.
SELECT
a.id,
a.date_a,
x.next_id
FROM
table a
OUTER APPLY
(SELECT TOP 1
next_id = i.intIndex
FROM
table i
WHERE
i.date_a > DATEADD(DAY, 12, a.date_a)
ORDER BY
date_a, id ASC) x
ORDER BY
date_a, id ASC
Data might look like the following, where the third column is added by the query:
id date_a next_id
--------------------------------
1798678 2014-12-01 NULL
1798689 2013-01-05 1798688
1798688 2014-03-31 NULL
1798696 2013-04-03 1798694
1798694 2013-08-12 1798691
1798691 2014-09-30 NULL
1798698 2013-05-14 1798697
1798697 2013-08-29 NULL
Assuming this data set (your result table; minus the result column):
CREATE TABLE some_table(id INT PRIMARY KEY,date_a DATE);
INSERT INTO some_table(id,date_a)
VALUES (1798678,'2014-12-01'),
(1798689,'2013-01-05'),
(1798688,'2014-03-31'),
(1798696,'2013-04-03'),
(1798694,'2013-08-12'),
(1798691,'2014-09-30'),
(1798698,'2013-05-14'),
(1798697,'2013-08-29');
This query returns the same result set as what the query you have returns:
SELECT
id,
date_a,
next_id=
CASE WHEN LEAD(date_a) OVER (ORDER BY date_a,id)>DATEADD(DAY,12,date_a)
THEN LEAD(id) OVER (ORDER BY date_a,id)
ELSE NULL
END
FROM
some_table
ORDER BY
date_a,id;

PostgreSQL: set a column with the ordinal of the row sorted via another field

I have a table segnature describing an item with a varchar field deno and a numeric field ord. A foreign key fk_collection tells which collection the row is part of.
I want to update field ord so that it contains the ordinal of that row per each collection, sorted by field deno.
E.g. if I have something like
[deno] ord [fk_collection]
abc 10
aab 10
bcd 10
zxc 20
vbn 20
Then I want a result like
[deno] ord [fk_collection]
abc 1 10
aab 0 10
bcd 2 10
zxc 1 20
vbn 0 20
I tried with something like
update segnature s1 set ord = (select count(*)
from segnature s2
where s1.fk_collection=s2.fk_collection and s2.deno<s1.deno
)
but query is really slow: 150 collections per about 30000 items are updated in 10 minutes about.
Any suggestion to speed up the process?
Thank you!
You can use a window function to generate the "ordinal" number:
with numbered as (
select deno, fk_collection,
row_number() over (partition by fk_collection order by deno) as rn,
ctid as id
from segnature
)
update segnature
set ord = n.rn
from numbered n
where n.id = segnature.ctid;
This uses the internal column ctid to uniquely identify each rows. The ctid comparison is quite slow, so if you have a real primary (or unique) key in that table, use that column instead.
Alternatively without the common table expression:
update segnature
set ord = n.rn
from (
select deno, fk_collection,
row_number() over (partition by fk_collection order by deno) as rn,
ctid as id
from segnature
) as n
where n.id = segnature.ctid;
SQLFiddle example: http://sqlfiddle.com/#!15/e997f/1

PostgreSQL: getting ordinal rank (row index? ) efficiently

You have a table like so:
id dollars dollars_rank points points_rank
1 20 1 35 1
2 18 2 30 3
3 10 3 33 2
I want a query that updates the table's rank columns (dollars_rank and points_rank) to set the rank for the given ID, which is just the row's index for that ID sorted by the relevant column in a descending order. How best to do this in PostgreSQL?
The window function dense_rank() is what you need - or maybe rank(). The UPDATE could look like this:
UPDATE tbl
SET dollars_rank = r.d_rnk
, points_rank = r.p_rnk
FROM (
SELECT id
, dense_rank() OVER (ORDER BY dollars DESC NULLS LAST) AS d_rnk
, dense_rank() OVER (ORDER BY points DESC NULLS LAST) AS p_rnk
FROM tbl
) r
WHERE tbl.id = r.id;
fiddle
NULLS LAST is only relevant if the involved columns can be NULL:
Sort by column ASC, but NULL values first?

Date Range between the Last 2 Records Sql Server 2008

Hi Given that I have a table with 2 columns.
Table Booking
Column Amount-TransactionDate
Get me total Amount between Last 2 transactionDate.
How do you do that ?How do you get the last transaction but 01
Any suggestions?
You can use a common table expression (CTE) to assign a sequence number to each row based on descending order of the transaction date. And then select the rows with a filter to get the last 2 rows.
This query displays the last two transactions in the table
WITH BookingCTE AS (
SELECT ROW_NUMBER() OVER (ORDER BY TransactionDate DESC) as Sequence,
Amount, TransactionDate
FROM Booking
)
SELECT Sequence, Amount, TransactionDate
FROM BookingCTE
WHERE Sequence <= 2
;
This query give you the total amount for the last two transactions.
WITH BookingCTE AS (
SELECT ROW_NUMBER() OVER (ORDER BY TransactionDate DESC) as Sequence, Amount, TransactionDate
FROM Booking
)
SELECT SUM(Amount) AS TotalAmount
FROM BookingCTE
WHERE Sequence <= 2
;