When I am executing a below query in DB2 it is not deleting data at once.
I have got 800 records and out of that every 2 records are duplicate and I want to delete 1 record of 2 records so it will leave 400 records in DB.
Below is a sample of RESERVATION_NUMBER.
DELETE
FROM reservation_number
WHERE reservation_id IN (SELECT reservation_id
FROM (SELECT ROW_NUMBER()
OVER() AS RN,
msr1.reservation_number,
msr1.reservation_id,
msr1.used_flag
FROM reservation_number msr1,
reservation_number msr2
WHERE
msr1.reservation_number = msr2.reservation_number
AND msr1.reservation_id <> msr2.reservation_id
ORDER BY msr1.reservation_number)
WHERE Mod (rn, 2) = 0
ORDER BY reservation_number)
This query is deleting complete data if I execute it multiple time. Data is being deleted in below fashion -
400, 168, 076, 038, 019, 003, 001
Would this not be easier?
DELETE FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY RESERVATION_NUMBER
ORDER BY RESERVATION_ID ) AS RN
FROM
RESERVATION_NUMBER
) WHERE RN > 1
I got the fix. I was missing a parameter in OVER().
Here is the right query
DELETE
FROM
RESERVATION_NUMBER
WHERE
RESERVATION_ID IN (
SELECT
RESERVATION_ID
FROM
(
SELECT
ROW_NUMBER() OVER(ORDER BY msr1.RESERVATION_NUMBER) AS RN,
msr1.RESERVATION_NUMBER,
msr1.RESERVATION_ID,
msr1.USED_FLAG
FROM
RESERVATION_NUMBER msr1 ,
RESERVATION_NUMBER msr2
WHERE
msr1.RESERVATION_NUMBER = msr2.RESERVATION_NUMBER
AND msr1.RESERVATION_ID <> msr2.RESERVATION_ID
ORDER BY
msr1.RESERVATION_NUMBER )
WHERE
MOD (RN,2)=1
ORDER BY
RESERVATION_NUMBER )
Related
Quick question, I'm trying to update a column only when there are duplicates(partition column > 1) in the table and have selected it based on partition concept, But the current query updates the whole table! please check the query below, Any leads would be greatly appreciated :)
UPDATE public.database_tag
SET deleted_at= '2022-04-25 19:33:29.087133+00'
FROM (
SELECT *,
row_number() over (partition by title order by created_at) as RN
FROM public.database_tag
ORDER BY RN DESC) X
WHERE X.RN > 1
Thanks very much!
Assuming that every row have unique ID it can be done like below.
UPDATE database_tag
SET deleted_at= '2022-04-25 19:33:29.087133+00'
WHERE <some_unique_id> in (
select <some_unique_id> from (
SELECT <some_unique_id>,
row_number() over (partition by title order by created_at) as RN
FROM public.database_tag
) X
WHERE X.RN > 1
)
Or we can reverse query to update all but set of ID's
UPDATE database_tag
SET deleted_at= '2022-04-25 19:33:29.087133+00'
WHERE <some_unique_id> not in (
select distinct on (title)
<some_unique_id> from database_tag
order by title, created_at
)
I am writing a query in Redshift to answer the question "Give the average lifetime spend of users who spent more on their first order than their second order." This is based off of an order_items table which has one row for every item ordered (so an order with 3 items would be represented in 3 rows). Here's a snapshot of the first 10 rows:
First 10 rows of order_items:
Here is my solution:
with
cte1_lifetime as (
select oi.user_id, sum(oi.sale_price) as lifetime_spend
from order_items as oi
group by oi.user_id
),
cte2_order as (
select oi.user_id, oi.order_id, sum(oi.sale_price) as order_total, rank() over(partition by oi.user_id order by oi.created_at) as order_rank
from order_items as oi
group by oi.user_id, oi.order_id, oi.created_at
order by oi.user_id, oi.order_id
),
cte3_first_order as (
select user_id, order_id, order_total
from cte2_order
where order_rank=1
order by user_id, order_id
),
cte4_second_order as (
select user_id, order_id, order_total
from cte2_order
where order_rank=2
order by user_id, order_id
)
select avg(cte1.lifetime_spend) as average_lifetime_spend
from cte1_lifetime as cte1
where exists (
select *
from cte3_first_order as cte3, cte4_second_order as cte4
where cte3.user_id=cte4.user_id
and cte1.user_id=cte3.user_id
and cte3.order_total > cte4.order_total)
And here is the answer key:
WITH
table1 AS
(SELECT user_id, order_id,
SUM(sale_price) OVER (PARTITION BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as order_total,
RANK() OVER (PARTITION BY user_id ORDER BY created_at) AS "sequence"
FROM order_items)
,
table2 AS
(SELECT user_id, SUM(sale_price) AS lifetime_spend
FROM order_items
WHERE EXISTS
(SELECT t1.user_id
FROM table1 t1, table1 t2
WHERE t1.user_id = t2.user_id AND t1.sequence = 1 AND t2.sequence = 2 AND t1.order_total>t2.order_total
AND t1.user_id = order_items.user_id)
GROUP BY 1
ORDER BY 1)
SELECT AVG(lifetime_spend)
FROM table2
These answers yield slightly different results on the same data- an average lifetime spend of $215 vs $220. I'd really like to understand why they are different but so far I can't figure it out. Any ideas?
I just started learning Postgres, and I'm trying to make an aggregation table that has the columns:
user_id
booking_sequence
booking_created_time
booking_paid_time
booking_price_amount
total_spent
All columns are provided, except for the booking_sequence column. I need to make a query that shows the first five flights of each user that has at least x purchases and has spent more than a certain amount of money, then sort it by the amount of money spent by the user, and then sort it by the booking sequence column.
I've tried :
select user_id,
row_number() over(partition by user_id order by user_id) as booking_sequence,
booking_created_time as booking_created_date,
booking_price_amount,
sum(booking_price_amount) as total_booking_price_amount
from fact_flight_sales
group by user_id, booking_created_time, booking_price_amount
having count(user_id) > 5
and total_booking_price_amount > 1000
order by total_booking_price_amount;
I got 0 when I added count(user_id) > 5, and total_booking_price_amount is not found when I add the second condition in the HAVING clause.
Edit:
I managed to make the code function correctly, for those who are curious:
select x.user_id, row_number() over(partition by x.user_id)
as booking_sequence, x.booking_created_time::date as booking_created_date, x.booking_price_amount,
sum(y.booking_price_amount) as total_booking_price_amount from
(
select user_id, booking_created_time, booking_price_amount from fact_flight_sales
group by user_id, booking_created_time, booking_price_amount
) as x
join
(
select user_id, booking_price_amount
from fact_flight_sales group by user_id, booking_price_amount
) as y
on x.user_id = y.user_id
group by x.user_id, x.booking_created_time, x.booking_price_amount
having count(x.user_id) >= 1 and sum(y.booking_price_amount) >250000
order by total_booking_price_amount desc, booking_sequence asc;
Big thanks to Laurenz for the help!
About count(user_id) > 5:
HAVING is calculated before window functions are evaluated, So result rows excluded by the HAVING clause will not be used to calculate the window function.
About total_booking_price_amount in HAVING:
You cannot use aliases from the SELECT list in the HAVING clause. You will have to repeat the expression (or use a subquery).
I'm using SQL Server 2000 (80). So, it's not possible to use the LAG function.
I have a code a data set with four columns:
Purchase_Date
Facility_no
Seller_id
Sale_id
I need to identify missing Sale_ids. So every sale_id is a 100% sequential, so the should not be any gaps in order.
This code works for a specific date and store if specified. But i need to work on entire data set looping looping through every facility_id and every seller_id for ever purchase_date
declare #MAXCOUNT int
set #MAXCOUNT =
(
select MAX(Sale_Id)
from #table
where
Facility_no in (124) and
Purchase_date = '2/7/2020'
and Seller_id = 1
)
;WITH TRX_COUNT AS
(
SELECT 1 AS Number
union all
select Number + 1 from TRX_COUNT
where Number < #MAXCOUNT
)
select * from TRX_COUNT
where
Number NOT IN
(
select Sale_Id
from #table
where
Facility_no in (124)
and Purchase_Date = '2/7/2020'
and seller_id = 1
)
order by Number
OPTION (maxrecursion 0)
My Dataset
This column:
case when
Sale_Id=0 or 1=Sale_Id-LAG(Sale_Id) over (partition by Facility_no, Purchase_Date, Seller_id)
then 'OK' else 'Previous Missing' end
will tell you which Seller_Ids have some sale missing. If you want to go a step further and have exactly your desired output, then filter out and distinct the 'Previous Missing' ones, and join with a tally table on not exists.
Edit: OP mentions in comments they can't use LAG(). My suggestion, then, would be:
Make a temp table that that has the max(sale_id) group by facility/seller_id
Then you can get your missing results by this pseudocode query:
Select ...
from temptable t
inner join tally N on t.maxsale <=N.num
where not exists( select ... from sourcetable s where s.facility=t.facility and s.seller=t.seller and s.sale=N.num)
> because the only way to "construct" nonexisting combinations is to construct them all and just remove the existing ones.
This one worked out
; WITH cte_Rn AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY Facility_no, Purchase_Date, Seller_id ORDER BY Purchase_Date) AS [Rn_Num]
FROM (
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id
FROM MyTable WITH (NOLOCK)
) a
)
, cte_Rn_0 as (
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id,
-- [Rn_Num] AS 'Skipped Sale'
-- , case when Sale_id = 0 Then [Rn_Num] - 1 Else [Rn_Num] End AS 'Skipped Sale for 0'
, [Rn_Num] - 1 AS 'Skipped Sale for 0'
FROM cte_Rn a
)
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id,
-- [Skipped Sale],
[Skipped Sale for 0]
FROM cte_Rn_0 a
WHERE NOT EXISTS
(
select * from cte_Rn_0 b
where b.Sale_id = a.[Skipped Sale for 0]
and a.Facility_no = b.Facility_no
and a.Purchase_Date = b.Purchase_Date
and a.Seller_id = b.Seller_id
)
--ORDER BY Purchase_Date ASC
Have a table with 3 columns: ID, Signature, and Datetime, and it's grouped by Signature Having Count(*) > 9.
select * from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
I now want to select the 1st and 10th records only, per Signature. What determines rank is the Datetime descending. Thus, I would expect every Signature to have 2 rows.
Thanks,
I would go with a couple of common table expressions.
The first will select all records from the table as well as a count of records per signature, and the second one will select from the first where the record count > 9 and add row_number partitioned by signature - and then just select from that where the row_number is either 1 or 10:
With cte1 AS
(
SELECT ID, Signature, Datetime, COUNT(*) OVER(PARTITION BY Signature) As NumberOfRows
FROM #Sigs
), cte2 AS
(
SELECT ID, Signature, Datetime, ROW_NUMBER() OVER(PARTITION BY Signature ORDER BY DateTime DESC) As Rn
FROM cte1
WHERE NumberOfRows > 9
)
SELECT ID, Signature, Datetime
FROM cte2
WHERE Rn IN (1, 10)
ORDER BY Signature desc
Because I don't know what your data looks like, this might need some adjustment.
The simplest way here, since you already know your sort order (DateTime DESC) and partitioning (Signature), is probably to assign row numbers and then select the rows you want.
SELECT *
FROM
(
select o.Signature
,o.DateTime
,ROW_NUMBER() OVER (PARTITION BY o.Signature ORDER BY o.DateTime DESC) [Row]
from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
)
WHERE [Row] IN (1,10)