Update a deleted_at column on partition in PostgreSQL - postgresql

Quick question, I'm trying to update a column only when there are duplicates(partition column > 1) in the table and have selected it based on partition concept, But the current query updates the whole table! please check the query below, Any leads would be greatly appreciated :)
UPDATE public.database_tag
SET deleted_at= '2022-04-25 19:33:29.087133+00'
FROM (
SELECT *,
row_number() over (partition by title order by created_at) as RN
FROM public.database_tag
ORDER BY RN DESC) X
WHERE X.RN > 1
Thanks very much!

Assuming that every row have unique ID it can be done like below.
UPDATE database_tag
SET deleted_at= '2022-04-25 19:33:29.087133+00'
WHERE <some_unique_id> in (
select <some_unique_id> from (
SELECT <some_unique_id>,
row_number() over (partition by title order by created_at) as RN
FROM public.database_tag
) X
WHERE X.RN > 1
)
Or we can reverse query to update all but set of ID's
UPDATE database_tag
SET deleted_at= '2022-04-25 19:33:29.087133+00'
WHERE <some_unique_id> not in (
select distinct on (title)
<some_unique_id> from database_tag
order by title, created_at
)

Related

simpler query for counting total row in the column

not sure if my below query script right to execute as I am trying to find just one query in Oracle SQL query
select distinct (master_id) , last_name from (
select q2.* , max (count_a) over (partition by master_id)
count_b from ( select q1.* , count (*) over (partition by
master_id order by purchased_date desc ) count_a from profile q1)
q2) where count_b > 2
I am trying to minimise timer to execute get data by reducing sub query
for example above it has two subqueries
max (count_a) over (partition by master_id) count_b
count (*) over (partition by master_id order by purchased_date desc ) count_a
so I played around until this query
max (count (*)) over (partition by master_id) count
SQL query script;
select * from profile a
join( select * from (
select master_id, max (count(*)) over (partition by
master_id) count from profile) where count >2) b
ON a. master_id = b. master_id
Thank you in advance for your help

Optimizing row_number to not scan all table

I have a table created as
CREATE TABLE T0
(
id text,
kind_of_datetime text,
... 10 more text fields
),
PRIMARY KEY(id, kind_of_datetime)
It is about 31M rows with about 800K of unique id values and 13K unique kind_of_datetime values.
I want to make query
SELECT *
FROM (
SELECT
*,
ROW_NUMBER() over(PARTITION BY id ORDER BY kind_of_datetime DESC) as rn_col
FROM TO
WHERE kind_of_datetime <= 'some_value'
) as tmp
WHERE rn_col = 1
It makes WindowAgg with actual reading of all table + sorting and works really long (minutes).
I tried to make index
CREATE INDEX index_name ON T0 (id, kind_of_datetime DESC NULLS LAST)
and in works better but only if final select consists of two key columns id + kind_of_datetime. Otherwise it's always fullscan.
Maybe I should change a way of storing data? Or create some other index?
What I don't want to do is to add UNCLUDE 10 other columns because it will take much RAM.
Try a subquery:
SELECT *,
ROW_NUMBER() over(PARTITION BY id ORDER BY kind_of_datetime DESC) as rn_col
FROM (SELECT * FROM tab
WHERE kind_of_datetime <= 'some_value'
) AS t
That will definitely apply the filter first.

Reset increment in PostgreSQL

I just started learning Postgres, and I'm trying to make an aggregation table that has the columns:
user_id
booking_sequence
booking_created_time
booking_paid_time
booking_price_amount
total_spent
All columns are provided, except for the booking_sequence column. I need to make a query that shows the first five flights of each user that has at least x purchases and has spent more than a certain amount of money, then sort it by the amount of money spent by the user, and then sort it by the booking sequence column.
I've tried :
select user_id,
row_number() over(partition by user_id order by user_id) as booking_sequence,
booking_created_time as booking_created_date,
booking_price_amount,
sum(booking_price_amount) as total_booking_price_amount
from fact_flight_sales
group by user_id, booking_created_time, booking_price_amount
having count(user_id) > 5
and total_booking_price_amount > 1000
order by total_booking_price_amount;
I got 0 when I added count(user_id) > 5, and total_booking_price_amount is not found when I add the second condition in the HAVING clause.
Edit:
I managed to make the code function correctly, for those who are curious:
select x.user_id, row_number() over(partition by x.user_id)
as booking_sequence, x.booking_created_time::date as booking_created_date, x.booking_price_amount,
sum(y.booking_price_amount) as total_booking_price_amount from
(
select user_id, booking_created_time, booking_price_amount from fact_flight_sales
group by user_id, booking_created_time, booking_price_amount
) as x
join
(
select user_id, booking_price_amount
from fact_flight_sales group by user_id, booking_price_amount
) as y
on x.user_id = y.user_id
group by x.user_id, x.booking_created_time, x.booking_price_amount
having count(x.user_id) >= 1 and sum(y.booking_price_amount) >250000
order by total_booking_price_amount desc, booking_sequence asc;
Big thanks to Laurenz for the help!
About count(user_id) > 5:
HAVING is calculated before window functions are evaluated, So result rows excluded by the HAVING clause will not be used to calculate the window function.
About total_booking_price_amount in HAVING:
You cannot use aliases from the SELECT list in the HAVING clause. You will have to repeat the expression (or use a subquery).

Selecting the 1st and 10th Records Only

Have a table with 3 columns: ID, Signature, and Datetime, and it's grouped by Signature Having Count(*) > 9.
select * from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
I now want to select the 1st and 10th records only, per Signature. What determines rank is the Datetime descending. Thus, I would expect every Signature to have 2 rows.
Thanks,
I would go with a couple of common table expressions.
The first will select all records from the table as well as a count of records per signature, and the second one will select from the first where the record count > 9 and add row_number partitioned by signature - and then just select from that where the row_number is either 1 or 10:
With cte1 AS
(
SELECT ID, Signature, Datetime, COUNT(*) OVER(PARTITION BY Signature) As NumberOfRows
FROM #Sigs
), cte2 AS
(
SELECT ID, Signature, Datetime, ROW_NUMBER() OVER(PARTITION BY Signature ORDER BY DateTime DESC) As Rn
FROM cte1
WHERE NumberOfRows > 9
)
SELECT ID, Signature, Datetime
FROM cte2
WHERE Rn IN (1, 10)
ORDER BY Signature desc
Because I don't know what your data looks like, this might need some adjustment.
The simplest way here, since you already know your sort order (DateTime DESC) and partitioning (Signature), is probably to assign row numbers and then select the rows you want.
SELECT *
FROM
(
select o.Signature
,o.DateTime
,ROW_NUMBER() OVER (PARTITION BY o.Signature ORDER BY o.DateTime DESC) [Row]
from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
)
WHERE [Row] IN (1,10)

Update Postgresql table using rank()

I'm trying to update a column (pop_1_rank) in a postgresql table with the results from a rank() like so:
UPDATE database_final_form_merge
SET
pop_1_rank = r.rnk
FROM (
SELECT pop_1, RANK() OVER ( ORDER BY pop_1 DESC) FROM database_final_form_merge WHERE territory_name != 'north' AS rnk)r
The SELECT query by itself works fine, but I just can't get it to update correctly. What am I doing wrong here?
I rather use the CTE notation.
WITH cte as (
SELECT pop_1,
RANK() OVER ( ORDER BY pop_1 DESC) AS rnk
FROM database_final_form_merge
WHERE territory_name <> 'north'
)
UPDATE database_final_form_merge
SET pop_1_rank = cte.rnk
FROM cte
WHERE database_final_form_merge.pop_1 = cte.pop_1
As far as I know, Postgres updates tables not subqueries. So, you can join back to the table:
UPDATE database_final_form_merge
SET pop_1_rank = r.rnk
FROM (SELECT pop_1, RANK() OVER ( ORDER BY pop_1 DESC) as rnk
FROM database_final_form_merge
WHERE territory_name <> 'north'
) r
WHERE database_final_form_merge.pop_1 = r.pop_1;
In addition:
The column alias goes by the column name.
This assumes that pop_1 is the id connecting the two tables.
You're missing WHERE on UPDATE query, because when doing UPDATE ... FROM you're basically doing joins.
So you need to select primary key and then match on primary key to update just the columns are computing rank over.