How to delete from a table in a performant way - postgresql

I need to delete some rows with a ROW_NUMBER. As Postgres doesn't support delete on subquery, I need to do:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY col1, col2 ORDER BY col3 DESC, col4 DESC) AS rank_id FROM {working_table}
)
DELETE FROM {working_table} AS s
USING cte AS t
WHERE
s.col1 = t.col1
AND s.col2= t.col2
AND s.col3 = t.col3
AND …
AND rank_id > 3
AND s.time <= …
which is, I have to do a self-join with all PK cols. The perf is very bad when it is a big table. I'm thinking to insert another table with only SELECT, so I don't have to self-join. And then drop original table and rename the new table. The original table will keep getting new rows. I need to make sure when I do this, now rows will lose. What's the best way to do this?

Related

Is the rows selected in the same order as we insert in postgres/oracle?

I have a multi million table , lets say table1 ,I created another table table2 using the statement :
Create table table2 as select column1, column2 from table1 order by column1 column2;
Is there any surety that when fetching datas from table2 , it is fetched in the sane order it was inserted?
For example :
SELECT column1,column2 from table2 limit 100000 offset 2000000;
should have the same result as
SELECT column1, column2 from table1 order by column1,column2 limit 100000 offset 2000000;
Is there any documentation from which i can confirm this?
Iam using both oracle as well as postgres.

Sum and average total columns in PostgreSQL

I'm using this query to find duplicate dates but not sure how to sum each duplicate dates, average it and remove duplicate dates.
DB Schema
date_time
datapoint_1
datapoint_2
SQL Query
SELECT date_time, COUNT(date_time)
FROM MYTABLE
GROUP BY date_time
HAVING COUNT(date_time) > 1
ORDER BY COUNT(date_time)
I would create a new table to replace the old one. That is easier and might even perform better:
CREATE TABLE mytable2 (LIKE mytable);
INSERT INTO mytable2 (date_time, datapoint_1, datapoint_2)
SELECT m.date_time, avg(m.datapoint_1), avg(m.datapoint_2)
FROM mytable AS m
GROUP BY m.date_time;
Then you can drop mytable and rename mytable2 to replace it.
To prevent new rows from creating duplicates, you could change the way you insert data:
-- to keep track of counts
ALTER TABLE mytable ADD numval integer DEFAULT 1;
-- to prevent duplicates
ALTER TABLE mytable ADD UNIQUE (date_time);
-- to insert new rows
INSERT INTO mytable (date_time, datapoint_1, datapoint_2)
VALUES ('2021-06-30', 42.0, -34.9)
ON CONFLICT (date_time)
DO UPDATE SET numval = mytable.numval + 1,
datapoint_1 = mytable.datapoint_1 + excluded.datapoint_1,
datapoint_2 = mytable.datapoint_2 + excluded.datapoint_2;
-- to select the averages
SELECT date_time,
datapoint_1 / numval AS datapoint_1,
datapoint_2 / numval AS datapoint_2
FROM mytable;
When you use GROUP BY you can also use aggregate functions to reduce multiple lines to a single one (COUNT, that you used is one of such functions). In your case the query would be:
SELECT date_time, avg(datapoint_1), avg(datapoint_2)
FROM MYTABLE
GROUP BY date_time
For every distinct date_time you will get a single row with the average of datapoint_1 and datapoint_2.

How do I delete records based on query results

How to I delete records from a table which is referenced in my query for example, below is my query which returns me the correct amount of results but I then want to delete those records from the same table that is referenced in the query.
;with cte as (select *,
row_number() over (partition by c.[Trust Discharge], c.[AE Admission], c.[NHS Number]
order by c.[Hospital Number]) as Rn,
count(*) over (partition by c.[Trust Discharge], c.[AE Admission], c.[NHS Number]) as cntDups
from CommDB.dbo.tblNHFDArchive as c)
Select * from cte
Where cte.Rn>1 and cntDups >1
as you can already select the rows by querying Select * from cte Where cte.Rn>1 and cntDups >1, you can delete them by running delete from your_table where unique_column in (Select unique_column from cte Where cte.Rn>1 and cntDups >1)
note that unique_column is a column in your table that cannot have duplicate values, and your_table is the table where the rows reside.
and don't forget to backup your table first if it's on production.

TSQL query to delete all duplicate records but one [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
delete duplicate records in SQL Server
I have a table in which unique records are denoted by a composite key, such as (COL_A, COL_B).
I have checked and confirmed that I have duplicate rows in my table by using the following query:
select COL_A, COL_B, COUNT(*)
from MY_TABLE
group by COL_A, COL_B
having count(*) > 1
order by count(*) desc
Now, I would like to remove all duplicate records but keep only one.
Could someone please shed some light on how to achieve this with 2 columns?
EDIT:
Assume the table only has COL_A and COL_B
1st solution,
It is flexible, because you can add more columns than COL_A and COL_B :
-- create table with identity filed
-- using idenity we can decide which row we can delete
create table MY_TABLE_COPY
(
id int identity,
COL_A varchar(30),
COL_B varchar(30)
/*
other columns
*/
)
go
-- copy data
insert into MY_TABLE_COPY (COL_A,COL_B/*other columns*/)
select COL_A, COL_B /*other columns*/
from MY_TABLE
group by COL_A, COL_B
having count(*) > 1
-- delete data from MY_TABLE
-- only duplicates (!)
delete MY_TABLE
from MY_TABLE_COPY c, MY_TABLE t
where c.COL_A=t.COL_A
and c.COL_B=t.COL_B
go
-- copy data without duplicates
insert into MY_TABLE (COL_A, COL_B /*other columns*/)
select t.COL_A, t.COL_B /*other columns*/
from MY_TABLE_COPY t
where t.id = (
select max(id)
from MY_TABLE_COPY c
where t.COL_A = c.COL_A
and t.COL_B = c.COL_B
)
go
2nd solution
If you have really two columns in MY_TABLE you can use:
-- create table and copy data
select distinct COL_A, COL_B
into MY_TABLE_COPY
from MY_TABLE
-- delete data from MY_TABLE
-- only duplicates (!)
delete MY_TABLE
from MY_TABLE_COPY c, MY_TABLE t
where c.COL_A=t.COL_A
and c.COL_B=t.COL_B
go
-- copy data without duplicates
insert into MY_TABLE
select t.COL_A, t.COL_B
from MY_TABLE_COPY t
go
Try:
-- Copy Current Table
SELECT * INTO #MY_TABLE_COPY FROM MY_TABLE
-- Delte all rows from current able
DELETE FROM MY_TABLE
-- Insert only unique values, removing your duplicates
INSERT INTO MY_TABLE
SELECT DISTINCT * FROM #MY_TABLE_COPY
-- Remove Temp Table
DROP TABLE #MY_TABLE_COPY
That should work as long as you don't break any foreign keys when deleting rows from MY_TABLE.

oracle index in table join

if I do
select *
from table1
where table1.col1 = 'xx'
and table1.col2 = 'yy'
and table1.col3= 'zz'`
the execution plan shows full table scan.
The indexes on this table exist for col4 and col5.
Do I need to set an index on each one of col1,col2,col3 to make the query perform better?
Also if the query is like this:
select *
from table1,table2
where table1.col1=table2.col2
and table1.col2 = 'yy'
and table1.col3= 'zz'
If we create an index on col1 and col2, will it suffice?
You should try adding indexes on the columns that you are using in the query:
table1 col1
table1 col2
table1 col3
table2 col2
Note that it can also be advantageous in some cases to have multi-column indexes, for example:
table1 (col2, col3)
It's hard to predict which index will work best without knowing more about your data, but you can try a few different possibilities and see what works best.