TSQL query to delete all duplicate records but one [duplicate] - tsql

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
delete duplicate records in SQL Server
I have a table in which unique records are denoted by a composite key, such as (COL_A, COL_B).
I have checked and confirmed that I have duplicate rows in my table by using the following query:
select COL_A, COL_B, COUNT(*)
from MY_TABLE
group by COL_A, COL_B
having count(*) > 1
order by count(*) desc
Now, I would like to remove all duplicate records but keep only one.
Could someone please shed some light on how to achieve this with 2 columns?
EDIT:
Assume the table only has COL_A and COL_B

1st solution,
It is flexible, because you can add more columns than COL_A and COL_B :
-- create table with identity filed
-- using idenity we can decide which row we can delete
create table MY_TABLE_COPY
(
id int identity,
COL_A varchar(30),
COL_B varchar(30)
/*
other columns
*/
)
go
-- copy data
insert into MY_TABLE_COPY (COL_A,COL_B/*other columns*/)
select COL_A, COL_B /*other columns*/
from MY_TABLE
group by COL_A, COL_B
having count(*) > 1
-- delete data from MY_TABLE
-- only duplicates (!)
delete MY_TABLE
from MY_TABLE_COPY c, MY_TABLE t
where c.COL_A=t.COL_A
and c.COL_B=t.COL_B
go
-- copy data without duplicates
insert into MY_TABLE (COL_A, COL_B /*other columns*/)
select t.COL_A, t.COL_B /*other columns*/
from MY_TABLE_COPY t
where t.id = (
select max(id)
from MY_TABLE_COPY c
where t.COL_A = c.COL_A
and t.COL_B = c.COL_B
)
go
2nd solution
If you have really two columns in MY_TABLE you can use:
-- create table and copy data
select distinct COL_A, COL_B
into MY_TABLE_COPY
from MY_TABLE
-- delete data from MY_TABLE
-- only duplicates (!)
delete MY_TABLE
from MY_TABLE_COPY c, MY_TABLE t
where c.COL_A=t.COL_A
and c.COL_B=t.COL_B
go
-- copy data without duplicates
insert into MY_TABLE
select t.COL_A, t.COL_B
from MY_TABLE_COPY t
go

Try:
-- Copy Current Table
SELECT * INTO #MY_TABLE_COPY FROM MY_TABLE
-- Delte all rows from current able
DELETE FROM MY_TABLE
-- Insert only unique values, removing your duplicates
INSERT INTO MY_TABLE
SELECT DISTINCT * FROM #MY_TABLE_COPY
-- Remove Temp Table
DROP TABLE #MY_TABLE_COPY
That should work as long as you don't break any foreign keys when deleting rows from MY_TABLE.

Related

DELETE else UPDATE from temp table

This may seem easy but I have been looking for some hours now.
How do I insert rows in a target table that do not exist in the temp table,
and at the same time delete rows in the target table that do exist in the temp table? It has to be transaction secure, in teradata, and if possible performant.
MERGE does not support delete and insert at the same time apparently.
Along with temporary/work table, you can make use of BTET to make delete and insert transaction safe.
Sample example shown below
CREATE TABLE temp_Table_2 AS (SELECT * FROM target_Table WHERE EXISTS (SELECT 1 FROM temp_Table WHERE target_Table.key_cols = temp_Table.key_cols)) WITH DATA;
BT;
--First delete rows which are present in temp table
DELETE FROM target_Table WHERE EXISTS (SELECT 1 FROM temp_Table WHERE target_Table.key_cols = temp_Table.key_cols);
--Second Insert rows which were not exist in target table
INSERT INTO target_Table
col_1,
col_1,
.
.
col_n
)
SELECT
col_1,
col_1,
.
.
col_n
FROM temp_Table
WHERE NOT EXISTS (SELECT 1 FROM temp_Table_2 WHERE temp_Table_2.key_cols = temp_Table.key_cols);
ET;
DROP TABLE temp_Table_2;
Note: You can tweak the example if there is any third table from where you want to INSERT rows into target table that do not exists in Temp table

PostgreSQL - Append a table to another and add a field without listing all fields

I have two tables:
table_a with fields item_id,rank, and 50 other fields.
table_b with fields item_id, and the same 50 fields as table_a
I need to write a SELECT query that adds the rows of table_b to table_a but with rank set to a specific value, let's say 4.
Currently I have:
SELECT * FROM table_a
UNION
SELECT item_id, 4 rank, field_1, field_2, ...
How can I join the two tables together without writing out all of the fields and without using an INSERT query?
EDIT:
My idea is to join table_b to table_a somehow with the rank field remaining empty, then simply replace the null rank fields. The rank field is never null, but item_id can be duplicated and table_a may have item_id values that are not in table_b, and vice-versa.
I am not sure I understand why you need this, but you can use jsonb functions:
select (jsonb_populate_record(null::table_a, row)).*
from (
select to_jsonb(a) as row
from table_a a
union
select to_jsonb(b) || '{"rank": 4}'
from table_b b
) s
order by item_id;
Working example in rextester.
I'm pretty sure I've got it. The predefined rank column can be inserted into table_b by joining to the subset of itself with only the columns left of the column behind which you want to insert.
WITH
_leftcols AS ( SELECT item_id, 4 rank FROM table_b ),
_combined AS ( SELECT * FROM table_b JOIN _leftcols USING (item_id) )
SELECT * FROM _combined
UNION
SELECT * FROM table_a

How to retrieve duplicate records and delete them in table A, also insert these duplicate records in another table B (in postgres)

how to retrieve duplicate records and delete them in table A, also insert these retrieved duplicate records in another table B (in postgres db)
SQL query's are required for my project.
To delete duplicates without having a unique column you can use the ctid virtual column which is essentially the same thing as the rowid in Oracle:
delete from table_A t1
where ctid <> (select min(t2.ctid)
from table_A t2
where t1.unique_column = t2.unique_column);
You can use the returning clause to get the deleted rows and insert them into the other table:
with deleted as (
delete from table_A x1
where ctid <> (select min(t2.ctid)
from table_A t2
where t1.unique_column = t2.unique_column);
returning *
)
insert into table_B (col_1, col_2)
select unique_column, some_other_column
from deleted;
If you further want to see those deleted rows, you can throw in another CTE:
with deleted as (
delete from table_A x1
where ctid <> (select min(t2.ctid)
from table_A t2
where t1.unique_column = t2.unique_column);
returning *
), moved as (
insert into table_B (col_1, col_2)
select unique_column, some_other_column
from deleted
returning *
)
select *
from moved;

SQL Union exclude row if value already exists in first table

I have two tables which i wish to combine. However, there is a field in both tables that should have the same value in the second table the second tables record should be excluded.
These are a MSSQL 2012 tables.
The only way i can think of is something nasty like this.
Select A, B
from Tab1
Union
Select C, D
from Tab2
where Tab2.c not in (Select A from Tab1)
It looks relatively clean in my example but the selects for Tab1 and tab2 have long and complex where clauses and i would need to duplicated that in the "not in" select statement.
I've seen other solutions but not in MSSQL. Any one out there have a better example ?
Thanks
well in t-sql i use this kind of code
USE tempdb
GO
CREATE TABLE StudentDetails
(
StudentID INTEGER PRIMARY KEY,
StudentName VARCHAR(15)
)
GO
INSERT INTO StudentDetails
VALUES(1,'SMITH')
INSERT INTO StudentDetails
VALUES(2,'ALLEN')
INSERT INTO StudentDetails
VALUES(3,'JONES')
INSERT INTO StudentDetails
VALUES(4,'MARTIN')
INSERT INTO StudentDetails
VALUES(5,'JAMES')
GO
CREATE TABLE StudentTotalMarks
(
StudentID INTEGER REFERENCES StudentDetails,
StudentMarks INTEGER
)
GO
INSERT INTO StudentTotalMarks
VALUES(1,230)
INSERT INTO StudentTotalMarks
VALUES(2,255)
INSERT INTO StudentTotalMarks
VALUES(3,200)
GO
-- Select from Table
SELECT *
FROM StudentDetails
GO
SELECT *
FROM StudentTotalMarks
GO
-- Merge Statement
MERGE StudentTotalMarks AS stm
USING (SELECT StudentID,StudentName FROM StudentDetails) AS sd
ON stm.StudentID = sd.StudentID
WHEN MATCHED AND stm.StudentMarks > 250 THEN DELETE
WHEN MATCHED THEN UPDATE SET stm.StudentMarks = stm.StudentMarks + 25
WHEN NOT MATCHED THEN
INSERT(StudentID,StudentMarks)
VALUES(sd.StudentID,25);
GO
-- Select from Table
SELECT *
FROM StudentDetails
GO
SELECT *
FROM StudentTotalMarks
GO
-- Clean up
DROP TABLE StudentDetails
GO
DROP TABLE StudentTotalMarks
GO
The Merge Join performs very well and the following result is obtained.
http://www.pinaldave.com/bimg/MergeStatement.gif
Hope this helps

T-SQL Delete Inserted Records

I know the title may seem strange but this is what I want to do:
I have table with many records.
I want to get some of this records and insert them in other table. Something like this:
INSERT INTO TableNew SELECT * FROM TableOld WHERE ...
The tricky part is that I want this rows that I have inserted to be deleted form the origin table as well.
Is there a easy way to do this, because the only think that I have managed to do is to use a temporary table for saving the selected records and then to put them in the second table and delete rows that match with them from the first table. It is a solution, but with so many records (over 3 millions and half) I am looking for some other idea...
In 2005+ use OUTPUT clause like this:
DELETE FROM TableOld
OUTPUT DELETED.* INTO TableNew
WHERE YourCondition
It will be performed in single transaction and either completed or roll back simultaneously
You can use the insert ... output clause to store the ID's of the copied rows in a temporary table. Then you can delete the rows from the original table based on the temporary table.
declare #Table1 table (id int, name varchar(50))
declare #Table2 table (id int, name varchar(50))
insert #Table1 (id,name)
select 1, 'Mitt'
union all select 2, 'Newt'
union all select 3, 'Rick'
union all select 4, 'Ron'
declare #copied table (id int)
insert #Table2
(id, name)
output inserted.id
into #copied
select id
, name
from #Table1
where name <> 'Mitt'
delete #Table1
where id in
(
select id
from #copied
)
select *
from #Table1
Working example at Data Explorer.
You should do some thing like this:
INSERT INTO "table1" ("column1", "column2", ...)
SELECT "column3", "column4", ...
FROM "table2"
WHERE ...
DELETE FROM "table1"
WHERE ...