Delete duplicates from a huge table in Postgresql - postgresql

I have an unusual problem: I need to delete duplicate records from a table in Postgresql. As i have duplicate records so i dont have primary key and unique index in this table. The table conatins like 20million records and it has duplicate records in it. While i am trying the below query it is taking too long time.
'DELETE FROM temp a using temp b where a.recordid=b.recordid and a.ctid < b.ctid;'
So what should be a better approach to handle such huge table with no index in it?
Appreciate for help.

if you have enough empty space, your can copy table without duplicates, then remove old table and rename new table
like this
INSERT INTO new_table
VALUES
SELECT
DISTINCT ON (column)
*
FROM old_table
ORDER BY column ASC

Use COPY TO to dump the table.
Then Unix sort -u to de-duplicate it.
Drop or truncate the table in Postgres, use COPY FROM to read it back in.
Add a primary key column.

Related

Unable to drop index on a db2 table

I have a table MAIN_SCHEMA.TEST in which I created a Index on a column CHECK_ID.
CHECK_ID is also a FOREIGN_KEY constraint in TEST table.
This table contains only 50 records.
By Mistake the index got created in Default schema DEFAULT_SCHEMA.CHECK_ID_IDX.
CREATE INDEX DEFAULT_SCHEMA.CHECK_ID_IDX(CHECK_ID ASC);
So I am trying to drop this index but the drop query gets stuck for long time.
DROP INDEX DEFAULT_SCHEMA.CHECK_ID_IDX.
there are no locks on this table when I checked.
Instead of dropping and recreating the index with the right schema, could you just try to RENAME the index? It requires the existing SCHEMA.NAME pair together with the new as input. It will not move any data, but just update the metadata.

Efficient way to move large number of rows from one table to another new table using postgres

I am using PostgreSQL database for live project. In which, I have one table with 8 columns.
This table contains millions of rows, so to make search faster from table, I want to delete and store old entries from this table to new another table.
To do so, I know one approach:
first select some rows
create new table
store this rows in that table
than delete from main table.
But it takes too much time and it is not efficient.
So I want to know what is the best possible approach to perform this in postgresql database?
Postgresql version: 9.4.2.
Approx number of rows: 8000000
I want to move rows: 2000000
You can use CTE (common table expressions) to move rows in a single SQL statement (more in the documentation):
with delta as (
delete from one_table where ...
returning *
)
insert into another_table
select * from delta;
But think carefully whether you actually need it. Like a_horse_with_no_name said in the comment, tuning your queries might be enough.
This is a sample code for copying data between two table of same.
Here i used different DB, one is my production DB and other is my testing DB
INSERT INTO "Table2"
select * from dblink('dbname=DB1 dbname=DB2 user=postgres password=root',
'select "col1","Col2" from "Table1"')
as t1(a character varying,b character varying);

How to clone or copy records in same table in postgres?

How to clone or copy records in same table in PostgreSQL by creating temporary table.
trying to create clones of records from one table to the same table with changed name(which is basically composite key in that table).
You can do it all in one INSERT combined with a SELECT.
i.e. say you have the following table definition and data populated in it:
create table original
(
id serial,
name text,
location text
);
INSERT INTO original (name, location)
VALUES ('joe', 'London'),
('james', 'Munich');
And then you can INSERT doing the kind of switch you're talking about without using a TEMP TABLE, like this:
INSERT INTO original (name, location)
SELECT 'john', location
FROM original
WHERE name = 'joe';
Here's an sqlfiddle.
This should also be faster (although for tiny data sets probably not hugely so in absolute time terms), since it's doing only one INSERT and SELECT as opposed to an extra SELECT and CREATE TABLE plus an UPDATE.
Did a bit of research, came up with a logic :
Create temp table
Copy records into it
Update the records in temp table
Copy it back to original table
CREATE TEMP TABLE temporary AS SELECT * FROM ORIGINAL WHERE NAME='joe';
UPDATE TEMP SET NAME='john' WHERE NAME='joe';
INSERT INTO ORIGINAL SELECT * FROM temporary WHERE NAME='john';
Was wondering if there was any shorter way to do it.

Delete all the records

How to delete all the records in SQL Server 2008?
To delete all records from a table without deleting the table.
DELETE FROM table_name use with care, there is no undo!
To remove a table
DROP TABLE table_name
from a table?
You can use this if you have no foreign keys to other tables
truncate table TableName
or
delete TableName
if you want all tables
sp_msforeachtable 'delete ?'
Use the DELETE statement
Delete From <TableName>
Eg:
Delete from Student;
I can see the that the others answers shown above are right, but I'll make your life easy.
I even created an example for you. I added some rows and want delete them.
You have to right click on the table and as shown in the figure Script Table a> Delete to> New query Editor widows:
Then another window will open with a script. Delete the line of "where", because you want to delete all rows. Then click Execute.
To make sure you did it right right click over the table and click in "Select Top 1000 rows". Then you can see that the query is empty.
If you want to reset your table, you can do
truncate table TableName
truncate needs privileges, and you can't use it if your table has dependents (another tables that have FK of your table,
For one table
truncate table [table name]
For all tables
EXEC sp_MSforeachtable #command1="truncate table ?"
Delete rows in the Results pane if you want to delete records in the database. If you want to delete all of the rows you can use a Delete query.
Delete from Table_name
delete from TableName
isn't a good practice.
Like in Google BigQuery, it don't let to use delete without "where" clause.
use
truncate table TableName
instead
When the table is very large, it's better to delete table itself with drop table TableName and recreate it, if one has create table query; rather than deleting records one by one, using delete from statement because that can be time consuming.
The statement is DELETE FROM YourDatabaseName.SomeTableName; if you are willing to remove all the records with reasonable permission. But you may see errors in the constraints that you defined for your Foreign Keys. So that you need to change your constraints before removing the records otherwise there is a command for MySQL (which may work for others) to ignore the constraints.
SET foreign_key_checks = 0;
Please be aware that this command will disable your foreign keys constrain check, so this can be dangerous for the relationships you created within your schema.

Is there a way to quickly duplicate record in T-SQL?

I need to duplicate selected rows with all the fields exactly same except ID ident int which is added automatically by SQL.
What is the best way to duplicate/clone record or records (up to 50)?
Is there any T-SQL functionality in MS SQL 2008 or do I need to select insert in stored procedures ?
The only way to accomplish what you want is by using Insert statements which enumerate every column except the identity column.
You can of course select multiple rows to be duplicated by using a Select statement in your Insert statements. However, I would assume that this will violate your business key (your other unique constraint on the table other than the surrogate key which you have right?) and require some other column to be altered as well.
Insert MyTable( ...
Select ...
From MyTable
Where ....
If it is a pure copy (minus the ID field) then the following will work (replace 'NameOfExistingTable' with the table you want to duplicate the rows from and optionally use the Where clause to limit the data that you wish to duplicate):
SELECT *
INTO #TempImportRowsTable
FROM (
SELECT *
FROM [NameOfExistingTable]
-- WHERE ID = 1
) AS createTable
-- If needed make other alterations to the temp table here
ALTER TABLE #TempImportRowsTable DROP COLUMN Id
INSERT INTO [NameOfExistingTable]
SELECT * FROM #TempImportRowsTable
DROP TABLE #TempImportRowsTable
If you're able to check the duplication condition as rows are inserted, you could put an INSERT trigger on the table. This would allow you to check the columns as they are inserted instead of having to select over the entire table.