I have a table "studies" with 14 subtables that are connected to "studies" via a foreign key that refers to the primary key in "studies".
I need to delete all rows from the table "studies" where the column "overall_status" is neither "Recruiting" nor "Not yet recruiting". I need to delete the appropriate rows in the dependent tables, too.
I tried with
DELETE from ctgov.studies where overall_status not in ('Recruiting','Not yet recruiting');
But this takes hours. Is there a faster solution?
DELETE CASCADE was extremely slow in PgAdmin; after an overnight session, I discontinued it. Exactly the same query ran flawlessly in psql. I also reduced the number of indexes before deleting a big number of rows.
Related
We have a Postgresql database table with many tables. All of the tables seem to be functioning perfectly except for 1. In the last day or two it has stopped performing row deletes. When we try something simple like
delete from bad_table where id_foo = 123;
It acts as if it has successfully deleted the row. But when we do
select * from bad_table where id_foo = 123;
the row is still there.
The same type of queries work fine on all the other tables we tried.
In addition, the foreign keys on this table are not working. There is a foreign key constraint on a column that references a different table. There is an id in the "bad_table", but that id does not exist in the referenced table. Again, foreign key constraints appear to be working fine in all other tables, it is just this one. We tried dropping and recreating the foreign key (which seemed to be successful), but it had no effect.
Between my coworkers and myself we probably have 80 years of relational database experience across oracle, sql server, postgres, etc. and none of us has ever seen anything like this. We've been banging our heads against a wall and are now reaching out to the wider world to see if anyone has any ideas of what we could try. Has anyone else ever seen something like this in Postgres?
It turned out that the issue with foreign keys was solved by dropping the foreign key constraint and then immediately adding it again.
The issue with not being able to delete rows was fixed by dropping a trigger than was called on row delete and then immediate recreating the same trigger.
I don't know what happened, but it is acting like a constraint and a trigger on that particular table were corrupted.
I have a table in my PostgreSQL database that became huge, filled with a lot of useless rows.
As these useless rows represent 99.9% of my table data (about 3.3M rows), I was wondering if deleting them could have a bad impact on my DB :
I know that this operation could take some time and I will be able to block writes on the table during the maintenance operation
But I was wondering if this huge change in the data could also impact performance after the opertation itself.
I found solutions like creating a new table / using TRUNCATE to drop all lines but as this operation will be specific and one shot, I would like to be able to choose the most adapted solution.
I know that Postgre SQL has a VACUUM mechanism but I'm not a DBA expert : Could anyone please confirm that this delete will not impact my table integrity / data structure and that freed space will be reclaimed if needed for new data ?
PostgreSQL 11.12, with default settings on AWS RDS. I don't have any index on my table and the criteria for rows deletion will not be based on the PK
Deleting rows typically does not shrink a PostgreSQL table, sou you would then have to run VACUUM (FULL) to compact it, during which the table is inaccessible.
If you are deleting many rows, both the DELETE and the VACUUM (FULL) will take a long time, and you would be much better off like this:
create a new table that is defined like the old one
INSERT INTO new_tab SELECT * FROM old_tab WHERE ... to copy over the rows you want to keep
drop foreign key constraints that point to the old table
create all indexes and constraints on the new table
drop the old table and rename the new one
By planning that carefully, you can get away with a short down time.
I have a table with around 200 million records and I have added 2 new columns to it. Now the 2 columns need values from a different table. Nearly 80% of the rows will be updated.
I tried update but it takes more than 2 hours to complete.
The main table has a composite primary key of 4 columns. I have dropped it and dropped an index that is present on a column before updating. Now the update takes little over than 1 hour.
Is there any other way to speed up this update process (like batch processing).
Edit: I used the other table(from where values will be matched for update) in from clause of the update statement.
Not really. Make sure that max_wal_size is high enough that you don't get too many checkpoints.
After the update, the table will be bloated to about twice its original size.
That bloat can be avoided if you update in batches and VACUUM in between, but that will not make processing faster.
Do you need whole update in single transaction? I had quite similar problem, with table that was under heavy load, and column required not null constraint. Do deal with it - I did some steps:
Add columns without constraints like not null, but with defaults. That way it went really fast.
Update columns in steps like 1000 entries per transaction. In my case load of the DB rise, so I had to put small delay.
Update columns to have not null constraints.
That way you don't block table for long time, but that is not an answer to your question.
First to validate where you are - I would check iostats to see if that is not the limit... To speed up, I would consider:
higher free space map - to be sure DB is aware of entries that can be removed, but note that if pages are packed to the limit it would not bring much...
maybe foreign keys referring to the table can be also removed? To stop locking the table,
removing all indices since they are slowing down, and create them afterwords - that looks like slicing problem but other way, but is an option, so counts...
There is a 2 type of solution to your problem.
1) This approach work if your main table doesn't update or inserted during this process
First create the same table schema without composite primary key and index with a different name.
Then insert the data in the new table with join table data.
Apply all constraints and indexes on the new table after insert.
Drop the old table and rename the new table with the old table name.
2) Or you can use a trigger to update that two-column on insert or update event. (This will make insert update operation slightly slow)
I have a simple table with a primary key ID and other columns that are not so interesting. The data there is not so much, like several thousands records. However there too many constraints on this table. I mean it is referenced by other tables (more than 200) with foreign key to the ID column. And every time I tried to insert something into it, it checks all the constraints and takes around 2-3 secs for every insert to complete. There are btree indexes on all tables, so the query planner uses index scans as well as sequential ones.
My question is there any option or something that I can apply in order to speed up these inserts. I tried to disable the sequential scans and use index scans only, but this didn't help. The partitioning wouldn't be helpful as well, I think. So please advise.
The version of psql is 10.0.
Thank you!
I have about 10 tables with over 2 million records and one with 30 million. I would like to efficiently remove older data from each of these tables.
My general algorithm is:
create a temp table for each large table and populate it with newer data
truncate the original tables
copy tmp data back to original tables using: "insert into originaltable (select * from tmp_table)"
However, the last step of copying the data back is taking longer than I'd like. I thought about deleting the original tables and making the temp tables "permanent", but I lose constraint/foreign key info.
If I delete from the tables directly, it takes much longer. Given that I need to preserve all foreign keys and constraints, are there any faster ways of removing the older data?
Thanks.
The fastest process is likely to be exactly as you've outlined:
Copy new data into a temporary table
Drop indexes and foreign keys
Drop the old table
Copy the temporary table back to the old table name
Rebuild indexes and foreign keys.
The Postgres manual has some suggestions on perfomance, too, that may or may not apply. Frankly, however, it is significantly quicker to drop a table than to drop millions of rows (since each delete is performed tuple by tuple) and it is significantly quicker to insert millions of rows into a table with no constraints or indexes (as each constraint must be checked and each index must be updated for each record insert; by removing all constraints, you limit this to a single build of the index and a single verification for the constraint).
The "standard" solution for these problems typically involves partitioning your tables on the appropriate key, such that when you need to delete old data, you can simply drop a whole partition -- certainly the fastest deletion that you will ever get.
However, partitioning in PostgreSQL isn't as easy as some other databases -- you need to relocate data manually using triggers, and there are caveats (e.g. no global primary keys)
See the PostgreSQL manual on Partitioning