Currently in Oracle I run a procedure monthly to delete some data. For performance reasons in Oracle I have used BULK COLLECT and FORALL .. DELETE to perform the deletes.
Anyone know if there is there anything similar in Postgres? Do I need to be concerned about performance if I use the following to delete a lot of data?
DELETE FROM sample WHERE id IN (SELECT id FROM test);
Use WHERE EXISTS not WHERE IN.
Otherwise, should be fine so long as sample isn't the target of any foreign key refs. If it is, you'll need indexes on the referencing ends.
For really big deletes on FKs with ON DELETE CASCADE it can be preferable to do a join to delete the referring side in a batch, then delete the referred-to side. That helps prevent millions of individual DELETE statements having to run for cascade deletes.
Related
Alright, so I have a table that is extremely slow on delete even after removing fkey constraints and extra indexes aside from the one created for id; and finally reindexing the table after. When I delete with the explain plan it doesn't show anything interesting and the rows are around 7k, what else is there to check? A another table within the same database deletes in 3min for 184k rows.There are also no triggers associated with the table. Doesn't reindexing after dropping Fkeys also decouple for any referencing tables? This is the only thing I can think of. I also checked for bloated tables and that also seems to be just fine. Any ideas?
I'm currently working on dumping one of our customer's database in a way that allows us to create new databases from this customer's basic structure, but without bringing along their private data.
So far, I've had success with pg_dump combined with the --exclude_table and exclude-table-data commands, which allowed me to bring only the data I'll effectively need for this task.
However, there are a few tables that mix lines which references some of the data I left behind with other lines that references data that I had to bring, and this is causing me a few issues during the restore operation. Specifically, when the dump tries to enforce FOREIGN KEY constraints for certain columns on these tables, it fails because there are some lines with keys that have no matching data on the respective foreign table - because I chose to not bring this table's data!
I know I can log into the database after the dump is complete, delete any rows that reference data that no longer exists and create the constraint myself, but I'd like to automate the process as much as possible. Is there a way to tell pg_dump or pg_restore (or any other program) to not bring rows from table A if they reference table B if and table B's data was excluded from the backup? Or to tell Postgres that I'd like to have that specific foreign key to be active before importing the table's data?
For reference, I'm working with PostgreSQL 9.2 on a HREL 7 server.
What if you disable foreign key checking when you restore your database dump? And after that remove lonely rows from the referring table.
By the way, I recommend you to fix you database schema so there is no chance wrong tuples being inserted into your database.
I need to insert a Big amount of data(Some Millions) and I need to perform it Quickly.
I read about Bulk insert via ODBC on .NET and JAVA But I need to perform it directly on the Database.
I also read about Batch Insert but What I have tried have not seemed to work
Batch Insert, Example
I'm executing a INSERT SELECT but it's taking something like 0,360s per row, this is very slow and I need to perform some improvements here.
I would really appreciate some guidance here with examples and documentation if possible.
DATABASE: SYBASE ASE 15.7
Expanding on some of the comments ...
blocking, slow disk IO, and any other 'wait' events (ie, anything other than actual insert/update activity) can be ascertained from the master..monProcessWaits table (where SPID = spid_of_your_insert_update_process) [see the P&T manual for Monitoring Tables (aka MDA tables)]
master..monProcessObject and master..monProcessStatement will show logical/physical IOs for currently running queries [again, see P&T manual for MDA tables]
master..monSysStatement will show logical/physical IOs for recently completed queries [again, see P&T manual for MDA tables]
for UPDATE statements you'll want to take a look at the query plan to see if you're suffering from a poor join order; also of key importance ... direct (fast/good) updates vs deferred (slow/bad) updates; deferred updates can occur for many reasons ... some fixable, some not ... updating indexed columns, poor join order, updates that cause page splits and/or row forwardings
RI (PK/FK) constraints can be viewed with sp_helpconstraint table_name; query plans will also show the under-the-covers joins required when performing RI (PK/FK) validations during inserts/updates/deletes
triggers are a bit harder to locate (an official sp_helptrigger doesn't show up until ASE 16); check the sysobjects.[ins|upd|del]trig where name = your_table - these represent the object id(s) of any insert/update/delete triggers on the table; also check sysobjects records where type = 'TR' and deltrig = object_id(your_table) - provides support for additional insert/update/delete triggers (don't recall at moment if this is just ASE 16+)
if triggers are being fired, need to review the associated query plans to make sure the inserted and deleted tables (if referenced) are driving any queries where these pseudo tables are joined with permanent tables
There are likely some areas I'm forgetting (off the top of my head) ... key take away is that there could be many reasons for 'slow' DML statements.
One (relatively) quick way to find out if RI (PK/FK) constraints or triggers are at play ...
set showplan on
go
insert/update/delete statements
go
Then review the resulting query plan(s); if you see references to any tables other than the ones explicitly listed in the insert/update/delete statements then you're likely dealing with RI constraints and/or triggers.
To give some context, the command is issued inside a task, and many task might issue the same command from multiple workers at the same time.
Each tasks tries to create a postgres schema. I often get the following error:
IntegrityError: (IntegrityError) duplicate key value violates unique constraint "pg_namespace_nspname_index"
DETAIL: Key (nspname)=(9621584361) already exists.
'CREATE SCHEMA IF NOT EXISTS "9621584361"'
Postgres version is PostgreSQL 9.4rc1.
Is it a bug in Postgres?
This is a bit of a wart in the implementation of IF NOT EXISTS for tables and schemas. Basically, they're an upsert attempt, and PostgreSQL doesn't handle the race conditions cleanly. It's safe, but ugly.
If the schema is being concurrently created in another session but isn't yet committed, then it both exists and does not exist, depending on who you are and how you look. It's not possible for other transactions to "see" the new schema in the system catalogs because it's uncommitted, so it's entry in pg_namespace is not visible to other transactions. So CREATE SCHEMA / CREATE TABLE tries to create it because, as far as it's concerned, the object doesn't exist.
However, that inserts a row into a table with a unique constraint. Unique constraints must be able to see uncommitted rows in order to function. So the insert blocks (stops) until the first transaction that did the CREATE either commits or rolls back. If it commits, the second transaction aborts, because it tried to insert a row that violates a unique constraint. CREATE SCHEMA isn't smart enough to catch this case and re-try.
To properly fix this PostgreSQL would probably need predicate locking, where it could lock the potential for a row. This might get added as part of the current work going on for implementing UPSERT.
For these particular commands, PostgreSQL could probably do a dirty read of the system catalogs, where it can see uncommitted changes. Then it could wait for the uncommitted transaction to commit or roll back, re-do the dirty read to see if someone else is waiting, and retry. But this would have a race condition where someone else might create the schema between when you do the read to check for it and when you try to create it.
So the IF NOT EXISTS variants would have to:
Check to see if the schema exists; if it does, finish without doing anything.
Attempt to create the table
If creation fails due to a unique constraint error, retry at the start
If table creation succeeds, finish
As far as I know nobody's implemented that, or they tried and it wasn't accepted. There would be possible issues with transaction ID burn rate, etc, with this approach.
I think this is a bug of sorts, but it's a "yeah, we know" kind of bug, not a "we'll get right on fixing that" kind of bug. Feel free to post to pgsql-bugs about it; at the very least the documentation should mention this caveat about IF NOT EXISTS.
I don't recommend doing DDL concurrently like that.
I needed to work around this limitation in an application where schemas are created concurrently. What worked for me was adding
LOCK TABLE pg_catalog.pg_namespace
in the transaction including CREATE SCHEMA. Looks like a dirty and unsafe thing to do, but helped me to solve the problem which occurred only in tests anyway.
My requirement is to be able to bulk insert data to multiple table. (Having foreign key constraints) I won't get the foreign keys till I submit the batch to the parent table. how to achieve this using bulk copy?
i cannot use linked servers or openrowset due to policy constraints.
you may be able to get away with turning off the constraints temporarily during the bulk load.