Avoid exclusive access locks on referenced tables when DROPping in PostgreSQL - postgresql

Why does dropping a table in PostgreSQL require ACCESS EXCLUSIVE locks on any referenced tables? How can I reduce this to an ACCESS SHARED lock or no lock at all? i.e. is there a way to drop a relation without locking the referenced table?
I can't find any mention of which locks are required in the documentation, but unless I explicitly get locks in the correct order when dropping multiple tables during concurrent operations, I can see deadlocks waiting on an AccessExclusiveLock in the logs, and acquiring this restrictive lock on commonly-referenced tables is causing momentary delays to other processes when tables are deleted.
To clarify,
CREATE TABLE base (
id SERIAL,
PRIMARY KEY (id)
);
CREATE TABLE main (
id SERIAL,
base_id INT,
PRIMARY KEY (id),
CONSTRAINT fk_main_base (base_id)
REFERENCES base (id)
ON DELETE CASCADE ON UPDATE CASCADE
);
DROP TABLE main; -- why does this need to lock base?

For anyone googling and trying to understand why their drop table (or drop foreign key or add foreign key) got stuck for a long time:
PostgreSQL (I looked at versions 9.4 to 13) foreign key constraints are actually implemented using triggers on both ends of the foreign key.
If you have a company table (id as primary key) and a bank_account table (id as primary key, company_id as foreign key pointing to company.id), then there are actually 2 triggers on the bank_account table and also 2 triggers on the company table.
table_name
timing
trigger_name
function_name
bank_account
AFTER UPDATE
RI_ConstraintTrigger_c_1515961
RI_FKey_check_upd
bank_account
AFTER INSERT
RI_ConstraintTrigger_c_1515960
RI_FKey_check_ins
company
AFTER UPDATE
RI_ConstraintTrigger_a_1515959
RI_FKey_noaction_upd
company
AFTER DELETE
RI_ConstraintTrigger_a_1515958
RI_FKey_noaction_del
Initial creation of those triggers (when creating the foreing key) requires SHARE ROW EXCLUSIVE lock on those tables (it used to be ACCESS EXCLUSIVE lock in version 9.4 and earlier). This lock does not conflict with "data reading locks", but will conflict with all other locks, for example a simple INSERT/UPDATE/DELETE into company table.
Deletion of those triggers (when droping the foreign key, or the whole table) requires ACCESS EXCLUSIVE lock on those tables. This lock conflicts with every other lock!
So imagine a scenario, where you have a transaction A running that first did a simple SELECT from company table (causing it to hold an ACCESS SHARE lock for company table until the transaction is commited or rolled back) and is now doing some other work for 3 minutes. You try to drop the bank_account table in transaction B. This requires ACCESS EXCLUSIVE lock, which will need to wait until the ACCESS SHARE lock is released first.
In addition of that all other transactions, which want to access the company table (just SELECT, or maybe INSERT/UPDATE/DELETE), will be queued to wait on the ACCESS EXCLUSIVE lock, which is waiting on the ACCESS SHARE lock.
Long running transactions and DDL changes require delicate handling.

-- SESSION#1
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
BEGIN;
CREATE TABLE base (
id SERIAL
, dummy INTEGER
, PRIMARY KEY (id)
);
CREATE TABLE main (
id SERIAL
, base_id INTEGER
, PRIMARY KEY (id)
, CONSTRAINT fk_main_base FOREIGN KEY (base_id) REFERENCES base (id)
-- comment the next line out ( plus maybe tghe previous one)
ON DELETE CASCADE ON UPDATE CASCADE
);
-- make some data ...
INSERT INTO base (dummy)
SELECT generate_series(1,10)
;
-- make some FK references
INSERT INTO main(base_id)
SELECT id FROM base
WHERE random() < 0.5
;
COMMIT;
BEGIN;
DROP TABLE main; -- why does this need to lock base?
SELECT pg_backend_pid();
-- allow other session to check the locks
-- and attempt an update to "base"
SELECT pg_sleep(20);
-- On rollback the other session will fail.
-- On commit the other session will succeed.
-- In both cases the other session must wait for us to complete.
-- ROLLBACK;
COMMIT;
-- SESSION#2
-- (Start this after session#1 from a different terminal)
SET search_path = tmp, pg_catalog;
PREPARE peeklock(text) AS
SELECT dat.datname
, rel.relname as relrelname
, cat.relname as catrelname
, lck.locktype
-- , lck.database, lck.relation
, lck.page, lck.tuple
-- , lck.virtualxid, lck.transactionid
-- , lck.classid
, lck.objid, lck.objsubid
-- , lck.virtualtransaction
, lck.pid, lck.mode, lck.granted, lck.fastpath
FROM pg_locks lck
LEFT JOIN pg_database dat ON dat.oid = lck.database
LEFT JOIN pg_class rel ON rel.oid = lck.relation
LEFT JOIN pg_class cat ON cat.oid = lck.classid
WHERE EXISTS(
SELECT * FROM pg_locks l
JOIN pg_class c ON c.oid = l.relation AND c.relname = $1
WHERE l.pid =lck.pid
)
;
EXECUTE peeklock( 'base' );
BEGIN;
-- attempt to perfom some DDL
ALTER TABLE base ALTER COLUMN id TYPE BIGINT;
-- attempt to perfom some DML
UPDATE base SET id = id+100;
COMMIT;
EXECUTE peeklock( 'base' );
\d base
SELECT * FROM base;

I suppose DDL locks everything it touches exclusively for the sake of simplicity — you're not supposed to run DDL involving not-temporary tables during normal operation anyway.
To avoid deadlock you may use advisory lock:
start transaction;
select pg_advisory_xact_lock(0);
drop table main;
commit;
This would ensure that only one client is concurrently running DDL involving referenced tables so it wouldn't matter in which order would other locks be acquired.
You can avoid locking table for long time by dropping foreign key first:
start transaction;
select pg_advisory_xact_lock(0);
alter table main drop constraint fk_main_base;
commit;
start transaction;
drop table main;
commit;
This would still need to lock base exclusively, but for much shorter time.

Related

Import CSV Data into Postgres DB with two tables which depend on each other

Hei,
i have a question about the best practice here. I have a Golang Project which uses as Postgres Database and specific Migrations. The Database has many tables and some depend on each other (Table A has FK to Table B, Table B has FK to Table A). My "problem" is now that i have to import data from CSV files, which i do with the COPY ... FROM ... WITH Command. Each CSV file contains the Data for a specific table.
If i try to use the copy command i get the error: "insert or update on table "b" violates foreign key constraint". Thats right, because in table a is no data right now. And cause of the FKs the problem happens on both sides.
So what is the best way to import the data?
Thanks :)
Possible solution approach:
1.) create the table without the FK constraints
2.) load the data into tables
3.) add the FK constraints to tables with ALTER TABLE:
Example:
ALTER TABLE table_name
ADD CONSTRAINT constraint_name constraint_definition;
You can defer deferrable constraints until the end of a transaction:
create table a (id serial primary key, b_id bigint);
create table b (id serial primary key, a_id bigint references a(id) deferrable);
alter table a
add constraint fk_b_id foreign key (b_id) references b(id) deferrable;
begin transaction;
SET CONSTRAINTS ALL DEFERRED;
--your `COPY...FROM...WITH` goes here
insert into b values (1,1);--without deferring constraints it fails here
insert into a values (1,1);
commit;
Problem is, you have to make sure your foreign key constraints are deferrable in the first place - by default they are not, so set constraints all deferred; won't affect them.
You can do this dynamically for the time of your import (online demo):
CREATE TEMP TABLE queries_to_make_constraints_deferrable AS
SELECT format('alter table %I.%I alter constraint %I deferrable',
v.schemaname, v.tablename, con.conname) as query
FROM pg_catalog.pg_constraint con
INNER JOIN pg_catalog.pg_class rel ON rel.oid = con.conrelid
INNER JOIN pg_catalog.pg_namespace nsp ON nsp.oid = connamespace
INNER JOIN (VALUES
('public','table1'),--list your tables here
('public','some_other_table'),
('public','a'),
('public','b')) v(schemaname,tablename)
ON nsp.nspname = v.schemaname AND rel.relname=v.tablename
WHERE con.contype='f' --foreign keys
AND con.condeferrable is False; --non-deferrable
do $$
declare rec record;
begin
for rec in select query from queries_to_make_constraints_deferrable
loop execute rec.query;
end loop;
end $$ ;
Carry out your import in a transaction with deferred constraints, then undo your alterations by replacing deferrable with not deferrable:
begin transaction;
SET CONSTRAINTS ALL DEFERRED;
--your `COPY...FROM...WITH` goes here
insert into b values (1,1);--without deferring constraints it fails here
insert into a values (1,1);
commit;
do $$
declare rec record;
begin
for rec in select query from queries_to_make_constraints_deferrable
loop execute replace(rec.query,'deferrable','not deferrable');
end loop;
end $$ ;
As already stated, an alternative would be to set up your schema without these constraints and add them after importing the data. That might require you to find and separate them from their table definitions, which again calls for a similar dynamic sql.

Why is table-swapping in Postgres so verbose?

I'd like to backfill a column of a large (20M rows), frequently-read but rarely-written table. From various articles and questions on SO, it seems like the best way to do this is create a table with identical structure, load in the backfilled data, and live-swap (since renaming is pretty quick). Sounds good!
But when I actually write the script to do this, it is mind-blowingly long. Here's a taste:
BEGIN;
CREATE TABLE foo_new (LIKE foo);
-- I don't use INCLUDING ALL, because that produces Indexes/Constraints with different names
-- This is the only part of the script that is specific to my case.
-- Everything else is standard for any table swap
INSERT INTO foo_new (id, first_name, last_name, email, full_name)
(SELECT id, first_name, last_name, email, first_name || last_name) FROM foo);
CREATE SEQUENCE foo_new_id_seq
START 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
SELECT setval('foo_new_id_seq', COALESCE((SELECT MAX(id)+1 FROM foo_new), 1), false);
ALTER SEQUENCE foo_new_id_seq OWNED BY foo_new.id;
ALTER TABLE ONLY foo_new ALTER COLUMN id SET DEFAULT nextval('foo_new_id_seq'::regclass);
ALTER TABLE foo_new
ADD CONSTRAINT foo_new_pkey
PRIMARY KEY (id);
COMMIT;
-- Indexes are made concurrently, otherwise they would block reads for
-- a long time. Concurrent index creation cannot occur within a transaction.
CREATE INDEX CONCURRENTLY foo_new_on_first_name ON foo_new USING btree (first_name);
CREATE INDEX CONCURRENTLY foo_new_on_last_name ON foo_new USING btree (last_name);
CREATE INDEX CONCURRENTLY foo_new_on_email ON foo_new USING btree (email);
-- One more line for each index
BEGIN;
ALTER TABLE foo RENAME TO foo_old;
ALTER TABLE foo_new RENAME TO foo;
ALTER SEQUENCE foo_id_seq RENAME TO foo_old_id_seq;
ALTER SEQUENCE foo_new_id_seq RENAME TO foo_id_seq;
ALTER TABLE foo_old RENAME CONSTRAINT foo_pkey TO foo_old_pkey;
ALTER TABLE foo RENAME CONSTRAINT foo_new_pkey TO foo_pkey;
ALTER INDEX foo_on_first_name RENAME TO foo_old_on_first_name;
ALTER INDEX foo_on_last_name RENAME TO foo_old_on_last_name;
ALTER INDEX foo_on_email RENAME TO foo_old_on_email;
-- One more line for each index
ALTER INDEX foo_new_on_first_name RENAME TO foo_on_first_name;
ALTER INDEX foo_new_on_last_name RENAME TO foo_on_last_name;
ALTER INDEX foo_new_on_email RENAME TO foo_on_email;
-- One more line for each index
COMMIT;
-- TODO: drop old table (CASCADE)
And this doesn't even include foreign keys, or other constraints! Since the only part of this that is specific to my case in the INSERT INTO bit, I'm surprised that there's no built-in Postgres function to do this sort of swapping. Is this operation less common than I make it out to be? Am I underestimating the variety of ways this can be accomplished? Is my desire to keep naming consistent an atypical one?
It's probably not all that common. Most tables aren't big enough to warrant it, and most applications can tolerate some amount of downtime here and there.
More importantly, different applications can afford to cut corners in different ways depending on their workload. The database server can't; it needs to handle (or to very deliberately not handle) every possible obscure edge-case, which is likely a lot harder than you might expect. Ultimately, writing tailored solutions for different use cases probably makes more sense.
Anyway, if you're just trying to implement a calculated field as first_name || last_name, there are better ways of doing it:
ALTER TABLE foo RENAME TO foo_base;
CREATE VIEW foo AS
SELECT
id,
first_name,
last_name,
email,
(first_name || last_name) AS full_name
FROM foo_base;
Assuming that your real case is more complicated, all of this effort may still be unnecessary. I believe that the copy-and-rename approach is largely based on the assumption that you need to lock the table against concurrent modifications for the duration of this process, and so the goal is to get it done as quickly as possible. If all concurrent operations are read-only - which appears to be the case, since you're not locking the table - then you're probably better off with a simple UPDATE (which won't block SELECTs), even if it does take a bit longer (though it does have the advantage of avoiding foreign key re-checks and TOAST table rewrites).
If this approach really is justified, I think there a few opportunities for improvement:
You don't need to recreate/reset the sequence; you can just link the existing sequence to the new table.
CREATE INDEX CONCURRENTLY seems unnecessary, as nobody else should be trying to access foo_new yet. In fact, if the whole script were in one transaction, it wouldn't even be externally visible at this point.
Table names only need to be unique within a schema. If you temporarily create a schema for the new table, you should be able to replace all of those RENAMEs with a single ALTER TABLE foo SET SCHEMA public.
Even if you don't expect concurrent writes, it wouldn't hurt to LOCK foo IN SHARE MODE anyway...
EDIT:
The sequence reassignment is a little more involved than I expected, as it seems that they need to stay in the same schema as their parent table. But here is (what appears to be) a working example:
BEGIN;
LOCK public.foo IN SHARE MODE;
CREATE SCHEMA tmp;
CREATE TABLE tmp.foo (LIKE public.foo);
INSERT INTO tmp.foo (id, first_name, last_name, email, full_name)
SELECT id, first_name, last_name, email, (first_name || last_name) FROM public.foo;
ALTER TABLE tmp.foo ADD CONSTRAINT foo_pkey PRIMARY KEY (id);
CREATE INDEX foo_on_first_name ON tmp.foo (first_name);
CREATE INDEX foo_on_last_name ON tmp.foo (last_name);
CREATE INDEX foo_on_email ON tmp.foo (email);
ALTER TABLE tmp.foo ALTER COLUMN id SET DEFAULT nextval('public.foo_id_seq');
ALTER SEQUENCE public.foo_id_seq OWNED BY NONE;
DROP TABLE public.foo;
ALTER TABLE tmp.foo SET SCHEMA public;
ALTER SEQUENCE public.foo_id_seq OWNED BY public.foo.id;
DROP SCHEMA tmp;
COMMIT;

Loading a big table, referenced by others, efficiently

My use case is the following:
I have big table users (~200 millions rows) of users with user_id as the primary key. users is referenced by several other tables using foreign key with ON DELETE CASCADE.
Every day I have to replace the whole content of users using a lot of csv files. (Please don't ask why I have to do that, I just have to...)
My idea was to set the primary key and all foreign keys as DEFERRED, then, in the same transaction, DELETE the whole table and copying all the csvs using the COPY command. The expected result was that all check and index calculation would happen at the end of the transaction.
But actually the insert process is super slow (4hours, against 10min if I insert and the put the primary key) AND no foreign key can refer to a deferrable primary.
I can't remove the primary key during the insertion because of the foreign keys. I don't want get rid of the foreign key either because I would have to simulate the behavior of ON DELETE CASCADE manually.
So basically I am looking for a way to tell postgres to not care about primary key index or foreign key check until the very end of the transaction.
PS1: I made up the users table, I am actually working with very different kind of data but it's not really relevant to the problem.
PS2: As a rough estimation I would say that every day, on my 200+ millions records, I have 10 records removed, 1million updated and 1million added.
A full delete + a full insert will cause a flood of cascading FK,
which will have to be postponed by DEFERRED,
which will cause an avalanche of aftermath for the DBMS at commit time.
Instead, dont {delete+create} keys, but keep them right where they are.
Also, dont touch records that dont need to be touched.
-- staging table
CREATE TABLE tmp_users AS SELECT * FROM big_users WHERE 1=0;
COPY TABLE tmp_users (...) FROM '...' WITH CSV;
-- ... and more copying ...
-- ... from more files ...
-- If this fails, you have a problem!
ALTER TABLE tmp_users
ADD PRIMARY KEY (id);
-- [EDIT]
-- I added this later, because the user_comments table
-- was not present in the original question.
DELETE FROM user_comments c
WHERE NOT EXISTS (
SELECT * FROM tmp_users u WHERE u.id = c.user_id
);
-- These deletes are allowed to cascade
-- [we assume that the mport of the CSV files was complete, here ...]
DELETE FROM big_users b
WHERE NOT EXISTS (
SELECT *
FROM tmp_users t
WHERE t.id = b.id
);
-- Only update the records that actually **change**
-- [ updates are expensive in terms of I/O, because they create row-versions
-- , and the need to delete the old row-versions, afterwards ]
-- Note that the key (id) does not change, so there will be no cascading.
-- ------------------------------------------------------------
UPDATE big_users b
SET name_1 = t.name_1
, name_2 = t.name_2
, address = t.address
-- , ... ALL THE COLUMNS here, except the key(s)
FROM tmp_users t
WHERE t.id = b.id
AND (t.name_1, t.name_2, t.address, ...) -- ALL THE COLUMNS, except the key(s)
IS DISTINCT FROM
(b.name_1, b.name_2, b.address, ...)
;
-- Maybe there were some new records in the CSV files. Add them.
INSERT INTO big_users (id,name_1,name_2,address, ...)
SELECT id,name_1,name_2,address, ...
FROM tmp_users t
WHERE NOT EXISTS (
SELECT *
FROM big_users x
WHERE x.id = t.id
);
I found a hacky solution :
update pg_index set indisvalid = false, indisready=false where indexrelid= 'users_pkey'::regclass;
DELETE FROM users;
COPY TABLE users FROM 'file.csv';
REINDEX INDEX users_pkey;
DELETE FROM user_comments c WHERE NOT EXISTS (SELECT * FROM users u WHERE u.id = c.user_id )
commit;
The magic dirty hack is to disable the primary key index in the postgres catalog and at the end to force the reindexing (which will override what we changed). I can't use foreign key with ON DELETE CASCADE because for some reason it makes the constraint being executed immediatly... So instead my foreign keys are ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED and I have to do the delete myself.
This works well in my case because only a few users are being referred in other tables.
I wish there was a cleaner solution though...

Foreign keys in postgresql can be violated by trigger

I've created some tables in postgres, added a foreign key from one table to another and set ON DELETE to CASCADE. Strangely enough, I have some fields that appear to be violating this constraint.
Is this normal behaviour? And if so, is there a way to get the behaviour I want (no violations possible)?
Edit:
I orginaly created the foreign key as part of CREATE TABLE, just using
... REFERENCES product (id) ON UPDATE CASCADE ON DELETE CASCADE
The current code pgAdmin3 gives is
ALTER TABLE cultivar
ADD CONSTRAINT cultivar_id_fkey FOREIGN KEY (id)
REFERENCES product (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE;
Edit 2:
To Clarify, I have a sneaking suspicion that the constraints are only checked when updates/inserts happen but are then never looked at again. Unfortunately I don't know enough about postgres to find out if this is true or how fields could end up in the database without those checks being run.
If this is the case, is there some way to check all the foreign keys and fix those problems?
Edit 3:
A constraint violation can be caused by a faulty trigger, see below
I tried to create a simple example that shows foreign key constraint being enforced. With this example I prove I'm not allowed to enter data that violates the fk and I prove that if the fk is not in place during insert, and I enable the fk, the fk constraint throws an error telling me data violates the fk. So I'm not seeing how you have data in the table that violates a fk that is in place. I'm on 9.0, but this should not be different on 8.3. If you can show a working example that proves your issue that might help.
--CREATE TABLES--
CREATE TABLE parent
(
parent_id integer NOT NULL,
first_name character varying(50) NOT NULL,
CONSTRAINT pk_parent PRIMARY KEY (parent_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE parent OWNER TO postgres;
CREATE TABLE child
(
child_id integer NOT NULL,
parent_id integer NOT NULL,
first_name character varying(50) NOT NULL,
CONSTRAINT pk_child PRIMARY KEY (child_id),
CONSTRAINT fk1_child FOREIGN KEY (parent_id)
REFERENCES parent (parent_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
)
WITH (
OIDS=FALSE
);
ALTER TABLE child OWNER TO postgres;
--CREATE TABLES--
--INSERT TEST DATA--
INSERT INTO parent(parent_id,first_name)
SELECT 1,'Daddy'
UNION
SELECT 2,'Mommy';
INSERT INTO child(child_id,parent_id,first_name)
SELECT 1,1,'Billy'
UNION
SELECT 2,1,'Jenny'
UNION
SELECT 3,1,'Kimmy'
UNION
SELECT 4,2,'Billy'
UNION
SELECT 5,2,'Jenny'
UNION
SELECT 6,2,'Kimmy';
--INSERT TEST DATA--
--SHOW THE DATA WE HAVE--
select parent.first_name,
child.first_name
from parent
inner join child
on child.parent_id = parent.parent_id
order by parent.first_name, child.first_name asc;
--SHOW THE DATA WE HAVE--
--DELETE PARENT WHO HAS CHILDREN--
BEGIN TRANSACTION;
delete from parent
where parent_id = 1;
--Check to see if any children that were linked to Daddy are still there?
--None there so the cascade delete worked.
select parent.first_name,
child.first_name
from parent
right outer join child
on child.parent_id = parent.parent_id
order by parent.first_name, child.first_name asc;
ROLLBACK TRANSACTION;
--TRY ALLOW NO REFERENTIAL DATA IN--
BEGIN TRANSACTION;
--Get rid of fk constraint so we can insert red headed step child
ALTER TABLE child DROP CONSTRAINT fk1_child;
INSERT INTO child(child_id,parent_id,first_name)
SELECT 7,99999,'Red Headed Step Child';
select parent.first_name,
child.first_name
from parent
right outer join child
on child.parent_id = parent.parent_id
order by parent.first_name, child.first_name asc;
--Will throw FK check violation because parent 99999 doesn't exist in parent table
ALTER TABLE child
ADD CONSTRAINT fk1_child FOREIGN KEY (parent_id)
REFERENCES parent (parent_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE;
ROLLBACK TRANSACTION;
--TRY ALLOW NO REFERENTIAL DATA IN--
--DROP TABLE parent;
--DROP TABLE child;
Everything I've read so far seems to suggest that constraints are only checked when the data is inserted. (Or when the constraint is created) For example the manual on set constraints.
This makes sense and - if the database works properly - should be good enough. I'm still curious how I managed to circumvent this or if I just read the situation wrong and there was never a real constraint violation to begin with.
Either way, case closed :-/
------- UPDATE --------
There was definitely a constraint violation, caused by a faulty trigger. Here's a script to replicate:
-- Create master table
CREATE TABLE product
(
id INT NOT NULL PRIMARY KEY
);
-- Create second table, referencing the first
CREATE TABLE example
(
id int PRIMARY KEY REFERENCES product (id) ON DELETE CASCADE
);
-- Create a (broken) trigger function
--CREATE LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION delete_product()
RETURNS trigger AS
$BODY$
BEGIN
DELETE FROM product WHERE product.id = OLD.id;
-- This is an error!
RETURN null;
END;
$BODY$
LANGUAGE plpgsql;
-- Add it to the second table
CREATE TRIGGER example_delete
BEFORE DELETE
ON example
FOR EACH ROW
EXECUTE PROCEDURE delete_product();
-- Now lets add a row
INSERT INTO product (id) VALUES (1);
INSERT INTO example (id) VALUES (1);
-- And now lets delete the row
DELETE FROM example WHERE id = 1;
/*
Now if everything is working, this should return two columns:
(pid,eid)=(1,1). However, it returns only the example id, so
(pid,eid)=(0,1). This means the foreign key constraint on the
example table is violated.
*/
SELECT product.id AS pid, example.id AS eid FROM product FULL JOIN example ON product.id = example.id;

CASCADE DELETE just once

I have a Postgresql database on which I want to do a few cascading deletes. However, the tables aren't set up with the ON DELETE CASCADE rule. Is there any way I can perform a delete and tell Postgresql to cascade it just this once? Something equivalent to
DELETE FROM some_table CASCADE;
The answers to this older question make it seem like no such solution exists, but I figured I'd ask this question explicitly just to be sure.
No. To do it just once you would simply write the delete statement for the table you want to cascade.
DELETE FROM some_child_table WHERE some_fk_field IN (SELECT some_id FROM some_Table);
DELETE FROM some_table;
This command will delete all data from all tables that have a foreign key to the specified table, plus everything that foreign keys to those tables, and so on. Proceed with extreme caution.
If you really want DELETE FROM some_table CASCADE; which means "remove all rows from table some_table", you can use TRUNCATE instead of DELETE and CASCADE is always supported. However, if you want to use selective delete with a where clause, TRUNCATE is not good enough.
USE WITH CARE - This will drop all rows of all tables which have a foreign key constraint on some_table and all tables that have constraints on those tables, etc.
Postgres supports CASCADE with TRUNCATE command:
TRUNCATE some_table CASCADE;
Handily this is transactional (i.e. can be rolled back), although it is not fully isolated from other concurrent transactions, and has several other caveats. Read the docs for details.
I wrote a (recursive) function to delete any row based on its primary key. I wrote this because I did not want to create my constraints as "on delete cascade". I wanted to be able to delete complex sets of data (as a DBA) but not allow my programmers to be able to cascade delete without thinking through all of the repercussions.
I'm still testing out this function, so there may be bugs in it -- but please don't try it if your DB has multi column primary (and thus foreign) keys. Also, the keys all have to be able to be represented in string form, but it could be written in a way that doesn't have that restriction. I use this function VERY SPARINGLY anyway, I value my data too much to enable the cascading constraints on everything.
Basically this function is passed in the schema, table name, and primary value (in string form), and it will start by finding any foreign keys on that table and makes sure data doesn't exist-- if it does, it recursively calls itsself on the found data. It uses an array of data already marked for deletion to prevent infinite loops. Please test it out and let me know how it works for you. Note: It's a little slow.
I call it like so:
select delete_cascade('public','my_table','1');
create or replace function delete_cascade(p_schema varchar, p_table varchar, p_key varchar, p_recursion varchar[] default null)
returns integer as $$
declare
rx record;
rd record;
v_sql varchar;
v_recursion_key varchar;
recnum integer;
v_primary_key varchar;
v_rows integer;
begin
recnum := 0;
select ccu.column_name into v_primary_key
from
information_schema.table_constraints tc
join information_schema.constraint_column_usage AS ccu ON ccu.constraint_name = tc.constraint_name and ccu.constraint_schema=tc.constraint_schema
and tc.constraint_type='PRIMARY KEY'
and tc.table_name=p_table
and tc.table_schema=p_schema;
for rx in (
select kcu.table_name as foreign_table_name,
kcu.column_name as foreign_column_name,
kcu.table_schema foreign_table_schema,
kcu2.column_name as foreign_table_primary_key
from information_schema.constraint_column_usage ccu
join information_schema.table_constraints tc on tc.constraint_name=ccu.constraint_name and tc.constraint_catalog=ccu.constraint_catalog and ccu.constraint_schema=ccu.constraint_schema
join information_schema.key_column_usage kcu on kcu.constraint_name=ccu.constraint_name and kcu.constraint_catalog=ccu.constraint_catalog and kcu.constraint_schema=ccu.constraint_schema
join information_schema.table_constraints tc2 on tc2.table_name=kcu.table_name and tc2.table_schema=kcu.table_schema
join information_schema.key_column_usage kcu2 on kcu2.constraint_name=tc2.constraint_name and kcu2.constraint_catalog=tc2.constraint_catalog and kcu2.constraint_schema=tc2.constraint_schema
where ccu.table_name=p_table and ccu.table_schema=p_schema
and TC.CONSTRAINT_TYPE='FOREIGN KEY'
and tc2.constraint_type='PRIMARY KEY'
)
loop
v_sql := 'select '||rx.foreign_table_primary_key||' as key from '||rx.foreign_table_schema||'.'||rx.foreign_table_name||'
where '||rx.foreign_column_name||'='||quote_literal(p_key)||' for update';
--raise notice '%',v_sql;
--found a foreign key, now find the primary keys for any data that exists in any of those tables.
for rd in execute v_sql
loop
v_recursion_key=rx.foreign_table_schema||'.'||rx.foreign_table_name||'.'||rx.foreign_column_name||'='||rd.key;
if (v_recursion_key = any (p_recursion)) then
--raise notice 'Avoiding infinite loop';
else
--raise notice 'Recursing to %,%',rx.foreign_table_name, rd.key;
recnum:= recnum +delete_cascade(rx.foreign_table_schema::varchar, rx.foreign_table_name::varchar, rd.key::varchar, p_recursion||v_recursion_key);
end if;
end loop;
end loop;
begin
--actually delete original record.
v_sql := 'delete from '||p_schema||'.'||p_table||' where '||v_primary_key||'='||quote_literal(p_key);
execute v_sql;
get diagnostics v_rows= row_count;
--raise notice 'Deleting %.% %=%',p_schema,p_table,v_primary_key,p_key;
recnum:= recnum +v_rows;
exception when others then recnum=0;
end;
return recnum;
end;
$$
language PLPGSQL;
If I understand correctly, you should be able to do what you want by dropping the foreign key constraint, adding a new one (which will cascade), doing your stuff, and recreating the restricting foreign key constraint.
For example:
testing=# create table a (id integer primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "a_pkey" for table "a"
CREATE TABLE
testing=# create table b (id integer references a);
CREATE TABLE
-- put some data in the table
testing=# insert into a values(1);
INSERT 0 1
testing=# insert into a values(2);
INSERT 0 1
testing=# insert into b values(2);
INSERT 0 1
testing=# insert into b values(1);
INSERT 0 1
-- restricting works
testing=# delete from a where id=1;
ERROR: update or delete on table "a" violates foreign key constraint "b_id_fkey" on table "b"
DETAIL: Key (id)=(1) is still referenced from table "b".
-- find the name of the constraint
testing=# \d b;
Table "public.b"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
Foreign-key constraints:
"b_id_fkey" FOREIGN KEY (id) REFERENCES a(id)
-- drop the constraint
testing=# alter table b drop constraint b_a_id_fkey;
ALTER TABLE
-- create a cascading one
testing=# alter table b add FOREIGN KEY (id) references a(id) on delete cascade;
ALTER TABLE
testing=# delete from a where id=1;
DELETE 1
testing=# select * from a;
id
----
2
(1 row)
testing=# select * from b;
id
----
2
(1 row)
-- it works, do your stuff.
-- [stuff]
-- recreate the previous state
testing=# \d b;
Table "public.b"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
Foreign-key constraints:
"b_id_fkey" FOREIGN KEY (id) REFERENCES a(id) ON DELETE CASCADE
testing=# alter table b drop constraint b_id_fkey;
ALTER TABLE
testing=# alter table b add FOREIGN KEY (id) references a(id) on delete restrict;
ALTER TABLE
Of course, you should abstract stuff like that into a procedure, for the sake of your mental health.
Yeah, as others have said, there's no convenient 'DELETE FROM my_table ... CASCADE' (or equivalent). To delete non-cascading foreign key-protected child records and their referenced ancestors, your options include:
Perform all the deletions explicitly, one query at a time, starting with child tables (though this won't fly if you've got circular references); or
Perform all the deletions explicitly in a single (potentially massive) query; or
Assuming your non-cascading foreign key constraints were created as 'ON DELETE NO ACTION DEFERRABLE', perform all the deletions explicitly in a single transaction; or
Temporarily drop the 'no action' and 'restrict' foreign key constraints in the graph, recreate them as CASCADE, delete the offending ancestors, drop the foreign key constraints again, and finally recreate them as they were originally (thus temporarily weakening the integrity of your data); or
Something probably equally fun.
It's on purpose that circumventing foreign key constraints isn't made convenient, I assume; but I do understand why in particular circumstances you'd want to do it. If it's something you'll be doing with some frequency, and if you're willing to flout the wisdom of DBAs everywhere, you may want to automate it with a procedure.
I came here a few months ago looking for an answer to the "CASCADE DELETE just once" question (originally asked over a decade ago!). I got some mileage out of Joe Love's clever solution (and Thomas C. G. de Vilhena's variant), but in the end my use case had particular requirements (handling of intra-table circular references, for one) that forced me to take a different approach. That approach ultimately became recursively_delete (PG 10.10).
I've been using recursively_delete in production for a while, now, and finally feel (warily) confident enough to make it available to others who might wind up here looking for ideas. As with Joe Love's solution, it allows you to delete entire graphs of data as if all foreign key constraints in your database were momentarily set to CASCADE, but offers a couple additional features:
Provides an ASCII preview of the deletion target and its graph of
dependents.
Performs deletion in a single query using recursive CTEs.
Handles circular dependencies, intra- and inter-table.
Handles composite keys.
Skips 'set default' and 'set null' constraints.
I cannot comment Palehorse's answer so I added my own answer.
Palehorse's logic is ok but efficiency can be bad with big data sets.
DELETE FROM some_child_table sct
WHERE exists (SELECT FROM some_Table st
WHERE sct.some_fk_fiel=st.some_id);
DELETE FROM some_table;
It is faster if you have indexes on columns and data set is bigger than few records.
You can use to automate this, you could define the foreign key constraint with ON DELETE CASCADE.
I quote the the manual of foreign key constraints:
CASCADE specifies that when a referenced row is deleted, row(s)
referencing it should be automatically deleted as well.
I took Joe Love's answer and rewrote it using the IN operator with sub-selects instead of = to make the function faster (according to Hubbitus's suggestion):
create or replace function delete_cascade(p_schema varchar, p_table varchar, p_keys varchar, p_subquery varchar default null, p_foreign_keys varchar[] default array[]::varchar[])
returns integer as $$
declare
rx record;
rd record;
v_sql varchar;
v_subquery varchar;
v_primary_key varchar;
v_foreign_key varchar;
v_rows integer;
recnum integer;
begin
recnum := 0;
select ccu.column_name into v_primary_key
from
information_schema.table_constraints tc
join information_schema.constraint_column_usage AS ccu ON ccu.constraint_name = tc.constraint_name and ccu.constraint_schema=tc.constraint_schema
and tc.constraint_type='PRIMARY KEY'
and tc.table_name=p_table
and tc.table_schema=p_schema;
for rx in (
select kcu.table_name as foreign_table_name,
kcu.column_name as foreign_column_name,
kcu.table_schema foreign_table_schema,
kcu2.column_name as foreign_table_primary_key
from information_schema.constraint_column_usage ccu
join information_schema.table_constraints tc on tc.constraint_name=ccu.constraint_name and tc.constraint_catalog=ccu.constraint_catalog and ccu.constraint_schema=ccu.constraint_schema
join information_schema.key_column_usage kcu on kcu.constraint_name=ccu.constraint_name and kcu.constraint_catalog=ccu.constraint_catalog and kcu.constraint_schema=ccu.constraint_schema
join information_schema.table_constraints tc2 on tc2.table_name=kcu.table_name and tc2.table_schema=kcu.table_schema
join information_schema.key_column_usage kcu2 on kcu2.constraint_name=tc2.constraint_name and kcu2.constraint_catalog=tc2.constraint_catalog and kcu2.constraint_schema=tc2.constraint_schema
where ccu.table_name=p_table and ccu.table_schema=p_schema
and TC.CONSTRAINT_TYPE='FOREIGN KEY'
and tc2.constraint_type='PRIMARY KEY'
)
loop
v_foreign_key := rx.foreign_table_schema||'.'||rx.foreign_table_name||'.'||rx.foreign_column_name;
v_subquery := 'select "'||rx.foreign_table_primary_key||'" as key from '||rx.foreign_table_schema||'."'||rx.foreign_table_name||'"
where "'||rx.foreign_column_name||'"in('||coalesce(p_keys, p_subquery)||') for update';
if p_foreign_keys #> ARRAY[v_foreign_key] then
--raise notice 'circular recursion detected';
else
p_foreign_keys := array_append(p_foreign_keys, v_foreign_key);
recnum:= recnum + delete_cascade(rx.foreign_table_schema, rx.foreign_table_name, null, v_subquery, p_foreign_keys);
p_foreign_keys := array_remove(p_foreign_keys, v_foreign_key);
end if;
end loop;
begin
if (coalesce(p_keys, p_subquery) <> '') then
v_sql := 'delete from '||p_schema||'."'||p_table||'" where "'||v_primary_key||'"in('||coalesce(p_keys, p_subquery)||')';
--raise notice '%',v_sql;
execute v_sql;
get diagnostics v_rows = row_count;
recnum := recnum + v_rows;
end if;
exception when others then recnum=0;
end;
return recnum;
end;
$$
language PLPGSQL;
The delete with the cascade option only applied to tables with foreign keys defined. If you do a delete, and it says you cannot because it would violate the foreign key constraint, the cascade will cause it to delete the offending rows.
If you want to delete associated rows in this way, you will need to define the foreign keys first. Also, remember that unless you explicitly instruct it to begin a transaction, or you change the defaults, it will do an auto-commit, which could be very time consuming to clean up.
When you creating new table, you can add some constrains like UNIQUE, or NOT NULL, also you can show SQL which action it should do when you trying to DELETE rows, which has REFERENCES on another tables
CREATE TABLE company (
id SERIAL PRIMARY KEY,
name VARCHAR(128),
year DATE);
CREATE TABLE employee (
id SERIAL PRIMARY KEY,
first_name VARCHAR(128) NOT NULL,
last_name VARCHAR(128) NOT NULL,
company_id INT REFERENCES company(id) ON DELETE CASCADE,
salary INT,
UNIQUE (first_name, last_name));
So after that you can just DELETE any rows which you need, for example:
DELETE
FROM company
WHERE id = 2;