Trigger sometimes fails with duplicate key error - postgresql

I'm using a PostgreSQL RDS instance in AWS. Basically, there is a query that inserts data into a first table, let's call it table. The data there can have duplicates in some fields (except for the primary key obviously).
Then there is the trigger that updates another table, infotable, allowing no duplicates.
The trigger:
CREATE TRIGGER insert_infotable AFTER INSERT ON table
FOR EACH ROW EXECUTE PROCEDURE insert_infotable();
The relevant part of the trigger function looks like this:
CREATE OR REPLACE FUNCTION insert_infotable() RETURNS trigger AS $insert_infotable$
BEGIN
--some irrelevant code
IF NOT EXISTS (SELECT * FROM infotable WHERE col1 = NEW.col1 AND col2 = NEW.col2) THEN
INSERT INTO infotable(col1, col2, col3, col4, col5, col6) values (--some values--);
END IF;
RETURN NEW;
END;
$insert_infotable$ LANGUAGE plpgsql;
The table infotable has a UNIQUE constraint on the columns col1 and col2.
In general all is working fine, but rarely, about once in 1k inserts, the trigger returns an error 'duplicate key value violates unique constraint "unique_col1_and_col2"' for table infotable. Which shouldn't happen since there is the IF NOT EXISTS part in the trigger function.
The first question is what might be the cause of this? The only thing I can think of is races where two users are getting the same info simultaneously, both trigger the trigger but then one updates the second table via trigger and the second user gets the duplicate error. And because of that his whole insert query fails, including the insert to the main table.
If that's the case, what can I do about it? Is using a lock on insert a good idea for a table that is supposed to have 100+ users inserting data simultaneously?
And if yes, what type of lock should I use and what table should I lock -- the main table, or the second one, which gets modified by the trigger? (or I guess should I have the lock with my main insert statement or inside the trigger function?)

Yes, this is a race condition. Two such triggers running concurrently won't see each other's modifications, because the transactions are not yet committed.
Since you have a unique constraint on infotable, you can simply use
INSERT INTO infotable ...
ON CONFLICT (col1, col2) DO NOTHING;

Related

Postgres: raise exception from trigger if column is in INSERT or UPDATE satement

I want to audit created_by, created_timestamp, modified_by, and modified_timestamp columns in my PostgreSQL table with triggers. Creating BEFORE INSERT and BEFORE UPDATE triggers to set these values to current_user and now() is reasonably straightforward.
However, if someone tries to do:
INSERT INTO SOMETABLE(someColumn, created_by) VALUES ('test', 'someOtherUser');
I'd rather throw an exception like, 'Manually setting created_by in an INSERT query is not allowed." instead of having the trigger silently change 'someOtherUser' to current_user.
I thought I could accomplish this in the trigger with:
if new.created_by is not null then raise exception 'Manually setting created_by in an INSERT query is not allowed.'; end if;
This works as expected for INSERT queries and triggers.
However, using the same strategy for UPDATE triggers, I'm finding it a bit more difficult, because the NEW record has the unchanged values from the existing row in addition to the changed values in the UPDATE query. (At least, I think that's what's happening.)
I can compare new.created_by to old.created_by to ensure they're the same, thus preventing the query from changing the value, but even though the end result is similar (i.e. the value in the table doesn't get changed), this really isn't the same as disallowing the column from being in the UPDATE query at all.
Is there an elegant way to determine if a column is present in the INSERT or UPDATE query? I've seen some suggestions here to convert to JSON and test that way, but that seems to be a rather ugly solution to me.
Are there other solutions to ensurevthese columns (created_by, created_timestamp, etc.) are only set by the trigger functions and are not manually settable in INSERT and UPDATE queries?
Create a special trigger for UPDATE with a name that is early in the alphabet, so that it is called before your other trigger:
CREATE FUNCTION yell() RETURNS trigger
LANGUAGE plpgsql AS
$$BEGIN
RAISE EXCEPTION 'direct update of "created_by" is forbidden';
END;$$;
CREATE TRIGGER aa_nosuchupdate
BEFORE UPDATE OF created_by FOR EACH ROW
EXECUTE PROCEDURE yell();
The INSERT case can be handled in your other trigger.

Postgresql: trigger on foreign table to execute function to truncate/insert into local table

I would like to create trigger to execute function to truncate local database table and insert new data.
Trigger execution must start after new row have insert in foreign database table.
I have read a lot about creating triggers on foreign table, but for me its not working. Trigger seems to not execute function when new row will be inserted in foreign table. It seems like trigger cant see this new row insert event.
What I did:
Created foreign table in my local database, lets call it 'foreign_table'. I tested, I can read data.
Created function to truncate local table and insert new data:
CREATE or replace FUNCTION public.reset_insert_table()
RETURNS TRIGGER
LANGUAGE 'plpgsql'
SET search_path=public
AS $BODY$
BEGIN
create temporary table temporary_table_tmp
as select * from public.table1;
TRUNCATE TABLE public.table2;
insert into table2
select * from temporary_table_tmp;
DROP table temporary_table_tmp;
END;
$BODY$;
Created trigger to launch function 'reset_insert_table()'
CREATE TRIGGER local_table_update
AFTER INSERT
ON 'foreign_table'
FOR EACH ROW EXECUTE PROCEDURE reset_insert_table();
Made test: inserted new row in foreign database table 'foreign_table', but I cant see that table is truncated and new data is not inserted. Insertion to foreign_tale was done in foreign database.
Problem was also testing does this trigger function work, executing manually will produce error:
EXECUTE PROCEDURE reset_insert_table();
ERROR: syntax error at or near "execute"
Tried also CALL and SELECT.
I created same function for testing but instead defining 'RETURNS TRIGGER'used 'RETURNS VOID' and function is working.
Can anyone tell why my solution is not working and does trigger on foreign tables must see events happening in foreign tables?
According to your comments, you seem to be using logical replication.
While data modifications are replayed on the standby with logical replication, the parameter session_replication_role is set to replica to keep triggers and foreign key constraints from working.
If you want a trigger to be triggered by the replay of data via logical replication, you have to declare it as a replica trigger:
ALTER TABLE a2 ENABLE REPLICA TRIGGER trigger_name;

How to properly emulate statement level triggers with access to data in postgres

I am using PostgreSQL as my database for a project at work. We use triggers in quite a few places to either maintain computed columns, or tables that essentially act as a materialized view.
All this worked just fine when simply utilizing row level triggers to keep all this in sync. However when we wrote scripts to periodically import our customers data into the database, we ran into issues with either performance or problems with number of locks in a single transaction.
To alleviate this I wanted to create a statement-level trigger with access to the modified rows (inserted, updated or deleted). However as this is not possible I instead created a BEFORE statement-level trigger that would create a temporary table. Then an AFTER row-level trigger that would insert the changed data into the temporary table. At last an AFTER statement-level trigger that would read the changes and perform necessary updates, and then drop the temporary table.
All this works just fine, assuming that within the triggers, no one would re-trigger the same flow again (as the temporary table would then already exist).
However I then learned that when using foreign key constraints with ON DELETE SET NULL, it is simply implemented with a system trigger that sets the column to NULL. This of course is not a problem at all, except for the fact that when you have several foreign key constraints like this on a single table, all referencing the same table (let's just call this files). When deleting a row from the files table, all these system level triggers to handle the ON DELETE SET NULL clause all fire at the same time, that is in parallel. Which presents a serious issue for me.
How would I go about implementing something like this? Here is a short SQL script to illustrate the problem:
CREATE TABLE files (
id serial PRIMARY KEY,
"name" TEXT NOT NULL
);
CREATE TABLE profiles (
id serial PRIMARY KEY,
NAME TEXT NOT NULL,
cv_file_id INT REFERENCES files(id) ON DELETE SET NULL,
photo_file_id INT REFERENCES files(id) ON DELETE SET NULL
);
CREATE TABLE profile_audit (
profile_id INT NOT NULL,
modified_at timestamptz NOT NULL
);
CREATE FUNCTION pre_stmt_create_temp_table()
RETURNS TRIGGER
AS $$
BEGIN
CREATE TEMPORARY TABLE tmp_modified_profiles (
id INT NOT NULL
) ON COMMIT DROP;
RETURN NULL;
END;
$$ LANGUAGE 'plpgsql';
CREATE FUNCTION insert_modified_profile_to_temp_table()
RETURNS TRIGGER
AS $$
BEGIN
INSERT INTO tmp_modified_profiles(id) VALUES (NEW.id);
RETURN NULL;
END;
$$ LANGUAGE 'plpgsql';
CREATE FUNCTION post_stmt_insert_rows_and_drop_temp_table()
RETURNS TRIGGER
AS $$
BEGIN
INSERT INTO profile_audit (id, modified_at)
SELECT t.id, CURRENT_TIMESTAMP FROM tmp_modified_profiles t;
DROP TABLE tmp_modified_profiles;
RETURN NULL;
END;
$$ LANGUAGE 'plpgsql';
CREATE TRIGGER tr_create_working_table BEFORE UPDATE ON profiles FOR EACH STATEMENT EXECUTE PROCEDURE pre_stmt_create_temp_table();
CREATE TRIGGER tr_insert_row_to_working_table AFTER UPDATE ON profiles FOR EACH ROW EXECUTE PROCEDURE insert_modified_profile_to_temp_table();
CREATE TRIGGER tr_insert_modified_rows_and_drop_working_table AFTER UPDATE ON profiles FOR EACH STATEMENT EXECUTE PROCEDURE post_stmt_insert_rows_and_drop_temp_table();
INSERT INTO files ("name") VALUES ('photo.jpg'), ('my_cv.pdf');
INSERT INTO profiles ("name") VALUES ('John Doe');
DELETE FROM files WHERE "name" = 'photo.jpg';
It would be a serious hack, but meanwhile, until PostgreSQL 9.5 is out, I would try to use CONSTRAINT triggers deferred to the end of the transaction. I am not really sure this will work, but might be worth trying.
You could use a status column to track inserts and updates for your statement-level triggers.
In a BEFORE INSERT OR UPDATE row-level trigger:
SET NEW.status = TG_OP;
Now you can use statement-level AFTER triggers:
BEGIN
DO FUNNY THINGS
WHERE status = 'INSERT';
-- reset the status
UPDATE mytable
SET status = NULL
WHERE status = 'INSERT';
END;
However, if you want to deal with deletes as well, you'll need something like this in your row-level trigger:
INSERT INTO status_table (table_name, op, id) VALUES (TG_TABLE_NAME, TG_OP, OLD.id);
Then, in your statement-level AFTER trigger, you can go like:
BEGIN
DO FUNNY THINGS
WHERE id IN (SELECT id FROM status_table
WHERE table_name = TG_TABLE_NAME AND op = TG_OP); -- just an example
-- reset the status
DELETE FROM status_table
WHERE table_name = TG_TABLE_NAME AND op = TG_OP;
END;

postgres autoincrement not updated on explicit id inserts

I have the following table in postgres:
CREATE TABLE "test" (
"id" serial NOT NULL PRIMARY KEY,
"value" text
)
I am doing following insertions:
insert into test (id, value) values (1, 'alpha')
insert into test (id, value) values (2, 'beta')
insert into test (value) values ('gamma')
In the first 2 inserts I am explicitly mentioning the id. However the table's auto increment pointer is not updated in this case. Hence in the 3rd insert I get the error:
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (id)=(1) already exists.
I never faced this problem in Mysql in both MyISAM and INNODB engines. Explicit or not, mysql always update autoincrement pointer based on the max row id.
What is the workaround for this problem in postgres? I need it because I want a tighter control for some ids in my table.
UPDATE:
I need it because for some values I need to have a fixed id. For other new entries I dont mind creating new ones.
I think it may be possible by manually incrementing the nextval pointer to max(id) + 1 whenever I am explicitly inserting the ids. But I am not sure how to do that.
That's how it's supposed to work - next_val('test_id_seq') is only called when the system needs a value for this column and you have not provided one. If you provide value no such call is performed and consequently the sequence is not "updated".
You could work around this by manually setting the value of the sequence after your last insert with explicitly provided values:
SELECT setval('test_id_seq', (SELECT MAX(id) from "test"));
The name of the sequence is autogenerated and is always tablename_columnname_seq.
In the recent version of Django, this topic is discussed in the documentation:
Django uses PostgreSQL’s SERIAL data type to store auto-incrementing
primary keys. A SERIAL column is populated with values from a sequence
that keeps track of the next available value. Manually assigning a
value to an auto-incrementing field doesn’t update the field’s
sequence, which might later cause a conflict.
Ref: https://docs.djangoproject.com/en/dev/ref/databases/#manually-specified-autoincrement-pk
There is also management command manage.py sqlsequencereset app_label ... that is able to generate SQL statements for resetting sequences for the given app name(s)
Ref: https://docs.djangoproject.com/en/dev/ref/django-admin/#django-admin-sqlsequencereset
For example these SQL statements were generated by manage.py sqlsequencereset my_app_in_my_project:
BEGIN;
SELECT setval(pg_get_serial_sequence('"my_project_aaa"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_aaa";
SELECT setval(pg_get_serial_sequence('"my_project_bbb"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_bbb";
SELECT setval(pg_get_serial_sequence('"my_project_ccc"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_ccc";
COMMIT;
It can be done automatically using a trigger. This way you are sure that the largest value is always used as the next default value.
CREATE OR REPLACE FUNCTION set_serial_id_seq()
RETURNS trigger AS
$BODY$
BEGIN
EXECUTE (FORMAT('SELECT setval(''%s_%s_seq'', (SELECT MAX(%s) from %s));',
TG_TABLE_NAME,
TG_ARGV[0],
TG_ARGV[0],
TG_TABLE_NAME));
RETURN OLD;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER set_mytable_id_seq
AFTER INSERT OR UPDATE OR DELETE
ON mytable
FOR EACH STATEMENT
EXECUTE PROCEDURE set_serial_id_seq('mytable_id');
The function can be reused for multiple tables. Change "mytable" to the table of interest.
For more info regarding triggers:
https://www.postgresql.org/docs/9.1/plpgsql-trigger.html
https://www.postgresql.org/docs/9.1/sql-createtrigger.html

Need help writing a PostgreSQL trigger function

I have two tables representing two different types of imagery. I am using PostGIS to represent the boundaries of those images. Here is a simplified example of those tables:
CREATE TABLE img_format_a (
id SERIAL PRIMARY KEY,
file_path VARCHAR(1000),
boundary GEOGRAPHY(POLYGON, 4326)
);
CREATE TABLE img_format_p (
id SERIAL PRIMARY KEY,
file_path VARCHAR(1000),
boundary GEOGRAPHY(POLYGON, 4326)
);
I also have a cross reference table, which I want to contain all the IDs of the images that overlap each other. Whenever an image of type "A" gets inserted into the database, I want to check to see whether it overlaps any of the existing imagery of type "P" (and vice versa) and insert corresponding entries into the img_a_img_p cross reference table. This table should represent a many-to-many relationship.
My first instinct is to write a trigger to manage thisimg_a_img_p table. I've never created a trigger before, so let me know if this is a silly thing to do, but it seems to make sense to me. So I create the following trigger:
CREATE TRIGGER update_a_p_cross_reference
AFTER INSERT OR DELETE OR UPDATE OF boundary
ON img_format_p FOR EACH ROW
EXECUTE PROCEDURE check_p_cross_reference();
The part where I am getting stuck is with writing the trigger function. My code is in Java and I see that there are tools like PL/pgSQL, but I'm not sure if that's what I should use or if I even need one of those special add-ons.
Essentially all I need the trigger to do is update the cross reference table each time a new image gets inserted into either img_format_a or img_format_p. When a new image is inserted, I would like to use a PostGIS function like ST_Intersects to determine whether the new image overlaps with any of the images in the other table. For each image pair where ST_INTERSECTS returns true, I would like to insert a new entry into img_a_img_p with the ID's of both images. Can someone help me figure out how to write this trigger function? Here is some pseudocode:
SELECT * FROM img_format_p P
WHERE ST_Intersects(A.boundary, P.boundary);
for each match in selection {
INSERT INTO img_a_img_p VALUES (A.id, P.id);
}
You could wrap the usual INSERT ... SELECT idiom in a PL/pgSQL function sort of like this:
create function check_p_cross_reference() returns trigger as
$$
begin
insert into img_a_img_p (img_a_id, img_p_id)
select a.id, p.id
from img_format_a, img_format_p
where p.id = NEW.id
and ST_Intersects(a.boundary, p.boundary);
return null;
end;
$$ language plpgsql;
Triggers have two extra variables, NEW and OLD:
NEW
Data type RECORD; variable holding the new database row for INSERT/UPDATE operations in row-level triggers. This variable is NULL in statement-level triggers and for DELETE operations.
OLD
Data type RECORD; variable holding the old database row for UPDATE/DELETE operations in row-level triggers. This variable is NULL in statement-level triggers and for INSERT operations.
So you can use NEW.id to access the new img_format_p value that's going in. You (currently) can't use the plain SQL language for triggers:
It is not currently possible to write a trigger function in the plain SQL function language.
but PL/pgSQL is pretty close. This would make sense as an AFTER INSERT trigger:
CREATE TRIGGER update_a_p_cross_reference
AFTER INSERT
ON img_format_p FOR EACH ROW
EXECUTE PROCEDURE check_p_cross_reference();
Deletes could be handled with a foreign key on img_a_img_p and a cascading delete. You could use your trigger for UPDATEs as well:
CREATE TRIGGER update_a_p_cross_reference
AFTER INSERT OR UPDATE OF boundary
ON img_format_p FOR EACH ROW
EXECUTE PROCEDURE check_p_cross_reference();
but you'd probably want to clear out the old entries before inserting the new ones with something like:
delete from img_a_img_p where img_p_id = NEW.id;
before the INSERT...SELECT statement.