duplicate key value error that is difficult to understand - postgresql

I have a table:
user_id | project_id | permission
--------------------------------------+--------------------------------------+------------
5911e84b-ab6f-4a51-942e-dab979882725 | b4f6d926-ac69-461f-9fd7-1992a1b1c5bc | owner
7e3581a4-f542-4abc-bbda-36fb91ea4bff | eff09e2a-c54b-4081-bde5-68de5d32dd73 | owner
46f9f2e3-edd1-40df-aa52-4bdc354abd38 | 59df2db8-5067-4bc2-b268-3fb1308d9d41 | owner
9089038d-4b77-4774-a095-a621fb73059a | 4f26ace1-f072-42d0-bd0d-ffbae9103b3f | owner
5911e84b-ab6f-4a51-942e-dab979882725 | 59df2db8-5067-4bc2-b268-3fb1308d9d41 | rw
I have a trigger on update:
--------------------------------------------------------------------------------
-- trigger that consumes the queue once the user responds
\set obj_name 'sharing_queue_on_update_trigger'
create or replace function :obj_name()
returns trigger as $$
begin
if new.status = 'accepted' then
-- add to the user_permissions table
insert into core.user_permissions (project_id, user_id, permission)
values (new.project, new.grantee, new.permission);
end if;
-- remove from the queue
delete from core.sharing_queue
where core.sharing_queue.grantee = new.grantee
and core.sharing_queue.project = new.project;
return null;
end;
$$ language plpgsql;
create trigger "Create a user_permission entry when user accepts invitation"
after update on core.sharing_queue
for each row
when (new.status != 'awaiting')
execute procedure :obj_name();
When I run the following update:
update sharing_queue set status='accepted' where project = 'eff09e2a-c54b-4081-bde5-68de5d32dd73';
The record in the following queue is supposed to fodder a new record in the first table presented.
grantor | maybe_grantee_email | project | permission | creation_date | grantee | status
--------------------------------------+---------------------+--------------------------------------+------------+---------------+--------------------------------------+----------
7e3581a4-f542-4abc-bbda-36fb91ea4bff | edmund#gmail.com | eff09e2a-c54b-4081-bde5-68de5d32dd73 | rw | | 46f9f2e3-edd1-40df-aa52-4bdc354abd38 | awaiting
(1 row)
Specifically, the grantee with id ending in 38, with the project_id ending in 73 is supposed feed a new record in the first table.
However, I get the following duplicate index error:
ERROR: duplicate key value violates unique constraint "pk_project_permissions_id"
DETAIL: Key (user_id, project_id)=(46f9f2e3-edd1-40df-aa52-4bdc354abd38, eff09e2a-c54b-4081-bde5-68de5d32dd73) already exists.
CONTEXT: SQL statement "insert into core.user_permissions (project_id, user_id, permission)
values (new.project, new.grantee, new.permission)
returning new"
I don't see how I'm violating the index. There is no record with the user and project combination in the first table presented. Right?
I'm new to using triggers this much. I'm wondering if somehow I might be triggering a "double" entry that cancels the transaction.
Any pointers would be greatly appreciated.
Requested Addendum
Here is the schema for user_permissions
--------------------------------------------------------------------------------
-- 📖 user_permissions
drop table if exists user_permissions;
create table user_permissions (
user_id uuid not null,
project_id uuid not null,
permission project_permission not null,
constraint pk_project_permissions_id primary key (user_id, project_id)
);
comment on column user_permissions.permission is 'Enum owner | rw | read';
comment on table user_permissions is 'Cannot add users directly; use sharing_queue';
-- ⚠️ deleted when the user is deleted
alter table user_permissions
add constraint fk_permissions_users
foreign key (user_id) references users(id)
on delete cascade;
-- ⚠️ deleted when the project is deleted
alter table user_permissions
add constraint fk_permissions_projects
foreign key (project_id) references projects(id)
on delete cascade;

Depending on the contents of the queue, the issue may be that you're not specifying that the record needs to be changed in your trigger:
create trigger "Create a user_permission entry when user accepts invitation"
after update on core.sharing_queue
for each row
when ((new.status != 'awaiting')
and (old.status IS DISTINCT FROM new.status))
execute procedure :obj_name();
Without the distinct check, the trigger would run once for each row where project = 'eff09e2a-c54b-4081-bde5-68de5d32dd73'.

The suggestions were helpful as they inspired the direction of the subsequent implementation. The initial fix using #TonyArra's additional when clause seemed to do the trick. The clause was no longer required once I created a series of on conflict UPSERT contingencies.

Related

How to implement dml error logging in Postgresql?

In the context of datawarehousing, ETL process must have a strategy for error handling. About that, Oracle has a great dml error logging feature that lets you insert/merge/update a million records without failing or rolling back when constraint violation occurs with one or more rows, which can be logged in a dedicated error table. After that you can investigate what is wrong with each row and correct the errors before repeating the insert/merge/update.
Is there any way to implement this feature in Postgresql ?
Since there is nothing built in, nor any useful extension exists, I searched for a solution based on a pgsql procedure and eventually found it. It works well in my use case, where some csv files must be loaded once a month into a staging db using foreign tables.
In the following test some records are inserted into the destination tables while other records that break an integrity constraint are inserted in an error table along with the error info.
test=# create table t1(c1 int primary key);
create table t2( f1 int ,f2 int, f3 numeric);
insert into t1 values(2),(11),(5),(12);
insert into t2 values(100,2,234),(57,11,25),(5,5,1231),(2,2,173),(2,12,240),(11,22,101),(3,12,99);
create table t3 as select * from t2 where 1+1=11;
alter table t3 add constraint t3_pk primary key(f1),add foreign key (f2) references t1(c1),add constraint f3_ck check(f3>100);
create table t3$err(f1 int,f2 int,f3 numeric, error_code varchar, error_message varchar, constraint_name varchar);
test=# do
$$
declare
rec Record;
v_err_code text;
v_err_message text;
v_constraint text;
begin
for rec in
select f1,
f2,
f3
from t2 --in my use case this is the foreign table reading a csv file
loop
begin
insert
into t3
values (rec.f1,
rec.f2,
rec.f3);
exception
when others then
get stacked diagnostics
v_err_code= returned_sqlstate,
v_err_message= MESSAGE_TEXT,
v_constraint= CONSTRAINT_NAME;
if left(v_err_code, 2) = '23' then --exception Class 23 — Integrity Constraint Violation
insert
into t3$err
values (rec.f1,
rec.f2,
rec.f3,
v_err_code,
v_err_message,
v_constraint);
raise notice 'record % inserted in error table',rec;
end if;
end;
end loop;
exception
when others then --outer exceptions different from constraint violations
get stacked diagnostics
v_err_code= returned_sqlstate;
raise notice 'sqlstate: %', v_err_code;
end;
$$;
NOTICE: record (57,11,25) inserted in error table
NOTICE: record (2,12,240) inserted in error table
NOTICE: record (11,22,101) inserted in error table
NOTICE: record (3,12,99) inserted in error table
test=# select * from t3;
f1 | f2 | f3
-----+----+------
100 | 2 | 234
5 | 5 | 1231
2 | 2 | 173
(3 rows)
test=# select * from t3$err;
f1 | f2 | f3 | error_code | error_message | constraint_name
----+----+-----+------------+-----------------------------------------------------------------------------+-----------------
57 | 11 | 25 | 23514 | new row for relation "t3" violates check constraint "f3_ck" | f3_ck
2 | 12 | 240 | 23505 | duplicate key value violates unique constraint "t3_pk" | t3_pk
11 | 22 | 101 | 23503 | insert or update on table "t3" violates foreign key constraint "t3_f2_fkey" | t3_f2_fkey
3 | 12 | 99 | 23514 | new row for relation "t3" violates check constraint "f3_ck" | f3_ck
(4 rows)
All the magics is done within the nested BEGIN..END, where each row passing the constraints is inserted in the target table or else inserted in the error table.
The above solution has many limitations, such as:
the oracle feature mentioned in the question is fully integrated with SQL (except for the plsql preliminaries in order to create the error table), while here a pgsql procedure is needed,
iterating over all the records is not exactly the most efficient way for data loading in comparison with a bulk loading executed through a pgsql procedure,
moreover, the loop is accompanied by a overhead due to the context switch between the procedural environment and the sql environment,
the error handling is not generic but must be addressing specific errors
when a record has more than one error, only the last one is inserted in the error table (there could be a solution for this point).

ERROR: more than one owned sequence found in Postgres

I'm setting up a identity column to my existing columns for the Patient table.
Here I would like to use GENERATED ALWAYS AS IDENTITY.
So I setup the identity column by using the following statement (previously it was serial):
ALTER TABLE Patient ALTER PatientId
ADD GENERATED ALWAYS AS IDENTITY (START WITH 1);
For the existing patient table I have a total of 5 records. (patientId 1 to 5)
When I insert a new record after the identity setup, it will throw an error like:
more than one owned sequence found
Even after resetting the identity column, I still get the same error.
ALTER TABLE Patient ALTER COLUMN PatientId RESTART WITH 6;
Let me know if you have any solutions.
Update: This bug has been fixed in PostgreSQL v12 with commit 19781729f78.
The rest of the answer is relevant for older versions.
A serial column has a sequence that is owned by the column and a DEFAULT value that gets the net sequence value.
If you try to change that column into an identity column, you'll get an error that there is already a default value for the column.
Now you must have dropped the default value, but not the sequence that belongs to the serial column. Then when you converted the column into an identity column, a second sequence owned by the column was created.
Now when you try to insert a row, PostgreSQL tries to find and use the sequence owned by the column, but there are two, hence the error message.
I'd argue that this is a bug in PostgreSQL: in my opinion, it should either have repurposed the existing sequence for the identity column or given you an error that there is already a sequence owned by the column, and you should drop it. I'll try to get this bug fixed.
Meanwhile, you should manually drop the sequence left behind from the serial column.
Run the following query:
SELECT d.objid::regclass
FROM pg_depend AS d
JOIN pg_attribute AS a ON d.refobjid = a.attrelid AND
d.refobjsubid = a.attnum
WHERE d.classid = 'pg_class'::regclass
AND d.refclassid = 'pg_class'::regclass
AND d.deptype <> 'i'
AND a.attname = 'patientid'
AND d.refobjid = 'patient'::regclass;
That should give you the name of the sequence left behind from the serial column. Drop it, and the identity column should behave as desired.
This is not an answer -- apologies, but this allows me to show, with a vivid image, the crazy behavior that I (unintentionally) uncovered this morning...
All I had to do was this:
alter TABLE db.generic_items alter column generic_item_id drop default;
alter TABLE db.generic_items alter column generic_item_id add generated by default as identity;
and now when scripting the table to SQL I get (abbreviated):
CREATE TABLE db.generic_items
(
generic_item_id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
generic_item_id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
generic_item_name character varying(50) COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT pk_generic_items PRIMARY KEY (generic_item_id),
)
I am thankful for the answer posted above, by Laurenz Albe! As he explains, just delete the sequence that was used for the serial default, and this craziness goes away and the table looks normal again.
Again, this is NOT AN ANSWER, but commenting did not let me add enough text.
Apology. Continues from my earlier comment(s).
This is what I executed and it shows, imo, that the manual fix is not sufficient, and with large tables, the repetitive trick I used (see below) would be impractical and potentially wrong because adopting an id belonging to a deleted row.
-- pls disregard the absence of 2 id rows, this is the final situation
\d vaste_data.studie_type
Table "vaste_data.studie_type"
Column | Type | Collation | Nullable | Default
--------+-----------------------+-----------+----------+----------------------------------
id | integer | | not null | generated by default as identity
naam | character varying(25) | | not null |
Indexes:
"pk_tstudytype_tstudytype_id" PRIMARY KEY, btree (id)
Referenced by:
TABLE "stuwadoors" CONSTRAINT "fk_t_stuwadoors_t_studytype" FOREIGN KEY (study_type_id) REFERENCES vaste_data.studie_type(id)
TABLE "psux" CONSTRAINT "study_studytype_fk" FOREIGN KEY (studie_type_id) FOREIGN KEY (studie_type_id) REFERENCES vaste_data.studie_type(id)
alter table vaste_data.studie_type alter column id drop default;
ALTER TABLE
alter table vaste_data.studie_type alter column id add generated by default as identity;
ALTER TABLE
-- I chose to show both sequences so I could try to drop either one.
SELECT d.objid::regclass
FROM pg_depend AS d
JOIN pg_attribute AS a ON d.refobjid = a.attrelid AND
d.refobjsubid = a.attnum
WHERE d.classid = 'pg_class'::regclass
AND d.refclassid = 'pg_class'::regclass
AND a.attname = 'id'
AND d.refobjid = 'vaste_data.studie_type'::regclass;
objid
-----------------------------------------
vaste_data.studie_type_id_seq
vaste_data.tstudytype_tstudytype_id_seq
(2 rows)
drop sequence vaste_data.studie_type_id_seq;
ERROR: cannot drop sequence vaste_data.studie_type_id_seq because column id of table vaste_data.studie_type requires it
HINT: You can drop column id of table vaste_data.studie_type instead.
\d vaste_data.studie_type_id_seq
Sequence "vaste_data.studie_type_id_seq"
Type | Start | Minimum | Maximum | Increment | Cycles? | Cache
---------+-------+---------+------------+-----------+---------+-------
integer | 1 | 1 | 2147483647 | 1 | no | 1
Sequence for identity column: vaste_data.studie_type.id
alter sequence vaste_data.studie_type_id_seq start 6;
ALTER SEQUENCE
drop sequence vaste_data.tstudytype_tstudytype_id_seq;
DROP SEQUENCE
insert into vaste_data.studie_type (naam) values('Overige leiding');
ERROR: duplicate key value violates unique constraint "pk_tstudytype_tstudytype_id"
DETAIL: Key (id)=(1) already exists.
...
ERROR: duplicate key value violates unique constraint "pk_tstudytype_tstudytype_id"
DETAIL: Key (id)=(5) already exists.
insert into vaste_data.studie_type (naam) values('Overige leiding');
INSERT 0 1

Using primary key & foreign key to build ltree

I am pretty new to postgres & especially new to ltree.
Searching the web for ltree brought me to examples where the tree was build by chaining characters. But I want to use the primary key & foreign key.
Therefore I build the following table:
create table fragment(
id serial primary key,
description text,
path ltree
);
create index tree_path_idx on fragment using gist (path);
Instead of A.B.G I want to have 1.3.5.
A root in the examples online is added like so:
insert into fragment (description, path) values ('A', 'A');
Instead of A I want to have the primary key (which I don't know at that moment). Is there a way to do that?
When adding a child I got the same problem:
insert into tree (letter, path) values ('B', '0.??');
I know the id of the parent but not of the child that I want to append.
Is there a way to do that or am I completey off track?
Thank you very much!
You could create a trigger which modifies path before each insert. For example, using this setup:
DROP TABLE IF EXISTS fragment;
CREATE TABLE fragment(
id serial primary key
, description text
, path ltree
);
CREATE INDEX tree_path_idx ON fragment USING gist (path);
Define the trigger:
CREATE OR REPLACE FUNCTION before_insert_on_fragment()
RETURNS TRIGGER LANGUAGE plpgsql AS $$
BEGIN
new.path := new.path || new.id::text;
return new;
END $$;
DROP TRIGGER IF EXISTS before_insert_on_fragment ON fragment;
CREATE TRIGGER before_insert_on_fragment
BEFORE INSERT ON fragment
FOR EACH ROW EXECUTE PROCEDURE before_insert_on_fragment();
Test the trigger:
INSERT INTO fragment (description, path) VALUES ('A', '');
SELECT * FROM fragment;
-- | id | description | path |
-- |----+-------------+------|
-- | 1 | A | 1 |
Now insert B under id = 1:
INSERT INTO fragment (description, path) VALUES ('B', (SELECT path FROM fragment WHERE id=1));
SELECT * FROM fragment;
-- | id | description | path |
-- |----+-------------+------|
-- | 1 | A | 1 |
-- | 2 | B | 1.2 |
Insert C under B:
INSERT INTO fragment (description, path) VALUES ('C', (SELECT path FROM fragment WHERE description='B'));
SELECT * FROM fragment;
-- | id | description | path |
-- |----+-------------+-------|
-- | 1 | A | 1 |
-- | 2 | B | 1.2 |
-- | 3 | C | 1.2.3 |
For anyone checking this in the future, I had the same issue and I figured out a way to do it without triggers and within the same INSERT query:
INSERT INTO fragment (description, path)
VALUES ('description', text2ltree('1.' || currval(pg_get_serial_sequence('fragment', 'id'))));
Explanation:
We can get the id of the current insert operation using currval(pg_get_serial_sequence('fragment', 'id')), which we can concatenate as a string with the parent full path 'parent_path' || and finally convert it to ltree using text2ltree(). The id from currval() doesn't have to be incremented because it is called during INSERT, so it is already incremented.
One edge case to be aware of is when you insert a node without any parent then you can't just remove the string concatenation '1.' || because the argument for text2ltree() must be text while id on its own is an integer. Instead you have concatenate the id with an empty string '' ||.
However, I prefer to create this function to get the path and clean up the insert query:
CREATE FUNCTION get_tree_path("table" TEXT, "column" TEXT, parent_path TEXT)
RETURNS LTREE
LANGUAGE PLPGSQL
AS
$$
BEGIN
IF NOT (parent_path = '') THEN
parent_path = parent_path || '.';
END IF;
RETURN text2ltree(parent_path || currval(pg_get_serial_sequence("table", "column")));
END;
$$
Then, you can call it like this:
INSERT INTO fragment (description, path)
VALUES ('description', get_tree_path('fragment', 'id', '1.9.32'));
If you don't have any parent, then replace the parent_path '1.9.32' with empty text ''.
I came up with this, needs the full parent path for insert, but the updates and deletes are simply cascaded :)
create table if not exists tree
(
-- primary key
id serial,
-- surrogate key
path ltree generated always as (coalesce(parent_path::text,'')::ltree || id::text::ltree) stored unique,
-- foreign key
parent_path ltree,
constraint fk_parent
foreign key(parent_path)
references tree(path)
on delete cascade
on update cascade,
-- content
name text
);

How to prevent insert, update and delete on inherited tables in PostgreSQL using BEFORE triggers

When using table inheritance, I would like to enforce that insert, update and delete statements should be done against descendant tables. I thought a simple way to do this would be using a trigger function like this:
CREATE FUNCTION test.prevent_action() RETURNS trigger AS $prevent_action$
BEGIN
RAISE EXCEPTION
'% on % is not allowed. Perform % on descendant tables only.',
TG_OP, TG_TABLE_NAME, TG_OP;
END;
$prevent_action$ LANGUAGE plpgsql;
...which I would reference from a trigger defined specified using BEFORE INSERT OR UPDATE OR DELETE.
This seems to work fine for inserts, but not for updates and deletes.
The following test sequence demonstrates what I've observed:
DROP SCHEMA IF EXISTS test CASCADE;
psql:simple.sql:1: NOTICE: schema "test" does not exist, skipping
DROP SCHEMA
CREATE SCHEMA test;
CREATE SCHEMA
-- A function to prevent anything
-- Used for tables that are meant to be inherited
CREATE FUNCTION test.prevent_action() RETURNS trigger AS $prevent_action$
BEGIN
RAISE EXCEPTION
'% on % is not allowed. Perform % on descendant tables only.',
TG_OP, TG_TABLE_NAME, TG_OP;
END;
$prevent_action$ LANGUAGE plpgsql;
CREATE FUNCTION
CREATE TABLE test.people (
person_id SERIAL PRIMARY KEY,
last_name text,
first_name text
);
psql:simple.sql:17: NOTICE: CREATE TABLE will create implicit sequence "people_person_id_seq" for serial column "people.person_id"
psql:simple.sql:17: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "people_pkey" for table "people"
CREATE TABLE
CREATE TRIGGER prevent_action BEFORE INSERT OR UPDATE OR DELETE ON test.people
FOR EACH ROW EXECUTE PROCEDURE test.prevent_action();
CREATE TRIGGER
CREATE TABLE test.students (
student_id SERIAL PRIMARY KEY
) INHERITS (test.people);
psql:simple.sql:24: NOTICE: CREATE TABLE will create implicit sequence "students_student_id_seq" for serial column "students.student_id"
psql:simple.sql:24: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "students_pkey" for table "students"
CREATE TABLE
--The trigger successfully prevents this INSERT from happening
--INSERT INTO test.people (last_name, first_name) values ('Smith', 'Helen');
INSERT INTO test.students (last_name, first_name) values ('Smith', 'Helen');
INSERT 0 1
INSERT INTO test.students (last_name, first_name) values ('Anderson', 'Niles');
INSERT 0 1
UPDATE test.people set first_name = 'Oh', last_name = 'Noes!';
UPDATE 2
SELECT student_id, person_id, first_name, last_name from test.students;
student_id | person_id | first_name | last_name
------------+-----------+------------+-----------
1 | 1 | Oh | Noes!
2 | 2 | Oh | Noes!
(2 rows)
DELETE FROM test.people;
DELETE 2
SELECT student_id, person_id, first_name, last_name from test.students;
student_id | person_id | first_name | last_name
------------+-----------+------------+-----------
(0 rows)
So I'm wondering what I've done wrong that allows updates and deletes directly against the test.people table in this example.
The trigger is set to execute FOR EACH ROW, but there is no row in test.people, that's why it's not run.
As a sidenote, you may issue select * from ONLY test.people to list the rows in test.people that don't belong to child tables.
The solution seems esasy: set a trigger FOR EACH STATEMENT instead of FOR EACH ROW, since you want to forbid the whole statement anyway.
CREATE TRIGGER prevent_action BEFORE INSERT OR UPDATE OR DELETE ON test.people
FOR EACH STATEMENT EXECUTE PROCEDURE test.prevent_action();

Using Rule to Insert Into Secondary Table Auto-Increments Sequence

To automatically add a column in a second table to tie it to the first table via a unique index, I have a rule such as follows:
CREATE OR REPLACE RULE auto_insert AS ON INSERT TO user DO ALSO
INSERT INTO lastlogin (id) VALUES (NEW.userid);
This works fine if user.userid is an integer. However, if it is a sequence (e.g., type serial or bigserial), what is inserted into table lastlogin is the next sequence id. So this command:
INSERT INTO user (username) VALUES ('john');
would insert column [1, 'john', ...] into user but column [2, ...] into lastlogin. The following 2 workarounds do work except that the second one consumes twice as many serials since the sequence is still auto-incrementing:
CREATE OR REPLACE RULE auto_insert AS ON INSERT TO user DO ALSO
INSERT INTO lastlogin (id) VALUES (lastval());
CREATE OR REPLACE RULE auto_insert AS ON INSERT TO user DO ALSO
INSERT INTO lastlogin (id) VALUES (NEW.userid-1);
Unfortunately, the workarounds do not work if I'm inserting multiple rows:
INSERT INTO user (username) VALUES ('john'), ('mary');
The first workaround would use the same id, and the second workaround is all kind of screw-up.
Is it possible to do this via postgresql rules or should I simply do the 2nd insertion into lastlogin myself or use a row trigger? Actually, I think the row trigger would also auto-increment the sequence when I access NEW.userid.
Forget rules altogether. They're bad.
Triggers are way better for you. And in 99% of cases when someone thinks he needs a rule. Try this:
create table users (
userid serial primary key,
username text
);
create table lastlogin (
userid int primary key references users(userid),
lastlogin_time timestamp with time zone
);
create or replace function lastlogin_create_id() returns trigger as $$
begin
insert into lastlogin (userid) values (NEW.userid);
return NEW;
end;
$$
language plpgsql volatile;
create trigger lastlogin_create_id
after insert on users for each row execute procedure lastlogin_create_id();
Then:
insert into users (username) values ('foo'),('bar');
select * from users;
userid | username
--------+----------
1 | foo
2 | bar
(2 rows)
select * from lastlogin;
userid | lastlogin_time
--------+----------------
1 |
2 |
(2 rows)