How to implement dml error logging in Postgresql? - postgresql

In the context of datawarehousing, ETL process must have a strategy for error handling. About that, Oracle has a great dml error logging feature that lets you insert/merge/update a million records without failing or rolling back when constraint violation occurs with one or more rows, which can be logged in a dedicated error table. After that you can investigate what is wrong with each row and correct the errors before repeating the insert/merge/update.
Is there any way to implement this feature in Postgresql ?

Since there is nothing built in, nor any useful extension exists, I searched for a solution based on a pgsql procedure and eventually found it. It works well in my use case, where some csv files must be loaded once a month into a staging db using foreign tables.
In the following test some records are inserted into the destination tables while other records that break an integrity constraint are inserted in an error table along with the error info.
test=# create table t1(c1 int primary key);
create table t2( f1 int ,f2 int, f3 numeric);
insert into t1 values(2),(11),(5),(12);
insert into t2 values(100,2,234),(57,11,25),(5,5,1231),(2,2,173),(2,12,240),(11,22,101),(3,12,99);
create table t3 as select * from t2 where 1+1=11;
alter table t3 add constraint t3_pk primary key(f1),add foreign key (f2) references t1(c1),add constraint f3_ck check(f3>100);
create table t3$err(f1 int,f2 int,f3 numeric, error_code varchar, error_message varchar, constraint_name varchar);
test=# do
$$
declare
rec Record;
v_err_code text;
v_err_message text;
v_constraint text;
begin
for rec in
select f1,
f2,
f3
from t2 --in my use case this is the foreign table reading a csv file
loop
begin
insert
into t3
values (rec.f1,
rec.f2,
rec.f3);
exception
when others then
get stacked diagnostics
v_err_code= returned_sqlstate,
v_err_message= MESSAGE_TEXT,
v_constraint= CONSTRAINT_NAME;
if left(v_err_code, 2) = '23' then --exception Class 23 — Integrity Constraint Violation
insert
into t3$err
values (rec.f1,
rec.f2,
rec.f3,
v_err_code,
v_err_message,
v_constraint);
raise notice 'record % inserted in error table',rec;
end if;
end;
end loop;
exception
when others then --outer exceptions different from constraint violations
get stacked diagnostics
v_err_code= returned_sqlstate;
raise notice 'sqlstate: %', v_err_code;
end;
$$;
NOTICE: record (57,11,25) inserted in error table
NOTICE: record (2,12,240) inserted in error table
NOTICE: record (11,22,101) inserted in error table
NOTICE: record (3,12,99) inserted in error table
test=# select * from t3;
f1 | f2 | f3
-----+----+------
100 | 2 | 234
5 | 5 | 1231
2 | 2 | 173
(3 rows)
test=# select * from t3$err;
f1 | f2 | f3 | error_code | error_message | constraint_name
----+----+-----+------------+-----------------------------------------------------------------------------+-----------------
57 | 11 | 25 | 23514 | new row for relation "t3" violates check constraint "f3_ck" | f3_ck
2 | 12 | 240 | 23505 | duplicate key value violates unique constraint "t3_pk" | t3_pk
11 | 22 | 101 | 23503 | insert or update on table "t3" violates foreign key constraint "t3_f2_fkey" | t3_f2_fkey
3 | 12 | 99 | 23514 | new row for relation "t3" violates check constraint "f3_ck" | f3_ck
(4 rows)
All the magics is done within the nested BEGIN..END, where each row passing the constraints is inserted in the target table or else inserted in the error table.
The above solution has many limitations, such as:
the oracle feature mentioned in the question is fully integrated with SQL (except for the plsql preliminaries in order to create the error table), while here a pgsql procedure is needed,
iterating over all the records is not exactly the most efficient way for data loading in comparison with a bulk loading executed through a pgsql procedure,
moreover, the loop is accompanied by a overhead due to the context switch between the procedural environment and the sql environment,
the error handling is not generic but must be addressing specific errors
when a record has more than one error, only the last one is inserted in the error table (there could be a solution for this point).

Related

How to deadlock an INSERT query in PostgreSQL

I have an application that I'm using to insert some data the db. This application has a field to put an SQL like:
INSERT INTO public.test_table ("message") VALUES (%s::text) # %s will be used as a parameter in each iteration
What I want to check, is how this application behave in case of a deadlock. So, my question is how to deadlock this INSERT query. What should I run to make this happen?
I'm using this table:
CREATE TABLE public."test_table" (
"number" integer NOT NULL GENERATED ALWAYS AS IDENTITY,
"date" time with time zone NOT NULL DEFAULT NOW(),
"message" text,
PRIMARY KEY ("number"));
While I was using MariaDB I managed to create a lock timeout using:
START TRANSACTION;
UPDATE test_table SET message = 'foo';
INSERT INTO test_table (message) VALUES ('test');
DO SLEEP(60);
COMMIT;
But in PostgreSQL this doesn't sent even create a lock timeout.
EDIT:
Let's say I add this one in the application, is it possible to get a deadlock using this:
BEGIN;
INSERT INTO public.test_table ("message") VALUES (%s::text);
I don't think you can force a deadlock with INSERTs given the table definition you have as the primary key value is generated automatically. But if you use a manually assigned PK value (or any other unique constraint) you can get deadlock when inserting the same unique values in different transactions
create table test_table
(
id integer primary key,
code varchar(10) not null unique
);
it is possible following the usual approach for deadlocks: interleaving locking of multiple resources in different order.
The following will result in a deadlock in step #4
Step | Transaction 1 | Transaction 2
----------|-----------------------------|----------------------------------
#1 | insert into test_table |
| (id, code) |
| values |
| (1, 'one'), |
| (2, 'two'); |
----------|-----------------------------|----------------------------------
#2 | | insert into test_table
| | (id, code)
| | values
| | (3, 'three');
----------|-----------------------------|----------------------------------
| -- this waits |
#3 | insert into test_table |
| (id, code) |
| values |
| (3, 'three'); |
----------|-----------------------------|----------------------------------
#4 | | -- this results in a deadlock
| | insert into test_table
| | (id, code)
| | values
| | (2, 'two');
There are an infinite number of ways you could change it to create a deadlock, but most of those ways would be to essentially throw it away and start over with something else entirely. If you want to make as few changes as possible, then I suppose it look something like putting a unique index on "message", then doing:
BEGIN;
INSERT INTO public.test_table ("message") VALUES ('a');
INSERT INTO public.test_table ("message") VALUES ('b');
but you would have to run these in two different sessions at the same time, with the order of 'a' and 'b' reversed in one of them.

duplicate key value error that is difficult to understand

I have a table:
user_id | project_id | permission
--------------------------------------+--------------------------------------+------------
5911e84b-ab6f-4a51-942e-dab979882725 | b4f6d926-ac69-461f-9fd7-1992a1b1c5bc | owner
7e3581a4-f542-4abc-bbda-36fb91ea4bff | eff09e2a-c54b-4081-bde5-68de5d32dd73 | owner
46f9f2e3-edd1-40df-aa52-4bdc354abd38 | 59df2db8-5067-4bc2-b268-3fb1308d9d41 | owner
9089038d-4b77-4774-a095-a621fb73059a | 4f26ace1-f072-42d0-bd0d-ffbae9103b3f | owner
5911e84b-ab6f-4a51-942e-dab979882725 | 59df2db8-5067-4bc2-b268-3fb1308d9d41 | rw
I have a trigger on update:
--------------------------------------------------------------------------------
-- trigger that consumes the queue once the user responds
\set obj_name 'sharing_queue_on_update_trigger'
create or replace function :obj_name()
returns trigger as $$
begin
if new.status = 'accepted' then
-- add to the user_permissions table
insert into core.user_permissions (project_id, user_id, permission)
values (new.project, new.grantee, new.permission);
end if;
-- remove from the queue
delete from core.sharing_queue
where core.sharing_queue.grantee = new.grantee
and core.sharing_queue.project = new.project;
return null;
end;
$$ language plpgsql;
create trigger "Create a user_permission entry when user accepts invitation"
after update on core.sharing_queue
for each row
when (new.status != 'awaiting')
execute procedure :obj_name();
When I run the following update:
update sharing_queue set status='accepted' where project = 'eff09e2a-c54b-4081-bde5-68de5d32dd73';
The record in the following queue is supposed to fodder a new record in the first table presented.
grantor | maybe_grantee_email | project | permission | creation_date | grantee | status
--------------------------------------+---------------------+--------------------------------------+------------+---------------+--------------------------------------+----------
7e3581a4-f542-4abc-bbda-36fb91ea4bff | edmund#gmail.com | eff09e2a-c54b-4081-bde5-68de5d32dd73 | rw | | 46f9f2e3-edd1-40df-aa52-4bdc354abd38 | awaiting
(1 row)
Specifically, the grantee with id ending in 38, with the project_id ending in 73 is supposed feed a new record in the first table.
However, I get the following duplicate index error:
ERROR: duplicate key value violates unique constraint "pk_project_permissions_id"
DETAIL: Key (user_id, project_id)=(46f9f2e3-edd1-40df-aa52-4bdc354abd38, eff09e2a-c54b-4081-bde5-68de5d32dd73) already exists.
CONTEXT: SQL statement "insert into core.user_permissions (project_id, user_id, permission)
values (new.project, new.grantee, new.permission)
returning new"
I don't see how I'm violating the index. There is no record with the user and project combination in the first table presented. Right?
I'm new to using triggers this much. I'm wondering if somehow I might be triggering a "double" entry that cancels the transaction.
Any pointers would be greatly appreciated.
Requested Addendum
Here is the schema for user_permissions
--------------------------------------------------------------------------------
-- 📖 user_permissions
drop table if exists user_permissions;
create table user_permissions (
user_id uuid not null,
project_id uuid not null,
permission project_permission not null,
constraint pk_project_permissions_id primary key (user_id, project_id)
);
comment on column user_permissions.permission is 'Enum owner | rw | read';
comment on table user_permissions is 'Cannot add users directly; use sharing_queue';
-- ⚠️ deleted when the user is deleted
alter table user_permissions
add constraint fk_permissions_users
foreign key (user_id) references users(id)
on delete cascade;
-- ⚠️ deleted when the project is deleted
alter table user_permissions
add constraint fk_permissions_projects
foreign key (project_id) references projects(id)
on delete cascade;
Depending on the contents of the queue, the issue may be that you're not specifying that the record needs to be changed in your trigger:
create trigger "Create a user_permission entry when user accepts invitation"
after update on core.sharing_queue
for each row
when ((new.status != 'awaiting')
and (old.status IS DISTINCT FROM new.status))
execute procedure :obj_name();
Without the distinct check, the trigger would run once for each row where project = 'eff09e2a-c54b-4081-bde5-68de5d32dd73'.
The suggestions were helpful as they inspired the direction of the subsequent implementation. The initial fix using #TonyArra's additional when clause seemed to do the trick. The clause was no longer required once I created a series of on conflict UPSERT contingencies.

Dynamically get columns names using old in triggers

I want to write a generic trigger function(Postgres Procedure). There are many main tables like TableA, TableB, etc. and their corresponding audit tables TableA_Audit, TableB_Audit, respectively. The structure is given below.
TableA and TableA_Audit has columns aa integer, ab integer.
TableB and TableB_Audit has columns ba integer.
Similarly there can be many main tables along with their audit tables.
The requirement is that is any of the main table gets updated then their entries should be inserted in their respective audit Table.
eg:- If TableA has entries like this
---------------------
| **TableA** |
---------------------
|aa | ab |
|--------|----------|
| 5 | 10 |
---------------------
and then i write an update like
update TableA set aa= aa+15,
then the old values for TableA should be inserted in the TableA_Audit Table like below
TableA_audit contains:-
---------------------
| **TableA_Audit** |
---------------------
|aa | ab |
|--------|----------|
| 5 | 10 |
---------------------
To faciliate the above scenario i have written a generic function called insert_in_audit. Whenever there is any update in any of the main table, the function insert_in_audit should be called. The function should achive the following:-
Dynamically insert entries in corresponding audit_table using main table. If there is update in Table B then entries should be inserted only in TableB_Audit.
Till now what i am able to do so. I have got the names of all the columns of the main table where update happened.
eg: for the query - update TableA set aa= aa+15, i am able to get all the columns name in TableA in a varchar array.
column_names varchar[ ] := '{"aa", "ab"}';
My question is that how to get old values of column aa and ab. I tried doing like this
foreach i in array column_names
loop
raise notice '%', old.i;
end loop;
But the above gave me error :- record "old" has no field "i". Can anyone help me to get old values.
Here is a code sample how you can dynamically extract values from OLD in PL/pgSQL:
CREATE FUNCTION dynamic_col() RETURNS trigger
LANGUAGE plpgsql AS
$$DECLARE
v_col name;
v_val text;
BEGIN
FOREACH v_col IN ARRAY TG_ARGV
LOOP
EXECUTE format('SELECT (($1).%I)::text', v_col)
USING OLD
INTO v_val;
RAISE NOTICE 'OLD.% = %', v_col, v_val;
END LOOP;
RETURN OLD;
END;$$;
CREATE TABLE trigtest (
id integer,
val text
);
INSERT INTO trigtest VALUES
(1, 'one'), (2, 'two');
CREATE TRIGGER dynamic_col AFTER DELETE ON trigtest
FOR EACH ROW EXECUTE FUNCTION dynamic_col('id', 'val');
DELETE FROM trigtest WHERE id = 1;
NOTICE: OLD.id = 1
NOTICE: OLD.val = one

Generating incremental numbers based on a different column

I have got a composite primary key in a table in PostgreSQL (I am using pgAdmin4)
Let's call the the two primary keys productno and version.
version represents the version of productno.
So if I create a new dataset, then it needs to be checked if a dataset with this productno already exists.
If productno doesn't exist yet, then version should be (version) 1
If productno exists once, then version should be 2
If productno exists twice, then version should be 3
... and so on
So that we get something like:
productno | version
-----|-----------
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
I found a quite similar problem: auto increment on composite primary key
But I can't use this solution because PostgreSQL syntax is obviously a bit different - so tried a lot around with functions and triggers but couldn't figure out the right way to do it.
You can keep the version numbers in a separate table (one for each "base PK" value). That is way more efficient than doing a max() + 1 on every insert and has the additional benefit that it's safe for concurrent transactions.
So first we need a table that keeps track of the version numbers:
create table version_counter
(
product_no integer primary key,
version_nr integer not null
);
Then we create a function that increments the version for a given product_no and returns that new version number:
create function next_version(p_product_no int)
returns integer
as
$$
insert into version_counter (product_no, version_nr)
values (p_product_no, 1)
on conflict (product_no)
do update
set version_nr = version_counter.version_nr + 1
returning version_nr;
$$
language sql
volatile;
The trick here is the the insert on conflict which increments an existing value or inserts a new row if the passed product_no does not yet exists.
For the product table:
create table product
(
product_no integer not null,
version_nr integer not null,
created_at timestamp default clock_timestamp(),
primary key (product_no, version_nr)
);
then create a trigger:
create function increment_version()
returns trigger
as
$$
begin
new.version_nr := next_version(new.product_no);
return new;
end;
$$
language plpgsql;
create trigger base_table_insert_trigger
before insert on product
for each row
execute procedure increment_version();
This is safe for concurrent transactions because the row in version_counter will be locked for that product_no until the transaction inserting the row into the product table is committed - which will commit the change to the version_counter table as well (and free the lock on that row).
If two concurrent transactions insert the same value for product_no, one of them will wait until the other finishes.
If two concurrent transactions insert different values for product_no, they can work without having to wait for the other.
If we then insert these rows:
insert into product (product_no) values (1);
insert into product (product_no) values (2);
insert into product (product_no) values (3);
insert into product (product_no) values (1);
insert into product (product_no) values (3);
insert into product (product_no) values (2);
The product table looks like this:
select *
from product
order by product_no, version_nr;
product_no | version_nr | created_at
-----------+------------+------------------------
1 | 1 | 2019-08-23 10:50:57.880
1 | 2 | 2019-08-23 10:50:57.947
2 | 1 | 2019-08-23 10:50:57.899
2 | 2 | 2019-08-23 10:50:57.989
3 | 1 | 2019-08-23 10:50:57.926
3 | 2 | 2019-08-23 10:50:57.966
Online example: https://rextester.com/CULK95702
You can do it like this:
-- Check if pk exists
SELECT pk INTO temp_pk FROM table a WHERE a.pk = v_pk1;
-- If exists, inserts it
IF temp_pk IS NOT NULL THEN
INSERT INTO table(pk, versionpk) VALUES (v_pk1, temp_pk);
END IF;
So - I got it work now
So if you want a column to update depending on another column in pg sql - have a look at this:
This is the function I use:
CREATE FUNCTION public.testfunction()
RETURNS trigger
LANGUAGE 'plpgsql'
COST 100
VOLATILE NOT LEAKPROOF
AS $BODY$
DECLARE v_productno INTEGER := NEW.productno;
BEGIN
IF NOT EXISTS (SELECT *
FROM testtable
WHERE productno = v_productno)
THEN
NEW.version := 1;
ELSE
NEW.version := (SELECT MAX(testtable.version)+1
FROM testtable
WHERE testtable.productno = v_productno);
END IF;
RETURN NEW;
END;
$BODY$;
And this is the trigger that runs the function:
CREATE TRIGGER testtrigger
BEFORE INSERT
ON public.testtable
FOR EACH ROW
EXECUTE PROCEDURE public.testfunction();
Thank you #ChechoCZ, you definetly helped me getting in the right direction.

How to prevent insert, update and delete on inherited tables in PostgreSQL using BEFORE triggers

When using table inheritance, I would like to enforce that insert, update and delete statements should be done against descendant tables. I thought a simple way to do this would be using a trigger function like this:
CREATE FUNCTION test.prevent_action() RETURNS trigger AS $prevent_action$
BEGIN
RAISE EXCEPTION
'% on % is not allowed. Perform % on descendant tables only.',
TG_OP, TG_TABLE_NAME, TG_OP;
END;
$prevent_action$ LANGUAGE plpgsql;
...which I would reference from a trigger defined specified using BEFORE INSERT OR UPDATE OR DELETE.
This seems to work fine for inserts, but not for updates and deletes.
The following test sequence demonstrates what I've observed:
DROP SCHEMA IF EXISTS test CASCADE;
psql:simple.sql:1: NOTICE: schema "test" does not exist, skipping
DROP SCHEMA
CREATE SCHEMA test;
CREATE SCHEMA
-- A function to prevent anything
-- Used for tables that are meant to be inherited
CREATE FUNCTION test.prevent_action() RETURNS trigger AS $prevent_action$
BEGIN
RAISE EXCEPTION
'% on % is not allowed. Perform % on descendant tables only.',
TG_OP, TG_TABLE_NAME, TG_OP;
END;
$prevent_action$ LANGUAGE plpgsql;
CREATE FUNCTION
CREATE TABLE test.people (
person_id SERIAL PRIMARY KEY,
last_name text,
first_name text
);
psql:simple.sql:17: NOTICE: CREATE TABLE will create implicit sequence "people_person_id_seq" for serial column "people.person_id"
psql:simple.sql:17: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "people_pkey" for table "people"
CREATE TABLE
CREATE TRIGGER prevent_action BEFORE INSERT OR UPDATE OR DELETE ON test.people
FOR EACH ROW EXECUTE PROCEDURE test.prevent_action();
CREATE TRIGGER
CREATE TABLE test.students (
student_id SERIAL PRIMARY KEY
) INHERITS (test.people);
psql:simple.sql:24: NOTICE: CREATE TABLE will create implicit sequence "students_student_id_seq" for serial column "students.student_id"
psql:simple.sql:24: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "students_pkey" for table "students"
CREATE TABLE
--The trigger successfully prevents this INSERT from happening
--INSERT INTO test.people (last_name, first_name) values ('Smith', 'Helen');
INSERT INTO test.students (last_name, first_name) values ('Smith', 'Helen');
INSERT 0 1
INSERT INTO test.students (last_name, first_name) values ('Anderson', 'Niles');
INSERT 0 1
UPDATE test.people set first_name = 'Oh', last_name = 'Noes!';
UPDATE 2
SELECT student_id, person_id, first_name, last_name from test.students;
student_id | person_id | first_name | last_name
------------+-----------+------------+-----------
1 | 1 | Oh | Noes!
2 | 2 | Oh | Noes!
(2 rows)
DELETE FROM test.people;
DELETE 2
SELECT student_id, person_id, first_name, last_name from test.students;
student_id | person_id | first_name | last_name
------------+-----------+------------+-----------
(0 rows)
So I'm wondering what I've done wrong that allows updates and deletes directly against the test.people table in this example.
The trigger is set to execute FOR EACH ROW, but there is no row in test.people, that's why it's not run.
As a sidenote, you may issue select * from ONLY test.people to list the rows in test.people that don't belong to child tables.
The solution seems esasy: set a trigger FOR EACH STATEMENT instead of FOR EACH ROW, since you want to forbid the whole statement anyway.
CREATE TRIGGER prevent_action BEFORE INSERT OR UPDATE OR DELETE ON test.people
FOR EACH STATEMENT EXECUTE PROCEDURE test.prevent_action();