Trigger to insert rows in remote database after deletion - postgresql

I have created a trigger that works like this:
After deleting data from table flux_tresorerie_historique it insert this row in the table flux_tresorerie_historique that is located in another database archive
I use dblink to insert data in the remote database, the problem is that the creation of the query is too hard especially that the table contain more than 20 columns, and I want to create similar functions for 10 other tables.
Is there another rapid way to ensure this task?
Here an example that works fine:
CREATE OR REPLACE FUNCTION flux_tresorerie_historique_backup_row()
RETURNS trigger AS
$BODY$
DECLARE date_rapprochement_flux TEXT;
DECLARE code_commission TEXT;
DECLARE reference_flux TEXT;
BEGIN
IF OLD.date_rapprochement_flux is null
THEN
date_rapprochement_flux = 'NULL';
ELSE
date_rapprochement_flux = ''''||to_char(OLD.date_rapprochement_flux, 'YYYY-MM-DD')||'''';
END IF;
IF OLD.code_commission is null
THEN
code_commission = 'NULL';
ELSE
code_commission = ''''||replace(OLD.code_commission,'''','''''')||'''';
END IF;
IF OLD.reference_flux is null
THEN
reference_flux = 'NULL';
ELSE
reference_flux = ''''||replace(OLD.reference_flux,'''','''''')||'''';
END IF;
perform dblink_connect('dbname=gtr_bd_archive user=postgres password=postgres');
perform dblink_exec('insert into flux_tresorerie_historique values('||OLD.id_flux_historique||','''||OLD.date_operation_flux||''','''||OLD.date_valeur_flux||''','||date_rapprochement_flux||','''||replace(OLD.libelle_flux,'''','''''')||''','||OLD.montant_flux||','||OLD.contre_valeur_dzd||','''||replace(OLD.rib_compte_bancaire,'''','''''')||''','||OLD.frais_flux||','''||replace(OLD.sens_flux,'''','''''')||''','''||replace(OLD.statut_flux,'''','''''')||''','''||replace(OLD.code_devise,'''','''''')||''','''||replace(OLD.code_mode_paiement,'''','''''')||''','''||replace(OLD.code_agence,'''','''''')||''','''||replace(OLD.code_compte,'''','''''')||''','''||replace(OLD.code_banque,'''','''''')||''','''||OLD.date_maj_flux||''','''||replace(OLD.statut_frais,'''','''''')||''','||reference_flux||','||code_commission||','||OLD.id_flux||');');
perform dblink_disconnect();
RETURN NULL;
END;

This is a limited application of replication. Requirements vary a lot, so there are a number of different established solutions, addressing different situations. Consider the overview in the manual.
Your hand-knit, trigger-based solution is one viable option for relatively few deletions. Opening and closing a separate connection for every row incurs quite an overhead. There are other various options.
While working with dblink I suggest some modifications. Most importantly:
Use format() to escape strings more elegantly.
Pass the whole row instead of passing and escaping every single column.
Don't place the password in every single trigger function.
Use a FOREIGN SERVER plus USER MAPPING. Detailed instructions here:
Persistent inserts in a UDF even if the function aborts
Basically, run once on the source server:
CREATE SERVER myserver FOREIGN DATA WRAPPER dblink_fdw
OPTIONS (hostaddr '127.0.0.1', dbname 'gtr_bd_archive');
CREATE USER MAPPING FOR role_source SERVER myserver
OPTIONS (user 'postgres', password 'secret');
Preferably, don't log in as superuser at the target server. Use a dedicated role with limited privileges to avoid privilege escalation.
And use a password file on the target server to allow password-less access. This way you don't even have to store the password in the USER MAPPING. Instructions in the last chapter of this related answer:
Run batch file with psql command without password
Then:
CREATE OR REPLACE FUNCTION pg_temp.flux_tresorerie_historique_backup_row()
RETURNS trigger AS
$func$
BEGIN
PERFORM dblink_connect('myserver'); -- name of foreign server from above
PERFORM dblink_exec( format(
$$
INSERT INTO flux_tresorerie_historique -- provide target column list!
SELECT (r).id_flux_historique
, (r).date_operation_flux
, (r).date_valeur_flux
, (r).date_rapprochement_flux::date -- 'YYYY-MM-DD' is default ISO format anyway
, (r).libelle_flux
, (r).montant_flux
, (r).contre_valeur_dzd
, (r).rib_compte_bancaire
, (r).frais_flux
, (r).sens_flux
, (r).statut_flux
, (r).code_devise
, (r).code_mode_paiement
, (r).code_agence
, (r).code_compte
, (r).code_banque
, (r).date_maj_flux
, (r).statut_frais
, (r).reference_flux
, (r).code_commission
, (r).id_flux
FROM (SELECT %L::flux_tresorerie_historique) t(r)
$$, OLD::text)); -- cast whole row type
PERFORM dblink_disconnect();
RETURN NULL; -- only for AFTER trigger
END
$func$ LANGUAGE plpgsql;
You should spell out the list of columns for the target table if the row types don't match.
If you are serious about this:
insert this row in the table flux_tresorerie_historique
I.e., you insert the whole row and the target row type is identical (no extracting a date from a timestamp etc.), you can simplify much further passing the whole row.
CREATE OR REPLACE FUNCTION flux_tresorerie_historique_backup_row()
RETURNS trigger AS
$func$
BEGIN
PERFORM dblink_connect('myserver'); -- name of foreign server
PERFORM dblink_exec( format(
$$
INSERT INTO flux_tresorerie_historique
SELECT (%L::flux_tresorerie_historique).*
$$
, OLD::text));
PERFORM dblink_disconnect();
RETURN NULL; -- only for AFTER trigger
END
$func$ LANGUAGE plpgsql;
Related:
How do I do large non-blocking updates in PostgreSQL?

You can use quote_nullable for this! Also, concat_ws comes very handy:
CREATE OR REPLACE FUNCTION flux_tresorerie_historique_backup_row()
RETURNS trigger AS
$BODY$
BEGIN
perform dblink_connect('dbname=gtr_bd_archive user=postgres password=postgres');
perform dblink_exec('insert into flux_tresorerie_historique values('||
concat_ws(', ', quote_nullable(OLD.id_flux_historique),
quote_nullable(OLD.date_operation_flux),
quote_nullable(OLD.date_valeur_flux),
quote_nullable(to_char(OLD.date_rapprochement_flux, 'YYYY-MM-DD')),
quote_nullable(OLD.libelle_flux),
quote_nullable(OLD.montant_flux),
quote_nullable(OLD.contre_valeur_dzd),
quote_nullable(OLD.rib_compte_bancaire),
quote_nullable(OLD.frais_flux),
quote_nullable(OLD.sens_flux),
quote_nullable(OLD.statut_flux),
quote_nullable(OLD.code_devise),
quote_nullable(OLD.code_mode_paiement),
quote_nullable(OLD.code_agence),
quote_nullable(OLD.code_compte),
quote_nullable(OLD.code_banque),
quote_nullable(OLD.date_maj_flux),
quote_nullable(OLD.statut_frais),
quote_nullable(OLD.reference_flux),
quote_nullable(OLD.code_commission),
quote_nullable(OLD.id_flux)
)||');');
perform dblink_disconnect();
RETURN NULL;
END;
Note that it is OK to place non-sting values between single quotes, since a quoted literal is for PostgreSQL just as good a literal value as one without the quotes, so it is convenient to place all of the columns processed by quote_nullable. Also note that quote_nullable will already output dates in YYYY-MM-DD format (e.g. select quote_nullable(now()::date) would result in '2016-05-04'), so you may want to simplify OLD.date_rapprochement_flux even further by removing the to_char.

Related

Declare and return value for DELETE and INSERT

I am trying to remove duplicated data from some of our databases based upon unique id's. All deleted data should be stored in a separate table for auditing purposes. Since it concerns quite some databases and different schemas and tables I wanted to start using variables to reduce chance of errors and the amount of work it will take me.
This is the best example query I could think off, but it doesn't work:
do $$
declare #source_schema varchar := 'my_source_schema';
declare #source_table varchar := 'my_source_table';
declare #target_table varchar := 'my_target_schema' || source_table || '_duplicates'; --target schema and appendix are always the same, source_table is a variable input.
declare #unique_keys varchar := ('1', '2', '3')
begin
select into #target_table
from #source_schema.#source_table
where id in (#unique_keys);
delete from #source_schema.#source_table where export_id in (#unique_keys);
end ;
$$;
The query syntax works with hard-coded values.
Most of the times my variables are perceived as columns or not recognized at all. :(
You need to create and then call a plpgsql procedure with input parameters :
CREATE OR REPLACE PROCEDURE duplicates_suppress
(my_target_schema text, my_source_schema text, my_source_table text, unique_keys text[])
LANGUAGE plpgsql AS
$$
BEGIN
EXECUTE FORMAT(
'WITH list AS (INSERT INTO %1$I.%3$I_duplicates SELECT * FROM %2$I.%3$I WHERE array[id] <# %4$L :: integer[] RETURNING id)
DELETE FROM %2$I.%3$I AS t USING list AS l WHERE t.id = l.id', my_target_schema, my_source_schema, my_source_table, unique_keys :: text) ;
END ;
$$ ;
The procedure duplicates_suppress inserts into my_target_schema.my_source_table || '_duplicates' the rows from my_source_schema.my_source_table whose id is in the array unique_keys and then deletes these rows from the table my_source_schema.my_source_table .
See the test result in dbfiddle.
As has been commented, you need some kind of dynamic SQL. In a FUNCTION, PROCEDURE or a DO statement to do it on the server.
You should be comfortable with PL/pgSQL. Dynamic SQL is no beginners' toy.
Example with a PROCEDURE, like Edouard already suggested. You'll need a FUNCTION instead to wrap it in an outer transaction (like you very well might). See:
When to use stored procedure / user-defined function?
CREATE OR REPLACE PROCEDURE pg_temp.f_archive_dupes(_source_schema text, _source_table text, _unique_keys int[], OUT _row_count int)
LANGUAGE plpgsql AS
$proc$
-- target schema and appendix are always the same, source_table is a variable input
DECLARE
_target_schema CONSTANT text := 's2'; -- hardcoded
_target_table text := _source_table || '_duplicates';
_sql text := format(
'WITH del AS (
DELETE FROM %I.%I
WHERE id = ANY($1)
RETURNING *
)
INSERT INTO %I.%I TABLE del', _source_schema, _source_table
, _target_schema, _target_table);
BEGIN
RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _unique_keys; -- execute
GET DIAGNOSTICS _row_count = ROW_COUNT;
END
$proc$;
Call:
CALL pg_temp.f_archive_dupes('s1', 't1', '{1, 3}', 0);
db<>fiddle here
I made the procedure temporary, since I assume you don't need to keep it permanently. Create it once per database. See:
How to create a temporary function in PostgreSQL?
Passed schema and table names are case-sensitive strings! (Unlike unquoted identifiers in plain SQL.) Either way, be wary of SQL-injection when concatenating SQL dynamically. See:
Are PostgreSQL column names case-sensitive?
Table name as a PostgreSQL function parameter
Made _unique_keys type int[] (array of integer) since your sample values look like integers. Use a the actual data type of your id columns!
The variable _sql holds the query string, so it can easily be debugged before actually executing. Using RAISE NOTICE '%', _sql; for that purpose.
I suggest to comment the EXECUTE line until you are sure.
I made the PROCEDURE return the number of processed rows. You didn't ask for that, but it's typically convenient. At hardly any cost. See:
Dynamic SQL (EXECUTE) as condition for IF statement
Best way to get result count before LIMIT was applied
Last, but not least, use DELETE ... RETURNING * in a data-modifying CTE. Since that has to find rows only once it comes at about half the cost of separate SELECT and DELETE. And it's perfectly safe. If anything goes wrong, the whole transaction is rolled back anyway.
Two separate commands can also run into concurrency issues or race conditions which are ruled out this way, as DELETE implicitly locks the rows to delete. Example:
Replicating data between Postgres DBs
Or you can build the statements in a client program. Like psql, and use \gexec. Example:
Filter column names from existing table for SQL DDL statement
Based on Erwin's answer, minor optimization...
create or replace procedure pg_temp.p_archive_dump
(_source_schema text, _source_table text,
_unique_key int[],_target_schema text)
language plpgsql as
$$
declare
_row_count bigint;
_target_table text := '';
BEGIN
select quote_ident(_source_table) ||'_'|| array_to_string(_unique_key,'_') into _target_table from quote_ident(_source_table);
raise notice 'the deleted table records will store in %.%',_target_schema, _target_table;
execute format('create table %I.%I as select * from %I.%I limit 0',_target_schema, _target_table,_source_schema,_source_table );
execute format('with mm as ( delete from %I.%I where id = any (%L) returning * ) insert into %I.%I table mm'
,_source_schema,_source_table,_unique_key, _target_schema, _target_table);
GET DIAGNOSTICS _row_count = ROW_COUNT;
RAISE notice 'rows influenced, %',_row_count;
end
$$;
--
if your _unique_key is not that much, this solution also create a table for you. Obviously you need to create the target schema yourself.
If your unique_key is too much, you can customize to properly rename the dumped table.
Let's call it.
call pg_temp.p_archive_dump('s1','t1', '{1,2}','s2');
s1 is the source schema, t1 is source table, {1,2} is the unique key you want to extract to the new table. s2 is the target schema

Capture number of rows affected by dynamic sql?

I am trying to get the return from a QUERY EXEUTE in a plpgsql function to be able to check how many rows were affected from a dynamic update query. My use case is adding an event (with a custom payload) to a separate table on insert or update to a dynamically set table. Because my event has a custom payload, I have not been able to use a database trigger (e.g. trigger before insert). As a simplified example, assume I have this table:
CREATE TABLE users (user_id text primary key, name text)
Here is my simplified events table:
CREATE TABLE events(event_id text primary key, payload json)
Here is my simplified function:
CREATE OR REPLACE FUNCTION my_function(_rowtype anyelement, q text, payload jsonb)
RETURNS SETOF anyelement AS
$func$
DECLARE
event_id text;
BEGIN
SELECT jsonb_object_field_text (payload, 'id')::text INTO STRICT event_id;
execute format('insert into event(event_id, payload) values ($1, $2)') using event_id, payload;
RETURN QUERY EXECUTE format('%s', q);
END
$func$ LANGUAGE plpgsql;
The goal is to have this work exactly the same as if someone had created these in a transaction. In pseucode for insert:
BEGIN
insert into events(id, payload) values($1, $2)
insert into users(columns) values(<any values>)
COMMIT
and similarly for update:
BEGIN
insert into events(id, payload) values($1, $2)
result, error := query(`update users set name = 'hello' where id = 'Not Exists Thus No Rows Modified'`);
if result.rowsAffected() == 0 {
ROLLBACK
}
COMMIT
The function my_function almost works except for one edge case: when an update actually doesn't affect any rows.
For example, this works:
select * from my_function(NULL::users,
'insert into users(id,name) values('u1', ''a2'') returning *',
payload => '{"id": "e1", "custom": "s1", "field": "2019-10-12T07:20:50.52Z"}')
As expected, after this is done both a row in the users table and the events table is created.
What fails is the following:
select * from my_function(NULL::users,
'update users set name = ''hello'' where user_id = ''NotExists'' returning *',
payload => '{"id": "e2", "custom": "s3", "field": "2019-10-12T07:20:50.52Z"}')
Here, a row is created in the events table (my goal is that it should not be created).
I know this approach is not elegant, and I know this is vulnerable to SQL injection. I'd love suggestions on better ways to solve this (including scrapping what we're doing now). But to answer the question directly, I'm looking to store the result of QUERY EXECUTE, check if any rows were affected, and raise an error so that there is never a case where a row in the events table is created when there is not real corresponding change in the users table. Users table is just an example, in general, it could be any dynamically set table.
A RETURN QUERY doesn't need to go to the end of the function, it only says: "the results of this query are part of the resulting set".
So you can use the RETURN QUERY, ask for FOUND and act accordingly. Here is your function modified for working this way:
CREATE OR REPLACE FUNCTION public.my_function(_rowtype anyelement, q text, payload jsonb)
RETURNS SETOF anyelement
LANGUAGE plpgsql
AS $function$
DECLARE
event_id text;
BEGIN
SELECT jsonb_object_field_text (payload, 'id')::text INTO STRICT event_id;
RETURN QUERY EXECUTE format('%s', q);
IF FOUND THEN
execute format('insert into events(event_id, payload) values ($1, $2)') using event_id, payload;
END IF;
RETURN;
END
$function$
PD: Maybe you can also solve your problem with triggers FOR EACH STATEMENT using the transition tables OLD and NEW (which are available since v10, https://www.postgresql.org/docs/10/sql-createtrigger.html)
CREATE OR REPLACE FUNCTION my_function(_rowtype anyelement, q text, payload jsonb)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE q;
IF NOT FOUND THEN
RETURN; -- nothing happened yet, we can exit silently.
-- Or you WANT an error for this case. Then do this instead:
-- RAISE EXCEPTION 'Query passed in parameter "q" did not affect any rows. Doing nothing!';
END IF;
INSERT INTO event(event_id, payload)
VALUES (payload->>'id', payload);
END
$func$;
As has been commented, RETURN QUERY does not return from the function. The manual:
RETURN NEXT and RETURN QUERY do not actually return from the
function — they simply append zero or more rows to the function's
result set. Execution then continues with the next statement in the
PL/pgSQL function. As successive RETURN NEXT or RETURN QUERY
commands are executed, the result set is built up. A final RETURN,
which should have no argument, causes control to exit the function (or
you can just let control reach the end of the function).
There's a code example for your case exactly at the bottom of that chapter in the manual. From me, actually. Originating here:
FUNCTION syntax error
It was suggested to use GET DIAGNOSTICS instead of the simpler FOUND. It's true that EXECUTE does not set the state of FOUND. But RETURN QUERY does. So keep using the simpler FOUND. Related:
Dynamic SQL (EXECUTE) as condition for IF statement
You have format() in your original twice. And while that's typically very useful for dynamic SQL, it's useless in your case. EXECUTE format('%s', q) is exactly the same as just EXECUTE q, with added cost. Both are open doors for SQL injection when passing user input.
While there is a good chance that the transaction might be rolled back, start with the critical step, and do the rest later. Avoid wasting the work. So I moved executing q to the top. Assuming it does not depend on the "payload" row, now inserted later.
Also, INSERT INTO events can be plain SQL. Nothing dynamic there. No need for format() or EXECUTE.
Finally, assuming your jsonb_object_field_text (payload, 'id')::text is just a fancy way of saying payload->>'id'. No need for an additional variable and another SELECT INTO.
Warning against SQL injection
Converting user input (parameter q in the example) to code to execute dynamically is the most direct SQL injection vulnerability of all. I wouldn't want to be caught in my underwear doing that.

sync two tables after insert

I am using postgresql. I have two schemas main and sec containing only one table datastore with the same structure (this is only an extract)
I am trying unsucessfully to create a trigger for keep sync both tables when insert occurs in one of them. The problem is some kind of circular or recursive reference.
Can you create some example for solve this?
I am working on this, I'll post my solution later.
You can use this code as reference for creating schemas and tables
CREATE SCHEMA main;
CREATE SCHEMA sec;
SET search_path = main, pg_catalog;
CREATE TABLE datastore (
fullname character varying,
age integer
);
SET search_path = sec, pg_catalog;
CREATE TABLE datastore (
fullname character varying,
age integer
);
An updatable view is the best solution and is as simple as (Postgres 9.3+):
drop table sec.datastore;
create view sec.datastore
as select * from main.datastore;
However, if you cannot do it for some inscrutable reasons, use pg_trigger_depth() function (Postgres 9.2+) to ensure that the trigger function is not executed during replication. The trigger on main.datastore may look like this:
create or replace function main.datastore_insert_trigger()
returns trigger language plpgsql as $$
begin
insert into sec.datastore
select new.fullname, new.age;
return new;
end $$;
create trigger datastore_insert_trigger
before insert on main.datastore
for each row when (pg_trigger_depth() = 0)
execute procedure main.datastore_insert_trigger();
The trigger on sec.datastore should be defined analogously.
create OR REPLACE function copytosec() RETURNS TRIGGER AS $$
BEGIN
insert into sec.datastore(fullname,age) values (NEW.fullname,NEW.age);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
create trigger copytosectrigger after insert on public.datastore
for each row
execute procedure copytosec();`

Get values from varying columns in a generic trigger

I am new to PostgreSQL and found a trigger which serves my purpose completely except for one little thing. The trigger is quite generic and runs across different tables and logs different field changes. I found here.
What I now need to do is test for a specific field which changes as the tables change on which the trigger fires. I thought of using substr as all the column will have the same name format e.g. XXX_cust_no but the XXX can change to 2 or 4 characters. I need to log the value in theXXX_cust_no field with every record that is written to the history_ / audit table. Using a bunch of IF / ELSE statements to accomplish this is not something I would like to do.
The trigger as it now works logs the table_name, column_name, old_value, new_value. I however need to log the XXX_cust_no of the record that was changed as well.
Basically you need dynamic SQL for dynamic column names. format helps to format the DML command. Pass values from NEW and OLD with the USING clause.
Given these tables:
CREATE TABLE tbl (
t_id serial PRIMARY KEY
,abc_cust_no text
);
CREATE TABLE log (
id int
,table_name text
,column_name text
,old_value text
,new_value text
);
It could work like this:
CREATE OR REPLACE FUNCTION trg_demo()
RETURNS TRIGGER AS
$func$
BEGIN
EXECUTE format('
INSERT INTO log(id, table_name, column_name, old_value, new_value)
SELECT ($2).t_id
, $3
, $4
,($1).%1$I
,($2).%1$I', TG_ARGV[0])
USING OLD, NEW, TG_RELNAME, TG_ARGV[0];
RETURN NEW;
END
$func$ LANGUAGE plpgsql;
CREATE TRIGGER demo
BEFORE UPDATE ON tbl
FOR EACH ROW EXECUTE PROCEDURE trg_demo('abc_cust_no'); -- col name here.
SQL Fiddle.
Related answer on dba.SE:
How to access NEW or OLD field given only the field's name?
List of special variables visible in plpgsql trigger functions in the manual.

How to get postgres (8.4) query results with unknown columns

Edit: After posting I found Erwin Brandstetter's answer to a similar question. It sounds like in 9.2+ I could use the last option he listed, but none of the other alternatives sound workable for my situation. However, the comment from Jakub Kania and reiterated by Craig Ringer suggesting I use COPY, or \copy, in psql appears to solve my problem.
My goal is to get the results of executing a dynamically created query into a text file.
The names and number of columns are unknown; the query generated at run time is a 'pivot' one, and the names of columns in the SELECT list are taken from values stored in the database.
What I envision is being able, from the command line to run:
$ psql -o "myfile.txt" -c "EXECUTE mySQLGeneratingFuntion(param1, param2)"
But what I'm finding is that I can't get results from an EXECUTEd query unless I know the number of columns and their types that are in the results of the query.
create or replace function carrier_eligibility.createSQL() returns varchar AS
$$
begin
return 'SELECT * FROM carrier_eligibility.rule_result';
-- actual procedure writes a pivot query whose columns aren't known til run time
end
$$ language plpgsql
create or replace function carrier_eligibility.RunSQL() returns setof record AS
$$
begin
return query EXECUTE carrier_eligibility.createSQL();
end
$$ language plpgsql
-- this works, but I want to be able to get the results into a text file without knowing
-- the number of columns
select * from carrier_eligibility.RunSQL() AS (id int, uh varchar, duh varchar, what varchar)
Using psql isn't a requirement. I just want to get the results of the query into a text file, with the column names in the first row.
What format of a text file do you want? Something like csv?
How about something like this:
CREATE OR REPLACE FUNCTION sql_to_csv(in_sql text) returns setof text
SECURITY INVOKER -- CRITICAL DO NOT CHANGE THIS TO SECURITY DEFINER
LANGUAGE PLPGSQL AS
$$
DECLARE t_row RECORD;
t_out text;
BEGIN
FOR t_row IN EXECUTE in_sql LOOP
t_out := t_row::text;
t_out := regexp_replace(regexp_replace(t_out, E'^\\(', ''), E'\\)$', '');
return next t_out;
END LOOP;
END;
$$;
This should create properly quoted csv strings without the header. Embedded newlines may be a problem but you could write a quick Perl script to connect and write the data or something.
Note this presumes that the tuple structure (parenthesized csv) does not change with future versions, but it currently should work with 8.4 at least through 9.2.