trigger function to refresh remote materialized view on UPDATE - postgresql

I have the following trigger and trigger function setup in order to refresh a MATERIALIZED VIEW on a remote server every time a local table A gets updated. The MV in turn is created from a foreign table of the local table A. After the trigger runs, the materialized view is updated, however, it is only updated to the state BEFORE the UPDATE happened. I'm not sure why this is the case. Either the trigger function runs before the UPDATE is commited, but that should be what the 'AFTER' part of the trigger is for, right? Or the MV refresh is to fast(?), but adding pg_sleep doesn't change the result.
CREATE OR REPLACE FUNCTION public.refresh_remote_mv()
RETURNS TRIGGER AS
$func$
BEGIN
PERFORM dblink_connect('remote_server');
PERFORM dblink_exec(
$$
REFRESH MATERIALIZED VIEW m_config;
$$);
PERFORM dblink_disconnect();
RETURN NULL;
END
$func$ LANGUAGE plpsql;
Trigger:
CREATE TRIGGER tr_remote_refresh
AFTER UPDATE ON m_config
EXECUTE PROCEDURE refresh_remote_mv()

That's because transaction isolation (your changes will be commited after all the triggers were fired, so another transaction from dblink won't see it).
It would be better to refresh materialized view with some frequency and not for every change. But if you wan't to do it that way you can change your dblink query to async dblink query, it should work then (remember fire it with some delay to be sure that transaction is commited).

Related

Return trigger before refresh concurrently on materialized view is finished

I have a database where data from different source tables are processed and stored in a materialized view.
I choosed to store it as a MV because the query to process this data takes a while - about 3 seconds - and needs to be called all the time.
So, I created a trigger to refresh the MV every time the source table is modified (INSERT, DELETE or UPDATE).
The problem is, it seems the trigger function waits for the materialized view to finish refreshing to return, and I don't want this.
I want the insert operation to return as fast as possible, and the MV to refresh in parallel.
My function:
CREATE OR REPLACE FUNCTION "MOBILIDADE".atualizar_mv_solicitacao()
RETURNS TRIGGER AS
$$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY "MOBILIDADE"."MV_SOLICITACAO";
RETURN NULL;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER solicitacao_atualizar_mv_solicitacao
AFTER INSERT OR DELETE OR UPDATE ON "MOBILIDADE"."GESTAOPROJETOS_SOLICITACAO"
FOR EACH STATEMENT
EXECUTE PROCEDURE "MOBILIDADE".atualizar_mv_solicitacao();
When I run an INSERT operation with the trigger function enabled, it takes about 3 seconds to finish, while when I execute it with the trigger disabled it takes only seconds 0.07 seconds.
INSERT INTO "MOBILIDADE"."GESTAOPROJETOS_SOLICITACAO" (documento_tipo,documento_numero,documento_sigla,documento_ano,requerente,solicitacao,data,data_recebimento_semob,categorias,geom,endereco_regiao,endereco_bairro,endereco_logradouro,anexo,created_by,created_at,acao) VALUES('Indicação',12345,'TESTE',2022,'TESTE','TESTE','2022-09-15','2022-09-15','{"Barreiras físicas" , "Pavimentação"}',ST_Transform(ST_SetSRID(ST_MakePoint(-45.888675631640105,-23.236909838714148),4326),4326),'Sul','Bosque dos Eucaliptos','Rua Lima Duarte',false,1,NOW(),1) RETURNING id
This is the wrong way to go about it. If refreshing the materialized view takes long and you modify the table often, then you cannot refresh the materialized view on every data change. Even if the refresh runs asynchronously (which is not possible with a trigger), it will still put a lot of load on your system.
Consider alternative solutions:
Refresh the materialized view every five minutes or so.
Don't use a materialized view, but a regular table that contains the aggregates and update that table from a trigger whenever the underlying data change. That will only work if the "materialized view" is simple enough.

Trigger function to refresh Postgres materialized view and capture refresh end time?

I am trying to build a summary table in our Postgres database that contains information about when various materialized views were refreshed. I would also like this table to trigger the actual refreshes.
The desired format for the table is as below, call it mv_refresh_monitor:
view_name
refresh_time_start
refresh_time_end
view_one
2022-02-01 22:10:59.234567
2022-02-01 22:11:59.234567
The table shows view_one was last refreshed late at night February 1st, and the refresh took 1 minute to complete.
What I would like to do is trigger the materialized refreshes by updating the refresh_time_start field; doing so would trigger the materialized view in the view_name field to refresh, and then also update the same row's refresh_time_end field to capture the time when the refresh is done.
My current implementation uses a function (to update the monitor table), a trigger function (to both refresh the view and call the function), and a trigger on the monitor table to call the trigger function. This is scoped only for a single materialized view:
CREATE OR REPLACE FUNCTION "ingested_digital_spend"."timestamp_refresh_end"()
RETURNS "pg_catalog"."void" AS $BODY$
UPDATE schema.mv_refresh_monitor SET refresh_time_end = CURRENT_TIMESTAMP AT TIME ZONE 'America/Los_Angeles';
$BODY$
LANGUAGE SQL;
CREATE OR REPLACE FUNCTION "schema"."refresh_materialized_view"()
RETURNS "pg_catalog"."trigger" AS $BODY$
BEGIN
IF NEW.view_name = 'view_one' AND NEW.refresh_time_start IS DISTINCT FROM OLD.refresh_time_start THEN
REFRESH MATERIALIZED VIEW schema.view_one;
PERFORM timestamp_refresh_end();
END IF;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER "refresh_mv" AFTER UPDATE ON "schema"."mv_refresh_monitor"
FOR EACH ROW
EXECUTE PROCEDURE "schema"."refresh_materialized_view"();
This almost works, but I'm trying to improve / fix three things:
Make this parameterized so that I don't have to write new IF THEN END IF clauses in the trigger function each time I add a new materialized view to the schema.
This seems like it shouldn't be terribly hard, I just haven't figured out the right way to parameterize PL/pgSQL functions yet.
Currently the time recorded in refresh_time_start and refresh_time_end are identical, despite the refresh operation itself taking 80 seconds. I am not sure how to scope the CURRENT_TIMESTAMP operations so that they don't evaluate to the same timestamp when the trigger function is initially called.
This feels like it should be possible, but I'm not as sure of this.
I would like the actual refresh to happen "in the background" if possible. That is, for the UPDATE to complete immediately and release the session that is performing the UPDATE. Right now the transaction on the monitor table doesn't complete until the view REFRESH transaction itself completes, so the client session hangs until the refresh completes.
This might be impossible.
Any suggestions or solutions for getting closer to these three requirements?
I think you are going about this the wrong way by pushing to much into the trigger/function. I would go with:
A function that you provide the view name and start name to. It does the REFRESH MATERIALIZED VIEW some_view> and updates mv_refresh_monitor with information. For information on how to parametrize this see Dynamic Queries
For the CURRENT_TIMESTAMP issue see Current date/time. CURRENT_TIMESTAMP by design captures the timestamp at the start of a transaction and does not change in the transaction. You are looking for transaction_timestamp()/statement_timestamp().
If you don't tie the REFRESH MATERIALIZED VIEW to the UPDATE you eliminate this issue.

PostgreSQL Trigger that updates table where original trigger runs from

I have two tables in this scenario. One is my "hot sync" table which is near-realtime bi-directional sync of data from my Salesforce Org to a Postgres table. As data changes in the source system (Salesforce), it updates that table on Postgres.
On this table in Postgres, I have a trigger that runs some logic. It basically checks to see if the record triggering it has a sent date that meets some business logic, copy that row into another schema/table to "archive" it.
This all works fine.
What I need to do however is once this row has been copied into the other table, I need to update the status of the record hot sync table. Since it is bi-directional, this will allow the data in Salesforce to reflect the changes I make from the Postgres side.
Can I place this update statement within the originating trigger or is this going to cause recursion issues?
CREATE FUNCTION salesforce.archivelogicfunc()
RETURNS trigger
LANGUAGE 'plpgsql'
COST 100
VOLATILE NOT LEAKPROOF
AS $BODY$ BEGIN
IF (DATE(NEW.et4ae5__datesent__c) < NOW() - INTERVAL '180 days'
AND DATE(NEW.et4ae5__datesent__c) > NOW() - INTERVAL '540 days')
THEN
INSERT INTO archive.individualemailresult__c
(dateopened__c,
numberoftotalclicks__c,
datebounced__c,
fromname__c,
hardbounce__c,
fromaddress__c,
softbounce__c,
name,
lastmodifieddate,
opened__c,
ownerid,
subjectline__c,
isdeleted,
contact__c,
systemmodstamp,
lastmodifiedbyid,
datesent__c,
dateunsubscribed__c,
createddate,
createdbyid,
lead__c,
tracking_as_of__c,
numberofuniqueclicks__c,
senddefinition__c,
mergeid__c,
triggeredsenddefinition__c,
sfid,
id,
_hc_lastop,
_hc_err,
isarchived)
VALUES
(NEW.et4ae5__dateopened__c,
NEW.et4ae5__numberoftotalclicks__c,
NEW.et4ae5__datebounced__c,
NEW.et4ae5__fromname__c,
NEW.et4ae5__hardbounce__c,
NEW.et4ae5__fromaddress__c,
NEW.et4ae5__softbounce__c,
NEW.name,
NEW.lastmodifieddate,
NEW.et4ae5__opened__c,
NEW.ownerid,
NEW.et4ae5__subjectline__c,
NEW.isdeleted,
NEW.et4ae5__contact__c,
NEW.systemmodstamp,
NEW.lastmodifiedbyid,
NEW.et4ae5__datesent__c,
NEW.et4ae5__dateunsubscribed__c,
NEW.createddate,
NEW.createdbyid,
NEW.et4ae5__lead__c,
NEW.et4ae5__tracking_as_of__c,
NEW.et4ae5__numberofuniqueclicks__c,
NEW.et4ae5__senddefinition__c,
NEW.et4ae5__mergeid__c,
NEW.et4ae5__triggeredsenddefinition__c,
NEW.sfid,
NEW.id,
NEW._hc_lastop,
NEW._hc_err,
NEW.isarchived__c)
ON CONFLICT (id)
DO NOTHING;
-- Update SF to reflect the archive
UPDATE salesforce."et4ae5__individualemailresult__c" SET isarchived__c = true, isdeleted = true WHERE id = NEW.id;
END IF;
RETURN NULL;
END;
$BODY$;
ALTER FUNCTION salesforce.archivelogicfunc()
OWNER TO ....;
My understanding is that the NEW.* is only going to contain the rows that caused the trigger to fire in the first place. Therefore if my trigger was fired for a single record, the update statement NEW.id should only update one record on the source table?
Trying to ensure the trigger isn't going to fire again with the update statement causing some recursive loop that I am not expecting.
My concern is:
Record is Updated
Trigger Fires and inserts record into an archive table
Update runs on the source table to update the record for the new.id
This update causes the trigger to run again. The insert would fail due to the on conflict, but the update would then run again, and again etc..
The original trigger is fired AFTER INSERT/UPDATE.
TRIGGER:
CREATE TRIGGER archivelogic_firetrigger
AFTER INSERT OR UPDATE
ON salesforce.et4ae5__individualemailresult__c
FOR EACH ROW
EXECUTE PROCEDURE salesforce.archivelogicfunc();
UPDATE:
I added a WHEN condition to my trigger. It appeared to work on a basic test, but willing to take any other advice if suggested.
CREATE TRIGGER archivelogic_firetrigger
AFTER INSERT OR UPDATE
ON salesforce.et4ae5__individualemailresult__c
FOR EACH ROW
WHEN (pg_trigger_depth() = 0) // <-- Added to prevent recursion
EXECUTE PROCEDURE salesforce.archivelogicfunc();
The easiest would be to make it a before trigger, and to replace the update by
NEW.isarchived__c = true;
NEW. isdeleted = true;
[...]
RETURN NEW;
Otherwise, you can filter the rows before running the trigger: it will be called only when isarchived__c and isdeleted have NOT changed (it may be dangerous though, just imagine someone updating ALL fields)
CREATE TRIGGER archivelogic_firetrigger
AFTER INSERT OR UPDATE
ON salesforce.et4ae5__individualemailresult__c
FOR EACH ROW
WHEN (NEW.isarchived__c IS NOT DISTINCT FROM OLD.isarchived__c
AND NEW.isdeleted IS NOT DISTINCT FROM OLD.isdeleted )
EXECUTE PROCEDURE salesforce.archivelogicfunc();

How to invoke a trigger on all rows manually in postgres

My trigger is defined the following way:
CREATE TRIGGER update_contract_finished_at
AFTER INSERT OR DELETE OR UPDATE OF performed_on
ON task
FOR EACH ROW
EXECUTE PROCEDURE update_contract_finished_at_function();
I now want to evoke this trigger to set the variables which are updated by the trigger. How do I do that?
Something like
for each row in task
execute procedure update_contract_finished_at_function();
I know I can update with a standard update set statement. I also want to verifiy that my trigger works on all the data correctly.
I'd write a slightly modified copy of update_contract_finished_at_function that takes type task as input and returns void.
Then replace NEW in the trigger function with $1 and call the function like this:
SELECT copy_func(task) FROM task;
If the functions are almost identical, it should be good enough to test the trigget function.
The way to manually trigger your on update trigger once would be:
UPDATE task SET performed_on = performed_on
however depending on how complicated your logic is in there and how many rows you have in the table a separate query might be significantly faster for initializing a large number of rows.
Since you mentioned you want to test the behaviour of your trigger you can clone the table or do a table or database dump and restore the data afterwards. If this is a live system you should instead do a database dump, restore to another system, add your trigger, test it, repeat from restore until you nail it... and only after you're sure it does what you want update the live system with it.
I ended up writing a PL/pgSQL function that in a loop processes all events in chronological order and calling it:
create or replace function process_event_history()
returns void
language plpgsql
as
$$
declare
event record;
begin
for event in
select id, timestamp
from events
order by timestamp
loop
update events set timestamp = event.timestamp
where id = event.id;
end loop;
end;
$$;
--;;
-- Execute the above function causing the trigger to run for all events.
select process_event_history();
--;;
-- Remove the temporary processing function.
drop function process_event_history();

Postgresql: run trigger AFTER update FOR EACH STATEMENT ONLY if data changed

In Postgresql I can have two kinds of triggers: FOR EACH ROW and FOR EACH STATEMENT. If I do a FOR EACH ROW trigger, I can add a WHERE clause something like OLD.* != NEW.* so it only fires if something has actually changed. Is there any way to do something similar with STATEMENT level triggers? I know I can't do the same thing since OLD and NEW aren't available, but I was thinking perhaps there might be a way to check the number of rows changed from within my function itself or the like.
Usage case: I am using the postgresql NOTIFY system to notify my app when data changes. Ideally, the app would get a single notification each time one or more records changes, and not get notified at all if data stays the same (even if an UPDATE was run). With a basic AFTER UPDATE FOR EACH STATEMENT trigger, I am getting notified every time an update statement runs - even if it doesn't actually change anything.
You should create two triggers: before update for each row and after update for each statement.
The first trigger checks if the table is being updated and sets a flag if so.
The second trigger checks the flag and performs notify if it was set.
You can use a custom configuration parameter as the flag (e.g. flags.the_table).
The solution is simple and safe, as the parameter is local in the current session.
create or replace function before_each_row_on_the_table()
returns trigger language plpgsql
as $$
begin
if new <> old then
set flags.the_table to 'on';
end if;
return new;
end $$;
create or replace function after_each_statement_on_the_table()
returns trigger language plpgsql
as $$
begin
if current_setting('flags.the_table', true) = 'on' then
notify your_channel, 'the_table was updated';
set flags.the_table to 'off';
end if;
return null;
end $$;
create trigger before_each_row_on_the_table
before update on the_table
for each row execute procedure before_each_row_on_the_table();
create trigger after_each_statement_on_the_table
after update on the_table
for each statement execute procedure after_each_statement_on_the_table();
The function current_setting() with two arguments is available in Postgres 9.6 or later.