I am trying to build a summary table in our Postgres database that contains information about when various materialized views were refreshed. I would also like this table to trigger the actual refreshes.
The desired format for the table is as below, call it mv_refresh_monitor:
view_name
refresh_time_start
refresh_time_end
view_one
2022-02-01 22:10:59.234567
2022-02-01 22:11:59.234567
The table shows view_one was last refreshed late at night February 1st, and the refresh took 1 minute to complete.
What I would like to do is trigger the materialized refreshes by updating the refresh_time_start field; doing so would trigger the materialized view in the view_name field to refresh, and then also update the same row's refresh_time_end field to capture the time when the refresh is done.
My current implementation uses a function (to update the monitor table), a trigger function (to both refresh the view and call the function), and a trigger on the monitor table to call the trigger function. This is scoped only for a single materialized view:
CREATE OR REPLACE FUNCTION "ingested_digital_spend"."timestamp_refresh_end"()
RETURNS "pg_catalog"."void" AS $BODY$
UPDATE schema.mv_refresh_monitor SET refresh_time_end = CURRENT_TIMESTAMP AT TIME ZONE 'America/Los_Angeles';
$BODY$
LANGUAGE SQL;
CREATE OR REPLACE FUNCTION "schema"."refresh_materialized_view"()
RETURNS "pg_catalog"."trigger" AS $BODY$
BEGIN
IF NEW.view_name = 'view_one' AND NEW.refresh_time_start IS DISTINCT FROM OLD.refresh_time_start THEN
REFRESH MATERIALIZED VIEW schema.view_one;
PERFORM timestamp_refresh_end();
END IF;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER "refresh_mv" AFTER UPDATE ON "schema"."mv_refresh_monitor"
FOR EACH ROW
EXECUTE PROCEDURE "schema"."refresh_materialized_view"();
This almost works, but I'm trying to improve / fix three things:
Make this parameterized so that I don't have to write new IF THEN END IF clauses in the trigger function each time I add a new materialized view to the schema.
This seems like it shouldn't be terribly hard, I just haven't figured out the right way to parameterize PL/pgSQL functions yet.
Currently the time recorded in refresh_time_start and refresh_time_end are identical, despite the refresh operation itself taking 80 seconds. I am not sure how to scope the CURRENT_TIMESTAMP operations so that they don't evaluate to the same timestamp when the trigger function is initially called.
This feels like it should be possible, but I'm not as sure of this.
I would like the actual refresh to happen "in the background" if possible. That is, for the UPDATE to complete immediately and release the session that is performing the UPDATE. Right now the transaction on the monitor table doesn't complete until the view REFRESH transaction itself completes, so the client session hangs until the refresh completes.
This might be impossible.
Any suggestions or solutions for getting closer to these three requirements?
I think you are going about this the wrong way by pushing to much into the trigger/function. I would go with:
A function that you provide the view name and start name to. It does the REFRESH MATERIALIZED VIEW some_view> and updates mv_refresh_monitor with information. For information on how to parametrize this see Dynamic Queries
For the CURRENT_TIMESTAMP issue see Current date/time. CURRENT_TIMESTAMP by design captures the timestamp at the start of a transaction and does not change in the transaction. You are looking for transaction_timestamp()/statement_timestamp().
If you don't tie the REFRESH MATERIALIZED VIEW to the UPDATE you eliminate this issue.
Related
I have a database where data from different source tables are processed and stored in a materialized view.
I choosed to store it as a MV because the query to process this data takes a while - about 3 seconds - and needs to be called all the time.
So, I created a trigger to refresh the MV every time the source table is modified (INSERT, DELETE or UPDATE).
The problem is, it seems the trigger function waits for the materialized view to finish refreshing to return, and I don't want this.
I want the insert operation to return as fast as possible, and the MV to refresh in parallel.
My function:
CREATE OR REPLACE FUNCTION "MOBILIDADE".atualizar_mv_solicitacao()
RETURNS TRIGGER AS
$$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY "MOBILIDADE"."MV_SOLICITACAO";
RETURN NULL;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER solicitacao_atualizar_mv_solicitacao
AFTER INSERT OR DELETE OR UPDATE ON "MOBILIDADE"."GESTAOPROJETOS_SOLICITACAO"
FOR EACH STATEMENT
EXECUTE PROCEDURE "MOBILIDADE".atualizar_mv_solicitacao();
When I run an INSERT operation with the trigger function enabled, it takes about 3 seconds to finish, while when I execute it with the trigger disabled it takes only seconds 0.07 seconds.
INSERT INTO "MOBILIDADE"."GESTAOPROJETOS_SOLICITACAO" (documento_tipo,documento_numero,documento_sigla,documento_ano,requerente,solicitacao,data,data_recebimento_semob,categorias,geom,endereco_regiao,endereco_bairro,endereco_logradouro,anexo,created_by,created_at,acao) VALUES('Indicação',12345,'TESTE',2022,'TESTE','TESTE','2022-09-15','2022-09-15','{"Barreiras físicas" , "Pavimentação"}',ST_Transform(ST_SetSRID(ST_MakePoint(-45.888675631640105,-23.236909838714148),4326),4326),'Sul','Bosque dos Eucaliptos','Rua Lima Duarte',false,1,NOW(),1) RETURNING id
This is the wrong way to go about it. If refreshing the materialized view takes long and you modify the table often, then you cannot refresh the materialized view on every data change. Even if the refresh runs asynchronously (which is not possible with a trigger), it will still put a lot of load on your system.
Consider alternative solutions:
Refresh the materialized view every five minutes or so.
Don't use a materialized view, but a regular table that contains the aggregates and update that table from a trigger whenever the underlying data change. That will only work if the "materialized view" is simple enough.
My trigger is defined the following way:
CREATE TRIGGER update_contract_finished_at
AFTER INSERT OR DELETE OR UPDATE OF performed_on
ON task
FOR EACH ROW
EXECUTE PROCEDURE update_contract_finished_at_function();
I now want to evoke this trigger to set the variables which are updated by the trigger. How do I do that?
Something like
for each row in task
execute procedure update_contract_finished_at_function();
I know I can update with a standard update set statement. I also want to verifiy that my trigger works on all the data correctly.
I'd write a slightly modified copy of update_contract_finished_at_function that takes type task as input and returns void.
Then replace NEW in the trigger function with $1 and call the function like this:
SELECT copy_func(task) FROM task;
If the functions are almost identical, it should be good enough to test the trigget function.
The way to manually trigger your on update trigger once would be:
UPDATE task SET performed_on = performed_on
however depending on how complicated your logic is in there and how many rows you have in the table a separate query might be significantly faster for initializing a large number of rows.
Since you mentioned you want to test the behaviour of your trigger you can clone the table or do a table or database dump and restore the data afterwards. If this is a live system you should instead do a database dump, restore to another system, add your trigger, test it, repeat from restore until you nail it... and only after you're sure it does what you want update the live system with it.
I ended up writing a PL/pgSQL function that in a loop processes all events in chronological order and calling it:
create or replace function process_event_history()
returns void
language plpgsql
as
$$
declare
event record;
begin
for event in
select id, timestamp
from events
order by timestamp
loop
update events set timestamp = event.timestamp
where id = event.id;
end loop;
end;
$$;
--;;
-- Execute the above function causing the trigger to run for all events.
select process_event_history();
--;;
-- Remove the temporary processing function.
drop function process_event_history();
I have the following trigger and trigger function setup in order to refresh a MATERIALIZED VIEW on a remote server every time a local table A gets updated. The MV in turn is created from a foreign table of the local table A. After the trigger runs, the materialized view is updated, however, it is only updated to the state BEFORE the UPDATE happened. I'm not sure why this is the case. Either the trigger function runs before the UPDATE is commited, but that should be what the 'AFTER' part of the trigger is for, right? Or the MV refresh is to fast(?), but adding pg_sleep doesn't change the result.
CREATE OR REPLACE FUNCTION public.refresh_remote_mv()
RETURNS TRIGGER AS
$func$
BEGIN
PERFORM dblink_connect('remote_server');
PERFORM dblink_exec(
$$
REFRESH MATERIALIZED VIEW m_config;
$$);
PERFORM dblink_disconnect();
RETURN NULL;
END
$func$ LANGUAGE plpsql;
Trigger:
CREATE TRIGGER tr_remote_refresh
AFTER UPDATE ON m_config
EXECUTE PROCEDURE refresh_remote_mv()
That's because transaction isolation (your changes will be commited after all the triggers were fired, so another transaction from dblink won't see it).
It would be better to refresh materialized view with some frequency and not for every change. But if you wan't to do it that way you can change your dblink query to async dblink query, it should work then (remember fire it with some delay to be sure that transaction is commited).
So I am working on adding a last updated time to the database for my app's server. The idea is that it will record the time an update is applied to one of our trips and then the app can send a get request to figure out if it's got all of the correct up to date information.
I've added the column to our table, and provided the service for it all, and finally manage to get a trigger going to update the column every time a change is made to a trip in it's trip table. My problem now comes from the fact that the information that pertains to a trip is stored across a multitude of other tables as well (for instance, there are tables for the routes that make up a trip and the photos that a user can see on the trip, etc...) and if any of that data changes, then the trip's update time also needs to change. I can't for the life of me figure out how to set up the trigger so that when I change some route information, the last updated time for the trip(s) the route belongs to will be updated in it's table.
This is my trigger code as it stands now: it updates the trip table's last updated column when that trip's row is updated.
CREATE OR REPLACE FUNCTION record_update_time() RETURNS TRIGGER AS
$$
BEGIN
NEW.last_updated=now();
RETURN NEW;
END;
$$
LANGUAGE PLPGSQL;
CREATE TRIGGER update_entry_on_entry_change
BEFORE UPDATE ON mydatabase.trip FOR EACH ROW
EXECUTE PROCEDURE record_update_time();
--I used the next two queries just to test that the trigger works. It
--probably doesn't make a difference to you but I'll keep it here for reference
UPDATE mydatabase.trip
SET title='Sample New Title'
WHERE id = 2;
SELECT *
FROM mydatabase.trip
WHERE mydatabase.trip.id < 5;
Now I need it to update when the rows referencing the trip row with a foreign key get updated. Any ideas from someone more experienced with SQL triggers than I?
"mydatabase" is a remarkably unfortunate name for a schema.
The trigger function could look like this:
CREATE OR REPLACE FUNCTION trg_upaft_upd_trip()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
UPDATE mydatabase.trip t -- "mydatabase" = schema name (?!)
SET last_updated = now()
WHERE t.id = NEW.trip_id -- guessing column names
RETURN NULL; -- calling this AFTER UPDATE
END
$func$;
And needs to be used in a trigger on every related table (not on trip itself):
CREATE TRIGGER upaft_upd_trip
AFTER UPDATE ON mydatabase.trip_detail
FOR EACH ROW EXECUTE PROCEDURE trg_upaft_upd_trip();
You also need to cover INSERT and DELETE (and possibly COPY) on all sub-tables ...
This approach has many potential points of failure. As alternative, consider a query or view that computes the latest last_updated from sub-tables dynamically. If you update often this might be the superior approach.
If you rarely UPDATE and SELECT often, your first approach might pay.
I have this Trigger in Postgresql that I can't just get to work (does nothing). For understanding, there's how I defined it:
CREATE TABLE documents (
...
modification_time timestamp with time zone DEFAULT now()
);
CREATE FUNCTION documents_update_mod_time() RETURNS trigger
AS $$
begin
new.modification_time := now();
return new;
end
$$
LANGUAGE plpgsql;
CREATE TRIGGER documents_modification_time
BEFORE INSERT OR UPDATE ON documents
FOR EACH ROW
EXECUTE PROCEDURE documents_update_mod_time();
Now to make it a bit more interesting.. How do you debug triggers?
Use the following code within a trigger function, then watch the 'messages' tab in pgAdmin3 or the output in psql:
RAISE NOTICE 'myplpgsqlval is currently %', myplpgsqlval; -- either this
RAISE EXCEPTION 'failed'; -- or that
To see which triggers actually get called, how many times etc, the following statement is the life-saver of choice:
EXPLAIN ANALYZE UPDATE table SET foo='bar'; -- shows the called triggers
Note that if your trigger is not getting called and you use inheritance, it may be that you've only defined a trigger on the parent table, whereas triggers are not inherited by child tables automatically.
To step through the function, you can use the debugger built into pgAdmin3, which on Windows is enabled by default; all you have to do is execute the code found in ...\8.3\share\contrib\pldbgapi.sql against the database you're debugging, restart pgAdmin3, right-click your trigger function, hit 'Set Breakpoint', and then execute a statement that would cause the trigger to fire, such as the UPDATE statement above.
Turns out I was using inheritance in the above problem and forgot to mention it. Now for everybody who might run into this as well, here's some debugging hints:
Use the following code to debug what a trigger is doing:
RAISE NOTICE 'test'; -- either this
RAISE EXCEPTION 'failed'; -- or that
To see which triggers actually get called, how many times etc, the following statement is the life-saver of choice:
EXPLAIN ANALYZE UPDATE table SET foo='bar'; -- shows the called triggers
Then there's the one thing I didn't know before: triggers only fire when updating the exact table they're defined on. If you use inheritance, you MUST define them on the child tables as well!
You can use 'raise notice' statements inside your trigger function to debug it. To debug the trigger not being called at all is another story.
If you add a 'raise exception' inside your trigger function, can you still do inserts/updates?
Also, if your update test occurs in the same transaction as your insert test, now() will be the same (since it's only calculated once per transaction) and therefore the update won't seem to do anything. If that's the case, either do them in separate transactions, or if this is a unit test and you can't do that, use clock_timestamp().
I have a unit test that depends on some time going by between transactions, so at the beginning of the unit test I have something like:
ALTER TABLE documents
ALTER COLUMN modification_time SET DEFAULT clock_timestamp();
Then in the trigger, use "set modification_time = default".
So normally it doesn't do the extra calculation, but during a unit test this allows me to do inserts with pg_sleep in between to simulate time passing and actually have that be reflected in the data.