Confirming if this trigger will do as I intend - postgresql

I have been using Postgres for a while now, but I have not implemented any triggers yet. I wanted to check if this will do what I intend it to do.
On a daily basis, I am adding new rows to a table (COPY) and also updating existing rows if there is a primary key conflict (ON CONFLICT DO UPDATE SET). I then have a materialized view using that table and a few other joins, and this view is used for a lot of reporting.
I want the materialized view to update when the original table has been updated, without me needing to schedule it or run it manually. (Right now I have it scheduled with a Python psycopg2 execute command).
CREATE OR REPLACE FUNCTION refresh_mat_view()
RETURNS TRIGGER LANGUAGE plpgsql
AS $$
BEGIN
REFRESH MATERIALIZED VIEW schema_name.materialized_view_name;
RETURN NULL;
END $$;
CREATE TRIGGER refresh_view
AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE
ON sutherland.dimension_peoplesoft FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_mat_view();
Would that refresh the view for every single row which is updated too? I am just imagining it trigger a refresh for each individual row which might be 100k+. It would be better to happen AFTER all inserts have been done (I have Python looping through each row in a pandas DataFrame to UPSERT into the database).

When you use a pure Materialized View, every time you refresh it, it will rebuild the whole thing. So if your data changes a lot and you need the data available fast it's not an optimized choice.
You should use Eager Materialized Views or Lazy Materialized Views that basically are "A well use of triggers". Obviously it's harder to do than a pure materialized view, but the results are better (depending on the case of use).
You should check this article Materialized View Strategies Using PostgreSQL

Related

Using materialized views

I am looking for a mechanism to reload tables from a production environment.
Using materialized views ultimately bring little benefit compare to a basic truncate / insert as select on existing table.
what added value give materialized views?

postgresql trigger after insert or update or delete

I have table called mdl_user and I added a new columns for this table, the first one is called "LastOperation" and the second one "TimeStamp"
I need to create trigger after insert or update or delete and write the output as "I" or "U" or "D" at "LastOperation" column and the time when this action happenes at "TimeStamp" column
NOTE: All this stuff to be in the same table not to be triggered for another table
You can get what you want and still use standard DML (including delete). It does however require a little behind the scenes manipulation of database objects. What it involves doing your DML against not a table but an update-able view.
Add the new columns as usual to the table.
Rename the table. Say to mdl_user_tab.
Create a VIEW with the same name and definition as the old table. Simple select (no join) from new table name.
Create a trigger function which interprets the DML manipulates the actual table.
Create an INSTEAD trigger for all DML operations against the VIEW.
You now process everything against the view (most existing code will not require updating). But you may need some additional functionality for actually processing the table. See example here.

Triggers to refresh multiple materialized views based on same table

I have a spatial table in a Postgres database from which I create three separate materialized views (each based on specific spatial queries). I want to create trigger functions to refresh each of the materialized views on updates, inserts, and deletes.
I have created three separate functions and triggers, but the performance (as to be expected) is horrendous. If I run a single trigger on update, insert, or delete, it performs fine. Below is a sample function and trigger I am using:
CREATE OR REPLACE FUNCTION refresh_mvw_taxa_hex5km()
RETURNS trigger
AS $BODY$
BEGIN
refresh materialized view mvw_taxa_hex5km;
return new;
END;
$BODY$ LANGUAGE plpgsql;
CREATE TRIGGER refresh_mvw_taxa_hex5km
AFTER INSERT OR UPDATE OR DELETE ON occurrence
FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_mvw_taxa_hex5km();
Is there a more efficient way to do this? I considered running scheduled tasks, but I really need the refresh on changes to the table. I have read a little about "concurrently," but not sure if this is the answer.

Best practices for performing a table swap in Redshift

We're in the process of running a handful of hourly scripts on our Redshift cluster which build summary tables for data consumers. After assembling a staging table, the script then runs a transaction which deletes the existing table and replaces it with the staging table, as such:
BEGIN;
DROP TABLE IF EXISTS public.data_facts;
ALTER TABLE public.data_facts_stage RENAME TO data_facts;
COMMIT;
The problem with this operation is that long-running analysis queries will place an AccessShareLock on public.data_facts, preventing it from being dropped and thrashing our ETL cycle. I'm thinking a better solution would be one which renames the existing table, as such:
ALTER TABLE public.data_facts RENAME TO data_facts_old;
ALTER TABLE public.data_facts_stage RENAME TO data_facts;
DROP TABLE public.data_facts_old;
However, this approach presupposes that 1) public.data_facts exists, and 2) public.data_facts_old does not exist.
Do you know if there's a way to conduct this operation safely in SQL, without relying on application logic? (eg. something like ALTER TABLE IF EXISTS).
I haven't tried it but looking at the documentation of CREATE VIEW it seems that this can be done with late-binding views.
The main idea would be a view public.data_facts that users interact with. Behind the scenes, you can load new data and then swap the view to “point” to the new table.
Bootstrap
-- load data into public.data_facts_v0
CREATE VIEW public.data_facts AS
SELECT * from public.data_facts_v0 WITH NO SCHEMA BINDING;
Update
-- load data into public.data_facts_v1
CREATE OR REPLACE VIEW public.data_facts AS
SELECT * from public.data_facts_v1 WITH NO SCHEMA BINDING;
DROP TABLE public.data_facts_v0;
The WITH NO SCHEMA BINDING means the view will be late-binding. “A late-binding view doesn't check the underlying database objects, such as tables and other views, until the view is queried.” This means the update can even introduce a table with renamed columns or a completely new structure.
Notes:
It might be a good idea to wrap the swap operations into a transaction to make sure we don't drop the previous table if the VIEW swap failed.
You can add a new load time timestamp encode runlength default getdate() column to your target table, and make your ETL do this:
INSERT INTO public.data_facts
SELECT * FROM public.data_facts_staging;
DELETE FROM public.data_facts
WHERE load_time<(select max(load_time) from public.data_facts);
DROP TABLE public.data_facts_staging;
note: public.data_facts_staging should have exactly the same structure as public.data_facts except that the last column of public.data_facts is load_time, so that on insert it will be populated with the current timestamp.
The only implication is that it would require extra disk space for a moment between you insert new rows and delete the old rows, and load_time has to be always the last column. Also you have to vaccum table every time you do this.
Another good thing about this is that if your ETL fails and staging table is empty or there is no staging table you won't lose your data. In the pure SQL scenario of swapping tables with DDL you're not protected from dropping the target table when staging table is missing. In the suggested scenario if no new rows are inserted the delete statement deletes nothing (there are no rows less than max load time), so worst case is just having the old version of data.
p.s. there is a command that instead of insert ... select ... just changes the pointer from staging to target table (alter table ... append from ...) but it requires the same type of lock as alter table I guess, so I don't suggest this

How do I INSERT and SELECT data with partitioned tables?

I set up a set of partitioned tables per the docs at http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html
CREATE TABLE t (year, a);
CREATE TABLE t_1980 ( CHECK (year = 1980) ) INHERITS (t);
CREATE TABLE t_1981 ( CHECK (year = 1981) ) INHERITS (t);
CREATE RULE t_ins_1980 AS ON INSERT TO t WHERE (year = 1980)
DO INSTEAD INSERT INTO t_1980 VALUES (NEW.year, NEW.a);
CREATE RULE t_ins_1981 AS ON INSERT TO t WHERE (year = 1981)
DO INSTEAD INSERT INTO t_1981 VALUES (NEW.year, NEW.a);
From my understanding, if I INSERT INTO t (year, a) VALUES (1980, 5), it will go to t_1980, and if I INSERT INTO t (year, a) VALUES (1981, 3), it will go to t_1981. But, my understanding seems to be incorrect. First, I can't understand the following from the docs
"There is currently no simple way to specify that rows must not be inserted into the master table. A CHECK (false) constraint on the master table would be inherited by all child tables, so that cannot be used for this purpose. One possibility is to set up an ON INSERT trigger on the master table that always raises an error. (Alternatively, such a trigger could be used to redirect the data into the proper child table, instead of using a set of rules as suggested above.)"
Does the above mean that in spite of setting up the CHECK constraints and the RULEs, I also have to create TRIGGERs on the master table so that the INSERTs go to the correct tables? If that were the case, what would be the point of the db supporting partitioning? I could just set up the separate tables myself? I inserted a bunch of values into the master table, and those rows are still in the master table, not in the inherited tables.
Second question. When retrieving the rows, do I select from the master table, or do I have to select from the individual tables as needed? How would the following work?
SELECT year, a FROM t WHERE year IN (1980, 1981);
Update: Seems like I have found the answer to my own question
"Be aware that the COPY command ignores rules. If you are using COPY to insert data, you must copy the data into the correct child table rather than into the parent. COPY does fire triggers, so you can use it normally if you create partitioned tables using the trigger approach."
I was indeed using COPY FROM to load data, so RULEs were being ignored. Will try with TRIGGERs.
Definitely try triggers.
If you think you want to implement a rule, don't (the only exception that comes to mind is updatable views). See this great article by depesz for more explanation there.
In reality, Postgres only supports partitioning on the reading side of things. You're going to have setup the method of insertition into partitions yourself - in most cases TRIGGERing. Depending on the needs and applicaitons, it can sometimes be faster to teach your application to insert directly into the partitions.
When selecting from partioned tables, you can indeed just SELECT ... WHERE... on the master table so long as your CHECK constraints are properly setup (they are in your example) and the constraint_exclusion parameter is set corectly.
For 8.4:
SET constraint_exclusion = partition;
For < 8.4:
SET constraint_exclusion = on;
All this being said, I actually really like the way Postgres does it and use it myself often.
Does the above mean that in spite of
setting up the CHECK constraints and
the RULEs, I also have to create
TRIGGERs on the master table so that
the INSERTs go to the correct tables?
Yes. Read point 5 (section 5.9.2)
If that were the case, what would be
the point of the db supporting
partitioning? I could just set up the
separate tables myself?
Basically: the INSERTS in the child tables must be done explicitly (either creating TRIGGERS, or by specifying the correct child table in the query). But the partitioning
is transparent for SELECTS, and (given the storage and indexing advantages of this schema) that's the point.
(Besides, because the partitioned tables are inherited,
the schema is inherited from the parent, hence consistency
is enforced).
Triggers are definitelly better than rules.
Today I've played with partitioning of materialized view table and run into problem with triggers solution.
Why ?
I'm using RETURNING and current solution returns NULL :)
But here's solution which works for me - correct me if I'm wrong.
1. I have 3 tables which are inserted with some data, there's an view (let we call it viewfoo) which contains
data which need to be materialized.
2. Insert into last table have trigger which inserts into materialized view table
via INSERT INTO matviewtable SELECT * FROM viewfoo WHERE recno=NEW.recno;
That works fine and I'm using RETURNING recno; (recno is SERIAL type - sequence).
Materialized view (table) need to be partitioned because it's huge, and
according to my tests it's at least x10 faster for SELECT in this case.
Problems with partitioning:
* Current trigger solution RETURN NULL - so I cannot use RETURNING recno.
(Current trigger solution = trigger explained at depesz page).
Solution:
I've changed trigger of my 3rd table TO NOT insert into materialized view table (that table is parent of partitioned tables), but created new trigger which inserts
partitioned table directly FROM 3rd table and that trigger RETURN NEW.
Materialized view table is automagically updated and RETURNING recno works fine.
I'll be glad if this helped to anybody.