Is it possible to partially refresh a materialized view in PostgreSQL? - postgresql

In Oracle, it is possible to refresh just part of the data. But in PostgreSQL, materialized views are supported since 9.3 (the current version now), which is not so long. So I wonder: is it possible to refresh just part of the data in the materialized view in PostgreSQL 9.3? If yes, how to do it?

PostgreSQL doesn't support progressive / partial updates of materialized views yet.
9.4 adds REFRESH MATERIALIZED VIEW CONCURRENTLY but it still has to be regenerated entirely.
Hopefully we'll see support in 9.5 if someone's enthusiastic enough. It's only possible to do this without user-defined triggers/rules for simple materialized views though, and special support would be needed to even handle things like incremental update of a count(...) ... GROUP BY ....
The Oracle answer you refer to isn't actually incremental refresh, though. It's refresh by-partitions. For PostgreSQL to support that natively, it'd first have to support real declarative partitioning - which it doesn't, though we're discussing whether it can be done for 9.5.

I just came across a similar problem. Learning from Craig's answer that it is not possible, I used a workaround. I deconstructed the materialized view and joined and/or unioned the individual parts in a VIEW:
Create a MATERIALIZED VIEW for each row or column group in question (material_col1, material_col2, etc. or with more complex disjunct where conditions), using e.g. a common id column.
Use a regular VIEW (fake_materialized_view) joining the MATERIALIZED VIEWs tables on the id column
in the case of disjunct rows one has to union all them
REFRESH MATERIALIZED VIEW as needed
Use your query on fake_materialized_view instead
The VIEW would look somewhat like this:
CREATE VIEW fake_materialized_view AS
SELECT m1.id, m1.col1, m2.col2
FROM material_col1 as m1 LEFT JOIN
material_col2 as m2
ON m1.id = m2.id
-- in case of additional row partitioning, e.g.
-- UNION ALL SELECT m3.id, m3.col1, m3.col2
-- FROM material_col3 m3
(Upd1: Thx to Barry for his comment utilizing row partitioning, which I added to the answer.)

Related

Using materialized views

I am looking for a mechanism to reload tables from a production environment.
Using materialized views ultimately bring little benefit compare to a basic truncate / insert as select on existing table.
what added value give materialized views?

Confirming if this trigger will do as I intend

I have been using Postgres for a while now, but I have not implemented any triggers yet. I wanted to check if this will do what I intend it to do.
On a daily basis, I am adding new rows to a table (COPY) and also updating existing rows if there is a primary key conflict (ON CONFLICT DO UPDATE SET). I then have a materialized view using that table and a few other joins, and this view is used for a lot of reporting.
I want the materialized view to update when the original table has been updated, without me needing to schedule it or run it manually. (Right now I have it scheduled with a Python psycopg2 execute command).
CREATE OR REPLACE FUNCTION refresh_mat_view()
RETURNS TRIGGER LANGUAGE plpgsql
AS $$
BEGIN
REFRESH MATERIALIZED VIEW schema_name.materialized_view_name;
RETURN NULL;
END $$;
CREATE TRIGGER refresh_view
AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE
ON sutherland.dimension_peoplesoft FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_mat_view();
Would that refresh the view for every single row which is updated too? I am just imagining it trigger a refresh for each individual row which might be 100k+. It would be better to happen AFTER all inserts have been done (I have Python looping through each row in a pandas DataFrame to UPSERT into the database).
When you use a pure Materialized View, every time you refresh it, it will rebuild the whole thing. So if your data changes a lot and you need the data available fast it's not an optimized choice.
You should use Eager Materialized Views or Lazy Materialized Views that basically are "A well use of triggers". Obviously it's harder to do than a pure materialized view, but the results are better (depending on the case of use).
You should check this article Materialized View Strategies Using PostgreSQL

Postgres GRANT not applied on parent

I'm in trouble with grant in postgresql (version 9.3).
I'm trying to restrict a ROLE 'client_1'. I want it to be able to do only select for one table. But there is inheritance between tables.
Here is my table structure:
CREATE TABLE public.table_a (...);
CREATE TABLE table_a_partitions.child_1 (...) INHERITS (public.table_a);
CREATE TABLE table_a_partitions.child_2 (...) INHERITS (public.table_a);
GRANT SELECT ON table_a_child_1 TO client_1;
It's okay when I do a select on child_2, there is an error, but if I do a SELECT * FROM table_a; for example it also reads the forbidden table child_2. I would my client access only child_1 (and some other in the future) results when he does SELECT * FROM table_a;.
Is there a simple way to solve this problem ?
Thank you
You would need to use a VIEW in PostgreSQL 9.3 to solve this problem. If you upgrade to 9.5, however, you could use row-level security.
As a note as to why, the grant check only occurs on the level of the initial relation queried. This means if you query a view, you need access to the view's contents, but the view owner (NOT YOU) needs access to the underlying relations. This allows a view to be useful for information hiding. Similarly with inheritance, this structure allows you to forbid rows to be inserted or queried directly from partitions of a table, but to allow different queries via the parent table. So this is a consequence of design priorities, not a bug.
Before row-level security, you would basically create a view and fold in user privilege criteria into the view (with partitioning/inheritance this is also a good idea for other reasons since your insert/update/delete triggers can return exactly what the db would do even though it cannot on a table).
As for row-level security, PostgreSQL 9.5 does allow you to specify row-level policies (conditions that get appended to insert/select/update/delete queries) and that provides something a little more manageable in some cases than the view approach.

Can we setup a table in postgres to always view the latest inserts first?

Right now when I create a table and do a
select * from table
I always see the first insert rows first. I'd like to have my latest inserts displayed first. Is it possible to achieve with minimal performance impact?
I believe that Postgres uses an internal field called OID that can be sorted by. Try the following.
select *,OID from table order by OID desc;
There are some limitations to this approach as described in SQL, Postgres OIDs, What are they and why are they useful?
Apparently the OID sequence "does" wrap if it exceeds 4B 6. So in essence it's a global counter that can wrap. If it does wrap, some slowdown may start occurring when it's used and "searched" for unique values, etc.
See also https://wiki.postgresql.org/wiki/FAQ#What_is_an_OID.3F
NB - in more recent version of Postgres this could be deprecated ( https://www.postgresql.org/docs/8.4/static/runtime-config-compatible.html#GUC-DEFAULT-WITH-OIDS )
Although you should be able to create tables with OID even in the most recent version if done explicitly on table create as per https://www.postgresql.org/docs/9.5/static/sql-createtable.html
Although the behaviour you are observing in the CLI appears consistent, it isn't a standard and cannot be depended on. If you are regularly needing to manually see the most recently added rows on a specific table you could add a timestamp field or some other sortable field and perhaps even wrap the query into a stored function .. I guess the approach depends on your particular use case.

How do I INSERT and SELECT data with partitioned tables?

I set up a set of partitioned tables per the docs at http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html
CREATE TABLE t (year, a);
CREATE TABLE t_1980 ( CHECK (year = 1980) ) INHERITS (t);
CREATE TABLE t_1981 ( CHECK (year = 1981) ) INHERITS (t);
CREATE RULE t_ins_1980 AS ON INSERT TO t WHERE (year = 1980)
DO INSTEAD INSERT INTO t_1980 VALUES (NEW.year, NEW.a);
CREATE RULE t_ins_1981 AS ON INSERT TO t WHERE (year = 1981)
DO INSTEAD INSERT INTO t_1981 VALUES (NEW.year, NEW.a);
From my understanding, if I INSERT INTO t (year, a) VALUES (1980, 5), it will go to t_1980, and if I INSERT INTO t (year, a) VALUES (1981, 3), it will go to t_1981. But, my understanding seems to be incorrect. First, I can't understand the following from the docs
"There is currently no simple way to specify that rows must not be inserted into the master table. A CHECK (false) constraint on the master table would be inherited by all child tables, so that cannot be used for this purpose. One possibility is to set up an ON INSERT trigger on the master table that always raises an error. (Alternatively, such a trigger could be used to redirect the data into the proper child table, instead of using a set of rules as suggested above.)"
Does the above mean that in spite of setting up the CHECK constraints and the RULEs, I also have to create TRIGGERs on the master table so that the INSERTs go to the correct tables? If that were the case, what would be the point of the db supporting partitioning? I could just set up the separate tables myself? I inserted a bunch of values into the master table, and those rows are still in the master table, not in the inherited tables.
Second question. When retrieving the rows, do I select from the master table, or do I have to select from the individual tables as needed? How would the following work?
SELECT year, a FROM t WHERE year IN (1980, 1981);
Update: Seems like I have found the answer to my own question
"Be aware that the COPY command ignores rules. If you are using COPY to insert data, you must copy the data into the correct child table rather than into the parent. COPY does fire triggers, so you can use it normally if you create partitioned tables using the trigger approach."
I was indeed using COPY FROM to load data, so RULEs were being ignored. Will try with TRIGGERs.
Definitely try triggers.
If you think you want to implement a rule, don't (the only exception that comes to mind is updatable views). See this great article by depesz for more explanation there.
In reality, Postgres only supports partitioning on the reading side of things. You're going to have setup the method of insertition into partitions yourself - in most cases TRIGGERing. Depending on the needs and applicaitons, it can sometimes be faster to teach your application to insert directly into the partitions.
When selecting from partioned tables, you can indeed just SELECT ... WHERE... on the master table so long as your CHECK constraints are properly setup (they are in your example) and the constraint_exclusion parameter is set corectly.
For 8.4:
SET constraint_exclusion = partition;
For < 8.4:
SET constraint_exclusion = on;
All this being said, I actually really like the way Postgres does it and use it myself often.
Does the above mean that in spite of
setting up the CHECK constraints and
the RULEs, I also have to create
TRIGGERs on the master table so that
the INSERTs go to the correct tables?
Yes. Read point 5 (section 5.9.2)
If that were the case, what would be
the point of the db supporting
partitioning? I could just set up the
separate tables myself?
Basically: the INSERTS in the child tables must be done explicitly (either creating TRIGGERS, or by specifying the correct child table in the query). But the partitioning
is transparent for SELECTS, and (given the storage and indexing advantages of this schema) that's the point.
(Besides, because the partitioned tables are inherited,
the schema is inherited from the parent, hence consistency
is enforced).
Triggers are definitelly better than rules.
Today I've played with partitioning of materialized view table and run into problem with triggers solution.
Why ?
I'm using RETURNING and current solution returns NULL :)
But here's solution which works for me - correct me if I'm wrong.
1. I have 3 tables which are inserted with some data, there's an view (let we call it viewfoo) which contains
data which need to be materialized.
2. Insert into last table have trigger which inserts into materialized view table
via INSERT INTO matviewtable SELECT * FROM viewfoo WHERE recno=NEW.recno;
That works fine and I'm using RETURNING recno; (recno is SERIAL type - sequence).
Materialized view (table) need to be partitioned because it's huge, and
according to my tests it's at least x10 faster for SELECT in this case.
Problems with partitioning:
* Current trigger solution RETURN NULL - so I cannot use RETURNING recno.
(Current trigger solution = trigger explained at depesz page).
Solution:
I've changed trigger of my 3rd table TO NOT insert into materialized view table (that table is parent of partitioned tables), but created new trigger which inserts
partitioned table directly FROM 3rd table and that trigger RETURN NEW.
Materialized view table is automagically updated and RETURNING recno works fine.
I'll be glad if this helped to anybody.