pg_restore: creating MATERIALIZED VIEW DATA taking long time

pg_restore: creating MATERIALIZED VIEW DATA taking long time - postgresql

I'm using postgresql 9.5.3. When I restoring the database, and it creating MATERIALIZED VIEW DATA, it takes me a long time, 3 hrs more. Is this really normal for postgresql?
pg_restore: creating MATERIALIZED VIEW DATA "public.mydata"
still on process. and my database.backup is 15gb sized

That depends on the view definition, the current table statistics and the data present.
You could examine the query plan for query defining the materialized view with EXPLAIN and see if there are any problems with the plan.
If it is a complicated query, maybe the problem is that autoanalyze didn't have time yet to calculate table statistics.
You can interrupt the statement with pg_cancel_backend() and recreate the materialized view later, perhaps after an ANALYZE, if it helps you to bring the rest of the database up quickly.

Related

REFRESH MATERIALIZED VIEW suddenly taking more time to complete

We have a materialized view in our Postgres DB (11.12, managed by AWS RDS). We have a scheduled task that updates it every 5 minutes using REFRESH MATERIALIZED VIEW <view_name>. At some specific point last week, the time needed to refresh the view suddenly went from ~1s to ~20s. The view contains ~70k rows, with around 15 columns, all of them being integers, booleans or UUIDs.
Prior to this:
There were no changes in the server configuration.
There were no changes to the view itself. In fact, running EXPLAIN ANALYZE <expression used to create the view> returns that the query still gets executed in less than a second. If the query is ran with a client like Postico, it takes ~20s to fetch all the results (a bit consistent with the time needed to materialize it, although we assume this is due to the time needed for network transmission).
There were no changes in the schema or any significant record increase in the contents of the tables needed to compute the view.
RDS Performance Insights indicate that the query is mostly using CPU resources
I know this is probably not enough to get a solution, but:
Are there any server performance metrics or logs that could lead us to understand better this situation?
Is this just the normal time the server needs to persist the view to disk? If so, any idea of possible reasons why it started to take so long recently?
Here is a link to the execution plan.
EDIT: creating another materialized view with the same JOINS but selecting less columns performs as expected (~1s).
EDIT 2: setting enable_nestloop = false greatly speeds up the REFRESH operation (same performance as before). Would this indicate that refactoring the underlying query could solve the issue?

Try REFRESH materialized view concurrently.
When you refresh data for a materialized view, PostgreSQL locks the entire table therefore you cannot query data against it. To avoid this, you can use the CONCURRENTLY option.
REFRESH MATERIALIZED VIEW CONCURRENTLY view_name;
With CONCURRENTLY option, PostgreSQL creates a temporary updated version of the materialized view, compares two versions, and performs INSERT and UPDATE only the differences.
You can query against a materialized view while it is being updated. One requirement for using CONCURRENTLY option is that the materialized view must have a UNIQUE index.

Original poster here. This is more than a year old, but here's what happened and how we eventually fixed it.
TLDR:
-REFRESH MATERIALIZED VIEW <query> started to take much longer than executing the query used to construct the view (~1s vs ~20s).
After a couple of weeks this question was asked, the query itself started to behave similarly (taking ~20s to complete). At this point, the EXPLAIN ANALYZE started to show indications of performance issues with the query. So we ended up optimising the underlying query (the biggest performance gain being replacing some JOINS with a CTE).
After this, the performance of both the REFRESH MATERIALIZED VIEW <query> and the standalone query behaved correctly (execution time < 1s).
A still open question here is why the REFRESH MATERIALIZED VIEW <query> and the standalone query had different performance at some point in time? Was the DB query planner choosing different query plans depending on whether it was going to materialize the view or not? I guess if someone knows if such thing is possible, please comment.

Updates materialized view every time (or every 5 minutes) this is not a good way to refresh materialized. Then the meaning of using materialized view does not remain. Let me explain to you one of the ways I found with my own logic, based on my own experience, so you can find a more optimal way later. Assumed, we used two tables in our materialized view, and we need that is a changed data one of the two tables we will refresh materialized view. To do this during the update or delete table we must insert to the table (for example refresh_materialized table) one record (you can also use the trigger), through which will be performed refreshing materialized view
For example:
insert into refresh_materialized
(
refresh_status,
insert_date,
executed_date
)
values (
false,
now(),
null
)
And so in our schedule task, we can use this query:
select count(*) from refresh_materialized
where refresh_status = false
if the count(*) will be > 0 then we must refresh materialized view else do nothing. After the refreshing materialized view we must update this table:
update refresh_materialized
set
refresh_status = true,
executed_date = now()
where
refresh_status = false;

postgresql materialized view refresh history time

I'm working on a project which requires me to write a query to create a materialized view in PostgreSQL. My requirement is that How I can get PostgreSQL materialized view refresh history time for specific materialized view.

PostgreSQL does not store the time when an SQL statement like REFRESH MATERIALIZED VIEW is run.
Any attempt to rely on the file modification time of the underlying data file is in vain, as jobs like autovacuum may modify the file.
The only way to retain such information is to store the times when you run the statement in a table yourself.
An alternative could be to log all DDL statements (log_statement = 'ddl') and retrieve the information from the log file.

performance of refreshing postgres materialized view

I am exploring materialized views to create de-normalized view to avoid joining multiple tables for read performance. APIs will read from the materialized views to provide data to clients.
I am using amazon aurora postgres (version 11).
I am using a unique index on the materialized view (MV) so that I can use the “refresh concurrently” option.
What I am noticing though is that when only a fraction of the rows get updated in one of the source tables and I try to refresh the view, it's pretty slow. In fact slower than populating the view for the first time. e.g.: to populate MV first time takes ~30 mins, refresh is taking more than an hour. less than 1% of rows have been updated. The main three tables involved in generating the MV have ~18 million, 27 million & 40 million rows.
The timeliness of the materialized view refresh is important so that data is not stale for too long.
I could go with custom tables to store the denormalized data instead of materialized views but would have to implement logic to refresh data. So planning to avoid that if possible.
Is there anything that can be done to speed up the refresh process of the materialized views?
Please let me know if you need more details.
thanks
Kiran

You can create a second materialised view and update it (not concurrently), and then swap the names.to the tables in a transaction.
I actually have no idea why postgres didn't implement CONCURRENTLY this way.

Refreshing a materialized view is slow even if little has changed, because every time the view is refreshed, the defining query is run.
Using CONCURRENTLY makes the operation even slower, because it is not a wholesale replacement of the materialized view contents, but modification of the existing data.
Perhaps you could create a denormalized table that is updated by a trigger whenever the underlying tables are modified.

Maybe I'm a bit late to the party, but if you would be on postgres 13 or later you could try this extension: https://github.com/sraoss/pg_ivm
They have some limitations but promise much faster rebuilt times.
Here's a bit more on it from pganalyze: https://pganalyze.com/blog/5mins-postgres-15-beta1-incremental-materialized-views-pg-ivm (in contrary to project authors, they want you to be on v14 or later)

Using views in postgresql to enable transparent replacement of backing tables

We have a view that aggregates from a backing table. The idea is to reduce cpu load by using a pre-aggregated table, and to periodically refresh it with the following:
create new_backing_table (fill it)
begin
drop backingtable
rename new_backingtable to backingtable
commit
while in production. The latency caused by the refresh interval is acceptable. Incremental updates are possible but not desirable.
Anyone has a comment on this scheme ?

Check out materialized views. This may suit your use case. It can be used to store query results at creation then refreshed at a later time.
A materialized view is defined as a table which is actually physically stored on disk, but is really just a view of other database tables. In PostgreSQL, like many database systems, when data is retrieved from a traditional view it is really executing the underlying query or queries that build that view.
https://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html

Postgres materialized views or CREATE TABLE AS if not updating incrementally?

If I don't plan to implement incremental updates for materialized views in Postgres, are there any advantages to using them over CREATE TABLE AS? From what I have read, when you refresh a materialized view, that view locks and is not readable. Since it's unavailable, it seems to have the same affect as dropping and recreating a table at the same rate you run refresh on a materialized view.

As of PostgreSQL 9.3.2 you also cannot use materialized view data as basis of UPDATE query.
So if you need to use this view as a basis of some updates then you are better of using regular tables.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse