Postgres materialized views with hot standby - postgresql

We are using Postgres (9.3) Hot Standby to build a read-only copy of a database. We have a UI which reads from a materialized view.
When we try to read from the materialized view in the standby database, the query hangs.
The materialized view takes ~10 seconds to rebuild in the master database. We've waited over 30 minutes for the query in the standby database and it seems to never complete.
Notably, the materialized view does exist in the standby database. We can't refresh it of course (since the DB is read only)
We can't find anything in the documentation which indicates that materialized views can't be used in standby databases, but that appears to be the case.
Has anyone got this to work, and/or what is the recommended work-around?

According to PostgreSQL Documentation - Hot Standby there is a way to handle Query Conflicts by assigning proper values to
max_standby_archive_delay & max_standby_streaming_delay
that define the maximum allowed delay in WAL application. In your case a high value may be preferable.

Related

Refresh materialized view on Postgresql 11 on RDS

We are currently on Postgres 11.13 on AWS RDS. I am trying to create a materialized view that takes about 6-7 minutes to run. What is the best way to keep this MV mostly up to date? I was thinking of pg_cron but I believe that is only available on PG 12.5+. A trigger on the underlying tables could be an option, but these tables either do not get updated at all, or underlying tables have many many inserts that occur in a short period of time. I don't want to trigger excessive refreshes.
Any suggestions for this scenario?

performance of refreshing postgres materialized view

I am exploring materialized views to create de-normalized view to avoid joining multiple tables for read performance. APIs will read from the materialized views to provide data to clients.
I am using amazon aurora postgres (version 11).
I am using a unique index on the materialized view (MV) so that I can use the “refresh concurrently” option.
What I am noticing though is that when only a fraction of the rows get updated in one of the source tables and I try to refresh the view, it's pretty slow. In fact slower than populating the view for the first time. e.g.: to populate MV first time takes ~30 mins, refresh is taking more than an hour. less than 1% of rows have been updated. The main three tables involved in generating the MV have ~18 million, 27 million & 40 million rows.
The timeliness of the materialized view refresh is important so that data is not stale for too long.
I could go with custom tables to store the denormalized data instead of materialized views but would have to implement logic to refresh data. So planning to avoid that if possible.
Is there anything that can be done to speed up the refresh process of the materialized views?
Please let me know if you need more details.
thanks
Kiran
You can create a second materialised view and update it (not concurrently), and then swap the names.to the tables in a transaction.
I actually have no idea why postgres didn't implement CONCURRENTLY this way.
Refreshing a materialized view is slow even if little has changed, because every time the view is refreshed, the defining query is run.
Using CONCURRENTLY makes the operation even slower, because it is not a wholesale replacement of the materialized view contents, but modification of the existing data.
Perhaps you could create a denormalized table that is updated by a trigger whenever the underlying tables are modified.
Maybe I'm a bit late to the party, but if you would be on postgres 13 or later you could try this extension: https://github.com/sraoss/pg_ivm
They have some limitations but promise much faster rebuilt times.
Here's a bit more on it from pganalyze: https://pganalyze.com/blog/5mins-postgres-15-beta1-incremental-materialized-views-pg-ivm (in contrary to project authors, they want you to be on v14 or later)

How to see changes in a postgresql database

My postresql database is updated each night.
At the end of each nightly update, I need to know what data changed.
The update process is complex, taking a couple of hours and requires dozens of scripts, so I don't know if that influences how I could see what data has changed.
The database is around 1 TB in size, so any method that requires starting a temporary database may be very slow.
The database is an AWS instance (RDS). I have automated backups enabled (these are different to RDS snapshots which are user initiated). Is it possible to see the difference between two RDS automated backups?
I do not know if it is possible to see difference between RDS snapshots. But in the past we tested several solutions for similar problem. Maybe you can take some inspiration from it.
Obvious solution is of course auditing system. This way you can see in relatively simply way what was changed. Depending on granularity of your auditing system down to column values. Of course there is impact on your application due auditing triggers and queries into audit tables.
Another possibility is - for tables with primary keys you can store values of primary key and 'xmin' and 'ctid' hidden system columns (https://www.postgresql.org/docs/current/static/ddl-system-columns.html) for each row before updated and compare them with values after update. But this way you can identify only changed / inserted / deleted rows but not changes in different columns.
You can make streaming replica and set replication slots (and to be on the safe side also WAL log archiving ). Then stop replication on replica before updates and compare data after updates using dblink selects. But these queries can be very heavy.

Using views in postgresql to enable transparent replacement of backing tables

We have a view that aggregates from a backing table. The idea is to reduce cpu load by using a pre-aggregated table, and to periodically refresh it with the following:
create new_backing_table (fill it)
begin
drop backingtable
rename new_backingtable to backingtable
commit
while in production. The latency caused by the refresh interval is acceptable. Incremental updates are possible but not desirable.
Anyone has a comment on this scheme ?
Check out materialized views. This may suit your use case. It can be used to store query results at creation then refreshed at a later time.
A materialized view is defined as a table which is actually physically stored on disk, but is really just a view of other database tables. In PostgreSQL, like many database systems, when data is retrieved from a traditional view it is really executing the underlying query or queries that build that view.
https://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html

What operations does pg_dump block?

I am trying to backup a live postgres database using pg_dump, however when ever I attempt to do so it causes things to blow up.
I have many live queries aggressively reading from a materialized view (it's a cache), which is aggressively refreshed (every minute or more).
I am starting to believe that pg_dump blocks the REFRESH MATERIALIZED VIEW from occurring, which blocks the reads from the materialized view, which causes things to explode.
Is this line of reasoning correct? What other operations does pg_dump block, and how should I do the backup?
From the documentation (emphasis mine):
pg_dump does not block other users accessing the database (readers or writers).
Your problem lies elsewhere.