We have a view that aggregates from a backing table. The idea is to reduce cpu load by using a pre-aggregated table, and to periodically refresh it with the following:
create new_backing_table (fill it)
begin
drop backingtable
rename new_backingtable to backingtable
commit
while in production. The latency caused by the refresh interval is acceptable. Incremental updates are possible but not desirable.
Anyone has a comment on this scheme ?
Check out materialized views. This may suit your use case. It can be used to store query results at creation then refreshed at a later time.
A materialized view is defined as a table which is actually physically stored on disk, but is really just a view of other database tables. In PostgreSQL, like many database systems, when data is retrieved from a traditional view it is really executing the underlying query or queries that build that view.
https://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
Related
I would like to do something like
results = select * from quick_table
if no results:
insert into quick_table select slow_query
results = select * from quick_table
return results
This is a pretty standard caching pattern. Is there any way I can do this in postgres that's more clever than just literally writing a function to do what I listed above?
The PostgreSQL feature that comes closest to what you want to do is a materialized view.
This creates a copy on disk of the results of your view, which you can then query as if it were a table. You can also add indexes to it in the usual way.
A caveat is that when you generate a materialized view, its data does not update automatically when the source tables’ data change. To reflect changes, you must issue a REFRESH MATERIALIZED VIEW command.
Typical approaches to refreshing are:
Run the refresh as a background task (e.g., in a cron job)
Add triggers to the source tables such that changing data in them causes the view to refresh.
Each approach has advantages and disadvantages, so the route you take will depend on your circumstance. It may also be useful to make sure you can add a unique index to your MV, as that will allow you to run concurrent refreshes; without that, a refresh places an exclusive lock on the view, so it won’t be readable until the refresh has finished.
Main question: I have several views depending on a PostgreSQL/PostGIS table and a final materialized view created by querying the other views. I need a fast and updatable final result (i.e. MV) to use in a QGIS project.
My aim is to update the starting table by overwriting it with new (lots of) values and hopefully have update views and materialized view. I use QGIS DB Manager to overwrite existing table but I get an error because of mv depending on it. If I delete mv, overwrite table and then recreate mv everything is ok but I'd like to avoid manual operations as much as possible.
Is there a better way to reach my goal?
Another question: If I set a trigger to refresh a mv when I update/insert/delete values in a table, would it work even in case of overwriting entire table with a new one?
Refreshing a materialized view runs the complete defining query, so that is a long running and heavy operation for a complicated query.
It is possible to launch REFRESH MATERIALIZED VIEW from a trigger (it had better be a FOR EACH STATEMENT trigger then), but that would make every data modification so slow that I don't think that is practically feasible.
One thing that might work is to implement something like a materialized view that refreshes immediately “by hand”:
create a regular table for the “materialized view” and fill it with data by running the query
on each of the underlying tables, define a row level trigger that modifies the materialized view in accordance with the changes that triggered it
This should work for views where the definition is simple enough, for complicated queries it will not be possible.
I am exploring materialized views to create de-normalized view to avoid joining multiple tables for read performance. APIs will read from the materialized views to provide data to clients.
I am using amazon aurora postgres (version 11).
I am using a unique index on the materialized view (MV) so that I can use the “refresh concurrently” option.
What I am noticing though is that when only a fraction of the rows get updated in one of the source tables and I try to refresh the view, it's pretty slow. In fact slower than populating the view for the first time. e.g.: to populate MV first time takes ~30 mins, refresh is taking more than an hour. less than 1% of rows have been updated. The main three tables involved in generating the MV have ~18 million, 27 million & 40 million rows.
The timeliness of the materialized view refresh is important so that data is not stale for too long.
I could go with custom tables to store the denormalized data instead of materialized views but would have to implement logic to refresh data. So planning to avoid that if possible.
Is there anything that can be done to speed up the refresh process of the materialized views?
Please let me know if you need more details.
thanks
Kiran
You can create a second materialised view and update it (not concurrently), and then swap the names.to the tables in a transaction.
I actually have no idea why postgres didn't implement CONCURRENTLY this way.
Refreshing a materialized view is slow even if little has changed, because every time the view is refreshed, the defining query is run.
Using CONCURRENTLY makes the operation even slower, because it is not a wholesale replacement of the materialized view contents, but modification of the existing data.
Perhaps you could create a denormalized table that is updated by a trigger whenever the underlying tables are modified.
Maybe I'm a bit late to the party, but if you would be on postgres 13 or later you could try this extension: https://github.com/sraoss/pg_ivm
They have some limitations but promise much faster rebuilt times.
Here's a bit more on it from pganalyze: https://pganalyze.com/blog/5mins-postgres-15-beta1-incremental-materialized-views-pg-ivm (in contrary to project authors, they want you to be on v14 or later)
If I store my query results as views does it take more space of my memory in comparison to a table with query results?
Another question about views is that can I have new query based on the results of a query that is stored as views?
Views don't store query results, they store queries.
Some RDBMS allow the way to store query results (for some queries): this is called materialized views in Oracle and indexed views in SQL Server.
PostgreSQL does not support those (though, as #CalvinCheng mentioned, you can emulate those using triggers or rules).
Yes, you can use views in your queries. However, a view is just a convenient way to refer to a complex query by name, not a way to store its results.
For Question 1
To answer your first question, you cannot store your query results as views but you can achieve a similar functionality using PostgreSQL's trigger feature.
PostgreSQL supports creation of views natively but not the creation of materialized views (views that store your results) - but this can be handled using triggers. See http://wiki.postgresql.org/wiki/Materialized_Views
views do not take up RAM ("memory").
For Question 2
And to answer the second question, to update a view in postgresql, you will need to use CREATE RULE - http://www.postgresql.org/docs/devel/static/sql-createrule.html
CREATE RULE defines a new rule applying to a specified table or view. CREATE OR REPLACE RULE will either create a new rule, or replace
an existing rule of the same name for the same table.
I would like to point out that as of Postgres 9.3, Materialized Views are supported
PostgreSQL view is a saved query. Once created, selecting from a view is exactly the same as selecting from the original query, it returns the query each time. so views do not take up memory.
You can not store your query results as views, views are just queries, but you can achieve a similar functionality using materialized views. Materialized views they are only updated on demand. Second, the whole materialized view must be updated; there is no way to only update a single stale row.
So in that case you have to eagerly update view whenever a change occurs that would invalidate a row. It can be done with triggers.
Greetings Overflowers,
Is there an SQL DBMS that allows me to create an indexed view in which I can insert new rows without modifying the original tables of the view? I will need to query this view after performing the in-view-only inserts. If the answer is no, what other methods can do the job? I simply want to merge a set of rows that comes from another server with the set of rows in the created view -in a specific order- to be able to perform fast queries against the merged set, ie the indexed view, without having to persist the received set in disk. I am not sure if using in-memory database would perform well as the merged sets grow ridiculously?
What do you think guys?
Kind regards
Well, there's no supported way to do that, since the view has to be based on some table(s).
Besides that, indexed views are not meant to be used like that. You don't have to push any data into the index view thinking that you will make data retrieval faster.
I suggest you keep your view just the way it is. And then have a staging table, with the proper indexes created on it, in which you insert the data coming from the external system.
The staging table should be truncated anytime you want to get rid of the data (so right before you're inserting new data). That should be done in a SNAPSHOT ISOLATION transaction, so your existing queries don't read dirty data, or deadlock.
Then you have two options:
Use an UNION ALL clause to merge the results from the view and the staging table when you want to retrieve your data.
If the staging table shouldn't be merged, but inner joined, then you perhaps can integrate it in the indexed view.