Changing a table into a continuous view - issue with dependent continuous views - pipelinedb

We're refactoring the structure of our tables and views and one of the improvements is changing one table (which is updated "manually" from a java class) to a continuous view.
The name of the view has to be the same with the old table and the old data has to be kept, so I think these steps would be logical:
ALTER TABLE old_table RENAME TO old_table_temp
Create the new continuous view
INSERT INTO new_continuous_view SELECT * FROM old_table_temp
DROP old_table_temp
The problem that I have right now is that when renaming the table, all the dependent views will still be depending on the newly named table so I can not drop it. The error looks like this:
analytics=> drop table renamed_table;
ERROR: cannot drop table renamed_table because other objects depend on it
DETAIL: continuous view cview1 depends on table renamed_table
continuous view cview2 depends on table renamed_table
continuous view cview3 depends on table renamed_table
continuous view cview4 depends on table renamed_table
Any idea would be appreciated, even if it's a different approach.

This is not possible. Each continuous view is backed by a materialization table (suffixed with _mrel), which stores the transition states of all aggregates being calculated in the continuous view. At read time these transition states are converted into finalized aggregate values. A simple example would be avg where the materialization table stores the sum and count, and at read time we calculate the average by dividing the two.
These transition states are mostly byte arrays with their internal implementation not exposed to the user and so the only way to create them is to back-fill data into the stream the continuous view is reading from and let our continuous query execution pipeline re-calculate them.
Modifications to materialization tables are disabled by default but you can enable them by setting the configuration parameter continuous_query_materialization_table_updatable. This might be useful in cases where you want to truncate old data or delete data for a particular group you want to back-fill data for.
As far as migrating dependent views is concerned, I think the easiest way would be to re-define them on the continuous view. The whole dependency management is internal to PostgreSQL and manually tweaking with it is not recommended.

Related

Problema with kinked materialized view when overwriting existing postgis table

Main question: I have several views depending on a PostgreSQL/PostGIS table and a final materialized view created by querying the other views. I need a fast and updatable final result (i.e. MV) to use in a QGIS project.
My aim is to update the starting table by overwriting it with new (lots of) values and hopefully have update views and materialized view. I use QGIS DB Manager to overwrite existing table but I get an error because of mv depending on it. If I delete mv, overwrite table and then recreate mv everything is ok but I'd like to avoid manual operations as much as possible.
Is there a better way to reach my goal?
Another question: If I set a trigger to refresh a mv when I update/insert/delete values in a table, would it work even in case of overwriting entire table with a new one?
Refreshing a materialized view runs the complete defining query, so that is a long running and heavy operation for a complicated query.
It is possible to launch REFRESH MATERIALIZED VIEW from a trigger (it had better be a FOR EACH STATEMENT trigger then), but that would make every data modification so slow that I don't think that is practically feasible.
One thing that might work is to implement something like a materialized view that refreshes immediately “by hand”:
create a regular table for the “materialized view” and fill it with data by running the query
on each of the underlying tables, define a row level trigger that modifies the materialized view in accordance with the changes that triggered it
This should work for views where the definition is simple enough, for complicated queries it will not be possible.

Using views in postgresql to enable transparent replacement of backing tables

We have a view that aggregates from a backing table. The idea is to reduce cpu load by using a pre-aggregated table, and to periodically refresh it with the following:
create new_backing_table (fill it)
begin
drop backingtable
rename new_backingtable to backingtable
commit
while in production. The latency caused by the refresh interval is acceptable. Incremental updates are possible but not desirable.
Anyone has a comment on this scheme ?
Check out materialized views. This may suit your use case. It can be used to store query results at creation then refreshed at a later time.
A materialized view is defined as a table which is actually physically stored on disk, but is really just a view of other database tables. In PostgreSQL, like many database systems, when data is retrieved from a traditional view it is really executing the underlying query or queries that build that view.
https://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html

Concurrency problems from eager materialization?

Using PostgreSQL 9.6.5, I am moving away from the built-in MATERIALIZED VIEW model because it lacks incremental refresh, and for my purposes (replication through SymmetricDS) I actually need the data in table storage, not view. My solution:
Create an "unmaterialized view" VIEW_XXX_UNMAT (really just a select)
Create table VIEW_XXX as select from VIEW_XXX_UNMAT (snapshot)
Add primary key to VIEW_XXX
Create a refresh function which deletes from and reinserts into VIEW_XXX based on the primary key
Create INSERT/DELETE/UPDATE triggers for each table involved in VIEW_XXX_UNMAT, which call the refresh function with the appropriate PK
My inspiration comes from this PGCon 2008 talk and once we get over the hurdle of creating all these triggers it all works nicely. Obviously we are limiting the UPDATE triggers to the columns that are involved, and only refreshing the view if NEW data is distinct from OLD data.
At this point I'd like to know:
If there are any better solutions out there (extension?) keeping in mind that I need tables in the end
Performance issues (I understand the refresh cost is write-bound, but because the materialized table is indexed, it's pretty fast?)
Concurrency issues (what if two people are updating the same field at the same time, which refreshes the materialized view through triggers, will one of them fail or will MVCC "take care of it"?)

Inserts to indexed views

Greetings Overflowers,
Is there an SQL DBMS that allows me to create an indexed view in which I can insert new rows without modifying the original tables of the view? I will need to query this view after performing the in-view-only inserts. If the answer is no, what other methods can do the job? I simply want to merge a set of rows that comes from another server with the set of rows in the created view -in a specific order- to be able to perform fast queries against the merged set, ie the indexed view, without having to persist the received set in disk. I am not sure if using in-memory database would perform well as the merged sets grow ridiculously?
What do you think guys?
Kind regards
Well, there's no supported way to do that, since the view has to be based on some table(s).
Besides that, indexed views are not meant to be used like that. You don't have to push any data into the index view thinking that you will make data retrieval faster.
I suggest you keep your view just the way it is. And then have a staging table, with the proper indexes created on it, in which you insert the data coming from the external system.
The staging table should be truncated anytime you want to get rid of the data (so right before you're inserting new data). That should be done in a SNAPSHOT ISOLATION transaction, so your existing queries don't read dirty data, or deadlock.
Then you have two options:
Use an UNION ALL clause to merge the results from the view and the staging table when you want to retrieve your data.
If the staging table shouldn't be merged, but inner joined, then you perhaps can integrate it in the indexed view.

Most efficient way of bulk loading unnormalized dataset into PostgreSQL?

I have loaded a huge CSV dataset -- Eclipse's Filtered Usage Data using PostgreSQL's COPY, and it's taking a huge amount of space because it's not normalized: three of the TEXT columns is much more efficiently refactored into separate tables, to be referenced from the main table with foreign key columns.
My question is: is it faster to refactor the database after loading all the data, or to create the intended tables with all the constraints, and then load the data? The former involves repeatedly scanning a huge table (close to 10^9 rows), while the latter would involve doing multiple queries per CSV row (e.g. has this action type been seen before? If not, add it to the actions table, get its ID, create a row in the main table with the correct action ID, etc.).
Right now each refactoring step is taking roughly a day or so, and the initial loading also takes about the same time.
From my experience you want to get all the data you care about into a staging table in the database and go from there, after that do as much set based logic as you can most likely via stored procedures. When you load into the staging table don't have any indexes on the table. Create the indexes after the data is loaded into the table.
Check this link out for some tips http://www.postgresql.org/docs/9.0/interactive/populate.html