Trigger for two databases - postgresql

I have two databases, db1 and db2,
each has a able. The two databases have the same data in each table.
For the databases remain the same, I need a trigger that causes, when inserting a new data table in db1, this same data is inserted into db2 table simultaneously, thus keeping the two databases always equal.
There tables are equal.

Google Slony, that's it, now you have triggers and replicas.
The other way to do it, would be the streaming replication.

Related

Creating denormalized tables with triggers too slow

Assume I'm doing everything in one postgresql database. I have 10 source tables I'm using to create one huge denormalized table. These source tables change frequently and have triggers firing after insert/update/delete to modify denormalized table in near-real-time. The problem is, some of these source tables I'm joining are huge (one table has 120M and other 25M rows) and statements for inserting new rows into denormalized table execute for a long time (20+ minutes for 50-100k rows).
So, I was thinking on what would be the best solution for updating(IUD)changes on this denormalized table, based on changes coming to source tables? Should I run these operations on a schedule, should I dedicate a specific database replica just for this, or should I continue trying to use triggers?
I'm open to using a totally different approach, as long as it's doable on the same database.
That sounds like there is no good and simple solution.
Perhaps you don't need that one huge denormalized table, and denormalizing a few attributes would be good enough for your query speed.
If not, you will probably need a kind of data warehouse for the denormalized data, and refresh that daily with increments. Ideally, tables there are already pre-aggregated.

Postgresql can a logical replica have DB objects of its own?

I have 2 postgresql 12 DB servers, say A and B. A is the main DB.
B consists of some Foreign tables pointing to A and some materialized views built with those foreign table joins. The materialized views refresh nightly and with increasing data, refresh over FDWs are taking awfully long as SQLs over FDWs can’t parallelized.
I wanted to know if -
a logical replica ( which gives me the ability to have only few tables replicated) can have some of its own objects ( mat views in my case, so that the refresh does not have to pull and join tables over FDW)
For those familiar with Oracle’s Golden Gate, is there anything similar for postgres? i.e log based not trigger based? open source would be better!
Thanks
It sounds like logical replication would indeed be the solution for you:
You can replicate individual tables with it, which you should not modify on B, but otherwise B is a normal database where you can have other tables.
Logical replication works by parsing the transaction log, just like you want. So all data modifications are replicated incrementally.
The replicated tables on B will be duplicates of the tables on A, so they are physically present on B (with foreign tables, there are no data on B, and accessing the foreign tables will actually access data on A). So there is no immediate need for materialized views.
Note that there are some limitations to logical replication. Most notable, ALTER TABLE and other DDL statements are not replicated.

How does AWS postgres RDS read replication handle schema switching?

I am wanting to know how an AWS postgres RDS does replication where I rename schemas to "swap" them within the read/write instance of the database.
Does it replicate this action to the read-replicas by sending on the "alter schema" rename commands I gave to my read/write instance? Or after my renames, does it see wholly different sets of data in the schemas and do a whole new copy of each out to the read-replicas?
For example...
In my RDS instance I have a read/write instance of "my_mega_database" which I want to create read-replicas of for my applications to connect to.
Typically, in "my_mega_database" there are two schemas "my_data" and "my_data_old", whereby "my_data" contains data that was delivered last night, and "my_data_old" contains data from the previous night. Each contains many tables and huge amounts of data.
If I were to do the following...
ALTER SCHEMA my_data_old RENAME TO my_data_tmp;
ALTER SCHEMA my_data RENAME TO my_data_old;
ALTER SCHEMA my_data_tmp RENAME TO my_data;
... I have affectively swapped these around.
My expectation is that these actions are replicated via the postgres WAL (ie: it sends the rename commands out to the replicas) and AWS RDS replication won't try and waste time copying huge amounts of data all over the place.
Is this correct?
(Speaking about PostgreSQL here, but RDS is probably similar.)
Renaming a schema (or any other object) is a small update in a catalog table, and no data are moved. Internally PostgreSQL uses only the numeric object ID, which stays the same.
You might wrap the three statements in a transaction to make the whole magic atomic.
The same is true on the standby, it is a trivial (meta)data modification.
The only thing that might be a problem are concurrent sessions holding locks.

How does pglogical-2 handle logical replication on same table while allowing it to be writeable on both databases?

Based on the above image, there are certain tables I want to be in the Internal Database (right hand side). The other tables I want to be replicated in the external database.
In reality there's only one set of values that SHOULD NOT be replicated across. The rest of the database can be replicated. Basically the actual price columns in the prices table cannot be replicated across. It should stay within the internal database.
Because the vendors are external to the network, they have no access to the internal app.
My plan is to create a replicated version of the same app and allow vendors to submit quotations and picking items.
Let's say the replicated tables are at least quotations and quotation_line_items. These tables should be writeable (in terms of data for INSERTs, UPDATEs, and DELETEs) at both the external database and the internal database. Hence at both databases, the data in the quotations and quotation_line_items table are writeable and should be replicated across in both directions.
The data in the other tables are going to be replicated in a single direction (from internal to external) except for the actual raw prices columns in the prices table.
The quotation_line_items table will have a price_id column. However, the raw price values in the prices table should not appear in the external database.
Ultimately, I want the data to be consistent for the replicated tables on both databases. I am okay with synchronous replication, so a bit of delay (say, a couple of second for the write operations) is fine.
I came across pglogical https://github.com/2ndQuadrant/pglogical/tree/REL2_x_STABLE
and they have the concept of PUBLISHER and SUBSCRIBER.
I cannot tell based on the readme which one would be acting as publisher and subscriber and how to configure it for my situation.
That won't work. With the setup you are dreaming of, you will necessarily end up with replication conflicts.
How do you want to prevent that data are modified in a conflicting fashion in the two databases? If you say that that won't happen, think again.
I believe that you would be much better off using a single database with two users: one that can access the “secret” table and one that cannot.
If you want to restrict access only to certain columns, use a view. Simple views are updateable in PostgreSQL.
It is possible with BDR replication which uses pglogical. On a basic level by allocating ranges of key ids to each node so writes are possible in both locations without conflict. However BDR is now a commercial paid for product.

Postgresql archiving old data

I need some expert advice on Postgres
I have few tables in my database that can grow huge, may be a hundred million records and have to implement some sort of data archiving in place. Say I have a subscriber table and subscriber_logs table. The subscriber_logs table will grow huge with time, affecting performance. I wanted to create a separate table called archive_subscriber_logs and create a scheduled task which will read from subscriber_logs and insert the data into archive_subscriber_logs, then delete the dumped data from subscriber_logs.
But my concern is, should I create the archive_subscriber_logs in the same database or in a different database. The problem with storing in a different db is the foreign key constraints that already exists on the main tables.
Anyone can suggest whether same db or different db is preferable? Or any other solutions?
Consider table partitioning, which is implemented in Postgres using table inheritance. This will improve performance on very large tables. Of course you would do measurements first to make sure it is worth implementing. The details are in the excellent Postgres documentation.
Using separate databases is not recommended because you won't be able to have foreign key constraints easily.