Stop BDR from replicating DROP TABLE or CREATE TABLE - postgresql

I have two databases with tables that I want to sync. I don't want to sync any other table. I'm using Postgres-BDR to do that.
Those tables are part of replication set common. There are some circumstances where other tables share a name across nodes (but are NOT in common), and a node will call DROP TABLE and then CREATE TABLE. Even though those tables aren't part of the common replication set, these commands are still replicated to the other nodes, causing the other node to lose all of its data in its table and then create an empty table.
How can I stop this? I only want commands that affect common to be replicated to the other nodes.

Nevermind, I found it. It's available with bdr.skip_ddl_replication.
I just put bdr.skip_ddl_replication = on in postgresql.conf, restarted the server, and BOOM! Works like a charm.
EDIT
It would be prudent of me to point out that the documentation warns that this option could break database replication if used improperly. But since I'll be VERY tightly controlling the table schema, it shouldn't cause any problems.

Related

Different select results when using multimaster via pglogical in PostgreSQL

There are two PostgreSQL 9.6 nodes subscribed to each other via pglogical. If node A inserts a row into the replicated table then node B sees it and vice versa.
However, when I update a row on one node, then subsequent SELECT queries on both nodes will keep returning different results - the current one and some of the previous ones.
Moreover, there are log entries about replication conflicts in the logs of both nodes.
Why does that happen and how do I fix that?
upd: setting pglogical.conflict_resolution to last_update_wins helps. Might consider other options of conflict resolution too
Multi-master replication is difficult.
There are conflicts that are bound to occur unless your application is aware of and specifically tailored to multi-master replication:
Rows inserted on different nodes with the same (automatically generated primary key must conflict.
If you modify the primary key of a row on one node while updating or deleting it on another, the databases will “drift apart”, leading to future conflicts.
You will have to fix your application so that it avoids problems like the above, and you will have to manually find and resolve all the conflicts that occurred so far.
Here is an example of the second case:
-- node one:
UPDATE person
SET id = 1234
WHERE id = 6543;
-- at the same time on node two
DELETE FROM person
WHERE id = 6543;
Both statements will be replicated to the other node, but do nothing there, because both nodes no longer have a person with id 6543 any more. There will be no replication conflict right away, but node one now has a person that node two doesn't have. It is easy to see how this can lead to replication conflicts later (imagine you insert a row on node one that has a foreign key relationship to person 1234).
This is why it is in most cases a good idea to consider an architecture that does not include multi-master replication.

Insert data into remote DB tables from multiple databases through trigger or replication or foreign data wrapper

I need some advice about the following scenario.
I have multiple embedded systems supporting PostgreSQL database running at different places and we have a server running on CentOS at our premises.
Each system is running at remote location and has multiple tables inside its database. These tables have the same names as the server's table names, but each system has different table name than the other systems, e.g.:
system 1 has tables:
sys1_table1
sys1_table2
system 2 has tables
sys2_table1
sys2_table2
I want to update the tables sys1_table1, sys1_table2, sys2_table1 and sys2_table2 on the server on every insert done on system 1 and system 2.
One solution is to write a trigger on each table, which will run on every insert of both systems' tables and insert the same data on the server's tables. This trigger will also delete the records in the systems after inserting the data into server. The problem with this solution is that if the connection with the server is not established due to network issue than that trigger will not execute or the insert will be wasted. I have checked the following solution for this
Trigger to insert rows in remote database after deletion
The second solution is to replicate tables from system 1 and system 2 to the server's tables. The problem with replication will be that if we delete data from the systems, it'll also delete the records on the server. I could add the alternative trigger on the server's tables which will update on the duplicate table, hence the replicated table can get empty and it'll not effect the data, but it'll make a long tables list if we have more than 200 systems.
The third solution is to write a foreign table using postgres_fdw or dblink and update the data inside the server's tables, but will this effect the data inside the server when we delete the data inside the system's table, right? And what will happen if there is no connectivity with the server?
The forth solution is to write an application in python inside each system which will make a connection to server's database and write the data in real time and if there is no connectivity to the server than it will store the data inside the sys1.table1 or sys2.table2 or whatever the table the data belongs and after the re-connect, the code will send the tables data into server's tables.
Which option will be best according to this scenario? I like the trigger solution best, but is there any way to avoid the data loss in case of dis-connectivity from the server?
I'd go with the fourth solution, or perhaps with the third, as long as it is triggered from outside the database. That way you can easily survive connection loss.
The first solution with triggers has the problems you already detected. It is also a bad idea to start potentially long operations, like data replication across a network of uncertain quality, inside a database transaction. Long transactions mean long locks and inefficient autovacuum.
The second solution may actually also be an option if you you have a recent PostgreSQL versions that supports logical replication. You can use a publication WITH (publish = 'insert,update'), so that DELETE and TRUNCATE are not replicated. Replication can deal well with lost connectivity (for a while), but it is not an option if you want the data at the source to be deleted after they have been replicated.

How does pglogical-2 handle logical replication on same table while allowing it to be writeable on both databases?

Based on the above image, there are certain tables I want to be in the Internal Database (right hand side). The other tables I want to be replicated in the external database.
In reality there's only one set of values that SHOULD NOT be replicated across. The rest of the database can be replicated. Basically the actual price columns in the prices table cannot be replicated across. It should stay within the internal database.
Because the vendors are external to the network, they have no access to the internal app.
My plan is to create a replicated version of the same app and allow vendors to submit quotations and picking items.
Let's say the replicated tables are at least quotations and quotation_line_items. These tables should be writeable (in terms of data for INSERTs, UPDATEs, and DELETEs) at both the external database and the internal database. Hence at both databases, the data in the quotations and quotation_line_items table are writeable and should be replicated across in both directions.
The data in the other tables are going to be replicated in a single direction (from internal to external) except for the actual raw prices columns in the prices table.
The quotation_line_items table will have a price_id column. However, the raw price values in the prices table should not appear in the external database.
Ultimately, I want the data to be consistent for the replicated tables on both databases. I am okay with synchronous replication, so a bit of delay (say, a couple of second for the write operations) is fine.
I came across pglogical https://github.com/2ndQuadrant/pglogical/tree/REL2_x_STABLE
and they have the concept of PUBLISHER and SUBSCRIBER.
I cannot tell based on the readme which one would be acting as publisher and subscriber and how to configure it for my situation.
That won't work. With the setup you are dreaming of, you will necessarily end up with replication conflicts.
How do you want to prevent that data are modified in a conflicting fashion in the two databases? If you say that that won't happen, think again.
I believe that you would be much better off using a single database with two users: one that can access the “secret” table and one that cannot.
If you want to restrict access only to certain columns, use a view. Simple views are updateable in PostgreSQL.
It is possible with BDR replication which uses pglogical. On a basic level by allocating ranges of key ids to each node so writes are possible in both locations without conflict. However BDR is now a commercial paid for product.

Is it possible to add tables to bucardo automatically?

I want all tables to be replicated by bucardo (at least for a given database), but it looks like I have to add them all manually:
bucardo_ctl add all tables
Can I have it so that every table in a database is replicated, or added to bucardo automatically?
If not, is there another replication strategy in Postgresql that might be better fit for me? I'm hoping to have all nodes available for reads/writes, to avoid administering any routing process to route the writes to the master. If the routing of writes can be done natively in Postgresql, then that could be a solution as well.
Use below command for add all
bucardo add all tables

Slony "duplicate key value violates unique constraint" error

I have a problem which goes on for longer time. I use slony to replicate database from master to slave and from that slave to three other backup servers. Once a 2-3 weeks there is a key duplication problem that happens only on one specific table (big but not biggest in database).
It started to occur like year ago on Postgres 8.4 and slony 1 and we switched to 2.0.1. Later we upgraded it to 2.0.4, and we succesfuly upgraded slony to 2.1.3 and it's our current version. We started fresh replication on same computers and it was all going well until today. We got the same duplication key error on same table (with different keys every time of course).
Solution to clean it up is just to delete invalid key on slaves (it spreads across all nodes) and it's all working again. Data is not corrupted. But source of problem remains unsolved.
In googles I found nothing related to this problem (we did not used truncate on any table, we did not change the structure of table).
Any ideas what can be done about it?
When this problem occured in our setup, it turned out that the schema of the master database was older than the slaves' and didn't have the UNIQUE constraint for this particular column. So, my advice would be:
make sure the master table has in fact the constraint
if not:
clean the table
add the constraint
else:
revoke write privileges from all clients except slony for the replicated tables.
As Craig has said usually this is a write transaction to a replica. So the first things to do is to verify permissions. If this keeps happening, what you can do is start logging connections of the readers of the replicas and keep them around so when the issue happens, you can track down where the bad tuple came from. This can generate a LOT of logs however so you probably want to see to what extent you can narrow this down first. You presumably know which replica this is starting on, so you can go from there.
A particular area of concern I would spot would be what happens if you have a user defined function which writes. A casual observer might not spot that in the query, nor might a connection pooler.