Help me to chouse a simple (lightweight) solution to the master-slave replication of one table between two Postgresql databases. The table contains a large object.
Here you'll find a very good overview of the replication tools for PostgreSQL. Please, take a look and hopefully you'll be able to pick one.
Otherwise, if you need something really lightweight, you can do it yourself. You'll need a trigger and a couple of functions, a dblink module if you need almost immediate changes propagation, otherwise you can survive with cron.
Related
I have a request asking for a read only schema replica for a role in postgresql. After reading documentation and better understanding replication in postgresql, I'm trying to identify whether or not I can create the publisher and subscriber within the same database.
Any thoughts on the best approach without having a second server would be greatly appreciated.
You asked two different question. Same database? No. Since Pub/Sub requires tables to have the same name (including schema) on both ends, you would be trying to replicate a table onto itself. Using logical replication plugins other than the built-in one might get around this restriction.
Same server? Yes. You can replicate between two databases of the same instance (but see the note in the docs about some extra hoops you need to jump through) or between two instances on the same host. So whichever of those things you meant by "same server", yes, you can.
But it seems like an odd way to do this. If the access is read only, why does it matter whether it is to a replica of the real data or to the real data itself?
I want all tables to be replicated by bucardo (at least for a given database), but it looks like I have to add them all manually:
bucardo_ctl add all tables
Can I have it so that every table in a database is replicated, or added to bucardo automatically?
If not, is there another replication strategy in Postgresql that might be better fit for me? I'm hoping to have all nodes available for reads/writes, to avoid administering any routing process to route the writes to the master. If the routing of writes can be done natively in Postgresql, then that could be a solution as well.
Use below command for add all
bucardo add all tables
We have a large table in our Postgres production database which we want to start "sharding" using foreign tables and inheritance.
The desired architecture will be to have 1 (empty) table that defines the schema and several foreign tables inheriting from the empty "parent" table. (possible with Postgres 9.5)
I found this well written article https://www.depesz.com/2015/04/02/waiting-for-9-5-allow-foreign-tables-to-participate-in-inheritance/ that explains everything on how to do it from scratch.
My question is how to reduce the needed migration of data to a minimum.
We have this 100+ GB table now, that should become our first "shard". And in the future we will regulary add new "shards". At some point, the older shards will be moved to another tablespace (on cheaper hardware since they become less important).
My question now:
Is there a way to "ALTER" an existing table to be a foreign table instead?
No way to use alter table to do this.
You really have to basically do it manually. This is no different (really) than doing table partitioning. You create your partitions, you load the data. You direct reads and writes to the partitions.
Now in your case, in terms of doing sharding there are a number of tools I would look at to make this less painful. First, if you make sure your tables are split the way you like them first, you can use a logical replication solution like Bucardo to replicate the writes while you are moving everything over.
There are some other approaches (parallelized readers and writers) that may save you some time at the expense of db load, but those are niche tools.
There is no native solution for shard management of standard PostgreSQL (and I don't know enough about Postgres-XL in this regard to know how well it can manage changing shard criteria). However pretty much anything is possible with a little work and knowledge.
What would be the best way to replicate individual DB tables from a Master postgresql server to a slave machine? It can be done with cron+rsync, or with whatever postgresql might have build in, or some sort of OSS tool, but so far the postgres docs don't seem to cover how to do table replication. I'm not able to do a full DB replication because some tables have license->IP stuff connected to it, and I can't replicate those on the slave machine. I don't need instant replication, hourly would be acceptable as well.
If I need to just rsync, can someone help identify what files within the /var/lib/pgsql directory would need to be synced, or how I would know what tables they are.
Starting with Postgres 10, logical replication is built into Postgres! This is often a better solution than external solutions. The Postgres docs are great and easy to follow. It's very easy. See the quick setup docs, which in essense boils down to running this:
-- On publisher DB
CREATE PUBLICATION mypub FOR TABLE users, departments;
-- On subscriber DB
CREATE SUBSCRIPTION mysub CONNECTION 'dbname=foo host=bar user=repuser' PUBLICATION mypub;
You might want to try Bucardo, which is an open source software to synchronize rows between tables even if they are in a remote location. It's a very simple software, and it is capable of creating one-way synchronization relationships as well.
Check out http://bucardo.org/wiki/Bucardo
You cannot get anything useful by copying individual tables files in the data directory. If you want to replicate selected tables, there are a number of good options.
http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling
Is there any an easy (built-in, add-on, open-source or commercial) to do replication on Postgresql (Master-slave) to have the data inside the slave be scrubbed for PCI compliance while being replicated across? How about ETL tools? It does not have to be instantaneous ... up to an hour lag is acceptable but the faster the better of course.
If this doesn't work, how about possibly using triggers on the slave database to achieve this?
Perhaps you should try creating a view of the tables you wish to scrub (performing your scrubbing in the SELECT), and then replicate the view to your offsite location.
I believe triggers on the slave would put you at risk for non-compliance, since data could leak out. If you want a packaged solution, I'd probably look at Bucardo, looking specifically into doing custom replication hooks into slave, to filter out (or modify) the columns you don't need/want. If that won't work, the idea to use views is probably your next best bet.
Yes. Use slony, add triggers to the master to materialize what you want to replicate and replicate only those materialized views. If you scrub on the master, that should do what you want. Since Slony will happily replicate only part of your database, that should work fine (on the other hand, remember, Slony will happily replicate only part of your database).