Replicate selected postgresql tables between two servers? - postgresql

What would be the best way to replicate individual DB tables from a Master postgresql server to a slave machine? It can be done with cron+rsync, or with whatever postgresql might have build in, or some sort of OSS tool, but so far the postgres docs don't seem to cover how to do table replication. I'm not able to do a full DB replication because some tables have license->IP stuff connected to it, and I can't replicate those on the slave machine. I don't need instant replication, hourly would be acceptable as well.
If I need to just rsync, can someone help identify what files within the /var/lib/pgsql directory would need to be synced, or how I would know what tables they are.

Starting with Postgres 10, logical replication is built into Postgres! This is often a better solution than external solutions. The Postgres docs are great and easy to follow. It's very easy. See the quick setup docs, which in essense boils down to running this:
-- On publisher DB
CREATE PUBLICATION mypub FOR TABLE users, departments;
-- On subscriber DB
CREATE SUBSCRIPTION mysub CONNECTION 'dbname=foo host=bar user=repuser' PUBLICATION mypub;

You might want to try Bucardo, which is an open source software to synchronize rows between tables even if they are in a remote location. It's a very simple software, and it is capable of creating one-way synchronization relationships as well.
Check out http://bucardo.org/wiki/Bucardo

You cannot get anything useful by copying individual tables files in the data directory. If you want to replicate selected tables, there are a number of good options.
http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling

Related

Can A Postgres Replication Publication And Subscription Exist On The Same Server

I have a request asking for a read only schema replica for a role in postgresql. After reading documentation and better understanding replication in postgresql, I'm trying to identify whether or not I can create the publisher and subscriber within the same database.
Any thoughts on the best approach without having a second server would be greatly appreciated.
You asked two different question. Same database? No. Since Pub/Sub requires tables to have the same name (including schema) on both ends, you would be trying to replicate a table onto itself. Using logical replication plugins other than the built-in one might get around this restriction.
Same server? Yes. You can replicate between two databases of the same instance (but see the note in the docs about some extra hoops you need to jump through) or between two instances on the same host. So whichever of those things you meant by "same server", yes, you can.
But it seems like an odd way to do this. If the access is read only, why does it matter whether it is to a replica of the real data or to the real data itself?

Postgres replication betwenn 2 databases on same server

I need to create a replica of existing database, that would copy any changing operation from master to slave, I.e create a mirror some sort of. I found a lot of examples in web but they all describes process when master and slave are on different servers.
I would like to create a write replica on the same server where master is located , without spinning up second instance of Postgres.
Is it possible to do so and could you point me a direction where I could find a solution how to do it?
Thank you.
P.S. I understand that replication on 2 servers is better, but I just need to do it on one common server.
If you want physical replication, you will need to run two instances of PostgreSQL. If they are on the same server machine, they will need to have different port numbers. The different port numbers is the only complexity, otherwise it is just like running on two different servers.
If you want logical replication, you can do that within a single instance, but you will need to jump through some hoops to create the subscription intra-instance, as described in the "Notes" section
You could consider using a simple trigger to insert/update/delete data on the other database as soon as the main one get modified.
A more "professional" way would be to use synchronous replication.

Is there an alternative to temporary tables I can use on a Hot Standby copy?

as suggested here's a TL;DR bit. I'm looking for an alternative to temporary tables I could use on a Hot Standby copy of a database. Is there anything or do I have to re-write everything and try and do it all in subqueries?
When I joined our company last year, our ERP was hosted locally and although I didn’t have admin access to the Postgres database I at least had read write access to the tables.
I wrote a number of reports (using SQL Command option in Crystal Reports)/SQL scripts that use temporary tables however, we’ve just migrated to a hosted version of the ERP and rather than access the live database we have been given access to a Hot Standby copy, mainly due to load balancing issues.
Unfortunately, the software company didn’t warn us that this would be the case or that it would be read only access. I found this out when I was testing some scripts when obviously anything with a temporary table failed.
I use temporary tables for things like storing dates and bank holiday information, holding temporary calculations and so on.
So I'm looking for an alternative to temporary tables I could use on the Hot Standby copy, or do I have to re-write everything and try and do it all in subqueries?
I’ve looked at using CTE (WITH) but the scope is far too small as I’d need access throughout the script.
Then I thought maybe I could read the data from the Hot Standby but create temporary tables in a different database/schema, but I don’t think that’s viable. If it is I might have to speak to the software house to be given access to another database/schema. postgres_fdw would seem the most likely candidate as you can update the external table but I can't see anywhere about dropping and creating tables.
I’ve only been using Postgres since last July having previously used MSSQL which I could probably have used a table variable, but I can’t find an equivalent for that.
I've tried looking at the Postgres documentation but, to some embarrassment, I do find a lot of documentation hard to follow without relatable examples, so I might well have missed something.
Sorry for the long post!
Thanks

How to create read replicas from multiple postgres databases into a single database?

I'd like to preface this by saying I'm not a DBA, so sorry for any gaps in technical knowledge.
I am working within a microservices architecture, where we have about a dozen or applications, each supported by its Postgres database instance (which is in RDS, if that helps). Each of the microservices' databases contains a few tables. It's safe to assume that there's no naming conflicts across any of the schemas/tables, and that there's no sharding of any data across the databases.
One of the issues we keep running into is wanting to analyze/join data across the databases. Right now, we're relying on a 3rd Party tool that caches our data and makes it possible to query across multiple database sources (via the shared cache).
Is it possible to create read-replicas of the schemas/tables from all of our production databases and have them available to query in a single database?
Are there any other ways to configure Postgres or RDS to make joining across our databases possible?
Is it possible to create read-replicas of the schemas/tables from all of our production databases and have them available to query in a single database?
Yes, that's possible and it's actually quite easy.
Setup one Postgres server that acts as the master.
For each remote server, create a foreign server then you then use to create a foreign table that makes the data accessible from the master server.
If you have multiple tables in multiple server that should be viewed as a single table in the master, you can setup inheritance to make all those tables appear like one. If you can define a "sharding" key that identifies a distinct attribute between those server, you can even make Postgres request the data only from the specific server.
All foreign tables can be joined as if they were local tables. Depending on the kind of query, some (or a lot) of the filter and join criteria can even be pushed down to the remote server to distribute the work.
As the Postgres Foreign Data Wrapper is writeable, you can even update the remote tables from the master server.
If the remote access and joins is too slow, you can create materialized views based on the remote tables to create a local copy of the data. This however means that it's not a real time copy and you have to manage the regular refresh of the tables.
Other (more complicated) options are the BDR project or pglogical. It seems that logical replication will be built into the next Postgres version (to be released a the end of this year).
Or you could use a distributed, shared-nothing system like Postgres-XL (which probably is the most complicated system to setup and maintain)

Postgres Multi-tenant administration/maintenance

We have a SaaS application where each tenant has its own database in Postgres. How would I apply a patch to all the databses? For example if I want to add a table or add a column to a table, I have to either write a program that loops through all databases and execute a SQL against them or using pgadmin, go through them one by one.
Is there smarter and/or faster way?
Any help is greatly appreciated.
Yes, there's a smarter way.
Don't create a new database for each tenant. If everything is in one database then you only need to alter one database.
Pick one database, alter each table to have the column TENANT and add this to the primary key. Then insert into this database every record for all tenants and drop the other databases (obviously considerably more work than this as your application will need to be changed).
The differences with your approach are extensively discussed elsewhere:
What problems will I get creating a database per customer?
What are the advantages of using a single database for EACH client?
Multiple schemas versus enormous tables
Practicality of multiple databases per client vs one database
Multi-tenancy - single database vs multiple database
If you don't put everything in one database then I'm afraid you have to alter them all individually, and doing it programatically would be simplest.
At a higher level, all multi-tenant applications follow one of three approaches:
One tenant's data lives in one database,
One tenant's data lives in one schema, or
Add a tenant_id / account_id column to your tables (shared schema).
I usually find that developers use the following criteria when they evaluate these different approaches.
Isolation: Since you can put each tenant into its own database in one hand, and have tenants share the same table on the other, this becomes the most apparent dimension. If you provide your users raw SQL access or you're in a regulated industry such as healthcare, you may need strict guarantees from your database. That said, PostgreSQL 9.5 comes with row level security policies that makes this less of a concern for most applications.
Extensibility: If your tenants are sharing the same schema (approach #3), and your tenants have fields that varies between them, then you need to think about how to merge these fields.
This article on multi-tenant databases has a great summary of different approaches. For example, you can add a dozen columns, call them C1, C2, and so forth, and have your application infer the actual data in this column based on the tenant_id. PostgresQL 9.4 comes with JSONB support and natively allows you to use semi-structured fields to express variations between different tenants' data.
Scaling: Another criteria is how easily your database would scale-out. If you create a tenant per database or schema (#1 or #2 above), your application can make use of existing Ruby Gems or [Django packages][1] to simplify app integration. That said, you'll need to manually manage your tenants' data and the machines they live on. Similarly, you'll need to build your own sharding logic to propagate foreign key constraints and ALTER TABLE commands.
With approach #3, you can use existing open source scaling solutions, such as Citus. For example, this blog post describes how to easily shard a multi-tenant app with Postgres.
it's time for me to give back to the community :) So after 4 years, our multi-tenant platform is in production and I would like to share the following observations/experiences with all of you.
We used a database per each tenant. This has given us extreme flexibility as the size of the databases in the backups are not huge and hence we can easily import them into our staging environment for customers issues.
We use Liquibase for database development and upgrades. This has been a tremendous help to us, allowing us to package the entire build into a simple war file. All changes are easily versioned and managed very efficiently. There is a bit of learning curve here an there but nothing substantial. 2-5 days can significantly save you time.
Given that we use Spring/JPA/Hibernate, we use a technique called Dynamic Data Source Routing. So when a user logs-in, we find the related datasource with a lookup and connect them to the session to the right database. That's also when the Liquibase scripts get applied for updates.
This is, for now, I will come back with more later on.
Well, there are problems with one database for all tenants in our case for sure.
The backup file gets huge and becomes almost not practical hard to manage
For troubleshooting, we need to restore customer's data in our dev env, we just use that customer's backup file and usually the file is not as big as if we were to use one database for all customers.
Again, Liquibase has been key in allowing to manage updates across all the tenants seamlessly and without any issues. Without Liquibase, I can see lots of complications with this approach. So Liquibase, Liquibase and more Liquibase.
I also suspect that we would need a more powerful hardware to manage a huge database with large joins across millions of records vs much lighter database with much smaller queries.
In case of problems, the service doesn't go down for everyone and there will be limited to one or few tenants.
In general, for our purposes, this has been a great architectural decision and we are benefiting from it every day. One time we had one customer that didn't have their archiving active and their database size grew to over 3 GB. With offshore teams and slower internet as well as storage/bandwidth prices, one can see how things may become complicated very quickly.
Hope this helps someone.
--Rex