Which PostgreSQL replication solution to use for my specific scenario - postgresql

I need to replicate a PostgreSQL database server as follows:
Two servers are adjacent to each-other - one is the master and the other standby. If the master fails, the standby takes over. Replication from master to slave needs to be failsafe, hence, synchronous. The standby will not be used for any querying unless it has become a master. So, no high-availability/load-balancing is required.
There is another backup server at a remote location. Data from the master server mentioned above will be replicated to this remote server asynchronously and in batches. Time is not a factor at all in this replication - a couple of hours is just fine. This server would be used just for backup.
I've studied the currently available replication solutions from the PostgreSQL docs as well as from Google, but can't decide which combination of synchronous-asynchronous solutions would I need.
The closest I came up with is using pgpool-II for scenario 1 and Mammoth for scenario 2. However, as pgpool is statement-based, what would happen to queries containing rand() and now()?
Please note that I'd rather use free and open-source replication tools.
Also, just a side question - according to scenario 1 above, when the master fails, the standby will take over. Would the master-slave role be reversed after that, or would after the recovery of the master server the slave would go back to its standby state?
Any suggestion would be highly appreciated. Thanks.

I suggest using DRBD for scenario 1 and either 9.0 built-in replication or Slony for scenario 2.
Before PostgreSQL 9.1 (not yet released), there is no other synchronous replication solution available, and DRBD is widely established for this purpose. Together with Pacemaker or Heartbeat, which come with all the scripts needed for PostgreSQL monitoring and switchover, you have a very robust and fairly easy to manage solution. (In fact, I'd consider continuing to use DRBD even after 9.1 comes out; it's just a lot easier and has a longer track record.)
For the cross-site asynchronous, you could try the built-in replication of PostgreSQL 9.0, perhaps in conjunction with repmgr for monitoring and management. Alternatively, you could try the (now a bit) old-school Slony, but I'd guess it will more complicated for your needs.

You didn't mention if the server in question was on a specific version or if this was a new project with the freedom to choose the version. The answers vary based on that information.
If you are starting with a clean slate, I would recommend designing based on the PostgreSQL 9.1 beta. The final version will be released long before you would be ready to go into a production environment and it has binary synchronous replication built-in.
I've been using the built-in asynchronous replication in PostgreSQL for years in almost the exact same scenario you describe and it has always been rock-solid for me. It's become even better with 9.0 with Hot standby and it's become much easier to configure and maintain. 9.1 provides the only missing piece you require.
However, if you are trying to replicate an existing server, built-in asynchronous replication with aggressive settings for "checkpoint_timeout" a very frequent backup of unarchived WAL files could be sufficient until you can upgrade to 9.1.
The bottom line here is that you can get exactly what you want is with stock PostgreSQL 9.1--no third-party products required.
As for failover, it is not an automatic process, you'll need to handle that yourself. I would recommend that after a failover, switching the roles of the two machines until either the next failover event or until a controlled manual failover during a scheduled outage during a slow period of use. Again, this is not automatic and much be managed by the administrator (via shell scripts, presumably).

Related

Check postgresql replication

I have created a replicated Postgresql database (Master - Slave). I did this with an already existing Ansible Playbook (Role) , which I don't fully understand yet. The cluster currently consists of only 2 databases on different VMs.
So I want to test this replication now.
Unfortunately I have little experience with Postgresql.
How can I control whether they connect stable?
If the slave really takes over the task if the master should fail?
Many thanks for any information, tips & tricks.
Postgresql v. 9.6
Official PostgreSQL does not yet support automatic failover (Although there are multiple third-party projects which support this feature). Therefore if the deployment you have mentioned is only official PostgreSQL, after master failure, none of replicas take over the write task. But they can answer read queries if they are configured as hot_standby.
If you want to check the state of replication, in master you can check out pg_stat_replication in master.
Also these official docs would help you understand Postgres streaming replication & failover better:
https://www.postgresql.org/docs/9.6/warm-standby.html#STREAMING-REPLICATION
https://www.postgresql.org/docs/9.6/warm-standby-failover.html

PostgreSQL: on-the-fly duplication of the database on the same server

We are using the same PostgreSQL 9.3 Server both for production and development.
So we would like to get the copy of the existing production database for the development purposes. To be precise, all the INSERT/UPDATE/DELETE events that come to production should be also placed into its copy. Reverse replication is not needed. How can we do that?
PS: Please take into account that full master-slave replication is not suitable. We have no opportunity to deploy one more PostgreSQL server at the moment.
UPD: pg_dump/pg_restore is not the case too, because this will crash all the updates done by developers in their database.
Sounds like you are looking for a replication system like Slony or Bucardo
Slony uses triggers to replicate the data so that should work without much hassle here. And Bucardo uses NOTIFY to do pretty much the same.
For just a local setup I would recommend Slony but if you would like to offer the developers a local database (i.e. local machine) I would recommend Bucardo instead as it offers asynchronous replication.

What's the difference between pgpool II replication and postgresql replication?

I'm not exactly a DBA, so I would appreciate easy to understand responses. I have to provide replication to our DB and pgpool seems more convenient because if one postgresql instance fails, the clients are not required to change anything to keep on working, right? So, in this case, makes more sense to use pgpool, but the configuration part seems (to me) a lot more complicated and confusing. For instance, do I need to set up WAL on both postgresql servers? Or this is only needed if I want to set up postgresql replication? The more I try to get an answer to these questions, the less clear it becomes. Maybe I forgot how to google...
The built-in replication, provided by PostgreSQL itself, includes streaming replication, warm standby, and hot standby. These options are based on shipping Write-Ahead Logs (WAL) to all the standby servers. Write statements (e.g., INSERT, UPDATE) will go to the master, and the master will send logs (WALs) to the standby servers (or other masters, in the case of master-master replication).
pgpool, on the other hand, is a type of statement-based replication middleware (like a database proxy). All the statements actually go to pgpool, and pgpool forwards everything to all the servers to be replicated.
One big disadvantage with pgpool is that you have a single point of failure; if the server running pgpool crashes, your whole cluster fails.
The PostgreSQL documentation has some basic info on the various types of replication that are possible: https://www.postgresql.org/docs/current/different-replication-solutions.html

Slony and PGPool for fail-over

We're considering Slony and PGPool as alternatives to handle failover in our application -and it seems like since we're gonna need at least two DB servers, we could take advantage of load balancing too-
Under which circumstances Slony is better than PGPool and viceversa?
This is anecdotal, so take it with a grain of salt.
PGPool and streaming WAL replication (with hot standby or not) works the way database replication ought to. Your application doesn't need to know anything about replication or whether it is part of a cluster or whatnot, it just talks to the database as it would any other. Streaming replication is robust, and has the ability to fail back to WAL shipping if streaming breaks. PGPool makes managing this process easy and gives good heartbeats and monitoring info.
Slony, on the other hand, is an administrative tar-pit, which needs to add trigger functions and numerous other objects to your database to work. Furthermore, Slony doesn't (easily) support the ability to replicate schema changes (DDL) in the same way it replicates ordinary insert/update/delete type operations (DML). Taken as a whole, these characteristics mean that, in many cases, your application needs to have special cases to handle Slony's eccentricities. In my opinion, that's bad: the application shouldn't have to be aware of/make changes to deal with the database replication solution that it runs on. I spent the better part of a year hacking Slony to work as a stable replication solution, and eventually came to the conclusion that it's a bad idea/bad replication mechanic implemented in an obtuse, illegible way, that is anything but stable or enterprise-ready. EDIT: as of PostgreSQL 9.3 you can install triggers (which Slony uses to detect changes) on DDL objects, which might allow Slony to replicate more aspects of a database.
That said, Slony does do two things better than streaming replication (administered via PGPool or no):
Slony allows per-table or per-database replication. Streaming replication only permits the replication of an entire Postgres instance. If that kind of granularity is important to you, you might want Slony.
Slony allows cascading (master -> slave -> slave) replication. Streaming replication only allows one level. EDIT: This is now supported in PostgreSQL native streaming replication, as of Postgres version 9.2.
At literally everything else, streaming replication is better and more stable.
But you're not just asking about streaming replication: PGPool does a great deal more than that. It allows the balancing of read and write loads between read-only slave databases and masters, and the implementation of backup plans, as well as a whole host of other things. Especially when compared to Slony's arcane command syntax and godawful administration scripts, PGPool wins in nearly every instance.
With regards to failover specifically, PGPool has heartbeat monitors and the ability to automatically route database traffic in a cluster. Slony only has a "fail over to slave now" command, leaving all of the monitoring and app-side routing up to you.
TL;DR: PGPool good. Slony bad.

Load Balancing and Failover for Read-Only PostgreSQL Database

Scenario
Multiple application servers host web services written in Java, running in SpringSource dm Server. To implement a new requirement, they will need to query a read-only PostgreSQL database.
Issue
To support redundancy, at least two PostgreSQL instances will be running. Access to PostgreSQL must be load balanced and must auto-fail over to currently running instances if an instance should go down. Auto-discovery of newly running instances is desirable but not required.
Research
I have reviewed the official PostgreSQL documentation on this issue. However, that focuses on the more general case of read/write access to the database. Top google results tend to lead to older newsgroup messages or dead projects such as Sequoia or DB Balancer, as well as one active project PG Pool II
Question
What are your real-world experiences with PG Pool II? What other simple and reliable alternatives are available?
PostgreSQL's wiki also lists clustering solutions, and the page on Replication, Clustering, and Connection Pooling has a table showing which solutions are suitable for load balancing.
I'm looking forward to PostgreSQL 9.0's combination of Hot Standby and Streaming Replication.
Have you looked at SQL Relay?
The standard solution for something like this is to look at Slony, Londiste or Bucardo. They all provide async replication to many slaves, where the slaves are read-only.
You then implement the load-balancing independent of this - on the TCP layer with something like HAProxy. Such a solution will be able to do failover of the read connections (though you'll still loose transaction visibility on a failover, and have to start new transaction on the new slave - but that's fine for most people)
Then all you have left is failover of the master role. There are supported ways of doing it on all these systems. None of them are automatic by default (because automatic failover of a database master role is really dangerous - consider the situation you are in once you've got split brain), but they can be automated easily if the requirement needs this for the master as well.