Check postgresql replication - postgresql

I have created a replicated Postgresql database (Master - Slave). I did this with an already existing Ansible Playbook (Role) , which I don't fully understand yet. The cluster currently consists of only 2 databases on different VMs.
So I want to test this replication now.
Unfortunately I have little experience with Postgresql.
How can I control whether they connect stable?
If the slave really takes over the task if the master should fail?
Many thanks for any information, tips & tricks.
Postgresql v. 9.6

Official PostgreSQL does not yet support automatic failover (Although there are multiple third-party projects which support this feature). Therefore if the deployment you have mentioned is only official PostgreSQL, after master failure, none of replicas take over the write task. But they can answer read queries if they are configured as hot_standby.
If you want to check the state of replication, in master you can check out pg_stat_replication in master.
Also these official docs would help you understand Postgres streaming replication & failover better:
https://www.postgresql.org/docs/9.6/warm-standby.html#STREAMING-REPLICATION
https://www.postgresql.org/docs/9.6/warm-standby-failover.html

Related

Postgres Patroni and etcd on the Same Machine

Assuming I have 2 postgres servers (1 master and 1 slave) and I'm using Patroni for high availability
1) I intend to have three-machine etcd cluster. Is it OK to use the 2 postgres machines also for etcd + another server, or it is preferable to use machines that are not used by Postgres?
2) What are my options of directing the read request to the slave and the write requests to the master without using pgpool?
Thanks!
yes, it is the best practice to run etcd on the two PostgreSQL machines.
the only safe way to do that is in your application. The application has to be taught to read from one database connection and write to another.
There is no safe way to distinguish a writing query from a non-writing one; consider
SELECT delete_some_rows();
The application also has to be aware that changes will not be visible immediately on the replica.
Streaming replication is of limited use when it comes to scaling...

Clustering PostgreSQL clusters

This will be confusing for some due to poor terminology choices by the PostgreSQL folks, but please bear with me...
We have a need to be able to support multiple PostgreSQL (PG) clusters, and cluster them on multiple servers using, e.g. repmgr. For example, to support both server availability and also PITR for each PG cluster. A single PG cluster per server is too expensive in many cases, so we multi-tenant (small) customers on separate PG clusters, for data separation, recovery, etc., but also want to be able to support HA via replication/fail-over.
The closest analogy for a PG cluster is a SQL Server instance - each can host multiple DB's, has its own port, etc. Like SQL Server, you can run multiple instances (PG clusters) on the same server, and set up replication for each.
Basic repmgr setup is no problem - that seems fairly clear in the single PG cluster model. But, is there any recommended/supported approach to multiple PG clusters using repmgr? I can kind of imagine faking repmgr into thinking each PG cluster is in effect a separate repmgr cluster (with separate repmgr.conf, connection info/port). But, I'm not yet sure that will work.
I'd typically expect to fail-over all PG clusters on the same server - not one at a time.
I recognize this may not be the best idea in all cases, but am mostly exploring what's possible. I have some alternatives, but this is closest to our current single-node model.
To clarify, I need to support many thousands of customers across many server clusters. Ideally, each cluster uses the same repmgr DB (in the main PG cluster, e.g.), and essentially stands alone from the other server clusters.
Thanks...
Answering my own question, but I hope someone eventually posts a better answer, as I otherwise quite like repmgr. In the end, it appears repmgr just isn't suitable for multiple PG clusters (instances), as there is an implied relationship between the repmgr cluster connection strings and the PG cluster (port). Thus, you'd essentially have to create a separate repmgr environment (DB) for every clustered PG cluster/instance, losing a lot of the operational simplicity that repmgr brings to the table.
I will investigate a more generic solution using Corosync/Pacemaker/etc., as at least in that case, the virtual cluster IP handling is built-in to the solution, and doesn't require additional software/resources to pull off.
I'm sure I'm probably over-simplifying things, but it seems like repmgr was tantilizingly close to solving much of the problem, had it allowed the repmgr DB to be fully independent of the PG cluster and allow each repmgr cluster to specify its own connection info, not (only) the connection info for the repmgr DB itself.

How to setup cross region replica of AWS RDS for PostgreSQL

I have a RDS for PostgreSQL setup in ASIA and would like to have a read copy in US.
But unfortunately just found from the official site that only RDS for MySQL has cross-region replica but not for PostgreSQL.
And I saw this page introduced other ways to migrate data in to and out of RDS for PostgreSQL.
If not buy an EC2 to install a PostgreSQL by myself in US, is there any way the synchronize data from ASIA RDS to US RDS?
It all depends on the purpose of your replication. Is it to provide a local data source and avoid network latencies ?
Assuming that your goal is to have cross-region replication, you have a couple of options.
Custom EC2 Instances
You can create your own EC2 instances and install PostgreSQL so you can customize replication behavior.
I've documented configuring master-slave replication with PostgreSQL on my blog: http://thedulinreport.com/2015/01/31/configuring-master-slave-replication-with-postgresql/
Of course, you lose some of the benefits of AWS RDS, namely automated multi-AZ redundancy, etc., and now all of a sudden you have to be responsible for maintaining your configuration. This is far from perfect.
Two-Phase Commit
Alternate option is to build replication into your application. One approach is to use a database driver that can do this, or to do your own two-phase commit. If you are using Java, some ideas are described here: JDBC - Connect Multiple Databases
Use SQS to uncouple database writes
Ok, so this one is the one I would personally prefer. For all of your database writes you should use SQS and have background writer processes that take messages off the queue.
You will need to have a writer in Asia and a writer in the US regions. To publish on SQS across regions you can utilize SNS configuration that publishes messages onto multiple queues: http://docs.aws.amazon.com/sns/latest/dg/SendMessageToSQS.html
Of course, unlike a two phase commit, this approach is subject to bugs and it is possible for your US database to get out of sync. You will need to implement a reconciliation process -- a simple one can be a pg_dump from Asian and pg_restore into US on a weekly basis to re-sync it, for instance. Another approach can do something like a Cassandra read-repair: every 10 reads out of your US database, spin up a background process to run the same query against Asian database and if they return different results you can kick off a process to replay some messages.
This approach is common, actually, and I've seen it used on Wall St.
So, pick your battle: either you create your own EC2 instances and take ownership of configuration and devops (yuck), implement a two-phase commit that guarantees consistency, or relax consistency requirements and use SQS and asynchronous writers.
This is now directly supported by RDS.
Example of creating a cross region replica using the CLI:
aws rds create-db-instance-read-replica \
--db-instance-identifier DBInstanceIdentifier \
--region us-west-2 \
--source-db-instance-identifier arn:aws:rds:us-east-1:123456789012:db:my-postgres-instance

Which PostgreSQL replication solution to use for my specific scenario

I need to replicate a PostgreSQL database server as follows:
Two servers are adjacent to each-other - one is the master and the other standby. If the master fails, the standby takes over. Replication from master to slave needs to be failsafe, hence, synchronous. The standby will not be used for any querying unless it has become a master. So, no high-availability/load-balancing is required.
There is another backup server at a remote location. Data from the master server mentioned above will be replicated to this remote server asynchronously and in batches. Time is not a factor at all in this replication - a couple of hours is just fine. This server would be used just for backup.
I've studied the currently available replication solutions from the PostgreSQL docs as well as from Google, but can't decide which combination of synchronous-asynchronous solutions would I need.
The closest I came up with is using pgpool-II for scenario 1 and Mammoth for scenario 2. However, as pgpool is statement-based, what would happen to queries containing rand() and now()?
Please note that I'd rather use free and open-source replication tools.
Also, just a side question - according to scenario 1 above, when the master fails, the standby will take over. Would the master-slave role be reversed after that, or would after the recovery of the master server the slave would go back to its standby state?
Any suggestion would be highly appreciated. Thanks.
I suggest using DRBD for scenario 1 and either 9.0 built-in replication or Slony for scenario 2.
Before PostgreSQL 9.1 (not yet released), there is no other synchronous replication solution available, and DRBD is widely established for this purpose. Together with Pacemaker or Heartbeat, which come with all the scripts needed for PostgreSQL monitoring and switchover, you have a very robust and fairly easy to manage solution. (In fact, I'd consider continuing to use DRBD even after 9.1 comes out; it's just a lot easier and has a longer track record.)
For the cross-site asynchronous, you could try the built-in replication of PostgreSQL 9.0, perhaps in conjunction with repmgr for monitoring and management. Alternatively, you could try the (now a bit) old-school Slony, but I'd guess it will more complicated for your needs.
You didn't mention if the server in question was on a specific version or if this was a new project with the freedom to choose the version. The answers vary based on that information.
If you are starting with a clean slate, I would recommend designing based on the PostgreSQL 9.1 beta. The final version will be released long before you would be ready to go into a production environment and it has binary synchronous replication built-in.
I've been using the built-in asynchronous replication in PostgreSQL for years in almost the exact same scenario you describe and it has always been rock-solid for me. It's become even better with 9.0 with Hot standby and it's become much easier to configure and maintain. 9.1 provides the only missing piece you require.
However, if you are trying to replicate an existing server, built-in asynchronous replication with aggressive settings for "checkpoint_timeout" a very frequent backup of unarchived WAL files could be sufficient until you can upgrade to 9.1.
The bottom line here is that you can get exactly what you want is with stock PostgreSQL 9.1--no third-party products required.
As for failover, it is not an automatic process, you'll need to handle that yourself. I would recommend that after a failover, switching the roles of the two machines until either the next failover event or until a controlled manual failover during a scheduled outage during a slow period of use. Again, this is not automatic and much be managed by the administrator (via shell scripts, presumably).

Load Balancing and Failover for Read-Only PostgreSQL Database

Scenario
Multiple application servers host web services written in Java, running in SpringSource dm Server. To implement a new requirement, they will need to query a read-only PostgreSQL database.
Issue
To support redundancy, at least two PostgreSQL instances will be running. Access to PostgreSQL must be load balanced and must auto-fail over to currently running instances if an instance should go down. Auto-discovery of newly running instances is desirable but not required.
Research
I have reviewed the official PostgreSQL documentation on this issue. However, that focuses on the more general case of read/write access to the database. Top google results tend to lead to older newsgroup messages or dead projects such as Sequoia or DB Balancer, as well as one active project PG Pool II
Question
What are your real-world experiences with PG Pool II? What other simple and reliable alternatives are available?
PostgreSQL's wiki also lists clustering solutions, and the page on Replication, Clustering, and Connection Pooling has a table showing which solutions are suitable for load balancing.
I'm looking forward to PostgreSQL 9.0's combination of Hot Standby and Streaming Replication.
Have you looked at SQL Relay?
The standard solution for something like this is to look at Slony, Londiste or Bucardo. They all provide async replication to many slaves, where the slaves are read-only.
You then implement the load-balancing independent of this - on the TCP layer with something like HAProxy. Such a solution will be able to do failover of the read connections (though you'll still loose transaction visibility on a failover, and have to start new transaction on the new slave - but that's fine for most people)
Then all you have left is failover of the master role. There are supported ways of doing it on all these systems. None of them are automatic by default (because automatic failover of a database master role is really dangerous - consider the situation you are in once you've got split brain), but they can be automated easily if the requirement needs this for the master as well.