Does my master server crash using Log-Shipping Synchronous Replication in Postgresql when the replica is down? - postgresql

I'm searching for HA solutions without load balancing in the master-slave model, using postgresql. My favorite solution so far is log shipping synchronous replication. But I have one main concern, and that is, if my slave server becomes unavailable, will my master server continue it's operation? Or will it wait for the acknowledgment of my slave server until it's up again?

If you have only one standby, the master will halt ( by design ).
The master will still serve read-only statements, but all writes will be blocked until the standby comes back.
You can avoid this scenario by providing multiple candidates in synchronous_standby_names.
See SYNCHRONOUS-REPLICATION in the PostgreSQL Docs.

I found another way to prevent the master halt at slave crash. We can use wal_sender_timeout in masters postgresql.conf file to disconnect from the slave if it's been crashed.

Related

Zalando operator- load balance read-write pgbouncer

I have installed Postgres cluster using zalando operator.
Also enabled pgbouncer for replicas and master.
But I would like to combine or load balance replicase and master connections,
So that read requests can be routed to read replicas and write requests can be routed to master.
Can anyone help me out in achieving this.
Thanks in advance.
Tried enabling pgbouncer.
pgbouncer is getting enabled either to master or to slave.
But I need a single point where it can route read requests to slaves and write requests to master.
There is no safe way to distinguish reading and writing statements in PostgreSQL. pgPool tries to do that, but I think any such solution is flaky. You will have to teach your application to direct reads and writes to different data sources.
I don't think Pgbouncer provides any out of the box way to load balance read and write queries. An alternative to that is the use of pgpool as a connection pooler. Pgpool provides a mode known as load_balance_mode which you can turn it on and it will try to load balance queries and send write queries to master and read queries to replica. You can read more about the load_balance_mode here

Attach additional node to postgres primary server as warm standby

I have set up Postgres 11 streaming replication cluster. Standby is a "hot standby". Is it possible to attach the second standby as a warm standby?
I assume that you are talking about WAL file shipping when you are speaking of a “warm standby”.
Sure, there is nothing that keeps you from adding a second standby that ships WAL files rather than directly attaching to the primary, but I don't see the reason for that.
According to this decent documentation of Postgres 11 streaming replication architecture, you can set the sync_state of a 2nd slave instance to be potential. This means that if/when the 1st sync slave fails, the detected failure (through ACK communication) will result in the 2nd slave will move from potential to sync becoming the active replication server. --see Section 11.3 - Managing Multiple Stand-by Servers in that link for more details.

Slow replication recovery due to communication problems

We had lately several times the same problems on Google compute engine environment with PostgreSQL streaming replication and I would like to understand reasons and if I can repair it in some smoother way.
From time to time we see some communication problems in Google's internal network in GCE datacenter and they always trigger replication lags between our PG master and its replicas. All machines are Debian-8 and PostgreSQL 9.5.
When situation happens everything seems to be OK - no errors in PG logs on master or replicas just communication between master and replicas seems to be incredibly slow or repeatedly failing so new WAL logs are transfered to replicas with big delays and therefore replication lag is still growing.
Restart of replication from within PostgreSQl or restart of PostgreSQL on replica does not really help - after several WAL logs copied using scp in recovery command communication is back in previous incredibly slow status. Only restart of the whole instance help. When whole VM is restarted communication is back to normal and recovery even from lag many hours long is done in a few minutes. So main reason for this behavior seems to be on OS level. I tried to check net traffic but without finding anything significant. I also do not see anything relevant in any OS log.
Could restart of some OS service help? So I do not need to restart the whole VM? Thank you very much for any ideas.

Replicate via pglogical on a hot_standby setup

I am running two databases (PostgreSQL 9.5.7) in a master/slave setup. My application is connecting to a pgpool instance which routes to the master database (and slave for read only queries).
Now I am trying to scale out some data to another read-only database instance containing only a few tables.
This works perfectly using pglogical directly on the master database.
However if the master transitions to slave for some reason, pglogical can't replicate any longer because the node is in standby.
Tried following things:
subscribed on the slave since it's less likely to go down, or overheated: Can't replicate on standby node.
subscribed via pgpool server: pgpool doesn't accept replication connections.
subscribed to both servers: pglogical config gets replicated along, so can't give them different node names.
The only thing I can think of now is to write my own tcp proxy which regularly checks for the state of the server to which I can subscribe to.
Is there any other/easier way I can solve this ?
Am I using the wrong tools perhaps ?
Ok so it seems that there are no solutions for this problem just yet.
Since the data in my logically replicated database is not changing fast, there is no harm if the replication stops for a moment.
Actions on failover could be:
Re-subscribe to the promoted master.
or promote standby node back to master after failover.

Automatic failover with PostgreSQL 9.1

PostgreSql 9.1 has master-slave synchronous replication. Suppose the master is machine A and the slave is machine B.
If the master fails, how does PostgreSQL know when to make the slave the master? What if the slave incorrectly thought the master was down because of a temporary network glitch on the master where the client program could still contact the master though.
And moreover, how would my client program know the slave in the new master and more importantly is ready to accept writes. Does the slave send a message to the client?
Check repmgr, it's one of its jobs is to deal with this issue.
Typically you want to use a promotion-management system like repmgr or patroni. Then you want to use some sort of a high availability proxy (could be pgbouncer or haproxy) to handle the actual abstraction so your applications do not need to know what system is master.
In answer to your question, most of these systems use a heartbeat to determine if there is a problem. Patroni goes out over the etcd heartbeat. Repmgr has its own heartbeat check. With Repmgr you need to write hook scripts to take care of stonith, and so forth.