Attach additional node to postgres primary server as warm standby - postgresql

I have set up Postgres 11 streaming replication cluster. Standby is a "hot standby". Is it possible to attach the second standby as a warm standby?

I assume that you are talking about WAL file shipping when you are speaking of a “warm standby”.
Sure, there is nothing that keeps you from adding a second standby that ships WAL files rather than directly attaching to the primary, but I don't see the reason for that.

According to this decent documentation of Postgres 11 streaming replication architecture, you can set the sync_state of a 2nd slave instance to be potential. This means that if/when the 1st sync slave fails, the detected failure (through ACK communication) will result in the 2nd slave will move from potential to sync becoming the active replication server. --see Section 11.3 - Managing Multiple Stand-by Servers in that link for more details.

Related

Postgresql doesn't reestablish delayed replication

I'm running master & replica on PG 13.3. I decided to use delayed replication (30 minutes configured in recovery_min_apply_delay parameter). On top of that, WAL archiving is configured and working well.
When load on master is very high for a long time, it happens that replication is falling behind until max_slot_wal_keep_size is exceeded (see my another, related question: Replication lag - exceeding max_slot_wal_keep_size, WAL segments not removed). Once it falls too far behind, the slot is "lost' and replica falls back to restoring WAL from the archive. So far so good. The problem is, it never tries replication again. Restarting slave does not help.
There are two ways how I managed to restore the replication:
Restarts & config edits
Remove the delay config from the replica
Restart postgres. Then it restores all the WAL from archive and once there's nothing left it will start replication again - but without any delay. Then I edit config again to introduce replication and it sometimes works, sometimes doesn't. I think it depends on the load.
Removing a WAL segment from archive
Look at currently restored WAL segments from the postgresql log and temporarily move the following one from the WAL archive. When PG tries to recovery it fails and falls back to replication
This doesn't seem like the right way to do it, does it?
Thanks,
-- Marcin
As far as I can see, this is a non-problem.
If you want replication delayed by 30 minutes, and you archive more than one 16MB WAL segment per half hour, there is no need to replicate. The information can just as well be read from the archive. If the latest entry in the latest archived WAL segment happens to be older than recovery_min_apply_delay, the standby will contact the primary and replicate.
If you insist on replication rather than archive recovery, remove restore_command and max_slot_wal_keep_size from the configuration. But I don't see the point.
If you are concerned about losing the active WAL segment in case of a catastrophe on the primary, use pg_receivewal rather than archive_command to populate the WAL archive.

Streaming replication solution in Postgres

I'm reading the article below how to achieve streaming replication in Postgres DB.
https://www.percona.com/blog/2018/09/07/setting-up-streaming-replication-postgresql
Some things are not quite clear
1) Are both DB instances active OR the slave instance is just a clone of master (o it communicates with master, but not the backend?
2) If DB master node failed, what will happen until second node will get back online? Is this covered by default by just having wal sender and wal receiver processes or something else needs to be added?
3) Which DB_HOST:PORT should be configured in the backend app if for example I have two backend nodes (both of them are active)?
If hot_standby = on in postgresql.conf, clients can connect to the standby, but only read data and not modify them. The standby is an identical physical cooy of the primary, just as if you had copied it file by file.
If the primary fails, the standby will remain up and running, but you still can only read data until somebody promotes the standby. You have to understand that PostgreSQL does not ship with cluster software that allows this to happen automatically. You have to usr some other software like Patroni for that.
That depends on the API your software is using. With libpq (the C API) or JDBC you can have a connection string that contains both servers and will select the primary automatically, but with other clients you may have to use external load balancing software.

Does my master server crash using Log-Shipping Synchronous Replication in Postgresql when the replica is down?

I'm searching for HA solutions without load balancing in the master-slave model, using postgresql. My favorite solution so far is log shipping synchronous replication. But I have one main concern, and that is, if my slave server becomes unavailable, will my master server continue it's operation? Or will it wait for the acknowledgment of my slave server until it's up again?
If you have only one standby, the master will halt ( by design ).
The master will still serve read-only statements, but all writes will be blocked until the standby comes back.
You can avoid this scenario by providing multiple candidates in synchronous_standby_names.
See SYNCHRONOUS-REPLICATION in the PostgreSQL Docs.
I found another way to prevent the master halt at slave crash. We can use wal_sender_timeout in masters postgresql.conf file to disconnect from the slave if it's been crashed.

Replicate via pglogical on a hot_standby setup

I am running two databases (PostgreSQL 9.5.7) in a master/slave setup. My application is connecting to a pgpool instance which routes to the master database (and slave for read only queries).
Now I am trying to scale out some data to another read-only database instance containing only a few tables.
This works perfectly using pglogical directly on the master database.
However if the master transitions to slave for some reason, pglogical can't replicate any longer because the node is in standby.
Tried following things:
subscribed on the slave since it's less likely to go down, or overheated: Can't replicate on standby node.
subscribed via pgpool server: pgpool doesn't accept replication connections.
subscribed to both servers: pglogical config gets replicated along, so can't give them different node names.
The only thing I can think of now is to write my own tcp proxy which regularly checks for the state of the server to which I can subscribe to.
Is there any other/easier way I can solve this ?
Am I using the wrong tools perhaps ?
Ok so it seems that there are no solutions for this problem just yet.
Since the data in my logically replicated database is not changing fast, there is no harm if the replication stops for a moment.
Actions on failover could be:
Re-subscribe to the promoted master.
or promote standby node back to master after failover.

Automatic failover with PostgreSQL 9.1

PostgreSql 9.1 has master-slave synchronous replication. Suppose the master is machine A and the slave is machine B.
If the master fails, how does PostgreSQL know when to make the slave the master? What if the slave incorrectly thought the master was down because of a temporary network glitch on the master where the client program could still contact the master though.
And moreover, how would my client program know the slave in the new master and more importantly is ready to accept writes. Does the slave send a message to the client?
Check repmgr, it's one of its jobs is to deal with this issue.
Typically you want to use a promotion-management system like repmgr or patroni. Then you want to use some sort of a high availability proxy (could be pgbouncer or haproxy) to handle the actual abstraction so your applications do not need to know what system is master.
In answer to your question, most of these systems use a heartbeat to determine if there is a problem. Patroni goes out over the etcd heartbeat. Repmgr has its own heartbeat check. With Repmgr you need to write hook scripts to take care of stonith, and so forth.