Postgres: How do I safely remove a replica?

Postgres: How do I safely remove a replica? - postgresql

Do I need to do anything on the primary if I permanently remove its only replica? I'm concerned about WAL files filling up the disk.
I want to remove the only replica from a single-node replication setup:
P -> R
I want to remove R.

Your concerns are absolutely correct. Replica creates replication slot on primary server, where restart_lsn is stored. According to the docs, restart_lsn is:
The address (LSN) of oldest WAL which still might be required by the consumer of this slot and thus won't be automatically removed during checkpoints.
If replica does not advance LSN in this replication slot, primary will keep all WAL segments starting from this position and ignoring max_wal_size limit.
If you want to remove replica and enable WAL rotation, then you have to drop replication slot as well:
postgres=# SELECT * FROM pg_replication_slots;
postgres=# SELECT pg_drop_replication_slot('replication_slot_name');
There is a patch on Postgres Commitfest, which introduces new GUC to limit a volume of WALs held by replication slot. However, the patch is long-living and still not yet committed.

Related

Alternative to a PostgreSQL cluster restore by promoting the secondary offline replica

We have an old PostgreSQL cluster version 10.6 running on REH 7
We use repmgr version 5.0.0
We have one primary replica and a secondary replica and we use repmgr for managing manula swithcover, promotions, etc.
We do not use repmgr daemon.
The primary uses replication slots for barman and for the secondary replica.
We have to do an update of the database content which will last several hours and we want to be able to restore the database back to the status before the update - in case the update fails.
One option is to restore a backup from barman. This would take a while.
We want to use another approach:
Before starting with the update on the primary, we stop the secondary replica and leave it down until the database content update is completed.
In case the database content update fails we stop the primary replica
We promote the secondary replica by running: “repmgr standby promote”
This will allow us to have the new primary replica as the former primary replica was before the database update started.
Then we delete the former primary and we build the secondary replica with repmgr standby clone
Any issue with this approach ?
Should I add more steps (like stopping the replication slot before stopping the secondary replica)
Many thanks Mari

Does pg_archivecleanup command effect on replication slot?

As per replication slot definition, it is a feature in PostgreSQL that ensure that the master server will retain the WAL logs that are needed by the replicas even when they are disconnected from server.
Is there any effect if I run pg_archivecleanup command every 15th day of month to free my storage. Does it has any effect on replication slot, since it is tracing WAL file which is required by standby server?
Because I run pg_archivecleanup removing WAL file from last checkpoint but I am not sure whether it is removing WAl file that is required for other replica.
If not removing then how it is actually tracing it?
I am looking for explanation from experts.

When you run pg_archivecleanup, PostgreSQL will delete all WAL segments that are older than the WAL segment you specify as argument. This will ignore replication slots, you you may end up removing WAL segments that may still be needed by standby servers to catch up (if they do that using restore_command).
Note that this normally not a problem, because pg_archivecleanup deletes WAL segments from the archive, while replication slots deal with WAL segments on the primary server (in the pg_wal directory), which are not affected by pg_archivecleanup. Now since the standby consumes WAL directly from the primary (as specified in primary_conninfo), it does not have to rely on the WAL archives.

Postgres and multiple locations of data storage

Postgres and the default location for its storage is at my C-drive. I would like to restore a backup to another database but to access it via the same Postgres server instance - the issue is that the size of the DB is too big to be restore on the same c-drive ...would it be possible to tell Postgres that the second database should be restore and placed on another location/drive (while still remaining the first one)? Like database1 at my C-drive and database2 at my D-drive?
Otherwise the second best solution would be to install 2 separate Postgres instances - but that also seems a bit overkill?

That should be entirely achievable, if you've used the postgres pg_dump command.
The pg_dump command does not create the database, so you create it yourself first. Use CREATE TABLESPACE to specify the location.
CREATE TABLESPACE secondspace LOCATION 'D:\postgresdata';
CREATE DATABASE seconddb TABLESPACE secondspace;
This creates an empty database on the D: drive.
Then the standard restore from a pg_dump should work:
psql seconddb < dumpfile

Replication
Sounds like you need database replication.
There are several ways to do this with Postgres, one built-in, and other approaches using add-on libraries.
Built-in replication feature
The built-in replication feature is likely to suit your needs. See the manual. In this approach, you have an instance of Postgres running on your primary server, doing reads and writes of your data. On a second server, an entirely separate computer, you run another instance of Postgres known as the replica. You first set up the replica by doing a full backup of your database on the first server, and restore to the second server.
Next you configure the replication feature. The replica needs to know it is playing the role of a replica rather than a regular database server. And the primary server needs to know the replica exists, so that every database change, every insert, modification, and deletion, can be communicated.
WAL
This communication happens via WAL files.
The Write-Ahead Log (WAL) feature in Postgres is where the database writes all changes first to the WAL, and only after that is complete, then writes to the actual database. In case of crash, power outage, or other failure, the database upon restarting can detect a transaction left incomplete. If incomplete, the transaction is rolled back, and the database server can try again by seeing the "To-Do" list of work listed in the WAL.
Every so often the current WAL is closed, with a new WAL file created to take over the work. With replication enabled, the closed WAL file is copied to the replica. The replica then incorporates that WAL file, to follow the same "To-Do" list of changes as written in that WAL file. So all changes are made to the replica database exactly as they were made to the primary database. Your replica is an exact match to the primary, except for a slight lag in time. The replica is always just one WAL file behind the progress of the primary.
In times of trouble, the replica serves as a warm stand-by. You can shutdown the primary, then tell the replica that it is now the primary. You can even configure the replica to be a hot stand-by, meaning it will automatically take-over when the primary seems to have failed. There are pros and cons to hot stand-by.
Offload read-only queries
As a bonus feature, the replica can be used for read-only queries. If your database is heavily used, you can offload some of the work burden from your primary to the replica. Any queries that do not require the absolute latest information can be shifted by connecting to the replica rather than the original. For example, a quarterly sales report likely does not need the latest data stored in the active WAL file that has not yet arrived on the replica.
Physical replication means all databases are copied
Caveat: This built-in replication feature is physical replication. This means all the changes to the entire Postgres installation (formally known as a cluster, not to be confused with a hardware cluster) is copied to the replica. If you use one Postgres server to server multiple databases, all those databases must be replicated – you cannot pick and choose which get copied over. There may be alternative replication features in the future related to logical replication.
More to learn
I am being brief here. The topics of replication, high-availability, and disaster-recovery are broad and complex, too much for an Answer on Stack Overflow.
Tip: This kind of Question might have been better asked on the sister site, DBA.StackExchange.com.

Postgres Logical Replication disaster recovery

We are looking to use Postgres Logical Replication to move changes from an upstream server ("source" server) to a downstream server ("sink" server).
We run into issues when we simulate a disaster recovery scenario. In order to simulate this, we delete the source database while the replication is still active. We then bring up a new source database and try to: a) move data from the sink into the source, and b) set up replication. At this stage we get one of two errors, depending on when we set up the replication (before or after moving the data).
The errors we get after testing the above are one of the below:
Replication slot already in use, difficulty in re-enabling slot without deletion
LOG: logical replication apply worker for subscription "test_sub" has started
ERROR: could not start WAL streaming: ERROR: replication slot "test_sub" does not exist
LOG: worker process: logical replication worker for subscription 16467 (PID 205) exited with exit code 1
Tried amending using:
ALTER SUBSCRIPTION "test_sub" disable;
ALTER SUBSCRIPTION "test_sub" SET (slot_name = NONE);
DROP SUBSCRIPTION "test_sub";
Cannot create subscription due to PK conflicts
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (id)=(701) already exists.
CONTEXT: COPY test, line 1
Some possible resolutions:
Have the Logical Replication set up after a given WAL record number. This might avoid the PK issues we are facing
Find a way to recreate the replication slot on the source database
Backup the Postgres server, including the replication slot, and re-import
Is this a well-catered for use case for Postgres Logical Replication? This is a typical disaster recovery scenario, so would like to know how best to implement. Thanks!

Postgres Streaming Replication Error: requested WAL segment has already been removed

I have setup streaming replication between a primary and secondary server. I have enabled archiving. In the Postgres log file I am seeing the below error.
< 2017-12-05 03:08:45.374 UTC > WARNING: archive_mode enabled, yet archive_command is not set
< 2017-12-05 03:08:46.668 UTC > ERROR: requested WAL segment 0000000100000000000000E3 has already been removed
< 2017-12-05 03:08:51.675 UTC > ERROR: requested WAL segment 0000000100000000000000E3 has already been removed
< 2017-12-05 03:08:56.682 UTC > ERROR: requested WAL segment 0000000100000000000000E3 has already been removed
Do we need to enable archive_mode = on for streaming replication? How can I avoid above error?
max_wal_senders = 3
wal_keep_segements = 32

https://www.postgresql.org/docs/current/static/warm-standby.html
If you use streaming replication without file-based continuous
archiving, the server might recycle old WAL segments before the
standby has received them. If this occurs, the standby will need to be
reinitialized from a new base backup. You can avoid this by setting
wal_keep_segments to a value large enough to ensure that WAL segments
are not recycled too early, or by configuring a replication slot for
the standby. If you set up a WAL archive that's accessible from the
standby, these solutions are not required, since the standby can
always use the archive to catch up provided it retains enough
segments.
emphasis mine.
so either increase wal_keep_segments to big enough (enough for your amount of block changes), or configure archive_command and set up some storage to keep removed wals from master to be available for slave. Or configuring a replication slot for the standby...

In my case I had to do reinit the replica in maintenance mode using below commands and it fixed the issue. This error was due to lag between leader and replica.
patronictl list
patronictl pause
patronictl reinit patroni
choose Replica pod
patronictl resume