Postgresql 12 streaming replication: restored replica but still not working - postgresql

Primary and salve server started showing a "requested WAL segment has already been removed" error after I've changed the PostgreSQL configs.
So, I decided to restore the backup from the primary server using the following steps:
Shut down the replica server.
Removed PostgreSQL data directory on the replica (/var/lib/postgresql/12/main)
Performed the base backup (sudo -u postgres pg_basebackup -h [PRIMARY_IP] -D /var/lib/postgresql/12/main -U replication -P -v -R) and completed successfully.
Started replica again.
But the error mentioned above still showing.
Configs on primary and slave:
wal_level = 'replica'
archive_mode = on
archive_command = 'cd .'
max_wal_senders = 48
wal_keep_segments = 50
hot_standby = on
When I run pg_basebackup .. command, I got this logs:
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 188/92000148 on timeline 1
pg_basebackup: starting background WAL receiver
I'm curious why it says from 188, not 0.

You need a replication slot to keep the primary from removing WAL that is still needed by the standby.
Create the replication slot with pg_create_logical_replication_slot on the primary.
Use the replication slot with the -S option of pg_basebackup.
Make sure primary_slot_name is set in the standby configuration. pg_basebackup's -R option will do that automatically.

Related

Disable postgres replication but keep slave server up as own individual server

We currently have a postgres server on AWS pulling updates from an internal server. We want to stop the replication, but keep the data on both the internal and the AWS server - the internal server is no longer being updated.
Is it just a case of removing the
host replication replicator x.x.x.x/32 md5
line from the pg_hba.conf file on both master and slave and restarting Postgres?
and then running
pg_ctl -D /target_directory promote
on the old slave server to promote the read only slave to a read / write master.
You should drop replication slot on master db. (physical replication)
SELECT pg_drop_replication_slot('your_slot_name');
Then promote replica server.
/usr/lib/postgresql/12/bin/pg_ctl promote -D /pg_data/12/main/
Edit pg_hab.file, remove replication line and run on master select pg_reload_conf();.

How to configure pgpool with repmgr to make node information sync?

I am trying to configure pgpool with postgresql and repmgr. But after configuring that I found it is not working as expected.
When I switchover standby to primary using repmgr it is working expected but at pgpool primary connection info not changing.
My question is that how can I sync repmgr info with pgpool such as switchover, failover etc.?
In the below screenshot I run pcp_watchdog_info but show server1 is leader and server2,server3 is standby.
But at repmgr after switchover new primary is server2 and server3,server4 is standby.
Why pgpool still showing server1 is a leader? Though as per architecture watchdog only monitor pgpool status.
Is there any relation with repmgr primary with pgpool LEADER?
Need expert opinion, thanks in advance.
If you are making repmgr handle automatic failover, you need to add another step in follow_command parameter in repmgr.conf file as follow:
follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr/15/repmgr.conf --log-to-file --upstream-node-id=%n && /usr/sbin/pcp_attach_node -w -h 172.16.0.100 -U pgpool -p 9898 -n %n'
Where 172.16.0.100 is the VIP of Pgpool-II cluster.
Why?
Because, when the primary node is unavailable and another standby is promoted to be the new primary, repmgr needs to resync other standby nodes in the cluster to the new primary and it will shutdown the PostgreSQL service before doing this step and this would detach this standby node from the Pgpool-II cluster.
So, you need to attach the node to the Pgpool-II cluster in follow_command.

Postgresql 12 Replication Fail

I'm trying to replicate database server which still running/active accepting requests from users such as inserting and updating.
I ran this command mentioned below to start copying my primary server to replication server:
root#replica:~#sudo -u postgres pg_basebackup -h [PRIMARY_IP] -D /var/lib/postgresql/12/main -U replication -P -v
The backup process had no error and finished, but I encounter error as follows when trying to start postgresql server.
root#replica:~#tail /var/log/postgresql/postgresql-12-main.log
2020-10-03 01:15:12.198 UTC [552567] FATAL: database system identifier differs between the primary and standby
2020-10-03 01:15:12.198 UTC [552567] DETAIL: The primary's identifier is 6805716485467355646, the standby's identifier is 6875279138564418280.
2020-10-03 01:15:17.204 UTC [552568] FATAL: database system identifier differs between the primary and standby
2020-10-03 01:15:17.204 UTC [552568] DETAIL: The primary's identifier is 6805716485467355646, the standby's identifier is 6875279138564418280.
2020-10-03 01:15:22.207 UTC [552570] FATAL: database system identifier differs between the primary and standby
2020-10-03 01:15:22.207 UTC [552570] DETAIL: The primary's identifier is 6805716485467355646, the standby's identifier is 6875279138564418280.
2020-10-03 01:15:27.212 UTC [552579] FATAL: database system identifier differs between the primary and standby
2020-10-03 01:15:27.212 UTC [552579] DETAIL: The primary's identifier is 6805716485467355646, the standby's identifier is 6875279138564418280.
2020-10-03 01:15:32.216 UTC [552581] FATAL: database system identifier differs between the primary and standby
2020-10-03 01:15:32.216 UTC [552581] DETAIL: The primary's identifier is 6805716485467355646, the standby's identifier is 6875279138564418280.
Any one have idea how to fix this things without trying pg_basebackup process again? since it took time and bandwidth for me.
Yes, I never did it replication before. I follow #jjanes suggestion, I delete my previous copy and then ran:
sudo -u postgres pg_basebackup -h [PRIMARY_IP] -D /var/lib/postgresql/12/main -U replication -P -v -R -X stream -C -S pgbackup1
And its working.
This answer is in context of Postgres 14.
I faced same error while bringing up a standby in streaming replication mode. In my case, these were the steps to reproduce this error:
Brought up an empty standby server.
All required configs were already in place on the standby to enable streaming replication.
Take back from standby with pg_basebackup utility as:
pg_basebackup -D /backup -F t -P -v -U replicator -X stream -w --no-password
Back up got successfully created. But, note that, in the above command, the -h option was left out, as a result, the back up got taken from the standby instead of the primary.
Stop the stand-by and clear its data directory.
Extract the /backup/base.tar to the empty standby data directory.
Create an empty standby.signal in the standby data directory.
Restart standby.
The standby fails to start with database system identifier differs between the primary and standby.
Execute step 3 with an additional -h option specifying the primary host.
Repeat step 4 to 8.
The standby comes up without error.
Also, it's important to note that the standby data directory gets fully replaced from the back up. The same error may re-appear, if standby data directory is not fully replaced from the backup.

How to bring back the replication system in postgresql if the master ip address has been changed in ubuntu?

Postgresql database replication has two servers one for master and the other for a slave. Due to some reason the master IP address got changed which was being used at several places in the slave server. With the new IP address, after replacing the old ones with the latest one in the slave server the replication is not working as before. Can someone help to resolve this issue?
Following are the steps used in setting up the slave server :
1.add the master IP address in the pg_hba.conf file for the user replication
nano /etc/postgresql/11/main/pg_hba.conf host
replication master-IP/24 md5
2.modify the following lines in the PostgreSQL.conf file of slave server where listen_addresses should be the IP of the slave server
nano /etc/postgresql/11/main/postgresql.conf
listen_addresses = 'localhost,slave-IP'
wal_level = replica
max_wal_senders = 10
wal_keep_segments = 64
3. Take the backup of the master server by entering the IP
pg_basebackup -h master-ip -D /var/lib/postgresql/11/main/ -P -U
replication --wal-method=fetch
4.create a recovery file and adding the following commands
standby_mode = 'on'
primary_conninfo = 'host=master-ip port=5432 user=replication password= '
trigger_file = '/tmp/MasterNow'
Below is the error from the log file:
started streaming WAL from primary at A/B3000000 on timeline 2
FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000020000000A000000B3 has already been removed
FATAL: could not connect to the primary server: could not connect to server: Connection timed out
Is the server running on host "master ip" and accepting
TCP/IP connections on port 5432?
record with incorrect prev-link 33018C00/0 at 0/D8E15D18
The standby server was down long enough that the primary server does not have the required transaction log information any more.
There are three remedies:
set the restore_command parameter in the standby server's recovery configuration to restore WAL segments from the archive (that should be the inverse of archive_command on your primary server). Then restart the standby.
This is the only option that allows you to recover without rebuilding the standby server from scratch.
Set wal_keep_segments on the primary server high enough that it retains enough WAL to cover the outage.
This won't help you recover now, but it will avoid the problem in the future.
Define a physical replication slot on the primary and put its name in the primary_slot_name parameter in the standby server's recovery configuration.
This won't help you recover now, but it will avoid the problem in the future.
Note: When using replication slots, monitor the replication. Otherwise a standby that is down will lead to WAL segments piling up on the primary, eventually filling up the disk.
All but the first options require that you rebuild your standby with pg_basebackup, because the required WAL information is no longer available.
host replication master-IP/24 md5
This line is missing a field. The USER field.
listen_addresses = 'localhost,slave-IP'
It is rarely necessary for this to be anything other than '*'. If you don't try to micromanage it, that is one less thing you will need to change. Also, changing wal_keep_segments on the replica doesn't do much unless you are using cascading replication. It needs to be changed on the master.
pg_basebackup -h master-ip -D /var/lib/postgresql/11/main/ -P -U
replication --wal-method=fetch
Did this indicate that it succeeded?
FATAL: could not receive data from WAL stream: ERROR: requested WAL
segment 000000020000000A000000B3 has already been removed
FATAL: could not connect to the primary server: could not connect to
server: Connection timed out
Is the server running on host "master ip" and accepting
TCP/IP connections on port 5432?
This is strange. In order to be informed that the file "has already been removed", it necessarily had to have connected. But the next line says it can't connect. It is not unusual to have a misconfiguration that prevents you from connecting, but in that case it wouldn't have been able to connect the first time. Did you change configuration between these two log messages? Is your network connection flaky?

the slave postgresql is not working

I have set up the master(192.168.1.10) and slave(192.168.1.11) postgresql. I got the error when login to the slave postgresql:
postgres#sonia-System-Product-Name:~$ psql
psql: FATAL: the database system is starting up
Transferring data by using:
psql -c "select pg_start_backup('initial_backup');"
rsync -cva --inplace --exclude=*pg_xlog* /data/dbs 192.168.1.11:/var/lib/postgresql/9.3/main/
psql -c "select pg_stop_backup();"
the postgresql.conf is
listen_addresses = 'localhost,192.168.1.10'
wal_level = 'hot_standby'
archive_mode = on
archive_command = 'cd .'
max_wal_senders = 3
hot_standby = on
I do not know why I can not login to slave postgresql. There is not problems with SSH connection.I have tried to restart service, but it did not work. please someone can help me out.
The problem sloved and share the solution. Thanks.
master server= 192.168.1.10
slave server = 192.168.1.20
Setup Master and slave postgresql
configuring Master server
ssh-keygen
ssh-copy-id the slave-ip-address
vi /etc/postgresql/9.3/main/postgresql.conf
listen_addresses = '*'
wal_level = hot_standby
max_wal_senders = 3
create replication role:
CREATE USER repuser WITH REPLICATION PASSWORD 'password';
configure the ip address on the master in order to the slave can have access to the master :
vi /etc/postgresql/9.3/main/pg_hba.conf
#rep
host replication repuser 192.168.1.20/24 md5
hostssl replication repuser 192.168.1.20/24 md5
configure the ip address on the slave server:
vi /etc/postgresql/9.3/main/pg_hba.conf
host replication repuser 192.168.1.10/24 md5
stop posgresql service on master server ( it is very import to do this step, otherwise, you will get log error on the slave sever when run psql.
/etc/init.d/postgresql stop
configuring the slave server
vi /etc/postgresql/9.3/main/postgresql.conf
listen_addresses = '*'
hot_standby = on
stop service and clean up data on Slave server
#/etc/init.d/postgresql stop
#cd /var/lib/postgresql/9.3/main/
#rm -rf *
Create recovery file
vi /var/lib/postgresql/9.3/main/recovery.conf
primary_conninfo = 'host=192.168.1.10 port=5432 user=repuser password=password'
standby_mode = on
copy data from the master to the slave database
rsync -av /var/lib/postgresql/9.3/main/* 192.168.1.20:/var/lib/postgresql/9.3/main/
start service on both slave and master
slave:~# /etc/init.d/postgresql start
master:~# /etc/init.d/postgresql start
verify the state of replication and run the following the command on the master sever
psql -x -c "select * from pg_stat_replication;"