PostgreSQL 9.4.1 Switchover & Switchback without recover_target_timeline=latest - postgresql

I have tested different scenarios to do switchover and switchback in postgreSQL 9.4.1 Version.
Scenario 1:- PostgreSQL Switchover and Switchback in 9.4.1
Scenario 2:- Is it mandatory parameter recover_target_timeline='latest' in switchover and switchback in PostgreSQL 9.4.1?
Scenario 3:- On this page
To test scenario 3 I have followed below steps to perform.
1) Stop the application connected to primary server.
2) Confirm all application was stopped and all thread was disconnected from primary DB.
#192.x.x.129(Primary)
3) Clean shutdown primary using
pg_ctl -D$PGDATA stop --mf
#DR(192.x.x.128) side check sync status:
postgres=# select pg_last_xlog_receive_location(),pg_last_xlog_replay_location();
-[ RECORD 1 ]-----------------+-----------
pg_last_xlog_receive_location | 4/57000090
pg_last_xlog_replay_location | 4/57000090
4)Stop DR server.DR(192.x.x.128)
pg_ctl -D $PGDATA stop -mf
pg_log:
2019-12-02 13:16:09 IST LOG: received fast shutdown request
2019-12-02 13:16:09 IST LOG: aborting any active transactions
2019-12-02 13:16:09 IST LOG: shutting down
2019-12-02 13:16:09 IST LOG: database system is shut down
#192.x.x.128(DR)
5) Make following changes on DR server.
mv recovery.conf recovery.conf_bkp
6)make changes in 192.x.x.129(Primary):
[postgres#localhost data]$ cat recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=replication password=postgres host=192.x.x.128 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
restore_command = 'cp %p /home/postgres/restore/%f'
trigger_file='/tmp/promote'
7)Start DR as read write mode:
pg_ctl -D $DATA start
pg_log:
2019-12-02 13:20:21 IST LOG: database system was shut down in recovery at 2019-12-02 13:16:09 IST
2019-12-02 13:20:22 IST LOG: database system was not properly shut down; automatic recovery in progress
2019-12-02 13:20:22 IST LOG: consistent recovery state reached at 4/57000090
2019-12-02 13:20:22 IST LOG: invalid record length at 4/57000090
2019-12-02 13:20:22 IST LOG: redo is not required
2019-12-02 13:20:22 IST LOG: database system is ready to accept connections
2019-12-02 13:20:22 IST LOG: autovacuum launcher started
(END)
We can see in above log OLD primary is now DR of Primary(Which was OLD DR) and not showing any error because timeline id same on new primary which is already exit in new DR.
8)Start Primary as read only mode:-
pg_ctl -D$PGDATA start
logs:
2019-12-02 13:24:50 IST LOG: database system was shut down at 2019-12-02 11:14:50 IST
2019-12-02 13:24:51 IST LOG: entering standby mode
cp: cannot stat ‘pg_xlog/RECOVERYHISTORY’: No such file or directory
cp: cannot stat ‘pg_xlog/RECOVERYXLOG’: No such file or directory
2019-12-02 13:24:51 IST LOG: consistent recovery state reached at 4/57000090
2019-12-02 13:24:51 IST LOG: record with zero length at 4/57000090
2019-12-02 13:24:51 IST LOG: database system is ready to accept read only connections
2019-12-02 13:24:51 IST LOG: started streaming WAL from primary at 4/57000000 on timeline 9
2019-12-02 13:24:51 IST LOG: redo starts at 4/57000090
(END)
Question 1:- In This scenario i have perform only switch-over to show you. using this method we can do switch-over and switchback. but using below method Switch-over-switchback is work, then why PostgreSQL Community invented recovery_target_timeline=latest and apply patches see blog: https://www.enterprisedb.com/blog/switchover-switchback-in-postgresql-9-3 from PostgrSQL 9.3...to latest version.
Question 2:- What mean to say in above log cp: cannot stat ‘pg_xlog/RECOVERYHISTORY’: No such file or directory ?
Question 3:- I want to make sure from scenarios 1 and scenario 3 which method/Scenarios is correct way to do switchover and switchback? because scenario 2 is getting error because we must use recover_target_timeline=latest which all community experts know.

Answers:
If you shut down the standby cleanly, then remove recovery.conf and restart it, it will come up, but has to perform crash recovery (database system was not properly shut down).
The proper way to promote a standby to a primary is by using the trigger file or running pg_ctl promote (or, from v12 on, by running the SQL function pg_promote). Then you have no down time and don't need to perform crash recovery.
Promoting the standby will make it pick a new time line, so you need recovery_target_timeline = 'latest' if you want the new standby to follow that time line switch.
That is caused by your restore_command.
The method shown in 1. above is the correct one.

Related

PostgreSQL - Recovery process triggering after startup

I m trying to run PostgreSQL 9.5 on Ubuntu 16.04. Our PostreSQL setup looks as following:
PostgreSQL Primary (9.5 on Ubuntu 16.04, Docker container) --rsync--> Walstore (backup server, Docker Container) --rsync--> PostgreSQL Standby (v9.5 on Ubuntu 16.04, Docker container).
In case of Primary crashes we are going to copy the Standby PostgreSQL data (which is in sync with Primary PostgreSQL data, excluding only recovery.conf file) to Primary PostgreSQL data directory and startup Primary PostgreSQL server again.
After the Primary PostgreSQL server is up and running again, a recovery process is going to be triggered as follow (logs):
postgresql-docker-primary-1 | * Starting PostgreSQL 9.5 database server
postgresql-docker-primary-1 | ...done.
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-1] LOG: database system was shut down in recovery at 2022-11-28 10:11:14 UTC
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-2] LOG: database system was not properly shut down; automatic recovery in progress
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-3] LOG: redo starts at 0/1700DB8
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-4] LOG: invalid record length at 0/3000060
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-5] LOG: redo done at 0/3000028
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-6] LOG: last completed transaction was at log time 2022-11-28 10:04:37.388433+00
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-7] LOG: MultiXact member wraparound protections are now enabled
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [26-1] LOG: database system is ready to accept connections
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [31-1] LOG: autovacuum launcher started
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [34-1] [unknown]#[unknown] LOG: incomplete startup packet
Since the Standby PostgreSQL data is in sync with PostgreSQL Primary data, why the Primary startup triggers the automatic recovery? My expectation is that the Primary PostgreSQL database server is not going to perform an automatic recovery. How does PostgreSQL know that a recovery process has to be triggered or not?
In our production case the automatic recovery would take around 3 hours in our case, so that the PostgreSQL server would be unavailable for around 3 hours and we would like to avoid this behaviour.
Is there any documentation where I can find the recovery process triggering of PostgreSQL database server?
Thank you very much for your feedback.
ctn

PostgreSQL connection issue after service restart

I have edited my pg_hba file and copied it to server and restarted the services by "sudo service postgresql restart" but after that the server is not connecting.
Showing the below error, Your database returned: "Connection to 138.2xx.1xx.xx:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections."
The Jenkins job and data visualization tools are failing which was working fine previously. What could be the reason.
Getting this in PostgreSQL Log
2019-10-23 07:21:25.829 CEST [11761] LOG: received fast shutdown request
2019-10-23 07:21:25.829 CEST [11761] LOG: aborting any active transactions
2019-10-23 07:21:25.829 CEST [11766] LOG: autovacuum launcher shutting down
2019-10-23 07:21:25.832 CEST [11763] LOG: shutting down
2019-10-23 07:21:25.919 CEST [11761] LOG: database system is shut down
2019-10-23 07:21:27.068 CEST [22633] LOG: database system was shut down at 2019-10-23 07:21:25 CEST
2019-10-23 07:21:27.073 CEST [22633] LOG: MultiXact member wraparound protections are now enabled
2019-10-23 07:21:27.075 CEST [22631] LOG: database system is ready to accept connections
2019-10-23 07:21:27.075 CEST [22637] LOG: autovacuum launcher started
2019-10-23 07:21:27.390 CEST [22639] [unknown]#[unknown] LOG: incomplete startup packet
Below shows no response.
root#Ubuntu-1604-xenial-64-minimal ~ # pg_isready -h localhost -p 5432
localhost:5432 - no response
Below was already added to the postgresql.config file.
listen_addresses = '*'
Do i need to restart the entire server?
Can anyone please help me to resolve this.

Slow postgresql startup in docker container

We built a debian docker image with postgresql to run one of our service. The database is for internal container use and does not need port mapping. I believe it is installed via apt-get in the Dockerbuild file.
We stop and start this service often, and it is a performance issue that the database is slow to startup. Although empty, takes sightly over 20s to accept connection on the first time we start the docker image. The log is as follow :
2019-04-05 13:05:30.924 UTC [19] LOG: could not bind IPv6 socket: Cannot assign requested address
2019-04-05 13:05:30.924 UTC [19] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2019-04-05 13:05:30.982 UTC [20] LOG: database system was shut down at 2019-04-05 12:57:16 UTC
2019-04-05 13:05:30.992 UTC [20] LOG: MultiXact member wraparound protections are now enabled
2019-04-05 13:05:30.998 UTC [19] LOG: database system is ready to accept connections
2019-04-05 13:05:30.998 UTC [24] LOG: autovacuum launcher started
2019-04-05 13:05:31.394 UTC [26] [unknown]#[unknown] LOG: incomplete startup packet
2019-04-19 13:21:58.974 UTC [37] LOG: could not bind IPv6 socket: Cannot assign requested address
2019-04-19 13:21:58.974 UTC [37] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2019-04-19 13:21:59.025 UTC [38] LOG: database system was interrupted; last known up at 2019-04-05 13:05:34 UTC
2019-04-19 13:21:59.455 UTC [39] [unknown]#[unknown] LOG: incomplete startup packet
2019-04-19 13:21:59.971 UTC [42] postgres#postgres FATAL: the database system is starting up
[...]
2019-04-19 13:22:15.221 UTC [85] root#postgres FATAL: the database system is starting up
2019-04-19 13:22:15.629 UTC [38] LOG: database system was not properly shut down; automatic recovery in progress
2019-04-19 13:22:15.642 UTC [38] LOG: redo starts at 0/14EEBA8
2019-04-19 13:22:15.822 UTC [38] LOG: invalid record length at 0/24462D0: wanted 24, got 0
2019-04-19 13:22:15.822 UTC [38] LOG: redo done at 0/24462A8
2019-04-19 13:22:15.822 UTC [38] LOG: last completed transaction was at log time 2019-04-05 13:05:36.602318+00
2019-04-19 13:22:16.084 UTC [38] LOG: MultiXact member wraparound protections are now enabled
2019-04-19 13:22:16.094 UTC [37] LOG: database system is ready to accept connections
2019-04-19 13:22:16.094 UTC [89] LOG: autovacuum launcher started
2019-04-19 13:22:21.528 UTC [92] root#test LOG: could not receive data from client: Connection reset by peer
2019-04-19 13:22:21.528 UTC [92] root#test LOG: unexpected EOF on client connection with an open transaction
Any suggetion in fixing this startup issue ?
EDIT : Some requested the dockerfile, here is relevant lines
RUN apt-get update \
&& apt-get install -y --force-yes \
postgresql-9.6-pgrouting \
postgresql-9.6-postgis-2.3 \
postgresql-9.6-postgis-2.3-scripts \
[...]
# Download, compile and install GRASS 7.2
[...]
USER postgres
# Create a database 'grass_backend' owned by the "root" role.
RUN /etc/init.d/postgresql start \
&& psql --command "CREATE USER root WITH SUPERUSER [...];" \
&& psql --command "CREATE EXTENSION postgis; CREATE EXTENSION plpython3u;" --dbname [dbname] \
&& psql --command "CREATE EXTENSION postgis_sfcgal;" --dbname [dbname] \
&& psql --command "CREATE EXTENSION postgis; CREATE EXTENSION plpython3u;" --dbname grass_backend
WORKDIR [...]
End of file after workdir, meaning I guess the database isn't properly shut down
Answer I stopped properly postgresql inside the docker install. It now starts 15s faster. Thanks for replying
Considering the line database system was not properly shut down; automatic recovery in progress that would definitely explain slow startup, please don't kill the service, send the stop command and wait for it to close properly.
Please note that the system might kill the process if it takes to long to stop, this will happen in the case of postgresql if there are connections still held to it (probably from your application). If you disconnect all the connections and than stop, postgresql should be able to stop relatively quickly.
Also make sure you stop the postgresql service inside the container before turning it off.
TCP will linger connections for a while, if you are starting and stopping in quick succession without properly stopping the service inside that would explain your error of why the port is unavailable, normally the service can start/stop in very quick succession on my machine if nothing is connected to it.
3 start-stop cycles of postgresql on my machine (I have 2 decently sized databases)
$ time bash -c 'for i in 1 2 3; do /etc/init.d/postgresql-11 restart; done'
* Stopping PostgreSQL 11 (this can take up to 92 seconds) ... [ ok ]
* /run/postgresql: correcting mode
* Starting PostgreSQL 11 ... [ ok ]
* Stopping PostgreSQL 11 (this can take up to 92 seconds) ... [ ok ]
* /run/postgresql: correcting mode
* Starting PostgreSQL 11 ... [ ok ]
* Stopping PostgreSQL 11 (this can take up to 92 seconds) ... [ ok ]
* /run/postgresql: correcting mode
* Starting PostgreSQL 11 ... [ ok ]
real 0m1.188s
user 0m0.260s
sys 0m0.080s

PostgreSQL PITR not working properly

I am trying to restore a PostgreSQL database to a point in time.
When I am using only restore_command in recovery.conf then its working fine.
restore_command = 'cp /var/lib/pgsql/pg_log_archive/%f %p'
When I am using the recovery_target_time parameter, it is not restoring to the target time.
restore_command = 'cp /var/lib/pgsql/pg_log_archive/%f %p'
recovery_target_time='2018-06-05 06:43:00.0'
Below is the log file content:
2018-06-05 07:31:39.166 UTC [22512] LOG: database system was interrupted; last known up at 2018-06-05 06:35:52 UTC
2018-06-05 07:31:39.664 UTC [22512] LOG: starting point-in-time recovery to 2018-06-05 06:43:00+00
2018-06-05 07:31:39.671 UTC [22512] LOG: restored log file "00000005.history" from archive
2018-06-05 07:31:39.769 UTC [22512] LOG: restored log file "00000005000000020000008F" from archive
2018-06-05 07:31:39.816 UTC [22512] LOG: redo starts at 2/8F000028
2018-06-05 07:31:39.817 UTC [22512] LOG: consistent recovery state reached at 2/8F000130
2018-06-05 07:31:39.818 UTC [22510] LOG: database system is ready to accept read only connections
2018-06-05 07:31:39.912 UTC [22512] LOG: restored log file "000000050000000200000090" from archive
2018-06-05 07:31:39.996 UTC [22512] LOG: recovery stopping before abort of transaction 9525, time 2018-06-05 06:45:02.088502+00
2018-06-05 07:31:39.996 UTC [22512] LOG: recovery has paused
I am trying to restore the database instance to 06:43:00. Why is it recovering up to 06:45:02?
EDIT
In first scenario recovery.conf converted into recovery.done but this didn't happen in second scenario
What could be the reason of this?
You forgot to set
recovery_target_action = 'promote'
After point-in-time-recovery, recovery_target_action determines how PostgreSQL will proceed.
The default value is pause which means that PostgreSQL will do nothing and wait for you to tell it how to proceed.
To complete recovery, connect to the database and run
SELECT pg_wal_replay_resume();
It seems that there has been no database activity logged between 06:43:00 and 06:45:02. Observe that the log says recovery stopping before abort of transaction 9525.

What is the purpose of `pg_logical` directory inside PostgreSQL data?

I've just stumbled upon this error while testing failover of a PostgreSQL 9.4 cluster I've set up. Here I'm trying to promote a slave to be the new master:
$ repmgr -f /etc/repmgr/repmgr.conf --verbose standby promote
2014-09-22 10:46:37 UTC LOG: database system shutdown was interrupted; last known up at 2014-09-22 10:44:02 UTC
2014-09-22 10:46:37 UTC LOG: database system was not properly shut down; automatic recovery in progress
2014-09-22 10:46:37 UTC LOG: redo starts at 0/18000028
2014-09-22 10:46:37 UTC LOG: consistent recovery state reached at 0/19000600
2014-09-22 10:46:37 UTC LOG: record with zero length at 0/1A000090
2014-09-22 10:46:37 UTC LOG: redo done at 0/1A000028
2014-09-22 10:46:37 UTC LOG: last completed transaction was at log time 2014-09-22 10:36:22.679806+00
2014-09-22 10:46:37 UTC FATAL: could not open directory "pg_logical/snapshots": No such file or directory
2014-09-22 10:46:37 UTC LOG: startup process (PID 2595) exited with exit code 1
2014-09-22 10:46:37 UTC LOG: aborting startup due to startup process failure
pg_logical/snapshots dir in fact exists on master node and it is empty.
UPD: I've just manually created empty directories pg_logical/snapshots and pg_logical/mappings and server has started without complaining. repmgr standby clone seems to omit this dirs while syncing. But the question still remains because I'm just curious what this directory is for, maybe I'm missing something in my setup. Simply Googling it did not yield any meaningful results.
It's for the new logical changeset extraction / logical replication feature in 9.4.
This shouldn't happen, though... it suggests a significant bug somewhere, probably repmgr. I'll wait for details (repmgr version, etc).
Update: Confirmed, it's a repmgr bug. It's fixed in git master already (and was before this report) and will be in the next release. Which had better be soon, given the significance of this issue.