PostgreSQL - Recovery process triggering after startup - postgresql

I m trying to run PostgreSQL 9.5 on Ubuntu 16.04. Our PostreSQL setup looks as following:
PostgreSQL Primary (9.5 on Ubuntu 16.04, Docker container) --rsync--> Walstore (backup server, Docker Container) --rsync--> PostgreSQL Standby (v9.5 on Ubuntu 16.04, Docker container).
In case of Primary crashes we are going to copy the Standby PostgreSQL data (which is in sync with Primary PostgreSQL data, excluding only recovery.conf file) to Primary PostgreSQL data directory and startup Primary PostgreSQL server again.
After the Primary PostgreSQL server is up and running again, a recovery process is going to be triggered as follow (logs):
postgresql-docker-primary-1 | * Starting PostgreSQL 9.5 database server
postgresql-docker-primary-1 | ...done.
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-1] LOG: database system was shut down in recovery at 2022-11-28 10:11:14 UTC
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-2] LOG: database system was not properly shut down; automatic recovery in progress
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-3] LOG: redo starts at 0/1700DB8
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-4] LOG: invalid record length at 0/3000060
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-5] LOG: redo done at 0/3000028
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-6] LOG: last completed transaction was at log time 2022-11-28 10:04:37.388433+00
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [27-7] LOG: MultiXact member wraparound protections are now enabled
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [26-1] LOG: database system is ready to accept connections
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [31-1] LOG: autovacuum launcher started
postgresql-docker-primary-1 | 2022-11-28 10:14:59 UTC [34-1] [unknown]#[unknown] LOG: incomplete startup packet
Since the Standby PostgreSQL data is in sync with PostgreSQL Primary data, why the Primary startup triggers the automatic recovery? My expectation is that the Primary PostgreSQL database server is not going to perform an automatic recovery. How does PostgreSQL know that a recovery process has to be triggered or not?
In our production case the automatic recovery would take around 3 hours in our case, so that the PostgreSQL server would be unavailable for around 3 hours and we would like to avoid this behaviour.
Is there any documentation where I can find the recovery process triggering of PostgreSQL database server?
Thank you very much for your feedback.
ctn

Related

How can I run the official docker postgres image as a non-root user?

I'm trying to use the official postgres docker files (with the aim of extending them) but if I run them as a non-root local user, they simply refuse to start.
i.e. If I follow the basic instructions from https://hub.docker.com/_/postgres and run:
docker run -e POSTGRES_PASSWORD=mysecretpassword postgres
then I get:
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
waiting for server to start....2023-02-18 09:55:58.427 UTC [48] LOG: starting PostgreSQL 15.2 (Debian 15.2-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-02-18 09:55:58.452 UTC [48] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-02-18 09:55:58.525 UTC [51] LOG: database system was shut down at 2023-02-18 09:55:49 UTC
2023-02-18 09:55:58.550 UTC [48] LOG: database system is ready to accept connections
done
server started
/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
2023-02-18 09:55:58.603 UTC [48] LOG: received fast shutdown request
waiting for server to shut down....2023-02-18 09:55:58.625 UTC [48] LOG: aborting any active transactions
2023-02-18 09:55:58.626 UTC [48] LOG: background worker "logical replication launcher" (PID 54) exited with exit code 1
2023-02-18 09:55:58.626 UTC [49] LOG: shutting down
2023-02-18 09:55:58.650 UTC [49] LOG: checkpoint starting: shutdown immediate
2023-02-18 09:55:58.835 UTC [49] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.047 s, sync=0.023 s, total=0.209 s; sync files=2, longest=0.012 s, average=0.012 s; distance=0 kB, estimate=0 kB
2023-02-18 09:55:58.840 UTC [48] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2023-02-18 09:55:58.967 UTC [1] LOG: starting PostgreSQL 15.2 (Debian 15.2-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-02-18 09:55:58.968 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2023-02-18 09:55:58.969 UTC [1] LOG: could not create IPv6 socket for address "::": Address family not supported by protocol
2023-02-18 09:55:59.017 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-02-18 09:55:59.063 UTC [62] LOG: database system was shut down at 2023-02-18 09:55:58 UTC
2023-02-18 09:55:59.091 UTC [1] LOG: database system is ready to accept connections
and everything is happy.
However, if I following the instructions under "Arbitrary --user Notes"
and run:
docker run -it --rm --user "$(id -u):$(id -g)" -v /etc/passwd:/etc/passwd:ro -e POSTGRES_PASSWORD=mysecretpassword postgres
(or without the it or the rm or with just the user and not the group - makes no difference)
then I get:
chmod: changing permissions of '/var/lib/postgresql/data': Operation not permitted
chmod: changing permissions of '/var/run/postgresql': Operation not permitted
The files belonging to this database system will be owned by user "richard".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... initdb: error: could not change permissions of directory "/var/lib/postgresql/data": Operation not permitted
What am I missing / doing wrong?
Note that Switching Between Root and Non-Root Users in Docker is very out of date and that the answer given as a comment under How to create a postgres container with a non-root user? is simply what I'm trying to do here.
Running unprivileged, the official postgres image needs to run from a fixed UID - matching their image, you can't just re-use your own uid and /etc/passwd.
Try this:
docker run -it --rm --user "999:999" -e POSTGRES_PASSWORD=mysecretpassword postgres

postgresql running in docker compose doesn't create role

I am trying to run postgreSQL via docker-compose and I am getting the issue that user/password is not created when I started the service.
version: "3"
services:
db:
image: postgres:latest
container_name: postgres
#volumes:
#- postgres-data:/var/lib/postgresql/data
ports:
- 5432:5432
environment:
- POSTGRES_PASSWORD=postgrespassword
- POSTGRES_USER=postgres
- POSTGRES_DB=random_db_name
restart: always
I have this block of code in my docker-compose.yml and I run the following command:
docker-compose up -d (this allow me to start the service in background)
and when I check the logs I got:
docker logs -f 0e1731f95396
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
pg_ctl -D /var/lib/postgresql/data -l logfile start
waiting for server to start....2021-04-27 16:20:44.592 UTC [49] LOG: starting PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-04-27 16:20:44.594 UTC [49] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-04-27 16:20:44.603 UTC [50] LOG: database system was shut down at 2021-04-27 16:20:44 UTC
2021-04-27 16:20:44.609 UTC [49] LOG: database system is ready to accept connections
done
server started
CREATE DATABASE
/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
waiting for server to shut down...2021-04-27 16:20:44.889 UTC [49] LOG: received fast shutdown request
.2021-04-27 16:20:44.891 UTC [49] LOG: aborting any active transactions
2021-04-27 16:20:44.892 UTC [49] LOG: background worker "logical replication launcher" (PID 56) exited with exit code 1
2021-04-27 16:20:44.892 UTC [51] LOG: shutting down
2021-04-27 16:20:44.907 UTC [49] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2021-04-27 16:20:45.018 UTC [1] LOG: starting PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-04-27 16:20:45.019 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-04-27 16:20:45.019 UTC [1] LOG: listening on IPv6 address "::", port 5432
2021-04-27 16:20:45.023 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-04-27 16:20:45.029 UTC [77] LOG: database system was shut down at 2021-04-27 16:20:44 UTC
2021-04-27 16:20:45.034 UTC [1] LOG: database system is ready to accept connections
But when I try to connect to this database locally I get the message: "FATAL: role "postgres" does not exist"
Do you have any input about how to solve this problem?
I already made a few attempts after reading a few comments from different places but I got always the same problem.
I was expecting to run locally postgreSQL and setup already a user/password and a Database with that name in the docker-compose

PostgreSQL 9.4.1 Switchover & Switchback without recover_target_timeline=latest

I have tested different scenarios to do switchover and switchback in postgreSQL 9.4.1 Version.
Scenario 1:- PostgreSQL Switchover and Switchback in 9.4.1
Scenario 2:- Is it mandatory parameter recover_target_timeline='latest' in switchover and switchback in PostgreSQL 9.4.1?
Scenario 3:- On this page
To test scenario 3 I have followed below steps to perform.
1) Stop the application connected to primary server.
2) Confirm all application was stopped and all thread was disconnected from primary DB.
#192.x.x.129(Primary)
3) Clean shutdown primary using
pg_ctl -D$PGDATA stop --mf
#DR(192.x.x.128) side check sync status:
postgres=# select pg_last_xlog_receive_location(),pg_last_xlog_replay_location();
-[ RECORD 1 ]-----------------+-----------
pg_last_xlog_receive_location | 4/57000090
pg_last_xlog_replay_location | 4/57000090
4)Stop DR server.DR(192.x.x.128)
pg_ctl -D $PGDATA stop -mf
pg_log:
2019-12-02 13:16:09 IST LOG: received fast shutdown request
2019-12-02 13:16:09 IST LOG: aborting any active transactions
2019-12-02 13:16:09 IST LOG: shutting down
2019-12-02 13:16:09 IST LOG: database system is shut down
#192.x.x.128(DR)
5) Make following changes on DR server.
mv recovery.conf recovery.conf_bkp
6)make changes in 192.x.x.129(Primary):
[postgres#localhost data]$ cat recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=replication password=postgres host=192.x.x.128 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
restore_command = 'cp %p /home/postgres/restore/%f'
trigger_file='/tmp/promote'
7)Start DR as read write mode:
pg_ctl -D $DATA start
pg_log:
2019-12-02 13:20:21 IST LOG: database system was shut down in recovery at 2019-12-02 13:16:09 IST
2019-12-02 13:20:22 IST LOG: database system was not properly shut down; automatic recovery in progress
2019-12-02 13:20:22 IST LOG: consistent recovery state reached at 4/57000090
2019-12-02 13:20:22 IST LOG: invalid record length at 4/57000090
2019-12-02 13:20:22 IST LOG: redo is not required
2019-12-02 13:20:22 IST LOG: database system is ready to accept connections
2019-12-02 13:20:22 IST LOG: autovacuum launcher started
(END)
We can see in above log OLD primary is now DR of Primary(Which was OLD DR) and not showing any error because timeline id same on new primary which is already exit in new DR.
8)Start Primary as read only mode:-
pg_ctl -D$PGDATA start
logs:
2019-12-02 13:24:50 IST LOG: database system was shut down at 2019-12-02 11:14:50 IST
2019-12-02 13:24:51 IST LOG: entering standby mode
cp: cannot stat ‘pg_xlog/RECOVERYHISTORY’: No such file or directory
cp: cannot stat ‘pg_xlog/RECOVERYXLOG’: No such file or directory
2019-12-02 13:24:51 IST LOG: consistent recovery state reached at 4/57000090
2019-12-02 13:24:51 IST LOG: record with zero length at 4/57000090
2019-12-02 13:24:51 IST LOG: database system is ready to accept read only connections
2019-12-02 13:24:51 IST LOG: started streaming WAL from primary at 4/57000000 on timeline 9
2019-12-02 13:24:51 IST LOG: redo starts at 4/57000090
(END)
Question 1:- In This scenario i have perform only switch-over to show you. using this method we can do switch-over and switchback. but using below method Switch-over-switchback is work, then why PostgreSQL Community invented recovery_target_timeline=latest and apply patches see blog: https://www.enterprisedb.com/blog/switchover-switchback-in-postgresql-9-3 from PostgrSQL 9.3...to latest version.
Question 2:- What mean to say in above log cp: cannot stat ‘pg_xlog/RECOVERYHISTORY’: No such file or directory ?
Question 3:- I want to make sure from scenarios 1 and scenario 3 which method/Scenarios is correct way to do switchover and switchback? because scenario 2 is getting error because we must use recover_target_timeline=latest which all community experts know.
Answers:
If you shut down the standby cleanly, then remove recovery.conf and restart it, it will come up, but has to perform crash recovery (database system was not properly shut down).
The proper way to promote a standby to a primary is by using the trigger file or running pg_ctl promote (or, from v12 on, by running the SQL function pg_promote). Then you have no down time and don't need to perform crash recovery.
Promoting the standby will make it pick a new time line, so you need recovery_target_timeline = 'latest' if you want the new standby to follow that time line switch.
That is caused by your restore_command.
The method shown in 1. above is the correct one.

Slow postgresql startup in docker container

We built a debian docker image with postgresql to run one of our service. The database is for internal container use and does not need port mapping. I believe it is installed via apt-get in the Dockerbuild file.
We stop and start this service often, and it is a performance issue that the database is slow to startup. Although empty, takes sightly over 20s to accept connection on the first time we start the docker image. The log is as follow :
2019-04-05 13:05:30.924 UTC [19] LOG: could not bind IPv6 socket: Cannot assign requested address
2019-04-05 13:05:30.924 UTC [19] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2019-04-05 13:05:30.982 UTC [20] LOG: database system was shut down at 2019-04-05 12:57:16 UTC
2019-04-05 13:05:30.992 UTC [20] LOG: MultiXact member wraparound protections are now enabled
2019-04-05 13:05:30.998 UTC [19] LOG: database system is ready to accept connections
2019-04-05 13:05:30.998 UTC [24] LOG: autovacuum launcher started
2019-04-05 13:05:31.394 UTC [26] [unknown]#[unknown] LOG: incomplete startup packet
2019-04-19 13:21:58.974 UTC [37] LOG: could not bind IPv6 socket: Cannot assign requested address
2019-04-19 13:21:58.974 UTC [37] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2019-04-19 13:21:59.025 UTC [38] LOG: database system was interrupted; last known up at 2019-04-05 13:05:34 UTC
2019-04-19 13:21:59.455 UTC [39] [unknown]#[unknown] LOG: incomplete startup packet
2019-04-19 13:21:59.971 UTC [42] postgres#postgres FATAL: the database system is starting up
[...]
2019-04-19 13:22:15.221 UTC [85] root#postgres FATAL: the database system is starting up
2019-04-19 13:22:15.629 UTC [38] LOG: database system was not properly shut down; automatic recovery in progress
2019-04-19 13:22:15.642 UTC [38] LOG: redo starts at 0/14EEBA8
2019-04-19 13:22:15.822 UTC [38] LOG: invalid record length at 0/24462D0: wanted 24, got 0
2019-04-19 13:22:15.822 UTC [38] LOG: redo done at 0/24462A8
2019-04-19 13:22:15.822 UTC [38] LOG: last completed transaction was at log time 2019-04-05 13:05:36.602318+00
2019-04-19 13:22:16.084 UTC [38] LOG: MultiXact member wraparound protections are now enabled
2019-04-19 13:22:16.094 UTC [37] LOG: database system is ready to accept connections
2019-04-19 13:22:16.094 UTC [89] LOG: autovacuum launcher started
2019-04-19 13:22:21.528 UTC [92] root#test LOG: could not receive data from client: Connection reset by peer
2019-04-19 13:22:21.528 UTC [92] root#test LOG: unexpected EOF on client connection with an open transaction
Any suggetion in fixing this startup issue ?
EDIT : Some requested the dockerfile, here is relevant lines
RUN apt-get update \
&& apt-get install -y --force-yes \
postgresql-9.6-pgrouting \
postgresql-9.6-postgis-2.3 \
postgresql-9.6-postgis-2.3-scripts \
[...]
# Download, compile and install GRASS 7.2
[...]
USER postgres
# Create a database 'grass_backend' owned by the "root" role.
RUN /etc/init.d/postgresql start \
&& psql --command "CREATE USER root WITH SUPERUSER [...];" \
&& psql --command "CREATE EXTENSION postgis; CREATE EXTENSION plpython3u;" --dbname [dbname] \
&& psql --command "CREATE EXTENSION postgis_sfcgal;" --dbname [dbname] \
&& psql --command "CREATE EXTENSION postgis; CREATE EXTENSION plpython3u;" --dbname grass_backend
WORKDIR [...]
End of file after workdir, meaning I guess the database isn't properly shut down
Answer I stopped properly postgresql inside the docker install. It now starts 15s faster. Thanks for replying
Considering the line database system was not properly shut down; automatic recovery in progress that would definitely explain slow startup, please don't kill the service, send the stop command and wait for it to close properly.
Please note that the system might kill the process if it takes to long to stop, this will happen in the case of postgresql if there are connections still held to it (probably from your application). If you disconnect all the connections and than stop, postgresql should be able to stop relatively quickly.
Also make sure you stop the postgresql service inside the container before turning it off.
TCP will linger connections for a while, if you are starting and stopping in quick succession without properly stopping the service inside that would explain your error of why the port is unavailable, normally the service can start/stop in very quick succession on my machine if nothing is connected to it.
3 start-stop cycles of postgresql on my machine (I have 2 decently sized databases)
$ time bash -c 'for i in 1 2 3; do /etc/init.d/postgresql-11 restart; done'
* Stopping PostgreSQL 11 (this can take up to 92 seconds) ... [ ok ]
* /run/postgresql: correcting mode
* Starting PostgreSQL 11 ... [ ok ]
* Stopping PostgreSQL 11 (this can take up to 92 seconds) ... [ ok ]
* /run/postgresql: correcting mode
* Starting PostgreSQL 11 ... [ ok ]
* Stopping PostgreSQL 11 (this can take up to 92 seconds) ... [ ok ]
* /run/postgresql: correcting mode
* Starting PostgreSQL 11 ... [ ok ]
real 0m1.188s
user 0m0.260s
sys 0m0.080s

PostgreSQL PITR not working properly

I am trying to restore a PostgreSQL database to a point in time.
When I am using only restore_command in recovery.conf then its working fine.
restore_command = 'cp /var/lib/pgsql/pg_log_archive/%f %p'
When I am using the recovery_target_time parameter, it is not restoring to the target time.
restore_command = 'cp /var/lib/pgsql/pg_log_archive/%f %p'
recovery_target_time='2018-06-05 06:43:00.0'
Below is the log file content:
2018-06-05 07:31:39.166 UTC [22512] LOG: database system was interrupted; last known up at 2018-06-05 06:35:52 UTC
2018-06-05 07:31:39.664 UTC [22512] LOG: starting point-in-time recovery to 2018-06-05 06:43:00+00
2018-06-05 07:31:39.671 UTC [22512] LOG: restored log file "00000005.history" from archive
2018-06-05 07:31:39.769 UTC [22512] LOG: restored log file "00000005000000020000008F" from archive
2018-06-05 07:31:39.816 UTC [22512] LOG: redo starts at 2/8F000028
2018-06-05 07:31:39.817 UTC [22512] LOG: consistent recovery state reached at 2/8F000130
2018-06-05 07:31:39.818 UTC [22510] LOG: database system is ready to accept read only connections
2018-06-05 07:31:39.912 UTC [22512] LOG: restored log file "000000050000000200000090" from archive
2018-06-05 07:31:39.996 UTC [22512] LOG: recovery stopping before abort of transaction 9525, time 2018-06-05 06:45:02.088502+00
2018-06-05 07:31:39.996 UTC [22512] LOG: recovery has paused
I am trying to restore the database instance to 06:43:00. Why is it recovering up to 06:45:02?
EDIT
In first scenario recovery.conf converted into recovery.done but this didn't happen in second scenario
What could be the reason of this?
You forgot to set
recovery_target_action = 'promote'
After point-in-time-recovery, recovery_target_action determines how PostgreSQL will proceed.
The default value is pause which means that PostgreSQL will do nothing and wait for you to tell it how to proceed.
To complete recovery, connect to the database and run
SELECT pg_wal_replay_resume();
It seems that there has been no database activity logged between 06:43:00 and 06:45:02. Observe that the log says recovery stopping before abort of transaction 9525.