Invalid resource manager ID in primary checkpoint record - postgresql

I've update my Airbyte image from 0.35.2-alpha to 0.35.37-alpha.
[running in kubernetes]
When the system rolled out the db pod wouldn't terminate and I [a terrible mistake] deleted the pod.
When it came back up, I get an error -
PostgreSQL Database directory appears to contain a database; Skipping initialization
2022-02-24 20:19:44.065 UTC [1] LOG: starting PostgreSQL 13.6 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
2022-02-24 20:19:44.065 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-02-24 20:19:44.065 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-02-24 20:19:44.071 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-02-24 20:19:44.079 UTC [21] LOG: database system was shut down at 2022-02-24 20:12:55 UTC
2022-02-24 20:19:44.079 UTC [21] LOG: invalid resource manager ID in primary checkpoint record
2022-02-24 20:19:44.079 UTC [21] PANIC: could not locate a valid checkpoint record
2022-02-24 20:19:44.530 UTC [1] LOG: startup process (PID 21) was terminated by signal 6: Aborted
2022-02-24 20:19:44.530 UTC [1] LOG: aborting startup due to startup process failure
2022-02-24 20:19:44.566 UTC [1] LOG: database system is shut down
Pretty sure the WAL file is corrupted, but I'm not sure how to fix this.

Warning - there is a potential for data loss
This is a test system, so I wasn't concerned with keeping the latest transactions, and had no backup.
First I overrode the container command to keep the container running but not try to start postgres.
...
spec:
containers:
- name: airbyte-db-container
image: airbyte/db
command: ["sh"]
args: ["-c", "while true; do echo $(date -u) >> /tmp/run.log; sleep 5; done"]
...
And spawned a shell on the pod -
kubectl exec -it -n airbyte airbyte-db-xxxx -- sh
Run pg_reset_wal
# dry-run first
pg_resetwal --dry-run /var/lib/postgresql/data/pgdata
Success!
pg_resetwal /var/lib/postgresql/data/pgdata
Write-ahead log reset
Then removed the temp command in the container, and postgres started up correctly!

Related

How can I run the official docker postgres image as a non-root user?

I'm trying to use the official postgres docker files (with the aim of extending them) but if I run them as a non-root local user, they simply refuse to start.
i.e. If I follow the basic instructions from https://hub.docker.com/_/postgres and run:
docker run -e POSTGRES_PASSWORD=mysecretpassword postgres
then I get:
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
waiting for server to start....2023-02-18 09:55:58.427 UTC [48] LOG: starting PostgreSQL 15.2 (Debian 15.2-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-02-18 09:55:58.452 UTC [48] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-02-18 09:55:58.525 UTC [51] LOG: database system was shut down at 2023-02-18 09:55:49 UTC
2023-02-18 09:55:58.550 UTC [48] LOG: database system is ready to accept connections
done
server started
/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
2023-02-18 09:55:58.603 UTC [48] LOG: received fast shutdown request
waiting for server to shut down....2023-02-18 09:55:58.625 UTC [48] LOG: aborting any active transactions
2023-02-18 09:55:58.626 UTC [48] LOG: background worker "logical replication launcher" (PID 54) exited with exit code 1
2023-02-18 09:55:58.626 UTC [49] LOG: shutting down
2023-02-18 09:55:58.650 UTC [49] LOG: checkpoint starting: shutdown immediate
2023-02-18 09:55:58.835 UTC [49] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.047 s, sync=0.023 s, total=0.209 s; sync files=2, longest=0.012 s, average=0.012 s; distance=0 kB, estimate=0 kB
2023-02-18 09:55:58.840 UTC [48] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2023-02-18 09:55:58.967 UTC [1] LOG: starting PostgreSQL 15.2 (Debian 15.2-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-02-18 09:55:58.968 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2023-02-18 09:55:58.969 UTC [1] LOG: could not create IPv6 socket for address "::": Address family not supported by protocol
2023-02-18 09:55:59.017 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-02-18 09:55:59.063 UTC [62] LOG: database system was shut down at 2023-02-18 09:55:58 UTC
2023-02-18 09:55:59.091 UTC [1] LOG: database system is ready to accept connections
and everything is happy.
However, if I following the instructions under "Arbitrary --user Notes"
and run:
docker run -it --rm --user "$(id -u):$(id -g)" -v /etc/passwd:/etc/passwd:ro -e POSTGRES_PASSWORD=mysecretpassword postgres
(or without the it or the rm or with just the user and not the group - makes no difference)
then I get:
chmod: changing permissions of '/var/lib/postgresql/data': Operation not permitted
chmod: changing permissions of '/var/run/postgresql': Operation not permitted
The files belonging to this database system will be owned by user "richard".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... initdb: error: could not change permissions of directory "/var/lib/postgresql/data": Operation not permitted
What am I missing / doing wrong?
Note that Switching Between Root and Non-Root Users in Docker is very out of date and that the answer given as a comment under How to create a postgres container with a non-root user? is simply what I'm trying to do here.
Running unprivileged, the official postgres image needs to run from a fixed UID - matching their image, you can't just re-use your own uid and /etc/passwd.
Try this:
docker run -it --rm --user "999:999" -e POSTGRES_PASSWORD=mysecretpassword postgres

Postgresql - PANIC: could not open file "pg_replslot/slot_name/state": No such file or directory

Is there any way we could stop replication without logging into psql shell.
Disk-full situation lead to some corruption in PG files and keep on restarting.
2023-02-06 08:17:54 UTC [1] LOG: starting PostgreSQL 13.7 (Ubuntu 13.7-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
2023-02-06 08:17:54 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2023-02-06 08:17:54 UTC [1] LOG: listening on IPv6 address "::", port 5432
2023-02-06 08:17:54 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-02-06 08:17:54 UTC [8] LOG: database system was shut down at 2023-02-06 08:17:45 UTC
2023-02-06 08:17:54 UTC [8] PANIC: could not open file "pg_replslot/slot_name/state": No such file or directory
2023-02-06 08:17:55 UTC [1] LOG: startup process (PID 8) was terminated by signal 6: Aborted
2023-02-06 08:17:55 UTC [1] LOG: aborting startup due to startup process failure
2023-02-06 08:17:55 UTC [1] LOG: database system is shut down
Tried removing pg_replslot/slot_name which lead to "password auth failure" and After resetting DB password(via pg_hba.conf) DB is not showing up !
Is there any proper way to recover in this state? /pg/main files and pgdata directories seem to be available except this slot information.
Done below steps:
I'm using PSQL docker container.
disk used for PG got full. Cleaned up some log files and docker system prune was used to remove unused images which freed some space. But lead to this issue.
Multiple times, we have seen similar issue in Dev environments, Disk full leading to some corrupted files (unable to read/ No such file or directory) kind of errors.
Tried removing pg_replslot/slot_name directory and it allowed me to start PSQL container.(previously is was keep on restarting container)
Reset password by using trust in auth column in pg_hbda.conf.
Now \l in psql shell showing only postgres DB and default DB's. Not showing our custom DB.
We have main DB in a separate tablespace and is not showing up in the list.
_ MOST importantly, Standby is also having SAME errors ! Probably someone messed it?

postgresql running in docker compose doesn't create role

I am trying to run postgreSQL via docker-compose and I am getting the issue that user/password is not created when I started the service.
version: "3"
services:
db:
image: postgres:latest
container_name: postgres
#volumes:
#- postgres-data:/var/lib/postgresql/data
ports:
- 5432:5432
environment:
- POSTGRES_PASSWORD=postgrespassword
- POSTGRES_USER=postgres
- POSTGRES_DB=random_db_name
restart: always
I have this block of code in my docker-compose.yml and I run the following command:
docker-compose up -d (this allow me to start the service in background)
and when I check the logs I got:
docker logs -f 0e1731f95396
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
pg_ctl -D /var/lib/postgresql/data -l logfile start
waiting for server to start....2021-04-27 16:20:44.592 UTC [49] LOG: starting PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-04-27 16:20:44.594 UTC [49] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-04-27 16:20:44.603 UTC [50] LOG: database system was shut down at 2021-04-27 16:20:44 UTC
2021-04-27 16:20:44.609 UTC [49] LOG: database system is ready to accept connections
done
server started
CREATE DATABASE
/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
waiting for server to shut down...2021-04-27 16:20:44.889 UTC [49] LOG: received fast shutdown request
.2021-04-27 16:20:44.891 UTC [49] LOG: aborting any active transactions
2021-04-27 16:20:44.892 UTC [49] LOG: background worker "logical replication launcher" (PID 56) exited with exit code 1
2021-04-27 16:20:44.892 UTC [51] LOG: shutting down
2021-04-27 16:20:44.907 UTC [49] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2021-04-27 16:20:45.018 UTC [1] LOG: starting PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-04-27 16:20:45.019 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-04-27 16:20:45.019 UTC [1] LOG: listening on IPv6 address "::", port 5432
2021-04-27 16:20:45.023 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-04-27 16:20:45.029 UTC [77] LOG: database system was shut down at 2021-04-27 16:20:44 UTC
2021-04-27 16:20:45.034 UTC [1] LOG: database system is ready to accept connections
But when I try to connect to this database locally I get the message: "FATAL: role "postgres" does not exist"
Do you have any input about how to solve this problem?
I already made a few attempts after reading a few comments from different places but I got always the same problem.
I was expecting to run locally postgreSQL and setup already a user/password and a Database with that name in the docker-compose

Cannot connect to port 8080 adminer (Postgres+Adminer docker stack)

Problem
Following the postgres docker official page:
https://hub.docker.com/_/postgres
I've created "stack.yml"
and it's contain:
# Use postgres/example user/password credentials
version: '3.1'
services:
db:
image: postgres
restart: always
environment:
POSTGRES_PASSWORD: example
adminer:
image: adminer
restart: always
ports:
- 8080:8080
And then running command:
$ docker stack deploy -c stack.yml postgres
But after it is finished, I cannot open http://localhost:8080
it keep "Waiting for ..."
I followed everything from the docs, and keep retrying but keep failing, any help would be very appreciated?
update:
using docker-compose is working, but I'm still curious why running with docker stack ... won't work
Additional details
Software version:
Pop!_OS 20.04 LTS
Docker version 19.03.12, build 48a66213fe
Here is $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f0fa7f4ce6ef postgres:latest "docker-entrypoint.s…" 23 minutes ago Up 23 minutes 5432/tcp postgres_db.1.pq28sm95br3hhr92gvxpsrgwd
4a8b54019f7d adminer:latest "entrypoint.sh docke…" 23 minutes ago Up 23 minutes 8080/tcp postgres_adminer.1.kya7f232pjc4975ubj9ywa13x
Here is $ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
j7mxwf0rpi7g postgres_adminer replicated 1/1 adminer:latest *:8080->8080/tcp
we8izke0tb34 postgres_db replicated 1/1 postgres:latest
Here is docker logs for postgres:
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
waiting for server to start....2020-09-08 06:21:18.981 UTC [46] LOG: starting PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2020-09-08 06:21:18.982 UTC [46] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-09-08 06:21:18.994 UTC [47] LOG: database system was shut down at 2020-09-08 06:21:18 UTC
2020-09-08 06:21:18.997 UTC [46] LOG: database system is ready to accept connections
done
server started
/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
2020-09-08 06:21:19.074 UTC [46] LOG: received fast shutdown request
waiting for server to shut down....2020-09-08 06:21:19.075 UTC [46] LOG: aborting any active transactions
2020-09-08 06:21:19.076 UTC [46] LOG: background worker "logical replication launcher" (PID 53) exited with exit code 1
2020-09-08 06:21:19.077 UTC [48] LOG: shutting down
2020-09-08 06:21:19.090 UTC [46] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2020-09-08 06:21:19.186 UTC [1] LOG: starting PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2020-09-08 06:21:19.186 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-09-08 06:21:19.186 UTC [1] LOG: listening on IPv6 address "::", port 5432
2020-09-08 06:21:19.188 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-09-08 06:21:19.216 UTC [55] LOG: database system was shut down at 2020-09-08 06:21:19 UTC
2020-09-08 06:21:19.219 UTC [1] LOG: database system is ready to accept connections
Here is docker logs for adminer:
[Tue Sep 8 06:21:01 2020] PHP 7.4.10 Development Server (http://[::]:8080) started
Use an IP address instead of hostname

Slow postgresql startup in docker container

We built a debian docker image with postgresql to run one of our service. The database is for internal container use and does not need port mapping. I believe it is installed via apt-get in the Dockerbuild file.
We stop and start this service often, and it is a performance issue that the database is slow to startup. Although empty, takes sightly over 20s to accept connection on the first time we start the docker image. The log is as follow :
2019-04-05 13:05:30.924 UTC [19] LOG: could not bind IPv6 socket: Cannot assign requested address
2019-04-05 13:05:30.924 UTC [19] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2019-04-05 13:05:30.982 UTC [20] LOG: database system was shut down at 2019-04-05 12:57:16 UTC
2019-04-05 13:05:30.992 UTC [20] LOG: MultiXact member wraparound protections are now enabled
2019-04-05 13:05:30.998 UTC [19] LOG: database system is ready to accept connections
2019-04-05 13:05:30.998 UTC [24] LOG: autovacuum launcher started
2019-04-05 13:05:31.394 UTC [26] [unknown]#[unknown] LOG: incomplete startup packet
2019-04-19 13:21:58.974 UTC [37] LOG: could not bind IPv6 socket: Cannot assign requested address
2019-04-19 13:21:58.974 UTC [37] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2019-04-19 13:21:59.025 UTC [38] LOG: database system was interrupted; last known up at 2019-04-05 13:05:34 UTC
2019-04-19 13:21:59.455 UTC [39] [unknown]#[unknown] LOG: incomplete startup packet
2019-04-19 13:21:59.971 UTC [42] postgres#postgres FATAL: the database system is starting up
[...]
2019-04-19 13:22:15.221 UTC [85] root#postgres FATAL: the database system is starting up
2019-04-19 13:22:15.629 UTC [38] LOG: database system was not properly shut down; automatic recovery in progress
2019-04-19 13:22:15.642 UTC [38] LOG: redo starts at 0/14EEBA8
2019-04-19 13:22:15.822 UTC [38] LOG: invalid record length at 0/24462D0: wanted 24, got 0
2019-04-19 13:22:15.822 UTC [38] LOG: redo done at 0/24462A8
2019-04-19 13:22:15.822 UTC [38] LOG: last completed transaction was at log time 2019-04-05 13:05:36.602318+00
2019-04-19 13:22:16.084 UTC [38] LOG: MultiXact member wraparound protections are now enabled
2019-04-19 13:22:16.094 UTC [37] LOG: database system is ready to accept connections
2019-04-19 13:22:16.094 UTC [89] LOG: autovacuum launcher started
2019-04-19 13:22:21.528 UTC [92] root#test LOG: could not receive data from client: Connection reset by peer
2019-04-19 13:22:21.528 UTC [92] root#test LOG: unexpected EOF on client connection with an open transaction
Any suggetion in fixing this startup issue ?
EDIT : Some requested the dockerfile, here is relevant lines
RUN apt-get update \
&& apt-get install -y --force-yes \
postgresql-9.6-pgrouting \
postgresql-9.6-postgis-2.3 \
postgresql-9.6-postgis-2.3-scripts \
[...]
# Download, compile and install GRASS 7.2
[...]
USER postgres
# Create a database 'grass_backend' owned by the "root" role.
RUN /etc/init.d/postgresql start \
&& psql --command "CREATE USER root WITH SUPERUSER [...];" \
&& psql --command "CREATE EXTENSION postgis; CREATE EXTENSION plpython3u;" --dbname [dbname] \
&& psql --command "CREATE EXTENSION postgis_sfcgal;" --dbname [dbname] \
&& psql --command "CREATE EXTENSION postgis; CREATE EXTENSION plpython3u;" --dbname grass_backend
WORKDIR [...]
End of file after workdir, meaning I guess the database isn't properly shut down
Answer I stopped properly postgresql inside the docker install. It now starts 15s faster. Thanks for replying
Considering the line database system was not properly shut down; automatic recovery in progress that would definitely explain slow startup, please don't kill the service, send the stop command and wait for it to close properly.
Please note that the system might kill the process if it takes to long to stop, this will happen in the case of postgresql if there are connections still held to it (probably from your application). If you disconnect all the connections and than stop, postgresql should be able to stop relatively quickly.
Also make sure you stop the postgresql service inside the container before turning it off.
TCP will linger connections for a while, if you are starting and stopping in quick succession without properly stopping the service inside that would explain your error of why the port is unavailable, normally the service can start/stop in very quick succession on my machine if nothing is connected to it.
3 start-stop cycles of postgresql on my machine (I have 2 decently sized databases)
$ time bash -c 'for i in 1 2 3; do /etc/init.d/postgresql-11 restart; done'
* Stopping PostgreSQL 11 (this can take up to 92 seconds) ... [ ok ]
* /run/postgresql: correcting mode
* Starting PostgreSQL 11 ... [ ok ]
* Stopping PostgreSQL 11 (this can take up to 92 seconds) ... [ ok ]
* /run/postgresql: correcting mode
* Starting PostgreSQL 11 ... [ ok ]
* Stopping PostgreSQL 11 (this can take up to 92 seconds) ... [ ok ]
* /run/postgresql: correcting mode
* Starting PostgreSQL 11 ... [ ok ]
real 0m1.188s
user 0m0.260s
sys 0m0.080s