Postgresql 10 pg_basebackup not backing up pg_control file - postgresql

I'm new to administrating postgresql 10 database so please bear with me (I'm from Oracle background). I was hoping a kind soul could point me to the right direction with our current problem.
We use pg_base to backup our current production system and for some reason since a couple of days ago pg_base has stopped backing up the database's pg_control file as well as postgres.conf which ofcourse renders the backup useless for restoration.
Below is the command that we use to initiate the backup
/usr/bin/docker exec -i --user postgres asf-db1 pg_basebackup --wal-method=fetch -D /var/lib/postgresql/data/pgdata/master_backup/latest -P -v --format=tar
The logs does not indicate anything suspicious and the issue only became apparent when I did a test restore.
Below is an extract of the backup logs
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 6/BF000178 on timeline 1
0/1519047 kB (0%), 0/1 tablespace (...ta/master_backup/latest/base.tar)
397408/1519047 kB (26%), 0/1 tablespace (...ta/master_backup/latest/base.tar
)^M1019400/1519047 kB (67%), 0/1 tablespace (...ta/master_backup/latest/base.
tar)^M1565927/1565927 kB (100%), 0/1 tablespace (...ta/master_backup/latest/b
t/base.tar)^M2164615/2164615 kB (100%), 0/1 tablespace (...ta/master_backup/l
p/latest/base.tar)^M2751015/2751015 kB (100%), 0/1 tablespace (...ta/master_b
r_backup/latest/base.tar)^M3061585/3061585 kB (100%), 0/1 tablespace (...ta/m
pg_basebackup: write-ahead log end point: 6/C0000050
pg_basebackup: base backup completed
Backup completed at: Wed 14 Nov 18:00:32 GMT 2018

Related

Recover Postgresql pgBarman

I've setup a postgresql DB and I want to backup it.
I've 1 server with my main DB et 1 with Barman.
All the setup is working, I can backup my DB with barman.
I just don't understand how I can recover my DB on a exact time point between the backups that I do everyday.
barman#ubuntu:~$ barman check main-db-server
WARNING: No backup strategy set for server 'main-db-server' (using default 'exclusive_backup').
WARNING: The default backup strategy will change to 'concurrent_backup' in the future. Explicitly set 'backup_options' to silence this warning.
Server main-db-server:
PostgreSQL: OK
is_superuser: OK
wal_level: OK
directories: OK
retention policy settings: OK
backup maximum age: OK (interval provided: 1 day, latest backup age: 9 minutes, 59 seconds)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 6 backups, expected at least 0)
ssh: OK (PostgreSQL server)
not in recovery: OK
systemid coherence: OK (no system Id available)
archive_mode: OK
archive_command: OK
continuous archiving: OK
archiver errors: OK
And when I backup my DB
barman#ubuntu:~$ barman backup main-db-server
WARNING: No backup strategy set for server 'main-db-server' (using default 'exclusive_backup').
WARNING: The default backup strategy will change to 'concurrent_backup' in the future. Explicitly set 'backup_options' to silence this warning.
Starting backup using rsync-exclusive method for server main-db-server in /var/lib/barman/main-db-server/base/20210427T150505
Backup start at LSN: 0/1C000028 (00000005000000000000001C, 00000028)
Starting backup copy via rsync/SSH for 20210427T150505
Copy done (time: 2 seconds)
Asking PostgreSQL server to finalize the backup.
Backup size: 74.0 MiB. Actual size on disk: 34.9 KiB (-99.95% deduplication ratio).
Backup end at LSN: 0/1C0000C0 (00000005000000000000001C, 000000C0)
Backup completed (start time: 2021-04-27 15:05:05.289717, elapsed time: 11 seconds)
Processing xlog segments from file archival for main-db-server
00000005000000000000001B
00000005000000000000001C
00000005000000000000001C.00000028.backup
I don't know how to restore my DB on a time between 2 backups :/
Thanks

Dockerized Postgresql cannot access postgresql.conf on custom image

I am in the process of experimenting/tinkering/learning/breaking with Docker. I am currently writing Docker code to create a snapshotted testing environment for my application.
By snapshotted I mean that my database is reset on purpose on every restart, so that I can work with old data at a certain time. What is peculiar in my case is that I want to populate a Postgresql database at build time, not at start time. Postgresql image is ready for populating the db with sql scripts at container start, but it takes hours.
My application is made by a Tomcat 8.5 server running my WAR and a Postgresql database, which is the focus of my question now. I am creating a Gist while I write for full code.
The code I have done
Full code on Gist
I have followed a tutorial on how to build a Docker image of Postgres with a full database, rather than have Postgres populate itself on boot. This because I have a million record database and only a .sql.gz dump that sysop gave me.
So the relevant parts of the Dockerfile are
WORKDIR /opt/setup/
COPY db-setup.sh /opt/setup/
COPY db-pack.sh /opt/setup/
COPY db-run.sh /opt/setup/
RUN ./db-setup.sh
RUN ./db-pack.sh
#VOLUME $PGDATA (Note it is commented out, now)
EXPOSE 5432
The db-setup.sh is run on image build, and picks files from data-scripts.d. Of course I am not allowed to share the contents of the dump, but it's a plain .sql.gz with plenties of OIDs that take a huge amount of time to restore. The db-setup.sh shown in Gist is derived from both the tutorial and the original Postgres image so that it handles correctly the compression (the tutorial only uses plain SQL)
Build succeeds, startup fails
When I build the image, it takes considerable amount of time to load the data, which is what I want
2019-08-07 07:57:04.149 UTC [49] LOG: database system was shut down at 2019-08-07 07:57:03 UTC
2019-08-07 07:57:04.231 UTC [48] LOG: database system is ready to accept connections
done
server started
./db-setup.sh: running methodinv_pcp3.sql.gz
2019-08-07 08:49:52.052 UTC [117] ERROR: canceling autovacuum task
2019-08-07 08:49:52.052 UTC [117] CONTEXT: automatic analyze of table "postgres.public.ftt_interactive_data_492"
2019-08-07 08:49:59.086 UTC [118] ERROR: canceling autovacuum task
2019-08-07 08:49:59.086 UTC [118] CONTEXT: automatic analyze of table "postgres.public.ftt_oper_492"
2019-08-07 08:50:34.086 UTC [118] ERROR: canceling autovacuum task
2019-08-07 08:50:34.086 UTC [118] CONTEXT: automatic analyze of table "postgres.public.ftt_validation_492"
2019-08-07 08:51:11.889 UTC [119] ERROR: canceling autovacuum task
2019-08-07 08:51:11.889 UTC [119] CONTEXT: automatic analyze of table "postgres.public.ftt_oper_492"
2019-08-07 08:54:21.131 UTC [123] ERROR: canceling autovacuum task
2019-08-07 08:54:21.131 UTC [123] CONTEXT: automatic analyze of table "postgres.public.ftt_oper_492"
waiting for server to shut down...2019-08-07 08:54:28.652 UTC [48] LOG: received fast shutdown request
.2019-08-07 08:54:28.797 UTC [48] LOG: aborting any active transactions
2019-08-07 08:54:28.799 UTC [48] LOG: worker process: logical replication launcher (PID 55) exited with exit code 1
2019-08-07 08:54:28.800 UTC [50] LOG: shutting down
..2019-08-07 08:54:31.407 UTC [48] LOG: database system is shut down
done
When I run the image with docker run, startup fails because it can't find Postgres configuration
D:\IdeaProjects\pcp\ftt-containers\ftt-db-method>docker run -p 5432:5432 -l ftt-db-method ftt-db-method:latest
Restoring /var/lib/postgresql/data ...
Done.
Launching command: postgres ...
postgres: could not access the server configuration file "/var/lib/postgresql/data/postgresql.conf": No such file or directory
Originally, my Dockerfile exposed a VOLUME which is now commented out. The above output occurs both when I declare a volume (which is not exactly what I want, I am new to Docker and copied&pasted on first chance) and when I comment the volume out.
Question
What is wrong with the Docker image of Postgres fully loaded with s**tloads of data I am experimenting?
How can I effectively start Postgres with an already full database that will not (necessarily) survive container restarts?
Edit 1
By bash-ing into the container I have found that the data dump created during build time is 10K, so basically empty.
This doesn't solve my problem yet, but answers why Postgres is unable to find its beloved data dir
Edit 2
I was able to bash into a temporary container, in particular between the moment the database is restored and the data lib is packed.
Basically the Dockerfile does
RUN ./db-setup.sh
Which executes the restore of the sql
echo "$0: running $f"; gunzip -c "$f" | "${psql[#]}" > /dev/null 2>&1 ; echo ;;
The output is saved to a temporary container.
Now Dockerfile does
RUN ./db-pack.sh
Which tars /var/lib/postgresql/data into /zdata. I have
2019-08-07 16:43:51.532 UTC [42] LOG: received fast shutdown request
waiting for server to shut down....2019-08-07 16:43:51.676 UTC [42] LOG: aborting any active transactions
2019-08-07 16:43:51.679 UTC [42] LOG: worker process: logical replication launcher (PID 49) exited with exit code 1
2019-08-07 16:43:51.681 UTC [44] LOG: shutting down
...2019-08-07 16:43:54.952 UTC [42] LOG: database system is shut down
done
server stopped
Removing intermediate container 8dbe2a4e776a
---> 263896b905ce
Step 15/19 : RUN ./db-pack.sh
---> Running in 56132ecb90cc
Packing data folder: /var/lib/postgresql/data
Pack & clean finished successfully.
Removing intermediate container 56132ecb90cc
---> 1a7f8d68e8df
Step 16/19 : VOLUME $PGDATA
---> Running in 10d222beed81
Removing intermediate container 10d222beed81
---> e1a9355882d1
So I tagged 263896b905ce (YHMV if you replicate on your pc) into a new image, then executed bash on it. The data dir was empty, the script would have packed nothing
docker tag 263896b905ce examine
docker run -it --entrypoint /bin/bash examine
root#ab963ace16a1:/opt/setup# ls
data-scripts.d db-pack.sh db-run.sh db-setup.sh
root#ab963ace16a1:/opt/setup# cd /zdata/
root#ab963ace16a1:/zdata# ls
root#ab963ace16a1:/zdata# cd /var/lib/postgresql/
root#ab963ace16a1:/var/lib/postgresql# ls
data
root#ab963ace16a1:/var/lib/postgresql# cd data/
root#ab963ace16a1:/var/lib/postgresql/data# ls
root#ab963ace16a1:/var/lib/postgresql/data# ls -lah
total 8.0K
drwxrwxrwx 2 postgres postgres 4.0K Jul 17 23:55 .
drwxr-xr-x 1 postgres postgres 4.0K Jul 17 23:55 ..
root#ab963ace16a1:/var/lib/postgresql/data#
root#ab963ace16a1:/var/lib/postgresql/data# ls^C
root#ab963ace16a1:/var/lib/postgresql/data# exit
exit
Fixed
According to https://stackoverflow.com/a/52762779/471213
"why doesn't VOLUME work?" When you define a VOLUME in the Dockerfile, you can only define the target, not the source of the volume. During the build, you will only get an anonymous volume from this. That anonymous volume will be mounted at every RUN command, prepopulated with the contents of the image, and then discarded at the end of the RUN command. Only changes to the container are saved, not changes to the volume.
So I had basically to run both RUNs at the same time
RUN ./db-setup.sh && ./db-pack.sh
#RUN ./db-pack.sh

Docker disk space issue with space left on host

I have PostgreSQL running in a Docker container (Docker 17.09.0-ce-mac35 on OS X 10.11.6) and I'm inserting data from a Python application on the host. After a while I consistently get the following error in Python while there is still plenty of disk space available on the host:
psycopg2.OperationalError: could not extend file "base/16385/24599.49": wrote only 4096 of 8192 bytes at block 6543502
HINT: Check free disk space.
This is my docker-compose.yml:
version: "2"
services:
rabbitmq:
container_name: rabbitmq
build: ../messaging/
ports:
- "4369:4369"
- "5672:5672"
- "25672:25672"
- "15672:15672"
- "5671:5671"
database:
container_name: database
build: ../database/
ports:
- "5432:5432"
The database Dockerfile looks like this:
FROM ubuntu:17.04
RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ zesty-pgdg main" > /etc/apt/sources.list.d/pgdg.list
RUN apt-get update && apt-get install -y --allow-unauthenticated python-software-properties software-properties-common postgresql-10 postgresql-client-10 postgresql-contrib-10
USER postgres
RUN /etc/init.d/postgresql start &&\
psql --command "CREATE USER ****** WITH SUPERUSER PASSWORD '******';" &&\
createdb -O ****** ******
RUN echo "host all all 0.0.0.0/0 md5" >> /etc/postgresql/10/main/pg_hba.conf
RUN echo "listen_addresses='*'" >> /etc/postgresql/10/main/postgresql.conf
EXPOSE 5432
VOLUME ["/etc/postgresql", "/var/log/postgresql", "/var/lib/postgresql"]
CMD ["/usr/lib/postgresql/10/bin/postgres", "-D", "/var/lib/postgresql/10/main", "-c", "config_file=/etc/postgresql/10/main/postgresql.conf"]
df -k output:
Filesystem 1024-blocks Used Available Capacity iused ifree %iused Mounted on
/dev/disk2 1088358016 414085004 674017012 39% 103585249 168504253 38% /
devfs 190 190 0 100% 658 0 100% /dev
map -hosts 0 0 0 100% 0 0 100% /net
map auto_home 0 0 0 100% 0 0 100% /home
Update 1:
It seems like the container has now shut down. I'll start over and try to df -k in the container before it shuts down.
2017-11-14 14:48:25.117 UTC [18] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2017-11-14 14:48:25.120 UTC [17] WARNING: terminating connection because of crash of another server process
2017-11-14 14:48:25.120 UTC [17] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-11-14 14:48:25.120 UTC [17] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2017-11-14 14:48:25.132 UTC [1] LOG: all server processes terminated; reinitializing
2017-11-14 14:48:25.175 UTC [1] FATAL: could not access status of transaction 0
2017-11-14 14:48:25.175 UTC [1] DETAIL: Could not write to file "pg_notify/0000" at offset 0: No space left on device.
2017-11-14 14:48:25.181 UTC [1] LOG: database system is shut down
Update 2:
This is df -k on the container, /dev/vda2 seems to be filling up quickly:
$ docker exec -it database df -k
Filesystem 1K-blocks Used Available Use% Mounted on
none 61890340 15022448 43700968 26% /
tmpfs 65536 0 65536 0% /dev
tmpfs 1023516 0 1023516 0% /sys/fs/cgroup
/dev/vda2 61890340 15022448 43700968 26% /etc/postgresql
shm 65536 8 65528 1% /dev/shm
tmpfs 1023516 0 1023516 0% /sys/firmware
Update 3:
This seems to be related to the ~64 GB file size limit on Docker.qcow2. Solved using qemu and gparted as follows:
cd ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
qemu-img info Docker.qcow2
qemu-img resize Docker.qcow2 +200G
qemu-img info Docker.qcow2
qemu-system-x86_64 -drive file=Docker.qcow2 -m 512 -cdrom ~/Downloads/gparted-live-0.30.0-1-i686.iso -boot d -device usb-mouse -usb

Fatal error starting postgres

I'm unfamiliar with how to use postgres and need some help. I'm currently running OSX Yosemite.
When I start postgres I get this:
pg_ctl: could not start server
Examine the log output.
There was an error executing [start] on postgres. Check /Users/work/git/proj/var/log/postgres.log for details.
createuser: could not connect to database postgres: FATAL: could not open relation mapping file "global/pg_filenode.map": No such file or directory
The log is below.
When I try to stop postgres I get this:
Postgres not running
And when I run ps -ef |grep postgres I get this:
20010 13398 1 0 Jul07 ? 00:00:00 /usr/pgsql-9.3/bin/postgres -h -k /Users/work/git/proj/var/pg
20010 13399 13398 0 Jul07 ? 00:00:09 postgres: logger process
20010 13401 13398 0 Jul07 ? 00:00:10 postgres: checkpointer process
20010 13402 13398 0 Jul07 ? 00:00:00 postgres: writer process
20010 13403 13398 0 Jul07 ? 00:00:00 postgres: wal writer process
20010 13404 13398 0 Jul07 ? 00:00:36 postgres: autovacuum launcher process
20010 13405 13398 0 Jul07 ? 00:00:02 postgres: stats collector process
20010 18112 17723 0 10:22 pts/0 00:00:00 grep postgres
What does this all mean and how could I possibly fix this?
log text
Postgres data dir doesn't exist. Creating
The files belonging to this database system will be owned by user "rose.smith".
This user must also own the server process.
The database cluster will be initialized with locale "C".
The default database encoding has accordingly been set to "SQL_ASCII".
The default text search configuration will be set to "english".
Data page checksums are disabled.
creating directory /Users/work/git/proj/postgres ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
creating configuration files ... ok
creating template1 database in /Users/work/git/proj/postgres/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
/usr/pgsql-9.3/bin/postgres -D /Users/work/git/proj/postgres
or
/usr/pgsql-9.3/bin/pg_ctl -D /Users/work/git/proj/postgres -l logfile start
waiting for server to start....< 2015-06-04 17:24:57.966 GMT >LOG: redirecting log output to logging collector process
< 2015-06-04 17:24:57.966 GMT >HINT: Future log output will appear in directory "pg_log".
done
server started
waiting for server to shut down.... done
server stopped
waiting for server to start....< 2015-06-04 18:10:18.044 GMT >LOG: redirecting log output to logging collector process
< 2015-06-04 18:10:18.044 GMT >HINT: Future log output will appear in directory "pg_log".
done
server started
"/Users/work/git/proj/var/log/postgres.log" 413L, 20935C
after running /usr/pgsql-9.3/bin/postgres -D /Users/work/git/proj/postgres
< 2015-07-08 14:40:36.331 GMT >FATAL: lock file "postmaster.pid" already exists
< 2015-07-08 14:40:36.331 GMT >HINT: Is another postmaster (PID 18145) running in data directory "/Users/work/git/proj/postgres"?
I can't speak to why this worked after trying these commands just a few minutes ago, but it is now working. Good luck to anyone else with the same problem.
stop postgres
killall postgres
remove postgres database with rm -rf postgres
start postgres
This website was helpful. I think my problem may have been the same as his.
I had deleted ~/Library/Containers/com.heroku.postgres or ~/Application Support/Postgres/ while the Postgres.app was still running. The old version was still running since I deleted the pid file, and it didn't know how to shut it down.
Source: https://github.com/PostgresApp/PostgresApp/issues/96
I faced same issue. I solved the problem with the following commands.
If you install postgresql using HomeBrew...
rm /usr/local/var/postgres/postmaster.pid
pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
Hope this helps you!

Postresql 9.3 replication not starting after pg_basebackup completes

I am trying to create a hot_standby server, and I receive the following error after pg_basebackup completes. Notice I use a shell script, replicator.sh, to start the replication. Can anyone give me some insight?
My specs:
Debian Wheezy 7.6
Postgresql 9.3
Database size: ~115GB
Error:
postgres#database-master:/etc/postgresql/9.3/main$ sh replicator.sh
Stopping PostgreSQL
[ ok ] Stopping PostgreSQL 9.3 database server: main.
Cleaning up old cluster directory
Starting base backup as replicator
Password:
113720266/113720266 kB (100%), 1/1 tablespace
NOTICE: WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup
pg_basebackup: base backup completed
Starting Postgresql
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start.
Please check the log output: 2014-09-11 17:56:33 UTC LOG: database system was interrupted; last
known up at 2014-09-11 16:54:29 UTC 2014-09-11 17:56:33 UTC LOG: creating missing WAL directory
"pg_xlog/archive_status" 2014-09-11 17:56:33 UTC LOG: incomplete startup packet 2014-09-11 17:56:33
UTC LOG: invalid checkpoint record 2014-09-11 17:56:33 UTC FATAL: could not locate required
checkpoint record 2014-09-11 17:56:33 UTC HINT: If you are not restoring from a backup, try
removing the file "/var/lib/p[FAILesql/9.3/main/backup_label". 2014-09-11 17:56:33 UTC LOG: startup
process (PID 21972) exited with exit code 1 2014-09-11 17:56:33 UTC LOG: aborting startup due to
startup process failure ... failed! failed!
Contents of replicator.sh:
#!/bin/bash
echo Stopping PostgreSQL
/etc/init.d/postgresql stop
echo Cleaning up old cluster directory
rm -rf /var/lib/postgresql/9.3/main
echo Starting base backup as replicator
pg_basebackup -h 123.456.789.123 -D /var/lib/postgresql/9.3/main -U replicator -v -P
echo Writing recovery.conf file
sudo -u postgres bash -c "cat > /var/lib/postgresql/9.3/main/recovery.conf <<- _EOF1_
standby_mode = 'on'
primary_conninfo = 'host=123.456.789.123 port=5432 user=replicator password=XXXXX sslmode=require'
trigger_file = '/tmp/postgresql.trigger'
_EOF1_
"
echo Starting Postgresql
/etc/init.d/postgresql start
Thank you,
Jake
My best guess from the above is that the pg_basebackup failed and your shell script doesn't check for error return codes or use set -e to automatically abort after errors, so it just carried on regardless.
It's also possible that you don't have WAL archiving configured, or don't have a restore_command set in the replica. In that case, the transaction logs required to start the base backup will not be available and startup will fail.
I strongly recommend that you:
Use pg_basebackup -X stream so that the required transaction logs get copied along with the backup; and
Use set -e in your shell script, or test for errors with a suitable if ! pg_basebackup .... ; then block.