I am trying to run a pg_upgrade on my existing 12.8 postgres and migrate it to 14.3 with docker based installation. I got the below error :
Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows in the new cluster ok
Deleting files from new pg_xact ok
Copying old pg_xact to new server ok
Setting oldest XID for new cluster ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
template1
myschema
*failure*
Consult the last few lines of "pg_upgrade_dump_1.log" for
the probable cause of the failure.
Can anyone help me with the location of pg_upgrade_dump_1.log and pg_upgrade_internal.log files?
Yes, as the user a_horse_with_no_name mentioned in his comment logs are created in the location where you executed pg_upgrade.
Several files are created like logs dumps and others.
Related
I am trying to understand Docker a little better, and in doing so, it appears I corrupted my PostgreSQL DB for my application.
I am using Docker Swarm to start my application and I'm getting the following error in a loop in the PostgreSQL Container:
2021-02-10 15:38:51.304 UTC 120 LOG: database system was shut down at 2021-02-10 14:49:14 UTC
2021-02-10 15:38:51.304 UTC 120 LOG: invalid primary checkpoint record
2021-02-10 15:38:51.304 UTC 120 LOG: invalid secondary checkpoint record
2021-02-10 15:38:51.304 UTC 120 PANIC: could not locate a valid checkpoint record
2021-02-10 15:38:51.447 UTC 1 LOG: startup process (PID 120) was terminated by signal 6
2021-02-10 15:38:51.447 UTC 1 LOG: aborting startup due to startup process failure
2021-02-10 15:38:51.455 UTC 1 LOG: database system is shut down
Initially, I was trying to modify the pg_hba.conf file in the container by going to the mount drive in the FS, which is in
/var/lib/docker/volumes/postgres96-data-volume/_data
However, every time I restarted the container my changes to pg_hba.conf were reverted. So this morning I added a dummy file called test in the mount folder and restarted the container expecting the file to be deleted to get a visual validation that restarting the container automatically replaces everything in that mount to it's original format. After restarting it again, that's when I started getting those error messages preventing my application from starting.
I deleted the test file and restarted the container again, but the error message continues.
I read many solutions on how to fix it, but my question is more to understand why adding a file would cause that? Is my volume corrupted simply because I added a file in there?
Thanks
WARNING
For the people who jump onto using the solution in the accepted answer, here's your WARNING:
The solution in the accepted answer asks to remove the docker volume which means that all the data in the PostgreSQL instance will be lost!!!
Refer to my answer here if you wish to preserve the data of the database instance.
Context in which I faced the same error
I am also using docker swarm to deploy containers and recently encountered this issue when I tried to scale the postgres db to create 2 replicas, both pointing to the same physical volume (mounted using docker, shared using NFS).
This was needed so that the data is in sync across both replicas.
But this led me to the same error as you have
PANIC: could not locate a valid checkpoint record
My findings
Firstly, the database volume is not corrupted, just the transaction WAL has corrupted or it has lost consensus. I did a lot of digging on it. I found two scenarios in which this error may occur:
The database was executing a live transaction but suddenly it shut down due to some error. In this case, the WAL tells the database what it was supposed to be doing when it unexpectedly shut down. However, if the DB shut down during a WAL update, the WAL may reflect some transactions which were actually executed but have improper execution info. This leads to an inconsistency in DB data vs WAL or a corrupt transaction log which leads to a checkpoint error.
You create multiple replicas of the db which point to the same volume. Consider the case of 2 replicas that I faced. When both replicas simultaneously try to execute a transaction on the same db volume, the transaction WAL loses consensus as there are two simultaneous checkpoints. The db fails to execute any further transactions as it is unable to determine which checkpoint to consider as the correct one. This can also happen if two containers (not necessarily replicas) point to the same mount path for PG_DATA.
Eventually, the db fails to start. The container does not start as the db throws an error which closes the container.
You may reset the WAL to fix this issue. When WAL is reset, you will lose the data for transactions that are yet to be executed on the DB. However, data that is already written and transactions that are already processed are preserved.
This error means the Postgres volume is corrupted. This can happen when two containers try to connect to the same volume at the same time. See this answer for slightly more info. Not sure how modifying a file corrupted the drive. You'll need to delete and recreate the volume though. To do this you can:
$ docker stop <your_container_name> # stops a running container
$ docker image prune # removes all images that are not attached to a container
$ docker volume ls # list out active volumes
$ docker volume rm <volume_name> # Remove the volume that's corrupted
I had to run the above code to stop a container, clean images that somehow weren't attached to any containers and then finally delete the offending volume where corrupted data was held.
To resolve this error, you can try the following steps:
Stop and remove the existing PostgreSQL container:
docker stop <container_name>
docker rm <container_name>
Delete the old PostgreSQL data directory, which is usually located at /var/lib/postgresql/data. This will delete all of your database data, so make sure to back up any important data before doing this.
Create a new PostgreSQL container with a fresh data directory:
docker run --name <postgres_container_name> -d postgres
We need some help with the point in recovery test . We have followed the below steps but after creating recovery.conf file and restarting the machine our DB is getting corrupted and not getting started.
Please look at the below steps and help us analyze what is going wrong there. We have used offline postgresql installer to set up database. Required configuration like wal_level=hot_stanby, archive_mode=on, and archive_command='cp %p /mnt/server/archive/%f' in postgresql.conf are also set properly.
After DB set up, We have created a tablcespace, db and mapped db with the tablespace. Also created some tables to generate transaction files in pg_wal and archive directory (which is mnt/server/archive)
SELECT pg_start_backup('TestPITR');
Taken backup(tar) of Postgresql data directory (opt/PostgreSQL/10/ data)>>tar zcf backup20180810.tar data/
SELECT pg_stop_backup();
Created some more tables and noted down their timestamp (for PITR, which will be added in recovery.conf file as recovery_target_time )
Stopped the DB>>service postgresql-10 stop
Created a new directory >>mkdir pgbackup
Moved tar file into the pgbackup directory >>mv backup20180810.tar /opt/PostgreSQL/10/pgbackup
Copied archive files into the pgbackup directory>> cp -r /mnt/server/archive/ /opt/PostgreSQL/10/pgbackup/
Renamed the old data directory( opt/PostgreSQL/10/ data )to bad.data >>> mv data bad.data (now opt/PostgreSQL/10/ bad.data)
untar the back taken in step 2>>tar -xvf backup20180810.tar
Moved the data directory to old path >>>mv /opt/PostgreSQL/10/pgbackup/data/ /opt/PostgreSQL/10/ (Now i have opt/PostgreSQL/10/ data-derived from tar file and with old transaction logs in pg_wal)
13 Copied the updated pg_wal logs from bad.data to data >>>cp -r /opt/PostgreSQL/10/bad.data/pg_wal/0* /opt/PostgreSQL/10/data/pg_wal
Started the database >> service postgresql-10 start (Our db works fine)
Created a recovery.conf file inside data folder >>opt/PostgreSQL/10/ data --given the permission and user as postgres
Our recovery.conf file has two following parameters. recovery_target_time was taken from step-5
restore_command = 'cp /opt/PostgreSQL/10/pgbackup/archive/%f %p'
recovery_target_time = '2018-08-10 02:56:31'
Restarted the db server>> service postgresql-10 restart
It doesn't restart also there is no log entry in log folder(opt/PostgreSQL/10/data/log) looks like it has stopped creating logs after restart.
Steps 14 and 15 are the wrong way around. You have to create recovery.conf before starting the server.
I was recently updating my postgres from 9.5 to 9.6.2 (installed with home-brew, though the old binaries were downloaded from postgresql directly), and I encountered a strange problem. I have been following this guide, and everything went okay. I had to modify some commands as my encoding info was strange, but everything worked (modified commands listed below.)
initdb --local=C /usr/local/var/postgres -E utf8 --lc-ctype=en_US.UTF-8
pg_upgrade -d /usr/local/var/postgres96 -D /usr/local/var/postgres -b /Users/MyUser/Downloads/pgsql/bin/ -B /usr/local/Cellar/postgresql/9.6.2/bin/
Note: I moved the old data cluster to posgres96 before beginning the process.
However, none of the data transferred over. I ran du -sh /usr/local/var/*/, and all of my data is still in postgres96, but none transferred over to the new cluster.
4.0K /usr/local/var/db/
80K /usr/local/var/homebrew/
48K /usr/local/var/log/
141M /usr/local/var/postgres/
386M /usr/local/var/postgres96/
0B /usr/local/var/run/
I re-ran everything, and noticed in the output of pg_upgrade there was a weird anomaly.
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for reg* system OID user data types ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for roles starting with 'pg_' ok
Creating dump of global objects ok
Creating dump of database schemas
ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.
Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows on the new cluster ok
Deleting files from new pg_clog ok
Copying old pg_clog to new server ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
ok
Copying user relation files
ok
Setting next OID for new cluster ok
Sync data directory to disk ok
Creating script to analyze new cluster ok
Creating script to delete old cluster ok
Upgrade Complete
It seems like the database schemas never got dumped? Regardless, all of my data is trapped in postgres96, and I can't get it out. Any help or insight would be appreciated.
Edit: I ended up just reinstalling an older postgres and dumping my data into the new server (which worked). I'd still be interested to know why pg_upgrade wasn't working though.
What I am trying to accomplish is a recovery using a continuous archive backup.
I am running a vm of CentOS 6.8 and Postgres 9.1 Postgres 9.1 is the same as the DB that I am pulling from.
I installed Postgres and initialized the DB, started up fine.
Then, following these directions: https://www.postgresql.org/docs/9.3/static/continuous-archiving.html
Stopped the destination pSQL server (as root: service postgresql-9.1 stop)
Copied the destination cluster data folder to the side (as postgres)
Removed the cluster data files (as postgres)
Copied in my source data folder (as postgres)
Copied WAL files into a clean pg_xlog folder under the data folder (as postgres)
Created a recovery.conf file which contained:
restore_command = 'cp /var/lib/pgsql/database_sample_backup/wal_archives/0A/%f %p'
This being another location for the WAL files other than the copy I placed in pg_xlog (was not sure if I needed both)
But when I attempt to restart my server, it fails. (as root: service postgresql-9.1 start)
My pgstartup.log at one point spit out "runuser: cannot set groups: Operation not permitted" but it doesn't consistently do this with every attempt to start.
I've also tried turning off archiving and replication directive in postgres.conf (so that it can run stand alone) and tried copying over the pg_hba.conf from the new DB I had created to see if they would resolve the issue. Neither did.
I've also done a netstat -ntap | grep 5432 which confirmed that I don't have anything else running on the port.
What else can I provide in the form of details, and what else my I attempt in this restoration process.
Thank you for your help!
I apologize for the long post. I have a Postgresql 9.3 server running on a Amazon linux AMI. I also have a compressed dump file from another server which I created using pg_dumpall. Now, I want to restore the data from this dump file in my Postgres. However, I want to load this data into a specific location (say /data).
I'm having a fresh installation of Postgres. So when I tried to do a:
sudo service postgresql93 start
I got an error message asking me to initialize the db. So I did a:
sudo service postgresql initdb
which created the required files in /var/lib/pgsql93/data. After that, I changed the 'data_directory' configuration in /var/lib/pgsql93/data/postgresql.conf and pointed it to /data (I had to do this as root user. I couldn't open the file as the default user).
Now when I try to do a
sudo service postgresql93 start
it fails to start, and when I check the /var/lib/pgsql93/pg_startup.log file, it says:
FATAL: "/data/postgresql" is not a valid data directory
DETAIL: File "/data/postgresql/PG_VERSION" is missing.
So I copied the files from the default (/var/lib/pgsql9.3/data) to /data, changed the permissions to 700 and owner to postgres.
However, when I try to start the service again, it still fails, and in the pgstartup.log, it only says:
LOG: redirecting log output to logging collector process
HINT: Future log output will appear in directory "pg_log".
And when I check the log in /data/pg_log, it says:
LOG: database system was shut down at 2014-12-30 21:31:18 UTC
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
What else could be the problem? I haven't restored the data yet. I just have the files which were created by the initdb command.
#BMW http://www.linuxquestions.org/questions/linux-server-73/change-postgresql-data-directory-649911/ is exactly what I was looking for. Thanks.