Fatal error when trying to start postgresql container in docker

Fatal error when trying to start postgresql container in docker - postgresql

I'm deploying Dataverse using docker.
The containers were working nicely, however, few days ago and without any changes, when I use docker-compose up -d the db container (postgresql) does not start. This is the error when use docker logs db.
PostgreSQL Database directory appears to contain a database; Skipping initialization
LOG: could not create IPv6 socket: Address family not supported by protocol
FATAL: could not open directory "pg_tblspc": No such file or directory
LOG: database system is shut down*
Can anyone help me please ?

What is the contents of the PGDATA directory and are you able to share the docker-compose section of the database?
According to your logs, one of the important directories underneath PGDATA is missing. This can happen if the data is not persisted in general and only the empty data directory is given. I assume that you have a volume mounted to the docker image. So showing the contents of that directory (PGDATA) will help us understand if only pg_tblspc is missing or everything.
If it is only pg_tblspc, we already have threads that discuss a recovery on SO. I don't know the reason, but understanding what exactly is missing (pg_tblspc or everything) would be important to understand the problem.

I had a similar issue when restart my container:
ERROR: could not open file "pg_tblspc/18684/PG_9.6_201608131/16385/26851": No such file or directory
Inspecting the container internally, I found the following:
├── pg_tblspc
│   └── 18684 -> /var/lib/postgres/pg_dirs/pg_indices
In my docker-compose configuration I only had the volumen:
volumes:
- postgres/data:/var/lib/postgresql/data
so when restart the container all data referenced for 18684 -> /var/lib/postgres/pg_dirs/pg_indices is lost
to avoid this issue, I add this path to the volumes like this:
volumes:
- postgres/data:/var/lib/postgresql/data
- postgres/pg_indices:/var/lib/postgres/pg_dirs/pg_indices
The above does not restore the already lost data,(I had to restore my entire database), but prevents further loss

Related

Using Docker, what triggered PANIC: could not locate a valid checkpoint record

I am trying to understand Docker a little better, and in doing so, it appears I corrupted my PostgreSQL DB for my application.
I am using Docker Swarm to start my application and I'm getting the following error in a loop in the PostgreSQL Container:
2021-02-10 15:38:51.304 UTC 120 LOG: database system was shut down at 2021-02-10 14:49:14 UTC
2021-02-10 15:38:51.304 UTC 120 LOG: invalid primary checkpoint record
2021-02-10 15:38:51.304 UTC 120 LOG: invalid secondary checkpoint record
2021-02-10 15:38:51.304 UTC 120 PANIC: could not locate a valid checkpoint record
2021-02-10 15:38:51.447 UTC 1 LOG: startup process (PID 120) was terminated by signal 6
2021-02-10 15:38:51.447 UTC 1 LOG: aborting startup due to startup process failure
2021-02-10 15:38:51.455 UTC 1 LOG: database system is shut down
Initially, I was trying to modify the pg_hba.conf file in the container by going to the mount drive in the FS, which is in
/var/lib/docker/volumes/postgres96-data-volume/_data
However, every time I restarted the container my changes to pg_hba.conf were reverted. So this morning I added a dummy file called test in the mount folder and restarted the container expecting the file to be deleted to get a visual validation that restarting the container automatically replaces everything in that mount to it's original format. After restarting it again, that's when I started getting those error messages preventing my application from starting.
I deleted the test file and restarted the container again, but the error message continues.
I read many solutions on how to fix it, but my question is more to understand why adding a file would cause that? Is my volume corrupted simply because I added a file in there?
Thanks

WARNING
For the people who jump onto using the solution in the accepted answer, here's your WARNING:
The solution in the accepted answer asks to remove the docker volume which means that all the data in the PostgreSQL instance will be lost!!!
Refer to my answer here if you wish to preserve the data of the database instance.
Context in which I faced the same error
I am also using docker swarm to deploy containers and recently encountered this issue when I tried to scale the postgres db to create 2 replicas, both pointing to the same physical volume (mounted using docker, shared using NFS).
This was needed so that the data is in sync across both replicas.
But this led me to the same error as you have
PANIC: could not locate a valid checkpoint record
My findings
Firstly, the database volume is not corrupted, just the transaction WAL has corrupted or it has lost consensus. I did a lot of digging on it. I found two scenarios in which this error may occur:
The database was executing a live transaction but suddenly it shut down due to some error. In this case, the WAL tells the database what it was supposed to be doing when it unexpectedly shut down. However, if the DB shut down during a WAL update, the WAL may reflect some transactions which were actually executed but have improper execution info. This leads to an inconsistency in DB data vs WAL or a corrupt transaction log which leads to a checkpoint error.
You create multiple replicas of the db which point to the same volume. Consider the case of 2 replicas that I faced. When both replicas simultaneously try to execute a transaction on the same db volume, the transaction WAL loses consensus as there are two simultaneous checkpoints. The db fails to execute any further transactions as it is unable to determine which checkpoint to consider as the correct one. This can also happen if two containers (not necessarily replicas) point to the same mount path for PG_DATA.
Eventually, the db fails to start. The container does not start as the db throws an error which closes the container.
You may reset the WAL to fix this issue. When WAL is reset, you will lose the data for transactions that are yet to be executed on the DB. However, data that is already written and transactions that are already processed are preserved.

This error means the Postgres volume is corrupted. This can happen when two containers try to connect to the same volume at the same time. See this answer for slightly more info. Not sure how modifying a file corrupted the drive. You'll need to delete and recreate the volume though. To do this you can:
$ docker stop <your_container_name> # stops a running container
$ docker image prune # removes all images that are not attached to a container
$ docker volume ls # list out active volumes
$ docker volume rm <volume_name> # Remove the volume that's corrupted
I had to run the above code to stop a container, clean images that somehow weren't attached to any containers and then finally delete the offending volume where corrupted data was held.

To resolve this error, you can try the following steps:
Stop and remove the existing PostgreSQL container:
docker stop <container_name>
docker rm <container_name>
Delete the old PostgreSQL data directory, which is usually located at /var/lib/postgresql/data. This will delete all of your database data, so make sure to back up any important data before doing this.
Create a new PostgreSQL container with a fresh data directory:
docker run --name <postgres_container_name> -d postgres

Postgres Recovery Failure

What I am trying to accomplish is a recovery using a continuous archive backup.
I am running a vm of CentOS 6.8 and Postgres 9.1 Postgres 9.1 is the same as the DB that I am pulling from.
I installed Postgres and initialized the DB, started up fine.
Then, following these directions: https://www.postgresql.org/docs/9.3/static/continuous-archiving.html
Stopped the destination pSQL server (as root: service postgresql-9.1 stop)
Copied the destination cluster data folder to the side (as postgres)
Removed the cluster data files (as postgres)
Copied in my source data folder (as postgres)
Copied WAL files into a clean pg_xlog folder under the data folder (as postgres)
Created a recovery.conf file which contained:
restore_command = 'cp /var/lib/pgsql/database_sample_backup/wal_archives/0A/%f %p'
This being another location for the WAL files other than the copy I placed in pg_xlog (was not sure if I needed both)
But when I attempt to restart my server, it fails. (as root: service postgresql-9.1 start)
My pgstartup.log at one point spit out "runuser: cannot set groups: Operation not permitted" but it doesn't consistently do this with every attempt to start.
I've also tried turning off archiving and replication directive in postgres.conf (so that it can run stand alone) and tried copying over the pg_hba.conf from the new DB I had created to see if they would resolve the issue. Neither did.
I've also done a netstat -ntap | grep 5432 which confirmed that I don't have anything else running on the port.
What else can I provide in the form of details, and what else my I attempt in this restoration process.
Thank you for your help!

Restoring Database PostgreSQL

One of my servers has a virus and the Postgres service in Windows is not running a backup and I'm using Odoo8 and even the Odoo Service is not running.
Is it possible to restore a database using only a OID directory which from what I know is the database file of Postgres.

I assume you mean /data/base/<oid> directory. Unfortunately it's not enough. There are some settings stored outside database oid directory as you called it.
Ex:
/data/glboal/ - cluster users' settings (passwords, roles etc)
/data/pg_xlog/ - WAL entries - possibly with transactions changes not "transfered" to database files yet.
/data/pg_tblspc/ - tablespaces
You need whole /data directory. Read more about PHYSICAL BACKUP.
Edit:
So, if whole /data is available for you, you can restore database to other server. There's one thing you should remember: destination postrges cluster must be at the same varsion ex. 9.4.1. When the first and seccond numbers match (ex 9.2.10 and 9.2.16) this should also work most of the times. Keeping that in mind, you just need to replace /data/ directory on destination server with your source /data directory (destination server must be stopped during that operation).

Changing postgresql 9.3 database location on Amazon ec2 linux

I apologize for the long post. I have a Postgresql 9.3 server running on a Amazon linux AMI. I also have a compressed dump file from another server which I created using pg_dumpall. Now, I want to restore the data from this dump file in my Postgres. However, I want to load this data into a specific location (say /data).
I'm having a fresh installation of Postgres. So when I tried to do a:
sudo service postgresql93 start
I got an error message asking me to initialize the db. So I did a:
sudo service postgresql initdb
which created the required files in /var/lib/pgsql93/data. After that, I changed the 'data_directory' configuration in /var/lib/pgsql93/data/postgresql.conf and pointed it to /data (I had to do this as root user. I couldn't open the file as the default user).
Now when I try to do a
sudo service postgresql93 start
it fails to start, and when I check the /var/lib/pgsql93/pg_startup.log file, it says:
FATAL: "/data/postgresql" is not a valid data directory
DETAIL: File "/data/postgresql/PG_VERSION" is missing.
So I copied the files from the default (/var/lib/pgsql9.3/data) to /data, changed the permissions to 700 and owner to postgres.
However, when I try to start the service again, it still fails, and in the pgstartup.log, it only says:
LOG: redirecting log output to logging collector process
HINT: Future log output will appear in directory "pg_log".
And when I check the log in /data/pg_log, it says:
LOG: database system was shut down at 2014-12-30 21:31:18 UTC
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
What else could be the problem? I haven't restored the data yet. I just have the files which were created by the initdb command.

#BMW http://www.linuxquestions.org/questions/linux-server-73/change-postgresql-data-directory-649911/ is exactly what I was looking for. Thanks.

MongoDB does not see database or collections after migrating from localhost to EBS volume

full disclosure: I am a complete n00b to mongodb and am just getting my feet wet with using mongo on AWS (but have 2 decades working in IT so not a total n00b :P)
I setup an EBS volume and installed mongo on a EC2 instance.
My problem is that I provisioned too small an EBS volume initially.
When I realized this I:
created a new larger EBS volume
mounted it on the server
stopped mongo ( $ sudo service mongod stop)
copied all my /data/db files into the new volume
updated conf files and fstab (dbpath, logpath, pidfilepath and mount point for new volume respectively)
restarted mongod
When I execute: $ sudo service mongod start
- everything runs fine.
- I can futz about in the admin and local databases.
However, when I run the mongos command: > show databases
- I only see the admin and local.
- the database I copied into the new volume (named encompass) is not listed.
I still have a working local copy of the database so my data is not lost, just not sure how best to move mongo data around other than:
A) start all over importing the data to the db on the AWS server (not what I would like since it is already loaded in my local db)
B) copy the local db to the new EBS volume again (also not preferred but better that importing all the data from scratch again!).
NOTE: originally I secure copied the data into the EBS volume with this command:
$ scp -r -i / / ec2-user#:/
then when I copied between volumes I used a vanilla cp command.
Did I miss something here?
The best I could find on SO and the web was this process (How to scale MongoDB?), but perhaps I missed a switch in a command or a nuance to the process that rendered my database files inert/useless?
Any idea how I can get mongo to see my other database files and collections?
Or did I make a irreversible error somewhere along the way?
Thanks for any help!!

Are you sure you conf file is being loaded? You can, for a test, load mongod.exe and specify the path directly to your db for a test, i.e.:
mongod --dbpath c:\mongo\data\db (unix syntax may vary a bit, this is windows)
run this from the command line and see what, if anything, mongo complains about.

A database has a very finicky algorithm that is easy to damage. Before copying from one database to another you should probably seed the database, a few dummy entries will tell you the database is working.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse