postgres Log files are saturating the disk and I intend to delete all disks after backing up, should I restart postgres service or can postgres see the new free space after deletion without retsart? If no is there a command that forces postgres to see the nes space size while it is running?
You can delete PostgreSQL log files at any time. Note, however, that deleting (unlinking) a file does not actually delete it as long as a process still holds it open. So you have to notify PostgreSQL with
pg_ctl logrotate
just like the documentation describes.
Related
I accidentally deleted a volume of docker mongo-data:/data/db , i have a copy of that folder , now the problem is when i run docker-compose up mongodb container doesn't start and gives an error of mongo_1 exited with code 14 below more details of the error and the mongo-data folder , can you someone help me please
in docker-compose.yml
volumes:
- ./mongo-data:/data/db
Restore from backup files
A step-by-step process to repair the corrupted files from a failed mongodb in a docker container:
! Before you start, make copy of the files. !
Make sure you know which version of the image was running in the container
Spawn new container with to run the repair process as follows
docker run -it -v <data folder>:/data/db <image-name>:<image-version> mongod --repair
Once the files are repaired, you can start the containers from the docker-compose
If the repair fails, it usually means that the files are corrupted beyond repair. There is still a chance to repair it with exporting the data as described here.
How to secure proper backup files
The database is constantly working with the files, so the files are constantly changed on the disks. In addition, the database will keep some of the changes in the internal memory buffers before they are flushed to the filesystem. Although the database engines are doing very good job to assure the the database can recover from abrupt failure by using the 2-stage commit process (first update the transaction-log than the datafile), when the files are copied there could be a corruption that will prevent the database from recovery.
Reason for such corruption is that the copy process is not aware of the database written process progress, and this creates a racing condition. With very simple words, while the database is in middle of writing, the copy process will create a copy of the file(s) that is half-updated, hence it will be corrupted.
When the database writer is in middle of writing to the files, we call them hot files. hot files are term from the OS perspective, and MongoDB also uses a term hot backup which is a term from MongoDB perspective. Hot backup means that the backup was taken when the database was running.
To take a proper snapshot (assuring the files are cold) you need to follow the procedure explained here. In short, the command db.fsyncLock() that is issued during this process will inform the database engine to flush all buffers and stop writing to the files. This will make the files cold, however the database remains hot, hence the difference between the terms hot files and hot backup. Once the copy is done, the database is informed to start writing to the filesystem by issuing db.fsyncUnlock()
Note the process is more complex and can change with different version of the databse. Here I give a simplification of it, in order to illustrate the point about the problems with the file snapshot. To secure proper and consistent backup, always follow the documented procedure for the database version that you use.
Suggested backup method
Preferred backup should always be the data dump method, since this assures that you can restore even in case of upgraded/downgraded database engines. MongoDB provides very useful tool called mongodump that can be used to create database backups by dumping the data, instead by copy of the files.
For more details on how to use the backup tools, as well as for the other methods of backup read the MongoDB Backup Methods chapter of the MondoDB documentation.
I am trying to understand Docker a little better, and in doing so, it appears I corrupted my PostgreSQL DB for my application.
I am using Docker Swarm to start my application and I'm getting the following error in a loop in the PostgreSQL Container:
2021-02-10 15:38:51.304 UTC 120 LOG: database system was shut down at 2021-02-10 14:49:14 UTC
2021-02-10 15:38:51.304 UTC 120 LOG: invalid primary checkpoint record
2021-02-10 15:38:51.304 UTC 120 LOG: invalid secondary checkpoint record
2021-02-10 15:38:51.304 UTC 120 PANIC: could not locate a valid checkpoint record
2021-02-10 15:38:51.447 UTC 1 LOG: startup process (PID 120) was terminated by signal 6
2021-02-10 15:38:51.447 UTC 1 LOG: aborting startup due to startup process failure
2021-02-10 15:38:51.455 UTC 1 LOG: database system is shut down
Initially, I was trying to modify the pg_hba.conf file in the container by going to the mount drive in the FS, which is in
/var/lib/docker/volumes/postgres96-data-volume/_data
However, every time I restarted the container my changes to pg_hba.conf were reverted. So this morning I added a dummy file called test in the mount folder and restarted the container expecting the file to be deleted to get a visual validation that restarting the container automatically replaces everything in that mount to it's original format. After restarting it again, that's when I started getting those error messages preventing my application from starting.
I deleted the test file and restarted the container again, but the error message continues.
I read many solutions on how to fix it, but my question is more to understand why adding a file would cause that? Is my volume corrupted simply because I added a file in there?
Thanks
WARNING
For the people who jump onto using the solution in the accepted answer, here's your WARNING:
The solution in the accepted answer asks to remove the docker volume which means that all the data in the PostgreSQL instance will be lost!!!
Refer to my answer here if you wish to preserve the data of the database instance.
Context in which I faced the same error
I am also using docker swarm to deploy containers and recently encountered this issue when I tried to scale the postgres db to create 2 replicas, both pointing to the same physical volume (mounted using docker, shared using NFS).
This was needed so that the data is in sync across both replicas.
But this led me to the same error as you have
PANIC: could not locate a valid checkpoint record
My findings
Firstly, the database volume is not corrupted, just the transaction WAL has corrupted or it has lost consensus. I did a lot of digging on it. I found two scenarios in which this error may occur:
The database was executing a live transaction but suddenly it shut down due to some error. In this case, the WAL tells the database what it was supposed to be doing when it unexpectedly shut down. However, if the DB shut down during a WAL update, the WAL may reflect some transactions which were actually executed but have improper execution info. This leads to an inconsistency in DB data vs WAL or a corrupt transaction log which leads to a checkpoint error.
You create multiple replicas of the db which point to the same volume. Consider the case of 2 replicas that I faced. When both replicas simultaneously try to execute a transaction on the same db volume, the transaction WAL loses consensus as there are two simultaneous checkpoints. The db fails to execute any further transactions as it is unable to determine which checkpoint to consider as the correct one. This can also happen if two containers (not necessarily replicas) point to the same mount path for PG_DATA.
Eventually, the db fails to start. The container does not start as the db throws an error which closes the container.
You may reset the WAL to fix this issue. When WAL is reset, you will lose the data for transactions that are yet to be executed on the DB. However, data that is already written and transactions that are already processed are preserved.
This error means the Postgres volume is corrupted. This can happen when two containers try to connect to the same volume at the same time. See this answer for slightly more info. Not sure how modifying a file corrupted the drive. You'll need to delete and recreate the volume though. To do this you can:
$ docker stop <your_container_name> # stops a running container
$ docker image prune # removes all images that are not attached to a container
$ docker volume ls # list out active volumes
$ docker volume rm <volume_name> # Remove the volume that's corrupted
I had to run the above code to stop a container, clean images that somehow weren't attached to any containers and then finally delete the offending volume where corrupted data was held.
To resolve this error, you can try the following steps:
Stop and remove the existing PostgreSQL container:
docker stop <container_name>
docker rm <container_name>
Delete the old PostgreSQL data directory, which is usually located at /var/lib/postgresql/data. This will delete all of your database data, so make sure to back up any important data before doing this.
Create a new PostgreSQL container with a fresh data directory:
docker run --name <postgres_container_name> -d postgres
I am not a DBA but i am using postgresql for production server and i am using postgresql 10 database. I am using Bigsql and i started replication of my production server to other server and on replication server everything is working but on my production server their is no space left. And after du command on my production server i am getting that pg_wal folder have 17 gb file and each file is of 16 mb size.
After some google search i change my postgresql.conf file as:
wal_level = logical
archive_mode = on
archive_command = 'cp -i %p /etc/bigsql/data/pg10/pg_wal/archive_status/%f'
i install postgresql 10 from Bigsql and did above changes.
After changes the dir /pg_wal/archive_status had 16 gb of log. So my question is that should i delete them manually or i have to wait for system delete them automatically.
And is that if i write archive_mode to on should that wal file getting removed automatically??
Thanks for your precious time.
This depends on how you do your backups and whether you'd ever need to restore the database to some point in time.
Only a full offline filesystem backup (offline meaning with database turned off) or an on-line logical backup with pg_dumpall will not need those files for a restore.
You'd need those files to restore a filesystem backup created while the database is running. Without them the backup will fail to restore. Though there exist backup solutions that copy needed WAL files automatically (like Barman).
You'd also need those files if your replica database will ever fall behind the master for some reason. Or you'd need to restore the database to some past point-in-time.
But these files compress pretty well - should be less than 10% size after compression - you can write your archive_command to compress them automatically instead of just copying.
And you should delete them eventually from the archive. I'd recommend to not delete them until they're at least a month old and also at least 2 full successful backups are done after creating them.
Trying to do a DB2 import as part of a system copy and the transaction logs filled up. Import was cancelled, transaction log backup ran, and number of logs were increased to approximately 90% of the available disk (previously 70%).
Restarted DB and kicked off DB but now that errors due to the tablespace state - running db2 list tablespaces show detail shows I have 4 tablespaces in Backup Pending state.
So I tried db2 backup database <SID> tablespace <SID>#BTABI online but I get the error:
SQL2059W A device full warning was encountered on device "/db2/db2". Do you want to continue(c), terminate this device only(d), abort the utility(t) ? (c/d/t) t
No option works but to terminate.
The thing is, the device isn't full. There's no activities on the DB, running db2 list applications gives:
SQL1611W No data was returned by Database System Monitor.
Running db2 "select log_utilization_percent,dbpartitionnum from sysibmadm.log_utilization order by 2" to show the log utilization returns 0.
There's no logs in use. The filesystem has space free. I even tried reducing the number of logs again to make sure but get the same issue.
I tried db2 "alter tablespace <SID>#BTABI switch online" instead and although this returns a 'success' statement it doesn't actually do anything - my tablespaces are still in Backup pending?
Any ideas please
You're trying to write the backup images to the /db2/db2 file system, which doesn't have enough space to hold the backup image(s).
Note: When you execute BACKUP DATABASE as in your example above without specifying where to send the backup (i.e. you don't use the to /dir/ectory or another option like use TSM), DB2 will write the backup image to the current directory. Make sure you specify where to store the backup image (and that it has enough free space to hold the backup image). If you don't care about recoverability and are just trying to get the table space out of backup pending state, you can specify /dev/null as your location as #mustaccio suggests in the comments above.
Also: You may want to look at the COMMITCOUNT option for the import utility so you're not trying to insert all data in a single massive transaction.
As per above comments - just kept running the import, resetting the 'pending load' status each time with:
load from /dev/null of del terminate into SAPECD.
A few packages fail each time but the rest process. Letting finish, resetting again and restarting the import gets through a little more each time.
The system is FC21 with Postgresql 9.3.9. The cluster has 6 databases and uses 38 GB of storage in the pgsql directory. Recently over 20GB of redundant data has been removed. Each db has been vacuumed with a 'vacuum all' command twice, additionally the entire cluster has been vacuumed twice with a vacuumdb -a command. All ran successfully. Postgresql has been stopped and restarted.
For verification a pg_dumpall command creates an 12GB file.
All the tables from one db were removed:
select pg_size_pretty(pg_database_size('db'));
Shows over 6GB remaining.
How can the space be recovered? It seems unreasonable to have to do a pg_restore to recover the space. I have read and re-read the 'recovering disk space' document.
A VACUUM command will only reclaim space that is at the end of table-files. You will want VACUUM FULL or vacuumdb -f.
You might also want to consider reindexdb since all this row rewriting might leave your indexes a little bloated.