Data in a given schema got truncated 4 times after every new build of the database - postgresql

I have a database with two schemas. The first schema was built weeks ago and have been stable ever since. The second schema under the same database in the same server is populated through an ETL process of the original schema. I built the data twice (approx 20 hours to build). I can see the schema takes the 100GB it requires from the hard drive. Upon connecting to pgadmin4 or datagrip, I can see instantly the data gets truncated (deleted) freeing up the space it takes. After the second build before connecting to anything, I made a File System Level Backup (tar file).
The first backup (3rd trial to keep the schema alive) I uncompressed the tar file and moved it to position keeping the uncompressed version of the folder in place.
I connected to pgadmin4 and the data disappeared again. Then I edited the postgres configuration data directory to point at the folder where I initially uncompressed the tar file to avoid copying and pasting in 2 hours. Launched postgres server again and boom, the schema truncates the data again.
I have no clue how or why this happens. Any advice on where to look next time before relaunching the server to pinpoint where that truncate command is fired from?
PS:
The tar file is a compression of "main" folder inside .../postgresql/11/ folder.
Thanks in advance.

Related

DB2 SQL3706N A disk full error was encountered

I have nearly 600+ files to load in DB2 database version 10.5.9. Each file size is nearly 200 MB. I have a batch script to upload each files in a loop.
My Disk "/mnt/blumeta0/db2/copy"size is 16 GB
If i run this upload with nonrecoverable mode it works. But i cant do that in my prod database.
I tried to db3 connect refresh and db3 terminate after each file uploaded but does not worked.
Manually cleaned up disk /mnt/blumeta0/db2/copy but total size of all files is more than 16 GB so got same error.
I cannot clean folder in script as clean up can be done with super user.
db2 "LOAD FROM $i OF DEL INSERT INTO <table_name>"
SQL3706N A disk full error was encountered on "/mnt/blumeta0/db2/copy".
How DB2 server cleans copy folder? Is there any other alternative i can try?
You suggested that the Load succeeds when using NONRECOVERABLE mode, however fails otherwise with error "SQL3706N A disk full error was encountered on "/mnt/blumeta0/db2/copy".
I'm guessing that the Load is being performed using the COPY YES option. Since the Load command that you pasted does not show the COPY YES option, I'm guessing that you have a special configuration setting enabled that forces Load operations to use COPY YES in order to prevent the table from becoming inaccessible in a rollforward recovery event or HADR standby takeover event. The name of this configuration setting (registry variable) is "DB2_LOAD_COPY_NO_OVERRIDE".
When the Load is performed with COPY YES, a copy of the table pages/extents that were generated during the Load operation is written into a copy image file.
I suspect that you have the registry variable "DB2_LOAD_COPY_NO_OVERRIDE=COPY YES /mnt/blumeta0/db2/copy" configured (you can use db2set -all on the database server to display all configured registry variables). If so, the copy image files are being stored in this path, which at 16GB appears to be too small to contain them all.
You can consider changing the location of this path to somewhere with more disk space, however the path should always be accessible in the event of a database rollforward recovery or hadr standby takeover, otherwise the table will not be accessible after such an event.

PostgreSQL data recovery after migration fail

My team and I are not professional database administrators and we were trying to copy our database from one machine to another for backup purposes. Unluckily we made a mistake of moving the data directory instead of copying it. Something unexpectedly happened during the process and the data was not moved completely. Right now we're missing data for the past one month and both machines are in the same state i.e the original and the copy don't have data for that past month. Is there any possibility of recovering this lost data, and if yes how do we go about it. PostgreSQL version is 9.4 running on centos 7.

Postgres Continuous Archiving and Point-in-Time Recovery (PITR)

I am trying to setup Continuous Archiving and Point-in-Time Recovery (PITR) in Postgres. When I go through the documentation it says:
The archive command should generally be designed to refuse to overwrite any pre-existing archive file. This is an important safety feature to preserve the integrity of your archive in case of administrator error (such as sending the output of two different servers to the same archive directory).
But I see that the same WAL file is changing multiple times when I open a connection and do some changes time to time. So for example, when I first connect the database and do some changes (like deleting or inserting some rows), it creates a WAL file named 000000010000000000000090 and my archive_command is immediately run. My archive_command is
test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f
This is based on the documentation, which checks if the file already exists in the archive directory, if exists, it doesn't copy and copies only if the file doesn't exist. So the first time the condition passes and the file is copied, but when I am doing some more changes with the same connection (I am even having the same issue when I reconnect from the same PC) the original WAL file is being changed. But the next time the copy doesn't work because the file already exists.
If this is allowed to happen, we may lose some changes in the backup. Anyone knows about any solution, so it creates a new file for every change instead of modifying the old file?
I am using Postgres version 10.2 on my local computer (Mac).
Does that really happen to you? Because it shouldn't.
PostgreSQL writes transaction logs in “WAL files” (WAL for Write Ahead Log) of 16MB size.
Whenever a WAL file is full, the log is switched to a new WAL file, and the old WAL file is archived with archive_command.
If archive_command completes with an exit status of 0 (success), the WAL file is recycled, otherwise archiving is retried until it succeeds. Failures will be logged.
Anyway, as long as there are no errors, each WAL file will only be archived once.
The behavior you describe shouldn't happen.
Check the PostgreSQL log to see if there were errors reported form archive_command. If you fix the error condition. normal operation will be resumed.

How can I force drop a broken Postgres database?

I have a database that seems to be broken for some reason. It's a development db for rails so I don't have a backup but I do need to continue development. I tried to just drop it but that's not working.
$ dropdb "database-name"
dropdb: database removal failed: ERROR: could not open file "global/2964": No such file or directory
Thanks in advance for any help!
There's more wrong here than a "broken" database. Something is badly wrong with your PostgreSQL data directory.
global/9264 looks like it's pg_catalog.pg_db_role_setting, which stores ALTER DATABASE ... SET ... and ALTER ROLE ... SET ... settings. This is not database-specific, it's a global table.
If you have missing files in your data directory your whole PostgreSQL data directory is probably damaged. You should back up what you can, if there's anything you care about, then rename or delete the damaged data directory and initdb a new blank one.
You won't be able to DROP this database (or do much else) because PostgreSQL can't load the files for the pg_db_role_setting table, but it needs to delete entries referring to the dropped database from there.
As for how this happened:
Have you ever run with fsync = off in postgresql.conf?
Do you have SSD storage? If so, have you had any recent sudden power loss?
Have you ever done any direct modifications of any kind inside the PostgreSQL data directory?
Is the PostgreSQL data directory on external storage that might have been suddenly removed?
Have you ever deleted postmaster.pid ?
See also https://wiki.postgresql.org/wiki/Corruption

Which Postgresql WAL files can I safely remove from the WAL archive folder

Current situation
So I have WAL archiving set up to an independent internal harddrive on a data logging computer running Postgres. The harddrive containing the WAL archives is filling up and I'd like to remove and archive all the WAL archive files, including the initial base backup, to external backup drives.
The directory structure is like:
D:/WALBACKUP/ which is the parent folder for all the WAL files (00000110000.CA00000004 etc)
D:/WALBACKUP/BASEBACKUP/ which holds the .tar of the initial base backup
The question I have then is:
Can I safely move literally every single WAL file except the current WAL archive file, (000000000001.CA0000.. and so on), including the base backup, and move them to another hdd. (Note that the database is live and receiving data)
cheers!
WAL archives
You can use the pg_archivecleanup command to remove WAL from an archive (not pg_xlog) that's not required by a given base backup.
In general I suggest using PgBarman or a similar tool to automate your base backups and WAL retention though. It's easier and less error prone.
pg_xlog
Never remove WAL from pg_xlog manually. If you have too much WAL then:
your wal_keep_segments setting is keeping WAL around;
you have archive_mode on and archive_command set but it isn't working correctly (check the logs);
your checkpoint_segments is ridiculously high so you're just generating too much WAL; or
you have a replication slot (see the pg_replication_slots view) that's preventing the removal of WAL.
You should fix the problem that's causing WAL to be retained. If nothing seems to have happened after changing a setting run a manual CHECKPOINT command.
If you have an offline server and need to remove WAL to start it you can use pg_archivecleanup if you must. It knows how to remove only WAL that isn't needed by the server its self ... but it might break your archive-based backups, streaming replicas, etc. So don't use it unless you must.
WAL files are incremental, so the simple answer is: You cannot throw any files out. The solution is to make a new base backup and then all previous WALs can be deleted.
The WAL files contain individual statements that modify tables so if you throw some older WALs out, then the recovery process will fail (it will not silently skip missing WAL files) because the state of the database cannot be restored reliably. You can move the WAL files to some other location without upsetting the WAL process but then you'd have to make all WAL files available again from a single location if you ever need to recover your database from some point in the past; if you are running out of disk space then that may mean recovering from some location where you have enough space to store the base backup and all WAL files. The main issue here is if you can do that fast enough to restore a full database after an incident.
Another issue is that if you cannot identify where/when a problem occurred that needs to be corrected your only option is to start with the base backup and then replay all the WAL files. This procedure is not difficult, but if you have an old base backup and many WAL files to process, this simply takes a lot of time.
The best approach for your case, in general, is to make a new base backup every x months and collect WALs with that base backup. After every new base backup you can delete the old base backup and its subsequent WALs or move them to cheap offline storage (DVD, tape, etc). In the case of a major incident you can quickly restore the database to a known correct state from the recent base backup and the relatively few WAL files collected since then.
A solution that we went for, is executing pg_basebackup every night. This would create a base backup and later on we can use pg_archivecleanup to clean up all the "old" WAL files before that base using something like
"%POSTGRES_INSTALLDIR%\bin\pg_archivecleanup" -d %WAL_backup_dir% %newestBaseFile%
Fortunately, we never had to recover yet, but it should work in theory.
In case someone found this by searching how to safely cleanup the WAL directory under a replication architecture, consider the scenario where there might be left overs from offline replicas, in this case, unused replica slots waiting for the replica to come back online and thus keeping a lot of WAL archives on the Master DB.
In our case we had an issue with a replica going down due to hardware failure, we had to recreate it along with its replica_slot on the Master DB but forgot to get rid of the previous used one. Once we cleared that out PSQL got rid of unused WALs and all was good.
You can add the script to automatically clean or remove pg_wal files. This will work in pg-11 version. If you want to use other psql version the you can simply replace the command "/usr/pgsql-11/bin/pg_archivecleanup" to /usr/pgsql-12/bin/pg_archivecleanup or 13 as per your wish.
#!/bin/bash
/usr/pgsql-11/bin/pg_controldata -D /var/lib/pgsql/11/data/ > pgwalfile.txt
/usr/pgsql-11/bin/pg_archivecleanup -d /var/lib/pgsql/11/data/pg_wal $(cat pgwalfile.txt | grep "Latest checkpoint's REDO WAL file" | awk '{print $6}')