I need your help with this one: I'm on db2 luw 10.5.8 and have set logarchmeth1 to tsm. I noticed my logpath is getting full (only 10% of available space left) and so I decided to use the db2 prune history command to free up some space.
Here is the command I used:
db2 prune history 20170813 and delete
I was under the impression that this command would free up some space by deleting all logfiles prior or equal to the specified date. Unfortunatley it didn't work that way. The available free space in my logpath remained unchanged at 10%.
I also tried the depracated prune logfile command (same result):
db2 prune logfile prior to S0000100.LOG
What can I do to free up some space in my logpath?
Any ideas?
Did you verify that the archival to TSM is completing successfully? Also you may need to manually archive any log files that completed before you switched on TSM.
You can see this tsm-archival-completion in the db2diag and the db_history view. If you set AUTO_DEL_REC_OBJ to on, then DB2 should do the prune for you at the end of a DB2-backup according to related settings.
If there is a TSM outage, the log files to be archived will stay in the active-log path (unless you additionally define FAILARCHPATH) until either the filesystem for active-logs fills or TSM-resumes-service (whichever happens first). DB2 will continually retry TSM and auto resume archival when TSM responds.
Ensure your filesystem for active-logs (and/or failarchpath filesystem) are large enough to tolerate a TSM-outage of your chosen duration.
Related
We have a custom backup solution utilizing the ibx controls in Delphi to perform nightly automatic backups. As part of our current validation for a successful backup, we read the output logs generated by the backup looking for the "closing file, committing and finishing" verbiage that's last in the log file. Additionally we perform a full restore to a separate area to ensure the ibk file is valid. That's turning out to be problematic in terms of available drive space so looking for other ideas to make sure the backup is successful.
How else might we ensure that our ibk file is valid?
Jeff,
Not sure what your database size or backup file size is, and if they are too big for the remaining disk space. Can you share the database and backup size details?
Older InterBase (2017 and earlier) had a way for the command line tool, gbak, to pipe the output from the backup to another gbak process restoring from the backup. This would allow you to save the disk space on the backup file. But, since you are using the IBX backup/restore service, this is not possible. Also, InterBase 2020 has a different backup format which requires random (not sequential) write access to the backup file, thereby not allowing any pipe output even via the 'gbak' command line tool.
Here are a couple of ways to "reduce" the disk storage requirements that may work for you.
** Backup file **
You can have the InterBase backup service (from your application) store the target backup file in an external storage medium (HDD, USB stick etc., or a SAN disk/network file share). The backup/restore service can read/write backup file(s) from network shares/external medium.
** Restored database **
When restoring the database you can use the service parameter option UseAllSpace (http://docwiki.embarcadero.com/Libraries/Sydney/en/IBX.IBServices.TRestoreOptions), equivalent to gbak option "-use_all_space". This will save you about 20% space on restored data pages.
Turn off index creation, thereby reducing page consumption (possibly quite a bit depending on your index definitions). But, you will lose index validation because of this. "DeactivateIndexes" option (gbak option "-inactive") in the same page above.
Restore the database to a remote InterBase server with its own storage medium, or, to an attached USB stick or SAN disk. Since you are using the restored database only for validating the backup file, you can have this restored database on a slower I/O medium or a slower server over the network.
I am trying to setup Continuous Archiving and Point-in-Time Recovery (PITR) in Postgres. When I go through the documentation it says:
The archive command should generally be designed to refuse to overwrite any pre-existing archive file. This is an important safety feature to preserve the integrity of your archive in case of administrator error (such as sending the output of two different servers to the same archive directory).
But I see that the same WAL file is changing multiple times when I open a connection and do some changes time to time. So for example, when I first connect the database and do some changes (like deleting or inserting some rows), it creates a WAL file named 000000010000000000000090 and my archive_command is immediately run. My archive_command is
test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f
This is based on the documentation, which checks if the file already exists in the archive directory, if exists, it doesn't copy and copies only if the file doesn't exist. So the first time the condition passes and the file is copied, but when I am doing some more changes with the same connection (I am even having the same issue when I reconnect from the same PC) the original WAL file is being changed. But the next time the copy doesn't work because the file already exists.
If this is allowed to happen, we may lose some changes in the backup. Anyone knows about any solution, so it creates a new file for every change instead of modifying the old file?
I am using Postgres version 10.2 on my local computer (Mac).
Does that really happen to you? Because it shouldn't.
PostgreSQL writes transaction logs in “WAL files” (WAL for Write Ahead Log) of 16MB size.
Whenever a WAL file is full, the log is switched to a new WAL file, and the old WAL file is archived with archive_command.
If archive_command completes with an exit status of 0 (success), the WAL file is recycled, otherwise archiving is retried until it succeeds. Failures will be logged.
Anyway, as long as there are no errors, each WAL file will only be archived once.
The behavior you describe shouldn't happen.
Check the PostgreSQL log to see if there were errors reported form archive_command. If you fix the error condition. normal operation will be resumed.
Trying to do a DB2 import as part of a system copy and the transaction logs filled up. Import was cancelled, transaction log backup ran, and number of logs were increased to approximately 90% of the available disk (previously 70%).
Restarted DB and kicked off DB but now that errors due to the tablespace state - running db2 list tablespaces show detail shows I have 4 tablespaces in Backup Pending state.
So I tried db2 backup database <SID> tablespace <SID>#BTABI online but I get the error:
SQL2059W A device full warning was encountered on device "/db2/db2". Do you want to continue(c), terminate this device only(d), abort the utility(t) ? (c/d/t) t
No option works but to terminate.
The thing is, the device isn't full. There's no activities on the DB, running db2 list applications gives:
SQL1611W No data was returned by Database System Monitor.
Running db2 "select log_utilization_percent,dbpartitionnum from sysibmadm.log_utilization order by 2" to show the log utilization returns 0.
There's no logs in use. The filesystem has space free. I even tried reducing the number of logs again to make sure but get the same issue.
I tried db2 "alter tablespace <SID>#BTABI switch online" instead and although this returns a 'success' statement it doesn't actually do anything - my tablespaces are still in Backup pending?
Any ideas please
You're trying to write the backup images to the /db2/db2 file system, which doesn't have enough space to hold the backup image(s).
Note: When you execute BACKUP DATABASE as in your example above without specifying where to send the backup (i.e. you don't use the to /dir/ectory or another option like use TSM), DB2 will write the backup image to the current directory. Make sure you specify where to store the backup image (and that it has enough free space to hold the backup image). If you don't care about recoverability and are just trying to get the table space out of backup pending state, you can specify /dev/null as your location as #mustaccio suggests in the comments above.
Also: You may want to look at the COMMITCOUNT option for the import utility so you're not trying to insert all data in a single massive transaction.
As per above comments - just kept running the import, resetting the 'pending load' status each time with:
load from /dev/null of del terminate into SAPECD.
A few packages fail each time but the rest process. Letting finish, resetting again and restarting the import gets through a little more each time.
Current situation
So I have WAL archiving set up to an independent internal harddrive on a data logging computer running Postgres. The harddrive containing the WAL archives is filling up and I'd like to remove and archive all the WAL archive files, including the initial base backup, to external backup drives.
The directory structure is like:
D:/WALBACKUP/ which is the parent folder for all the WAL files (00000110000.CA00000004 etc)
D:/WALBACKUP/BASEBACKUP/ which holds the .tar of the initial base backup
The question I have then is:
Can I safely move literally every single WAL file except the current WAL archive file, (000000000001.CA0000.. and so on), including the base backup, and move them to another hdd. (Note that the database is live and receiving data)
cheers!
WAL archives
You can use the pg_archivecleanup command to remove WAL from an archive (not pg_xlog) that's not required by a given base backup.
In general I suggest using PgBarman or a similar tool to automate your base backups and WAL retention though. It's easier and less error prone.
pg_xlog
Never remove WAL from pg_xlog manually. If you have too much WAL then:
your wal_keep_segments setting is keeping WAL around;
you have archive_mode on and archive_command set but it isn't working correctly (check the logs);
your checkpoint_segments is ridiculously high so you're just generating too much WAL; or
you have a replication slot (see the pg_replication_slots view) that's preventing the removal of WAL.
You should fix the problem that's causing WAL to be retained. If nothing seems to have happened after changing a setting run a manual CHECKPOINT command.
If you have an offline server and need to remove WAL to start it you can use pg_archivecleanup if you must. It knows how to remove only WAL that isn't needed by the server its self ... but it might break your archive-based backups, streaming replicas, etc. So don't use it unless you must.
WAL files are incremental, so the simple answer is: You cannot throw any files out. The solution is to make a new base backup and then all previous WALs can be deleted.
The WAL files contain individual statements that modify tables so if you throw some older WALs out, then the recovery process will fail (it will not silently skip missing WAL files) because the state of the database cannot be restored reliably. You can move the WAL files to some other location without upsetting the WAL process but then you'd have to make all WAL files available again from a single location if you ever need to recover your database from some point in the past; if you are running out of disk space then that may mean recovering from some location where you have enough space to store the base backup and all WAL files. The main issue here is if you can do that fast enough to restore a full database after an incident.
Another issue is that if you cannot identify where/when a problem occurred that needs to be corrected your only option is to start with the base backup and then replay all the WAL files. This procedure is not difficult, but if you have an old base backup and many WAL files to process, this simply takes a lot of time.
The best approach for your case, in general, is to make a new base backup every x months and collect WALs with that base backup. After every new base backup you can delete the old base backup and its subsequent WALs or move them to cheap offline storage (DVD, tape, etc). In the case of a major incident you can quickly restore the database to a known correct state from the recent base backup and the relatively few WAL files collected since then.
A solution that we went for, is executing pg_basebackup every night. This would create a base backup and later on we can use pg_archivecleanup to clean up all the "old" WAL files before that base using something like
"%POSTGRES_INSTALLDIR%\bin\pg_archivecleanup" -d %WAL_backup_dir% %newestBaseFile%
Fortunately, we never had to recover yet, but it should work in theory.
In case someone found this by searching how to safely cleanup the WAL directory under a replication architecture, consider the scenario where there might be left overs from offline replicas, in this case, unused replica slots waiting for the replica to come back online and thus keeping a lot of WAL archives on the Master DB.
In our case we had an issue with a replica going down due to hardware failure, we had to recreate it along with its replica_slot on the Master DB but forgot to get rid of the previous used one. Once we cleared that out PSQL got rid of unused WALs and all was good.
You can add the script to automatically clean or remove pg_wal files. This will work in pg-11 version. If you want to use other psql version the you can simply replace the command "/usr/pgsql-11/bin/pg_archivecleanup" to /usr/pgsql-12/bin/pg_archivecleanup or 13 as per your wish.
#!/bin/bash
/usr/pgsql-11/bin/pg_controldata -D /var/lib/pgsql/11/data/ > pgwalfile.txt
/usr/pgsql-11/bin/pg_archivecleanup -d /var/lib/pgsql/11/data/pg_wal $(cat pgwalfile.txt | grep "Latest checkpoint's REDO WAL file" | awk '{print $6}')
I have looked at the postgres documentation and the synopsis below is given:
pg_resetxlog [-f] [-n] [-ooid ] [-x xid ] [-e xid_epoch ] [-m mxid ] [-O mxoff ] [-l timelineid,fileid,seg ] datadir
But at no point in the documentation do they explain what the datadir is.
Is it the %postgres-path%/9.0/data or could it be %postgres-path%/9.0/data/pgxlog ?
Also, if I want to change my xlog directory, can I simply move the items in my current pg_xlog directory and run the command to point to another directory? (Assume my current pg_xlog directory is in /data1/postgres/data/pg_xlog AND the directory I want it the logs to go to is: /data2/pg_xlog)
Would the following command achieve what I've just described?
mv /data1/postgres/data/pg_xlog /data2/pg_xlog
pg_resetxlog /data2
pg_resetxlog is a tool of last resort for getting your database running again after:
You deleted files you shouldn't have from pg_xlog;
You restored a file system level backup that omitted the pg_xlog directory due to a backup system configuration mistake (this happens more than you'd think, people think "it has log in the name so it must be unimportant; I'll leave it out of the backups").
File-system corruption due to a hardware fault or hard drive failure damaged your data directory; or potentially even
a PostgreSQL bug or operating system bug damaged the write-ahead logs (exceedingly rare).
As the manual says:
pg_resetxlog clears the write-ahead log (WAL) [...]. This
function is sometimes needed if these files have become corrupted. It
should be used only as a last resort, when the server will not start
due to such corruption.
Do not run pg_resetxlog unless you know exactly what you are doing and why. If you are unsure, ask on the pgsql-general mailing list or on https://dba.stackexchange.com/.
pg_resetxlog may corrupt your database, as the documentation warns. If you have to use it, you should REINDEX, dump your database(s), re-initdb, and reload your databases. Do not just continue using the damaged cluster. As per the documentation:
After running this command, it should be possible to start the server,
but bear in mind that the database might contain inconsistent data due
to partially-committed transactions. You should immediately dump your
data, run initdb, and reload. After reload, check for inconsistencies
and repair as needed.
If you simply want to move your write-ahead log directory to another location, you should:
Stop PostgreSQL
Move pg_xlog
Add a symbolic link from the old location to the new location
Start PostgreSQL
Or, as the documentation says:
It is advantageous if the log is located on a different disk from the
main database files. This can be achieved by moving the pg_xlog
directory to another location (while the server is shut down, of
course) and creating a symbolic link from the original location in the
main data directory to the new location.
If PostgreSQL fails to start, you've done something wrong. Do not use pg_resetxlog to "fix" it. Undo your changes and work out what you did wrong.
Move the contents of your pg_xlog directory to the desired location like '/home/foo/pg_xlog'
mv pg_xlog/* /home/foo/pg_xlog
Delete the pg_xlog directory
rm -rf pg_xlog
Create a soft-link of pg_xlog
ln -s /home/foo/pg_xlog pg_xlog
Verify the link
ls -lrt pg_xlog
Note: pg_resetxlog is not the right tool to move the pg_xlog please read
http://www.postgresql.org/docs/9.2/static/app-pgresetxlog.html
The data directory corresponds to the data_directory entry in the postgresql.conf file, or the PGDATA environment variable, and it can also be queried live in SQL with the SHOW data_directory statement. It does not point to the pg_xlog directory, but one level above.
To change the location of the WAL files, the PG server must be shut down, the pg_xlog directory and its contents moved to the new location, a symbolic link should be created from the old location to the new location, and the server restarted. pg_resetxlog should not be used for this, as it may suppress the latest transactions (this tool is typically used in crash recovery situations when all else fails).
You should never manually touch the WAL files, that is perfectly clear.
If there is dangling files in the pg_xlog directory, that is, there are is file which ends with .done* in the sub-folder archive_status which need to be cleaned up manually, that can be accomplished with the sql command
CHECKPOINT;
which forces a transaction checkpoint which includes cleaning up the WAL segment files.
See documentation for 9.3 but exists in all current versions of Postgresql.