Modify data in a database backup file - progress-4gl

I have a .bak file of a database that contains PHI located on a server in a PHI environment. I have a .r script that will anonymize and remove PHI data in the database and any other connections strings that point to client systems. Is there a way to run this script (or another type of script) to modify the .bak file in the PHI environment before I move it to a non-PHI compliant environment to be restored?

I am guessing that a ".bak file" is a backup (there is no standard Progress-wide naming convention but that is a common usage). If so then, no, there is no way to directly manipulate the backup without restoring it.
You could restore it in a safe environment, modify the contents of the restored db and then make a new backup of the anonymized database. (If I were to do something like that I would use a different extension, perhaps .BKX, for the scrubbed backup so that it is obvious to people which one is which - that would help to reduce the chances of accidental release...)

Related

Interbase Backup Validation

We have a custom backup solution utilizing the ibx controls in Delphi to perform nightly automatic backups. As part of our current validation for a successful backup, we read the output logs generated by the backup looking for the "closing file, committing and finishing" verbiage that's last in the log file. Additionally we perform a full restore to a separate area to ensure the ibk file is valid. That's turning out to be problematic in terms of available drive space so looking for other ideas to make sure the backup is successful.
How else might we ensure that our ibk file is valid?
Jeff,
Not sure what your database size or backup file size is, and if they are too big for the remaining disk space. Can you share the database and backup size details?
Older InterBase (2017 and earlier) had a way for the command line tool, gbak, to pipe the output from the backup to another gbak process restoring from the backup. This would allow you to save the disk space on the backup file. But, since you are using the IBX backup/restore service, this is not possible. Also, InterBase 2020 has a different backup format which requires random (not sequential) write access to the backup file, thereby not allowing any pipe output even via the 'gbak' command line tool.
Here are a couple of ways to "reduce" the disk storage requirements that may work for you.
** Backup file **
You can have the InterBase backup service (from your application) store the target backup file in an external storage medium (HDD, USB stick etc., or a SAN disk/network file share). The backup/restore service can read/write backup file(s) from network shares/external medium.
** Restored database **
When restoring the database you can use the service parameter option UseAllSpace (http://docwiki.embarcadero.com/Libraries/Sydney/en/IBX.IBServices.TRestoreOptions), equivalent to gbak option "-use_all_space". This will save you about 20% space on restored data pages.
Turn off index creation, thereby reducing page consumption (possibly quite a bit depending on your index definitions). But, you will lose index validation because of this. "DeactivateIndexes" option (gbak option "-inactive") in the same page above.
Restore the database to a remote InterBase server with its own storage medium, or, to an attached USB stick or SAN disk. Since you are using the restored database only for validating the backup file, you can have this restored database on a slower I/O medium or a slower server over the network.

Change path of Firebird Secondary database files

I have created a Firebird multi-file database
Main Database file D:\Database\MainDB.fdb
Secondary files (240 Files) located under D:\Database\DBFiles\Data001.fdb to D:\Database\DBFiles\Data240.fdb
When copy database to another location and trying to open it Firebird doesn't locate the files if they are not in D:\ partition
I want Firebird to locate the secondary files under Database\DBFiels folder at the new path.
So if I copy the database to C:\Database\MainDB.fdb
Firebird would open Data001.fdb in new path like C:\Database\DBFiles instead of old path in D:\Database\DBFiles where they were initially created
Can that be done with Firebird? if not, then how it should be done?
Update:
Finally I found out it's not possiable to change Firebird database secondary files usign Firebird.
but I found this Firebird FAQ mention GLINK tool but It doesn't support Firebird 3.x so I didn't test it, and It's not recommended to use it even with supported versions of Firebird.
Done what exactly?
UPD. I edited the very vague original question to make clear WHAT the topic starter wants.
You can not reliably "copy files with Firebird" - Firebird is not files copying tool. You can to a degree use EXTERNAL TABLE for raw files access, but very limited and not upon the database itself.
It is dangerous practice to "copy databases" while Firebird is working, because you would only copy part of the data. The recently updated data that is in memory cache but did not yet made it on disk would be lost. The database file would be inconsistent with some data updated and some not yet. When you "copy database files" you have first to shutdown either those databases or even the whole Firebird server.
Firebird has it's own tools for moving databases around - and those are called backup/restore tools. Maybe what you need is nbackup tool, if gbak is too slow for you.
Finally, you can list files that comprise the database. You can do it via gstat utility or via "Services API" it uses. You also can select from RDB$FILES system table. However what would you do after you did it? The very access to the database makes it badly suited for consequent copying (#2). You would perhaps need to shutdown database, turn it to read-only AND single-user state, and only then attach to it and read RDB$FILES. And after copying done - you would have to de-shutdown the database. Kinda much more complex than nbackup.
https://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/gstat-example-header.html
https://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/gfix-dbstartstop.html
https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref-appx04-files.html
https://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/gbak.html
https://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/nbackup.html

Postgres equivalent to Oracle's "DIRECTORY" objects

Is it possible to create "DIRECTORY" object in Postgres?
If not can some help me with a solution how implement it on PostgreSQL.
Not the best option, but you could use:
COPY (select 1) TO PROGRAM 'mkdir --mode=777 -p /path/to/your/directory/'
Note that only the last part of directory get the permissions set in mode.
There is no equivalent concept to an "Oracle directory" in Postgres.
The alternatives depend on why the "Oracle directory" is needed.
If the directory is needed to read and write files on the database server, then this can be done through Generic File Access Functions. Access to those functions is restricted to superusers (details in the linked section of the manual). If regular users should be able to use them, the best thing would be to create wrapper functions and then grant execute on those functions to the users in question.
For security reasons, only directories inside the database cluster can be accessed.
But it's possible to create symlinks inside the data directory that point to directories outside the data directory. Access privileges on those directories need to be properly setup for the postgres operating system user (the one under which the postgres process is started)
If the directory is needed to access e.g. CSV files through Oracle's external tables, then there is no need for a "directory". The file FDW foreign data wrapper, can access files outside the data directory (provided access privileges have been setup correctly on the file system level).
The question doesn't even make sense really. PostgreSQL is a database management system. It doesn't have files and directories.
The closest parallel I can think of is schemas - see CREATE SCHEMA.
Now, if you want to use COPY to write output to the server's disk and want to create a directory to put that output in... then no, there's nothing like that. But you can use PL/Perlu or PL/Pythonu to do it easily enough.

Backing up the DB vs. backing up the VM

We're serving a Django/Postgres site running on a VM hypervisor. We're now trying to figure out our back up strategy and have two probable options:
Back up the DB directly using pg_dump
Back up the VM directly by copying the VM image
I'm with the latter as I think, I could simply back up everything that has to do with the site. I'm not sure whether I have to shut down the VM for this though.
What is a better and more recommended way of backing up a DB? Are there any reasons for not using the VM backup?
Thanks
The question basically boils down to, can you consider a hot copy of PostgreSQL's data files a backup?
The answer is: not really. PostgreSQL tries very hard through the use of WAL to ensure that its files are in a consistent state all the time and that it can survive a power failure, but starting it up from a copy of these files puts PostgreSQL into recovery mode. If the backup happened at the wrong second and PostgreSQL can't recover from the state of these files, your backup is useless. You don't want your backup/restore mechanism to depend on the recovery mechanism (unless you're dealing with "crash only" software, which PostgreSQL is not).
The probability of PostgreSQL not being able to recover from these files is not high, but it's not zero either. The probability of PostgreSQL not being able to load an SQL dump that it made, on the other hand, is zero. I prefer backup choices with lower probabilities of failure. pg_dump was designed for doing backups.
PostgreSQL recommends using pg_dump for backups, as a file system (or VM) backup requires the database to be shut down (and has other drawbacks):
http://www.postgresql.org/docs/8.1/static/backup-file.html
Edit: Also, a pg_dump backup will be significantly smaller than a filesystem dump of the same database.
There is an additional option. With PostgreSQL you can make an online backup that allows you to snapshot the file system and maintain consistency. You can see details here:
http://www.postgresql.org/docs/9.0/static/continuous-archiving.html
We use this exact method for making backups when we run PostgreSQL in a VM.

TSQL syntax to restore .bak to new db

I need to automate the creation of a duplicate db from the .bak of my production db. I've done the operation plenty of times via the GUI but when executing from the commandline I'm a little confused by the various switches, in particular, the filenames and being sure ownership is correctly replicated.
Just looking for the TSQL syntax for RESTORE that accomplishes that.
Assuming you're using SQL Server 2005 or 2008, the simplest way is to use the "Script" button at the top of the restore database dialog in SQL Server Management Studio. This will automatically create a T-SQL script with all the options/settings configured in the way you've filled in the dialog.
look here: How to: Restore a Database to a New Location and Name (Transact-SQL), which has a good example:
This example creates a new database
named MyAdvWorks. MyAdvWorks is a
copy of the existing AdventureWorks
database that includes two files:
AdventureWorks_Data and
AdventureWorks_Log. This database uses
the simple recovery model. The
AdventureWorks database already
exists on the server instance, so the
files in the backup must be restored
to a new location. The RESTORE
FILELISTONLY statement is used to
determine the number and names of the
files in the database being restored.
The database backup is the first
backup set on the backup device.
USE master
GO
-- First determine the number and names of the files in the backup.
-- AdventureWorks_Backup is the name of the backup device.
RESTORE FILELISTONLY
FROM AdventureWorks_Backup
-- Restore the files for MyAdvWorks.
RESTORE DATABASE MyAdvWorks
FROM AdventureWorks_Backup
WITH RECOVERY,
MOVE 'AdventureWorks_Data' TO 'D:\MyData\MyAdvWorks_Data.mdf',
MOVE 'AdventureWorks_Log' TO 'F:\MyLog\MyAdvWorks_Log.ldf'
GO
This may help also: Copying Databases with Backup and Restore