DB2 append backups into one unique db - db2

I am using the restore db to import backup into a test environment database that works fine, but I need to extend this import process to several backups from several dates into a unique test environment db... What is the command to append backups into a unique database...
thanks
Phil

You write "I get only full offline database backup files from several time date (2 weeks of production) that I need to restore in a test database for analysis... example 4 backups files of 2 weeks of data = 2 months of data ...."
and you also write "What is the command to append backups into a unique database..."
While there is no explicit method for full-offline-backup images to be combined for Db2-LUW, there's always another way to get what you need...given the right skills and tools.
IF you have a FULL backup image, it can either be restored to a new database, or it can fully overwrite an existing database. If you have 4 FULL backup images, each can be restored either into a (uniquely named) database (or overwrite 4 existing databases).
You can also restore specific tablespaces from a backup image, if properly configured. Some sites have designed discrete tablespaces for specific time periods (one per day/week/month) to help with such activities. Some sites have designed their tables to be range partitioned (with each time period having its own partition (and sometimes dedicated tablespaces also), and this makes subsequent merging of content more easy with the right skills.
If you are competent with scripting, you can restore the first (earliest) image, export the relevant table contents to flat-files, restore the next backup image and export the relevant tables to new flat-files (repeat as needed), then load these flat-files into a table for analysis. If your database size is small then this can be considered a keep-it-simple approach.
You can also do clever things with federation if you restore to discrete databases.
Separately purchasable tools exist to let you extract selected content from a backup image (which can then be loaded into a Db2 database), without needing to do a restore action. These are not included with the Db2 product. So you could extract specific table contents from a backup image if you pay for the right tools and learn how to use them. Speak with your IBM Salesperson. Such tools may require currently supported versions of Db2 however.

Related

PostgreSQL: even read access changes data files disk leading to large incremental backups using pgbackrest

We are using pgbackrest to backup our database to Amazon S3. We do full backups once a week and an incremental backup every other day.
Size of our database is around 1TB, a full backup is around 600GB and an incremental backup is also around 400GB!
We found out that even read access (pure select statements) on the database has the effect that the underlying data files (in /usr/local/pgsql/data/base/xxxxxx) change. This results in large incremental backups and also in very large storage (costs) on Amazon S3.
Usually the files with low index names (e.g. 391089.1) change on read access.
On an update, we see changes in one or more files - the index could correlate to the age of the row in the table.
Some more facts:
Postgres version 13.1
Database is running in docker container (docker version 20.10.0)
OS is CentOS 7
We see the phenomenon on multiple servers.
Can someone explain, why postgresql changes data files on pure read access?
We tested on a pure database without any other resources accessing the database.
This is normal. Some cases I can think of right away are:
a SELECT or other SQL statement setting a hint bit
This is a shortcut for subsequent statements that access the data, so they don't have t consult the commit log any more.
a SELECT ... FOR UPDATE writing a row lock
autovacuum removing dead row versions
These are leftovers from DELETE or UPDATE.
autovacuum freezing old visible row versions
This is necessary to prevent data corruption if the transaction ID counter wraps around.
The only way to fairly reliably prevent PostgreSQL from modifying a table in the future is:
never perform an INSERT, UPDATE or DELETE on it
run VACUUM (FREEZE) on the table and make sure that there are no concurrent transactions

Best way to backup and restore data in PostgreSQL for testing

I'm trying to migrate our database engine from MsSql to PostgreSQL. In our automated test, we restore the database back to "clean" state at the start of every test. We do this by comparing the "diff" between the working copy of the database with the clean copy (table by table). Then copying over any records that have changed. Or deleting any records that have been added. So far this strategy seems to be the best way to go about for us because per test, not a lot of data is changed, and the size of the database is not very big.
Now I'm looking for a way to essentially do the same thing but with PostgreSQL. I'm considering doing the exact same thing with PostgreSQL. But before doing so, I was wondering if anyone else has done something similar and what method you used to restore data in your automated tests.
On a side note - I considered using MsSql's snapshot or backup/restore strategy. The main problem with these methods is that I have to re-establish the db connection from the app after every test, which is not possible at the moment.
If you're okay with some extra storage, and if you (like me) are particularly not interested in re-inventing the wheel in terms of checking for diffs via your own code, you should try creating a new DB (per run) via templates feature of createdb command (or CREATE DATABASE statement) in PostgreSQL.
So for e.g.
(from bash) createdb todayDB -T snapshotDB
or
(from psql) CREATE DATABASE todayDB TEMPLATE snaptshotDB;
Pros:
In theory, always exact same DB by design (no custom logic)
Replication is a file-transfer (not DB restore). So far less time taken (i.e. doesn't run SQL again, doesn't recreate indexes / restore tables etc.)
Cons:
Takes 2x the disk space (although template could be on a low performance NFS etc)
For my specific situation. I decided to go back to the original solution. Which is to compare the "working" copy of the database with "clean" copy of the database.
There are 3 types of changes.
For INSERT records - find max(id) from clean table and delete any record on working table that has higher ID
For UPDATE or DELETE records - find all records in clean table EXCEPT records found in working table. Then UPSERT those records into working table.

Using Data compare to copy one database over another

Ive used the Data Comare tool to update schema between the same DB's on different servers, but what If so many things have changed (including data), I simply want to REPLACE the target database?
In the past Ive just used TSQL, taken a backup then restored onto the target with the replace command and/or move if the data & log files are on different drives. Id rather have an easier way to do this.
You can use Schema Compare (also by Red Gate) to compare the schema of your source database to a blank target database (and update), then use Data Compare to compare the data in them (and update). This should leave you with the target the same as the source. However, it may well be easier to use the backup/restore method in that instance.

postgres copy database to another server reduces database size

Installed postgres 9.1 in both the machine.
Initially the DB size is 7052 MB then i used the following command for copy to another server.
pg_dump -C dbname | bzip2 | ssh remoteuser#remotehost "bunzip2 | psql dbname"
After successfully copies, In destination machine i check size it shows 6653 MB.
then i checked for table count its same.
Has there been data loss? Is there missing data?
Note:
Two machines have same hardware and software configuration.
i used:
SELECT pg_size_pretty(pg_database_size('dbname'));
One of the PostgreSQL's most sophisticated features is so called Multi-Version Concurrency Control (MVCC), a standard technique for avoiding conflicts between reads and writes of the same object in database. MVCC guarantees that each transaction sees a consistent view of the database by reading non-current data for objects modified by concurrent transactions. Thanks to MVCC, PostgreSQL has great scalability, a robust hot backup tool and many other nice features comparable to the most advanced commercial databases.
Unfortunately, there is one downside to MVCC, the databases tend to grow over time and sometimes it can be a problem. In recent versions of PostgreSQL there is a separate server process called the autovacuum daemon (pg_autovacuum), whose purpose is to keep the database size reasonable. It does that by trying to recover reusable chunks of the database files. Still, there are many scenarios that will force the database to grow, even if the amount of the useful data in it doesn't really change. That happens typically if you have lots of UPDATE and/or DELETE statements in the applications that are using the database.
When you do a COPY, you recover extraneous space and so your copied DB appears smaller.
That looks normal. Databases are often smaller after restore, because a newly created b-tree index is more compact than one that's been progressively built by inserts. Additionally, UPDATEs and DELETEs leave empty space in the tables.
So you have nothing to worry about. You'll find that if you diff an SQL dump from the old DB and a dump taken from the just-restored DB, they'll be the same except for comments.

Physical location of objects in a PostgreSQL database?

I'm interested to get the physical locations of tables, views, functions, data/content available in the tables of PostgreSQL in Linux OS. I've a scenario that PostgreSQL could be installed in SD-Card facility and Hard-Disk. If I've tables, views, functions, data in SD, I want to get the physical locations of the same and merge/copy into my hard-disk whenever I wish to replace the storage space. I hope the storage of database should be in terms of plain files architecture.
Also, is it possible to view the contents of the files? I mean, can I access them?
Kevin and Mike already provided pointers where to find the data directory. For the physical location of a table in the file system, use:
SELECT pg_relation_filepath('my_table');
Don't mess with the files directly unless you know exactly what you are doing.
A database as a whole is represented by a subdirectory in PGDATA/base:
If you use tablespaces it gets more complicated. Read details in the chapter Database File Layout in the manual:
For each database in the cluster there is a subdirectory within
PGDATA/base, named after the database's OID in pg_database. This
subdirectory is the default location for the database's files; in
particular, its system catalogs are stored there.
...
Each table and index is stored in a separate file. For ordinary
relations, these files are named after the table or index's filenode
number, which can be found in pg_class.relfilenode.
...
The pg_relation_filepath() function shows the entire path (relative to
PGDATA) of any relation.
Bold emphasis mine.
The manual about the function pg_relation_filepath().
The query show data_directory; will show you the main data directory. But that doesn't necessarily tell you where things are stored.
PostgreSQL lets you define new tablespaces. A tablespace is a named directory in the filesystem. PostgreSQL lets you store individual tables, indexes, and entire databases in any permissible tablespace. So if a database were created in a specific tablespace, I believe none of its objects would appear in the data directory.
For solid run-time information about where things are stored on disk, you'll probably need to query pg_database, pg_tablespace, or pg_tables from the system catalogs. Tablespace information might also be available in the information_schema views.
But for merging or copying to your hard disk, using these files is almost certainly a Bad Thing. For that kind of work, pg_dump is your friend.
If you're talking about copying the disk files as a form of backup, you should probably read this, especially the section on Continuous Archiving and Point-in-Time Recovery (PITR):
http://www.postgresql.org/docs/current/interactive/backup.html
If you're thinking about trying to directly access and interpret data in the disk files, bypassing the database management system, that is a very bad idea for a lot of reasons. For one, the storage scheme is very complex. For another, it tends to change in every new major release (issued once per year). Thirdly, the ghost of E.F. Codd will probably haunt you; see rules 8, 9, 11, and 12 of Codd's 12 rules.