What affects DB2 restored database size? - db2

I have database TESTDB with following details:
Database size: 3.2GB
Database Capacity: 302 GB
One of its tablespaces has its HWM too high due to an SMP extent, so it is not letting me reduce the high water mark.
My backup size is around 3.2 GB (As backups contains only used pages)
If I restore this database backup image via a redirected restore, what will be the newly restored database's size?
Will it be around 3.2 GB or around 302 GB?

The short answer is that RESTORE DATABASE will produce a target database that occupies about as much disk space as the source database did when it was backed up.
On its own, the size of a DB2 backup image is not a reliable indicator of how big the target database will be. For one thing, DB2 provides the option to compress the data being backed up, which can make the backup image significantly smaller than the DB2 object data it contains.
As you correctly point out, the backup image only contains non-empty extents (blocks of contiguous pages), but the RESTORE DATABASE command will recreate each tablespace container to its original size (including empty pages) unless you specify different container locations and sizes via the REDIRECT parameter.
The 302GB of capacity you're seeing is from GET_DBSIZE_INFO and similar utilities, and is quite often larger than the total storage the database currently occupies. This is because DB2's capacity calculation includes not only unused pages in DMS tablespaces, but also any free space on volumes or drives that are used by an SMS tablespace (most DB2 LUW databases contain at least one SMS tablespace).

Related

Use Azure Blob storage for postgresql database

Azure blob storage has a maximum limit of 500TB. This is sufficient as I have (or will have) a lot of data.
I'd like to set up a server in Azure with PostGres installed, but I do not want to use the local SSD/HDD, because the size is limited... and the only way I can resize the disk (if I need to), is to shut down the VM and bang out some cryptic codes into PowerShell.
Is it possible to tell Postgres to use Azure blob storage?
If so, how can I do this?
If no, what options do I have? How can I make my HDD/SDD scale as I need more space, without any intervention on my part?
You cannot directly read/write from/to blob storage, via Postgres, as it expects a normal file system (with normal file I/O operations). To do that, there'd need to be modifications made to the Postgres storage system to accomplish such a thing.
The maximum storage footprint, from a normal OS file system perspective, would then be limited by the number of disks (which sit in page blobs) attached to your vm. Maximum 2 disks per core, maximum 4TB per disk (the larger disk sizes are new as of May 2017). That gives you a range of 8TB (single core) to 128TB (32 core). And you'd likely need to RAID those disks (unless Postgres supports a JBOD model).
As suggested by #Gaurav, you can also use Postgres-as-a-Service. Currently, while in Preview, that service has a limit of 1TB per database. Larger sizes will eventually be available.

Postgres database dump size larger than physical size

I just made an pg_dump backup from my database and its size is about 95GB but the size of the direcory /pgsql/data is about 38GB.
I run a vacuum FULL and the size of the dump does not change. The version of my postgres installation is 9.3.4, on a CentOS release 6.3 server.
It is very weird the size of the dump comparing with the physical size or I can consider this normal?
Thanks in advance!
Regards.
Neme.
The size of pg_dump output and the size of a Postgres cluster (aka 'instance') on disk have very, very little correlation. Consider:
pg_dump has 3 different output formats, 2 of which allow compression on-the-fly
pg_dump output contains only schema definition and raw data in a text (or possibly "binary" format). It contains no index data.
The text/"binary" representation of different data types can be larger or smaller than actual data stored in the database. For example, the number 1 stored in a bigint field will take 8 bytes in a cluster, but only 1 byte in pg_dump.
This is also why VACUUM FULL had no effect on the size of the backup.
Note that a Point In Time Recovery (PITR) based backup is entirely different from a pg_dump backup. PITR backups are essentially copies of the data on disk.
Postgres does compress its data in certain situations, using a technique called TOAST:
PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately known as TOAST (or "the best thing since sliced bread").

Is database back up size is same as database size

am working with PostgreSQL i checked following command then it returns 12MB
SELECT pg_size_pretty(pg_database_size('itcs'));
but when i took back up using pgadmin back up size is 1MB why this difrence
If you are taking a logical backup (with pg_dump), the backup contains only the data, no empty pages, no old versions of rows, no padding, no indexes. It may also be compressed. All that can greatly reduce the size.
If you are taking a physical backup, the backup more or less consists of the actual database files as they are, plus recovery logs to get them to a consistent state. So that would be roughly the same size as the database itself (but you can also compress it).

Mongodb data files become smaller after migration

On my first server I get:
root#prod ~ # du -hs /var/lib/mongodb/
909G /var/lib/mongodb/
After migration this database with mongodump/mongorestore
On my second server I get:
root#prod ~ # du -hs /var/lib/mongodb/
30G /var/lib/mongodb/
After I waited a few hours, mongo finished indexing I got:
root#prod ~ # du -hs /var/lib/mongodb/
54G /var/lib/mongodb/
I tested database and there's no corrupted or missed data.
Why there's so big difference in size before and after migration?
MongoDB does not recover disk space when actually data size drops due to data deletion along with other causes. There's a decent explanation in the online docs:
Why are the files in my data directory larger than the data in my database?
The data files in your data directory, which is the /data/db directory
in default configurations, might be larger than the data set inserted
into the database. Consider the following possible causes:
Preallocated data files.
In the data directory, MongoDB preallocates data files to a particular
size, in part to prevent file system fragmentation. MongoDB names the
first data file .0, the next .1, etc. The
first file mongod allocates is 64 megabytes, the next 128 megabytes,
and so on, up to 2 gigabytes, at which point all subsequent files are
2 gigabytes. The data files include files with allocated space but
that hold no data. mongod may allocate a 1 gigabyte data file that may
be 90% empty. For most larger databases, unused allocated space is
small compared to the database.
On Unix-like systems, mongod preallocates an additional data file and
initializes the disk space to 0. Preallocating data files in the
background prevents significant delays when a new database file is
next allocated.
You can disable preallocation by setting preallocDataFiles to false.
However do not disable preallocDataFiles for production environments:
only use preallocDataFiles for testing and with small data sets where
you frequently drop databases.
On Linux systems you can use hdparm to get an idea of how costly
allocation might be:
time hdparm --fallocate $((1024*1024)) testfile
The oplog.
If this mongod is a member of a replica set, the data directory
includes the oplog.rs file, which is a preallocated capped collection
in the local database. The default allocation is approximately 5% of
disk space on 64-bit installations, see Oplog Sizing for more
information. In most cases, you should not need to resize the oplog.
However, if you do, see Change the Size of the Oplog.
The journal.
The data directory contains the journal files, which store write
operations on disk prior to MongoDB applying them to databases. See
Journaling Mechanics.
Empty records.
MongoDB maintains lists of empty records in data files when deleting
documents and collections. MongoDB can reuse this space, but will
never return this space to the operating system.
To de-fragment allocated storage, use compact, which de-fragments
allocated space. By de-fragmenting storage, MongoDB can effectively
use the allocated space. compact requires up to 2 gigabytes of extra
disk space to run. Do not use compact if you are critically low on
disk space.
Important
compact only removes fragmentation from MongoDB data files and does
not return any disk space to the operating system.
To reclaim deleted space, use repairDatabase, which rebuilds the
database which de-fragments the storage and may release space to the
operating system. repairDatabase requires up to 2 gigabytes of extra
disk space to run. Do not use repairDatabase if you are critically low
on disk space.
http://docs.mongodb.org/manual/faq/storage/
What they don't tell you are the two other ways to restore/recover disk space - mongodump/mongorestore as you did or adding a new member to the replica set with an empty disk so that it writes it's databsae files from scratch.
If you are interested in monitoring this, the db.stats() command returns a wealth of data on data, index, storage and file sizes:
http://docs.mongodb.org/manual/reference/command/dbStats/
Over time the MongoDB files develop fragmentation. When you do a "migration", or whack the data directory and force a re-sync, the files pack down. If your application does a lot of deletes or updates which grow the documents fragmentation develops fairly quickly. In our deployment it is updates that grow the documents that causes this. Somehow MongoDB moves the document when it sees that the updated document can't fit in the space of the original document. There is some way to add padding factors to the collection to avoid this.

mongodb Excessive Disk Space

my mongodb take 114g which is 85%of my disk
trying to free some space using db.repairDatabase() will fail as i don't have enough free space
i know that my data shouldn't take so much space as i used to have a big collection that took 90% of the disk.
i then drop this collection and re-inserted only 20% of its data.
how can i free some space ?
The disk space for data storage is preallocated by MongoDB and can only be reclaimed by rebuilding the database with new preallocated files. Typically this is done through a db.repairDatabase() or a backup & restore. As you've noted, a repair requires enough space to create a new copy of the database so may not be an option.
Here are a few possible solutions, but all involve having some free space available elsewhere:
if there is enough free space to mongodump that database, you could mongodump, drop, and mongorestore it. db.dropDatabase() will remove the data files on disk.
if you are using a volume manager such as LVM or ZFS, you could add extra disk space to the logical volume in order to repair or dump & restore the database.
if you have another server, you could set up a replica set to sync the data without taking down your current server (which would be a "primary" in the replica set). Once the data is sync'd to a secondary server, you could then stepdown the original primary and resync the database.
Note that for a replica set you need a minimum of three nodes .. so either three data nodes, or two data nodes plus an arbiter. In a production environment the arbiter would normally run on a third server so it can allow either of the data nodes to become a primary if the current primary is unavailable. In your reclaiming disk space scenario it would be OK to run the arbiter on one of the servers instead (presumably the new server).
Replica sets are generally very helpful for administrative purposes, as they allow you to step down a server for maintenance (such as running a database compact or repair) while still having a server available for your application. They have other benefits as well, such as maintaining redundant copies of your data for automatic failover/recovery.