do not undsrstand file structure of pgsql - postgresql

Can someone provide a bit of clarification?
I understand that the /base folder show a data folder for each database. In PgAdmin, I have 13 databases listed under 1 server. In the /base folder, there are 14 folders. So that should be 1 per database and 1 for the the server equalling 14.
I do not know how do know what folder is for what database. However, only one has a lot of data. When I search for large files on my system, this displays:
16M: /var/lib/pgsql/9.2/data/base/18642/18652
13M: /var/lib/pgsql/9.2/data/base/18642/18751
1.0G: /var/lib/pgsql/9.2/data/base/21719/21804
12M: /var/lib/pgsql/9.2/data/base/21719/21806
15M: /var/lib/pgsql/9.2/data/base/21719/21750
20M: /var/lib/pgsql/9.2/data/base/21719/21837
118M: /var/lib/pgsql/9.2/data/base/21719/21834
Now, if this is (21719) actually the only running database used by staff, when I archive (pgdump) it, the size of the dump is approx 6 Gig. The size of the dump and the data listed above do not match.
Can someone shed some light on my confusion?
Thanks a bunch.
This was a result of trying to find out why I have almost 700 gig of drive space being used when the only stuff on it is postgresql and an occasional runaway vnc-error-log that eats up drive space (figured out how to solve that). However, I still have over 60% of my drive used, I cannot account for it, and found the data sizes in postgresql.
Thanks for any insight that can be provided on postgresql db data

I do not know how do know what folder is for what database
The folder name is the OID of the database, which you can get with the following SQL query, along with each db size according to the SQL engine:
select oid,datname,pg_database_size(datname) from pg_database;
If there are 13 databases and 14 folders, the additional folder is probably the pgsql_tmp directory used for temporary files. The concept of server of pgAdmin does not come into play in a specific server's data directory.
Also as said in the comments, the dump size may be greater than the disk size due compression. It can also be smaller since it doesn't contain any index data. On the whole, knowing the size on disk does not help much to predict the size of the SQL dump and vice versa.

Related

PostgreSQL - Recovery of Functions' code following accidental deletion of data files

So, I am (well... I was) running PostgreSQL within a container (Ubuntu 14.04LTS with all the recent updates, back-end storage is "dir" because of convince).
To cut the long story short, the container folder got deleted. Following the use of extundelete and ext4magic, I have managed to extract some of the database physical files (it appears as if most of the files are there... but not 100% sure if and what is missing).
I have two copies of the database files. One from 9.5.3 (which appears to be more complete) and one from 9.6 (I upgraded the container very recently to 9.6, however it appears to be missing datafiles).
All I am after is to attempt and extract the SQL code the relates to the user defined functions. Is anyone aware of an approach that I could try?
P.S.: Last backup is a bit dated (due to bad practices really) so it would be last resort if the task of extracting the needed information is "reasonable" and "successful".
Regards,
G
Update - 20/4/2017
I was hoping for a "quick fix" by somehow extracting the function body text off the recovered data files... however, nothing's free in this life :)
Starting from the old-ish backup along with the recovered logs, we managed to cover a lot of ground into bringing the DB back to life.
Lessons learned:
1. Do implement a good backup/restore strategy
2. Do not store backups on the same physical machine
3. Hardware failure can be disruptive... Human error can be disastrous!
If you can reconstruct enough of a data directory to start postgres in single user mode you might be able to dump pg_proc. But this seems unlikely.
Otherwise, if you're really lucky you'll be able to find the relation for pg_proc and its corresponding pg_toast relation. The latter will often contain compressed text, so searches for parts of variables you know appear in function bodies may not help you out.
Anything stored inline in pg_proc will be short functions, significantly less than 8k long. Everything else will be in the toast relation.
To decode that you have to unpack the pages to get the toast hunks, then reassemble them and uncompress them (if compressed).
If I had to do this, I would probably create a table with the exact same schema as pg_proc in a new postgres instance of the same version. I would then find the relfilenode(s) for pg_catalog.pg_proc and its toast table using the relfilenode map file (if it survived) or by pattern matching and guesswork. I would replace the empty relation files for the new table I created with the recovered ones, restart postgres, and if I was right, I'd be able to select from the tables.
Not easy.
I suggest reading up on postgres's storage format as you'll need to understand it.
You may consider https://www.postgresql.org/support/professional_support/ . (Disclaimer, I work for one of the listed companies).
P.S.: Last backup is a bit dated (due to bad practices really) so it would be last resort if the task of extracting the needed information is "reasonable" and "successful".
Backups are your first resort here.
If the 9.5 files are complete and undamaged (or enough so to dump the schema) then simply copying them in place, checking permissions and starting the server will get you going. Don't trust the data though, you'll need to check it all.
Although it is possible to partially recover given damaged files, it's a long complicated process and the fact that you are asking on Stack Overflow probably means it's not for you.

Postgres database dump size larger than physical size

I just made an pg_dump backup from my database and its size is about 95GB but the size of the direcory /pgsql/data is about 38GB.
I run a vacuum FULL and the size of the dump does not change. The version of my postgres installation is 9.3.4, on a CentOS release 6.3 server.
It is very weird the size of the dump comparing with the physical size or I can consider this normal?
Thanks in advance!
Regards.
Neme.
The size of pg_dump output and the size of a Postgres cluster (aka 'instance') on disk have very, very little correlation. Consider:
pg_dump has 3 different output formats, 2 of which allow compression on-the-fly
pg_dump output contains only schema definition and raw data in a text (or possibly "binary" format). It contains no index data.
The text/"binary" representation of different data types can be larger or smaller than actual data stored in the database. For example, the number 1 stored in a bigint field will take 8 bytes in a cluster, but only 1 byte in pg_dump.
This is also why VACUUM FULL had no effect on the size of the backup.
Note that a Point In Time Recovery (PITR) based backup is entirely different from a pg_dump backup. PITR backups are essentially copies of the data on disk.
Postgres does compress its data in certain situations, using a technique called TOAST:
PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately known as TOAST (or "the best thing since sliced bread").

Is database back up size is same as database size

am working with PostgreSQL i checked following command then it returns 12MB
SELECT pg_size_pretty(pg_database_size('itcs'));
but when i took back up using pgadmin back up size is 1MB why this difrence
If you are taking a logical backup (with pg_dump), the backup contains only the data, no empty pages, no old versions of rows, no padding, no indexes. It may also be compressed. All that can greatly reduce the size.
If you are taking a physical backup, the backup more or less consists of the actual database files as they are, plus recovery logs to get them to a consistent state. So that would be roughly the same size as the database itself (but you can also compress it).

Dumping postgres DB, time and .sql file weight

I have a big db (nominatim db, for address geocoding reverse), is about 408gb big.
Now, to provide an estimate to the customer, I would like to know how long will take the export/reimport procedure and how big will .sql dump file be.
My postgresql version is 9.4, is installed on a centOS 6.7 virtual machine, with 16gb RAM and 500 gb disk space.
Can you help me?
Thank you all guys for your answer, anyway to restore the dumped db I don't use the command pg_restore but psql -d newdb -f dump.sql (I read this way to do in a official doc). This because I have to set-up this db on another machine to avoid the nominatim db indexing procedure! I don't know if someone knows nominatim (is a openstreetmap opensource product) but the db indexing process of European map (15.8 gb), in a CentOS 6.7 machine with 16gb ram tooks me 32 days...
Than another possible question should be: pg_restore is equal to psql -d -f? Wich is faster?
Thanks again
As #a_horse_with_no_name says, nobody will be able to give you exact answers for your environment. But this is the procedure I would use to get some estimates.
I have generally found that a compressed backup of my data is 1/10th or less the size of the live database. You can also usually deduct the on-disk size of the indexes from the backup size as well. Examine the size of things in-database to get a better idea. You can also try forming a subset of the database you have which is much smaller and compare the live size to the compressed backup; this may give you a ratio that should be in the ballpark. SQL files are gassy and compress well; the on-disk representation Postgres uses seems to be even gassier though. Price of performance probably.
The best way to estimate time is just to do some exploratory runs. In my experience this usually takes longer than you expect. I have a ~1 TB database that I'm fairly sure would take about a month to restore, but it's also aggressively indexed. I have several ~20 GB databases that backup/restore in about 15 minutes. So it's pretty variable, but indexes add time. If you can set up a similar server, you can try the backup-restore procedure and see how long it will take. I would recommend doing this anyway, just to build confidence and suss out any lingering issues before you pull the trigger.
I would also recommend you try out pg_dump's "custom format" (pg_dump -Fc) which makes compressed archives that are easy for pg_restore to use.

What affects DB2 restored database size?

I have database TESTDB with following details:
Database size: 3.2GB
Database Capacity: 302 GB
One of its tablespaces has its HWM too high due to an SMP extent, so it is not letting me reduce the high water mark.
My backup size is around 3.2 GB (As backups contains only used pages)
If I restore this database backup image via a redirected restore, what will be the newly restored database's size?
Will it be around 3.2 GB or around 302 GB?
The short answer is that RESTORE DATABASE will produce a target database that occupies about as much disk space as the source database did when it was backed up.
On its own, the size of a DB2 backup image is not a reliable indicator of how big the target database will be. For one thing, DB2 provides the option to compress the data being backed up, which can make the backup image significantly smaller than the DB2 object data it contains.
As you correctly point out, the backup image only contains non-empty extents (blocks of contiguous pages), but the RESTORE DATABASE command will recreate each tablespace container to its original size (including empty pages) unless you specify different container locations and sizes via the REDIRECT parameter.
The 302GB of capacity you're seeing is from GET_DBSIZE_INFO and similar utilities, and is quite often larger than the total storage the database currently occupies. This is because DB2's capacity calculation includes not only unused pages in DMS tablespaces, but also any free space on volumes or drives that are used by an SMS tablespace (most DB2 LUW databases contain at least one SMS tablespace).