Converting .pbf to postgresql using osm2pgsql (not enough disk space) - postgresql

I must say I don't understand this completely.
But when I try to convert a binary pbf file for my country Germany which is of size 3gb using osm2pgsql (slim mode), it is converted to postgresql tables for 3 hours and fails with the message 'not enough disk space'. I have 50gb of free space in my linux machine.
I can understand the temporary files are added to RAM normally and because I am using slim mode it is saved to database.
Please enlighten me, how a 3gb osm file while converting to postgresql(gis) tables takes 50gb space and throws that error ?
How do I solve this ?

Yes it could cross 50gb. As India pbf is around 375mb and PostgreSQL data folder size is 11gb that include world boundary also from OSM.

Related

Postgres: 'Wrong tuple length' on Vacuum. Disk space running full

We have a PGSQL server (v13) with alot of data in it.
The database contains documents.
The total database is around 1.5 TB. Today, someone called me telling me the disk space was almost full. They put in 1 TB extra storage some time ago but that extra storange ran full extremely quickly, which is very abnormal. Disk was 2 TB, now 3 TB with the extra storage.
If I look at the table containing the documents, it only added around 10 GB since 20/07/2022, so I really don't understand why the disk is running full this fast. If I do this query on the database:
SELECT pg_size_pretty( pg_total_relation_size('documents') );
It returns '2.7 TB' which is impossible, since there aren't that much documents added recently.
I did a Vacuum as a test on a certain table (total: 20 gb). The vacuum failed with Error:
ERROR: wrong tuple length
What does it mean? I have the same errors in the PGSQL logfiles. They recently installed a new antivirus system on the server. I already asked for exclusions but it didn't seem to solve the problem.
I now only have +/- 130 gb free disk space and it keeps getting full.
Is it possible the vacuum takes the disk space and does not return it to Windows because of the error?
Any help is appreciated. I'm not a database expert but i really need to solve this.

Postgres database dump size larger than physical size

I just made an pg_dump backup from my database and its size is about 95GB but the size of the direcory /pgsql/data is about 38GB.
I run a vacuum FULL and the size of the dump does not change. The version of my postgres installation is 9.3.4, on a CentOS release 6.3 server.
It is very weird the size of the dump comparing with the physical size or I can consider this normal?
Thanks in advance!
Regards.
Neme.
The size of pg_dump output and the size of a Postgres cluster (aka 'instance') on disk have very, very little correlation. Consider:
pg_dump has 3 different output formats, 2 of which allow compression on-the-fly
pg_dump output contains only schema definition and raw data in a text (or possibly "binary" format). It contains no index data.
The text/"binary" representation of different data types can be larger or smaller than actual data stored in the database. For example, the number 1 stored in a bigint field will take 8 bytes in a cluster, but only 1 byte in pg_dump.
This is also why VACUUM FULL had no effect on the size of the backup.
Note that a Point In Time Recovery (PITR) based backup is entirely different from a pg_dump backup. PITR backups are essentially copies of the data on disk.
Postgres does compress its data in certain situations, using a technique called TOAST:
PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately known as TOAST (or "the best thing since sliced bread").

Dumping postgres DB, time and .sql file weight

I have a big db (nominatim db, for address geocoding reverse), is about 408gb big.
Now, to provide an estimate to the customer, I would like to know how long will take the export/reimport procedure and how big will .sql dump file be.
My postgresql version is 9.4, is installed on a centOS 6.7 virtual machine, with 16gb RAM and 500 gb disk space.
Can you help me?
Thank you all guys for your answer, anyway to restore the dumped db I don't use the command pg_restore but psql -d newdb -f dump.sql (I read this way to do in a official doc). This because I have to set-up this db on another machine to avoid the nominatim db indexing procedure! I don't know if someone knows nominatim (is a openstreetmap opensource product) but the db indexing process of European map (15.8 gb), in a CentOS 6.7 machine with 16gb ram tooks me 32 days...
Than another possible question should be: pg_restore is equal to psql -d -f? Wich is faster?
Thanks again
As #a_horse_with_no_name says, nobody will be able to give you exact answers for your environment. But this is the procedure I would use to get some estimates.
I have generally found that a compressed backup of my data is 1/10th or less the size of the live database. You can also usually deduct the on-disk size of the indexes from the backup size as well. Examine the size of things in-database to get a better idea. You can also try forming a subset of the database you have which is much smaller and compare the live size to the compressed backup; this may give you a ratio that should be in the ballpark. SQL files are gassy and compress well; the on-disk representation Postgres uses seems to be even gassier though. Price of performance probably.
The best way to estimate time is just to do some exploratory runs. In my experience this usually takes longer than you expect. I have a ~1 TB database that I'm fairly sure would take about a month to restore, but it's also aggressively indexed. I have several ~20 GB databases that backup/restore in about 15 minutes. So it's pretty variable, but indexes add time. If you can set up a similar server, you can try the backup-restore procedure and see how long it will take. I would recommend doing this anyway, just to build confidence and suss out any lingering issues before you pull the trigger.
I would also recommend you try out pg_dump's "custom format" (pg_dump -Fc) which makes compressed archives that are easy for pg_restore to use.

What affects DB2 restored database size?

I have database TESTDB with following details:
Database size: 3.2GB
Database Capacity: 302 GB
One of its tablespaces has its HWM too high due to an SMP extent, so it is not letting me reduce the high water mark.
My backup size is around 3.2 GB (As backups contains only used pages)
If I restore this database backup image via a redirected restore, what will be the newly restored database's size?
Will it be around 3.2 GB or around 302 GB?
The short answer is that RESTORE DATABASE will produce a target database that occupies about as much disk space as the source database did when it was backed up.
On its own, the size of a DB2 backup image is not a reliable indicator of how big the target database will be. For one thing, DB2 provides the option to compress the data being backed up, which can make the backup image significantly smaller than the DB2 object data it contains.
As you correctly point out, the backup image only contains non-empty extents (blocks of contiguous pages), but the RESTORE DATABASE command will recreate each tablespace container to its original size (including empty pages) unless you specify different container locations and sizes via the REDIRECT parameter.
The 302GB of capacity you're seeing is from GET_DBSIZE_INFO and similar utilities, and is quite often larger than the total storage the database currently occupies. This is because DB2's capacity calculation includes not only unused pages in DMS tablespaces, but also any free space on volumes or drives that are used by an SMS tablespace (most DB2 LUW databases contain at least one SMS tablespace).

do not undsrstand file structure of pgsql

Can someone provide a bit of clarification?
I understand that the /base folder show a data folder for each database. In PgAdmin, I have 13 databases listed under 1 server. In the /base folder, there are 14 folders. So that should be 1 per database and 1 for the the server equalling 14.
I do not know how do know what folder is for what database. However, only one has a lot of data. When I search for large files on my system, this displays:
16M: /var/lib/pgsql/9.2/data/base/18642/18652
13M: /var/lib/pgsql/9.2/data/base/18642/18751
1.0G: /var/lib/pgsql/9.2/data/base/21719/21804
12M: /var/lib/pgsql/9.2/data/base/21719/21806
15M: /var/lib/pgsql/9.2/data/base/21719/21750
20M: /var/lib/pgsql/9.2/data/base/21719/21837
118M: /var/lib/pgsql/9.2/data/base/21719/21834
Now, if this is (21719) actually the only running database used by staff, when I archive (pgdump) it, the size of the dump is approx 6 Gig. The size of the dump and the data listed above do not match.
Can someone shed some light on my confusion?
Thanks a bunch.
This was a result of trying to find out why I have almost 700 gig of drive space being used when the only stuff on it is postgresql and an occasional runaway vnc-error-log that eats up drive space (figured out how to solve that). However, I still have over 60% of my drive used, I cannot account for it, and found the data sizes in postgresql.
Thanks for any insight that can be provided on postgresql db data
I do not know how do know what folder is for what database
The folder name is the OID of the database, which you can get with the following SQL query, along with each db size according to the SQL engine:
select oid,datname,pg_database_size(datname) from pg_database;
If there are 13 databases and 14 folders, the additional folder is probably the pgsql_tmp directory used for temporary files. The concept of server of pgAdmin does not come into play in a specific server's data directory.
Also as said in the comments, the dump size may be greater than the disk size due compression. It can also be smaller since it doesn't contain any index data. On the whole, knowing the size on disk does not help much to predict the size of the SQL dump and vice versa.