Importing data from osm to postgis - openstreetmap

I have worked with osm2pgsql to import data from osm to postgis.
What other options do I have in order to do this?
I mean what other tools exist and which one is better?
And also I have problem in importing large amount of data to my database. Do I need extra big memoty capacity to do this? Like 64 GB RAM ?

Try ogr2ogr with the OSM driver. Apparently, you can even use hstore for "other_tags". E.g.:
ogr2ogr -f PostgreSQL "PG:dbname=osm" test.pbf \
-lco COLUMN_TYPES=other_tags=hstore \
--config OSM_MAX_TMPFILE_SIZE 1024
The configure option limits the internal processing in-memory SQLite DB size threshold to 10 GB (1024 MB), which you can adjust to any number of MB (default is 100 MB). Also, you will have five resulting layers—one for each geometry type. Give it a whirl.

Related

Recompress PostgreSQL custom format dump file -Z 6 to -Z 9

Could anybody explain to me how to recompress pgdump file in a custom format without restoring it?
My workaround is the main database with replication, 5 nodes in total, with approx size about 300GB, I'm using one for slave replicas for creating dump file with -Z 9 it took about 4 hours, with -Z 6 it took about 2 hours. The problem in using -Z 9 is that it took too much time and replica became way too behind master node, that is why I'm using -Z 6 compression. I can't add one more node just for making the dump files.
Restore the database on dump storage and create the new dump file better compression is no option due to low resources on the dump storage to restore the database.
I'm already tried to compress the pgdump file with bz2 or 7z but I got difference only 1 GB on the total size of pgdump file about 40GB. If I use -Z 9 compression I got pgdump file size about 32 GB.
Is there any option to use pg_restore and pg_dump and recompress pgdump file with -Z 6 compression to -Z 9 compression?
I don't think there is a simple way to rewrite a PostgreSQL dump file with higher compression, but what about creating the dump with -Z 0 and later compressing it with gzip or something similar?
That would make pg_dump as fast as possible, and you could still have compression.

How to limit pg_dump's memory usage?

I have a ~140 GB postgreDB on Heroku / AWS. I want to create a dump of this on a windows Azure - Windows server 2012 R2 virtual machine, as i need to move the DB into Azure environment.
The DB has a couple of smaller tables, but mainly consists of a single table taking ~130 GB, including indexes. It has ~500 million rows.
I've tried to use pg_dump for this, with:
./pg_dump -Fc --no-acl --no-owner --host * --port 5432 -U * -d * > F:/051418.dump
I've tried on various Azure virtual machine sizes, including some fairly large with (D12_V2) 28GB ram, 4 VCPUs 12000 MAXIOPs, etc. But in all cases the pg_dump stalls completely due to memory swapping.
On above machine it's currently using all available memory and has used the past 12 hrs swapping memory on the disk. I dont expect it to complete, due to the swapping.
From other posts i've understood it could be an issue with the network speed, beeing much faster than the disk IO speed, causing pg_dump to suck up all available memory and more, so i've tried using the azure machine with most IOPs. This hasnt helped.
So is there another way i can force pg_dump to cap it's memory usage, or wait on pulling more data until it has written to disk and clear memory ?
Looking forward to your help!
Krgds.
Christian

Backing up mongo database of about 120 GB size

We have a mongo database of about 120 GB size. I have run mongodump using nohup and redirecting the logs to /dev/null about 3 days back, but the dump file is ~40GB in size now, and the dump is still running. Is this expected?
If yes, what is the approximate compression ratio for a mongo database? i.e. for a 120 GB database, how much is the backup file size going to be?
This would help me in estimating the time remaining for the dump to finish. I have no clue why it is taking up so much time, also, wanted to know if there is a faster/better way of backing up the mongo database (remote copy is not something i'm considering)?

A faster way to copy a postgresql database (or the best way)

I did a pg_dump of a database and am now trying to install the resulting .sql file on to another server.
I'm using the following command.
psql -f databasedump.sql
I initiated the database install earlier today and now 7 hours later the database is still being populated. I don't know if this his how long it is supposed to take, but I continue to monitor it, so far I've seen over 12 millon inserts and counting. I suspect there's a faster way to do this.
Create your dumps with
pg_dump -Fc -Z 9 --file=file.dump myDb
Fc
Output a custom archive suitable for input into pg_restore. This is the most flexible format in that it allows reordering of loading data as well as object definitions. This format is also compressed by default.
Z 9: --compress=0..9
Specify the compression level to use. Zero means no compression. For the custom archive format, this specifies compression of individual table-data segments, and the default is to compress at a moderate level. For plain text output, setting a nonzero compression level causes the entire output file to be compressed, as though it had been fed through gzip; but the default is not to compress. The tar archive format currently does not support compression at all.
and restore it with
pg_restore -Fc -j 8 file.dump
-j: --jobs=number-of-jobs
Run the most time-consuming parts of pg_restore — those which load data, create indexes, or create constraints — using multiple concurrent jobs. This option can dramatically reduce the time to restore a large database to a server running on a multiprocessor machine.
Each job is one process or one thread, depending on the operating system, and uses a separate connection to the server.
The optimal value for this option depends on the hardware setup of the server, of the client, and of the network. Factors include the number of CPU cores and the disk setup. A good place to start is the number of CPU cores on the server, but values larger than that can also lead to faster restore times in many cases. Of course, values that are too high will lead to decreased performance because of thrashing.
Only the custom and directory archive formats are supported with this option. The input must be a regular file or directory (not, for example, a pipe). This option is ignored when emitting a script rather than connecting directly to a database server. Also, multiple jobs cannot be used together with the option --single-transaction.
Links:
pg_dump
pg_restore
Improve pg dump&restore
PG_DUMP | always use format directory with -j option
time pg_dump -j 8 -Fd -f /tmp/newout.dir fsdcm_external
PG_RESTORE | always use tuning for postgres.conf with format directory With -j option
work_mem = 32MB
shared_buffers = 4GB
maintenance_work_mem = 2GB
full_page_writes = off
autovacuum = off
wal_buffers = -1
time pg_restore -j 8 --format=d -C -d postgres /tmp/newout.dir/`
For more info
https://gitlab.com/yanar/Tuning/wikis/improve-pg-dump&restore
Why are you producing a raw .sql dump? The opening description of pg_dump recommends the "custom" format -Fc.
Then you can use pg_restore which will restore your data (or selected parts of it). There is a "number of jobs" option -j which can use multiple cores (assuming your disks aren't already the limiting factor). In most cases, on a modern machine you can expect at least some gains from this.
Now you say "I don't know how long this is supposed to take". Well, until you've done a few restores you won't know. Do monitor what your system is doing and whether you are limited by cpu or disk I/O.
Finally, the configuration settings you want for restoring a database are not those you want to run it. A couple of useful starters:
Increase maintenance_work_mem so you can build indexes in larger chunks
Turn off fsync during the restore. If your machine crashes, you'll start from scratch again anyway.
Do remember to reset them after the restore though.
The usage of pg_dump is generally recommended to be paired with pg_restore, instead of psql. This method can be split among cores to speed up the loading process by passing the --jobs flag as such:
$ pg_dump -Fc db > db.Fc.dump
$ pg_restore -d db --jobs=8 db.Fc.dump
Postgres themselves have a guide on bulk loading of data.
I also would recommend heavily tuning your postgresql.conf configuration file and set appropriately high values for the maintenance_work_mem and checkpoint_segments values; higher values on these may dramatically increase your write performance.

PostgreSQL backup with smallest output files

We have a Postgresql database that is over 732 GB when backed as a file system backup. When we do a pg_dump we can get it down to 585 GB. If I combined the pg_dump with the PITR method will this give me the best backup with smallest backup data file size? My plan was to run the pg_start_backup, then the pg_dump, then the pg_stop_backup. I know the documentation states to run a file system backup but I want a smaller backup data set. I would then copy off WAL files and then backup them up at night.
To truly get the smallest file, you'll have to try compressing your pg_dump -Fc dump file with one of many compression tools and settings. Using gzip or xz with maximum possible compression would be a start. This will of course require an excellent CPU and lots of CPU time.