Is it possible to restore a MongoDB database from a .bson file quicker than mongorestore? - mongodb

I have a very huge database from an old backup. It's about 500GB total and it's a .bson file. At the current rate of my harddrive and CPU, I am done in probably 10-20 hours. EDIT: About 9 hours.
I simply ran:
mongorestore -d database -c collection C:\very_large_backup.bson
Is it possible for MongoDB to simply access the .bson file directly, or is mongorestore the only option I have?
I plan on moving to a Microsoft SQL Server with this data (discarding the extra bits of information that might overlap). Maybe there's a faster way that way?

Related

Mongodump while writing

Is it safe to run mongodump against running server with many writes per second? Is it possible to get corrupted dump doing in this way?
From here:
Use --oplog to capture incoming write operations during the mongodump operation to ensure that the backups reflect a consistent data state.
Does it mean that no matter how many writes in database dump will be consistent?
If I ran mongodump --oplog at 1AM and it finished at 2AM then I run mongorestore --oplogReplay what state will I get?
From here:
However, the use of mongodump and mongorestore as a backup strategy can be problematic for sharded clusters and replica sets.
but why? I had replica set of 1 primary and 2 secondary. What the problem to run mongodump against one of secondary? It should same as primary (except replication lag difference).
The docs are quite clear about it:
--oplog
Creates a file named oplog.bson as part of the mongodump output. The oplog.bson file, located in the top level of the output directory, contains oplog entries that occur during the mongodump operation. This file provides an effective point-in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction with mongorestore --oplogReplay.
Without --oplog, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup.
--oplog has no effect when running mongodump against a mongos instance to dump the entire contents of a sharded cluster. However, you can use --oplog to dump individual shards.
Without --oplog you still get a valid dump, just a bit inconsistent - some of the writes done between 1 AM and 2 AM will be missing.
With --oplog you have the oplog file captured at 2 AM. The dump remains inconsistent, and replaying the oplog on restore fixes this issue.
The problems dumping the sharded clusters deserve a dedicated page in the docs. Essentially because of complexity to synchronise backups of all nodes:
To create backups of a sharded cluster, you will stop the cluster balancer, take a backup of the config database, and then take backups of each shard in the cluster using mongodump to capture the backup data. To capture a more exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the filesystem snapshots; otherwise the snapshot will only approximate a moment in time.
There are no problems to dump replica set.

mongorestore hangs while restoring fs.chunks

I am trying to upgrade from the mongodb sandbox option onto a shared cluster, and to keep my current data I have to do a mongodump and mongorestore to migrate the old data onto the new database.
This is what I put in the command line.
mongorestore -h url:host -d heroku_zc -u heroku_zc -p 470grupv030prq5uj0fm mongo-dump-dir/heroku_9r
It seems to all go fine and restores all the data entries, but while uploading the file chunks it hangs part way through. Sometimes 5% of the ay through sometimes 20% of the way through sometimes 50%.
As I say, when I look at the new database, all the rows are there correctly, and only the actual data files are missing.
This is what happens in the terminal, it doesn't give an error it just stops.
2017-02-09T15:45:20.509+0100 [#.......................] heroku_z25kbwmc.fs.chunks 15.8 MB/299.6 MB (5.3%)
2017-02-09T15:45:23.509+0100 [#.......................] heroku_z25kbwmc.fs.chunks 15.8 MB/299.6 MB (5.3%)
2017-02-09T15:45:26.510+0100 [#.......................] heroku_z25kbwmc.fs.chunks 15.8 MB/299.6 MB (5.3%)
Both db's are created from heroku as addons to my parse server.
EDIT: I also don't know if this is a problem, the local system database says 2.03GB. I don't understand how this can be, as the total database size is only 500mb

Transfer a MongoDB database over an unstable connection

I have a fairly small MongoDB instance (15GB) running on my local machine, but I need to push it to a remote server in order for my partner to work on it. The problem is twofold,
The server only has 30GB of free space
My local internet connection is very unstable
I tried copyDatabase to transfer it directly, but it would take approximately 2 straight days to finish, in which the connection is almost guaranteed to fail at some point. I have also tried both mongoexport and mongodump but both produce files that are ~40GB, which won't fit on the server, and that's ignoring the difficulties of transferring 40GB in the first place.
Is there another, more stable method that I am unaware of?
Since your mongodump output is much larger than your data, I'm assuming you are using MongoDB 3.0+ with the WiredTiger storage engine and your data is compressed but your mongodump output is not.
As at MongoDB 3.2, the mongodump and mongorestore tools now have support for compression (see: Archiving and Compression in MongoDB Tools). Compression is not used by default.
For your use case as described I'd suggest:
Use mongodump --gzip to create a dump directory with compressed backups of all of your collections.
Use rsync --partial SRC .... DEST or similar for a (resumable) file transfer over your unstable internet connection.
NOTE: There may be some directories you can tell rsync to ignore with --exclude; for example the local and test databases can probably be skipped. Alternatively, you may want to specify a database to backup with mongodump --gzip --db dbname.
Your partner can use a similar rsync commandline to transfer to their environment, and a command line like mongorestore --gzip /path/to/backup to populate their local MongoDB instance.
If you are going to transfer dumps on an ongoing basis, you will probably find rsync's --checksum option useful to include. Normally rsync transfers "updated" files based on a quick comparison of file size and modification time. A checksum involves more computation but would allow skipping collections that have identical data to previous backups (aside from the modification time).
If you need to sync data changes on ongoing basis, you also may be better moving your database to a cloud service (eg. a Database-as-a-Service provider like MongoDB Atlas or your own MongoDB instance).

Backing up mongo database of about 120 GB size

We have a mongo database of about 120 GB size. I have run mongodump using nohup and redirecting the logs to /dev/null about 3 days back, but the dump file is ~40GB in size now, and the dump is still running. Is this expected?
If yes, what is the approximate compression ratio for a mongo database? i.e. for a 120 GB database, how much is the backup file size going to be?
This would help me in estimating the time remaining for the dump to finish. I have no clue why it is taking up so much time, also, wanted to know if there is a faster/better way of backing up the mongo database (remote copy is not something i'm considering)?

PostgreSQL backup with smallest output files

We have a Postgresql database that is over 732 GB when backed as a file system backup. When we do a pg_dump we can get it down to 585 GB. If I combined the pg_dump with the PITR method will this give me the best backup with smallest backup data file size? My plan was to run the pg_start_backup, then the pg_dump, then the pg_stop_backup. I know the documentation states to run a file system backup but I want a smaller backup data set. I would then copy off WAL files and then backup them up at night.
To truly get the smallest file, you'll have to try compressing your pg_dump -Fc dump file with one of many compression tools and settings. Using gzip or xz with maximum possible compression would be a start. This will of course require an excellent CPU and lots of CPU time.