Transfer a MongoDB database over an unstable connection - mongodb

I have a fairly small MongoDB instance (15GB) running on my local machine, but I need to push it to a remote server in order for my partner to work on it. The problem is twofold,
The server only has 30GB of free space
My local internet connection is very unstable
I tried copyDatabase to transfer it directly, but it would take approximately 2 straight days to finish, in which the connection is almost guaranteed to fail at some point. I have also tried both mongoexport and mongodump but both produce files that are ~40GB, which won't fit on the server, and that's ignoring the difficulties of transferring 40GB in the first place.
Is there another, more stable method that I am unaware of?

Since your mongodump output is much larger than your data, I'm assuming you are using MongoDB 3.0+ with the WiredTiger storage engine and your data is compressed but your mongodump output is not.
As at MongoDB 3.2, the mongodump and mongorestore tools now have support for compression (see: Archiving and Compression in MongoDB Tools). Compression is not used by default.
For your use case as described I'd suggest:
Use mongodump --gzip to create a dump directory with compressed backups of all of your collections.
Use rsync --partial SRC .... DEST or similar for a (resumable) file transfer over your unstable internet connection.
NOTE: There may be some directories you can tell rsync to ignore with --exclude; for example the local and test databases can probably be skipped. Alternatively, you may want to specify a database to backup with mongodump --gzip --db dbname.
Your partner can use a similar rsync commandline to transfer to their environment, and a command line like mongorestore --gzip /path/to/backup to populate their local MongoDB instance.
If you are going to transfer dumps on an ongoing basis, you will probably find rsync's --checksum option useful to include. Normally rsync transfers "updated" files based on a quick comparison of file size and modification time. A checksum involves more computation but would allow skipping collections that have identical data to previous backups (aside from the modification time).
If you need to sync data changes on ongoing basis, you also may be better moving your database to a cloud service (eg. a Database-as-a-Service provider like MongoDB Atlas or your own MongoDB instance).

Related

mongodump a db on archlinux

I try to backup my local mongodb. I use archlinux and installed mongodb-tools in order to use mongodump.
I tried :
mongodump --host localhost --port 27017
mongodump --host localhost --port 27017 --db mydb
Every time I have the same response :
Failed: error connecting to db server: no recheable server
I'm however able to connect to the database using
mongo --host localhost --port 27017
or just
mongo
My mongodb version is 3.0.7.
I did not set any username/password
How can I properly use mongodump to backup my local database ?
This appears to be a bug in the mongodump tool, see this JIRA ticket for more detail. You should be able to use mongodump if you explicitely specify the IP address:
mongo --host 127.0.0.1 --port 27017
"Properly" is a highly subjective term in this context. To give you an impression:
mongodump and mongorestore aren't incredibly fast. In sharded environments, they can take days (note the plural!) for reasonably sized databases. Which in turn means that in a worst case scenario, you can loose days worth of data. Furthermore, during backup, the data may change quite a bit, so the state of your backup may be inconsistent. It is better to think of mongodump as "mongodumb" in this aspect.
Your application has to be able to deal with the lack of consistency gracefully, which can be quite a pain in the neck to develop. Furthermore, long restore times cost money and (sometimes even more important) reputation.
I personally use mongodump only in two scenarios: for backing up a sharded clusters metadata (which is only a couple of MB in size) and for (relatively) cheap data, which is easy to reobtain by other means.
For doing a MongoDB backup properly, imho, there are only three choices:
MongoDB Inc's cloud backup,
MongoDB Ops Manager
Filesystem snapshots
Cloud backup
It has several advantages. You can do point in time recoveries, guaranteeing to have the database in a consistent state as it was at the chosen point in time. It is extremely easy to set up and maintain.
However, you guessed it, it comes with a price tag based on data volatility and overall size, which, imho, is reasonable for small to medium sized data with low to moderate volatility.
MongoDB Ops Manager
Being an on premise version of the cloud backup (It has quite some other features out of the scope of this answer, too), it offers the same benefits. It is more suited for upper scale medium size to large databases or for databases with disproportionate high volatility (as indicated by a high "OplogGb/h" value in comparison to the data size).
Filesystem snapshots
Well, it is sort of cheap. Just make a snapshot, mount it, copy it to some backup space, unmount and destroy the snapshot, optionally compress the copied data and you are done. There are some caveats, though.
Synchronization
To get a backup of consistent data, you need to synchronize your snapshots on a sharded cluster. Especially since the sharded clusters metadata needs to be consistent with the backups, too, if you want a halfway fast recovery. That can become a bit tricky. To make sure your data is consistent, you'd need to disconnect all mongos, stop the balancer, fsync the data to the files on each node, make the snapshot, start the balancer again and restart all mongos. To have this properly synced, you need a maintenance window of some minutes every time you make a backup.
Note that for a simple replica set, synchronization is not required and backups work flawlessly.
Overprovisioning
Filesystem snapshots work with what is called "Copy-On-Write" (CoW). A bit simplified: When you make a snapshot and a file is modified, it is instead copied and the changes are applied to the newly copied file. The snapshot however, points to the old file. It is obvious that in order to be able to make a snapshot, as per CoW, you need some additional disk space so that MongoDB can work while you deal with the snapshot. Let us assume a worst case scenario in which all the data is changed – you'd need to overprovision your partition for MongoDB by at least 100% of your data size or, to put it in other terms, your critical disk utilization would be 50% minus some threshold for the time you need to scale up. Of course, this is a bit exaggerated, but you get the picture.
Conclusion
IMHO, proper backups should be done this way:
mongorestore for cheap data and little concern for consistency
Filesystem snapshots for replica sets
Cloud Backups for small to medium sized sharded databases with low to moderate volatility
Ops Manager Backups for large databases or small to medium ones with disproportionate high volatility
As said: "properly" is a highly subjective term when it comes to backups. ;)

How to import data into MongoDB from another MongoDB?

I've installed new MongoDB server and now I want to import data from the old one. My MongoDB stores monitoring data and it's a bit problematic to export the data from old database (it's over 10Gb), so I though it might be possible to import directly from DB, but haven't found how to do that with mongoimport.
The export/import would be the fastest option.
But if you really want to bypass it you can use the new server as a replica of the old one, and wait for full replication.
It takes longer but it's an easy way to set up a full copy without impact on the first one.
Follow this:
http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
And then, once it's done, change configuration again.
It's easiest than it seems, but I recommend you to do a dry run with a sample database before doing it...
Note that another benefit is that the new replica will be probably smaller in size than the initial database, because MongoDb is not very good at freeing space of deleted members
mongoimport/mongoexport is per collection operating, so it's not proper for this kind of operation.
Instead to use mongodump/mongorestore.
If the old MongoDB instance can be shutdown to do this task, you can shut down it then copy all data files to the new server as its own data. And run the new instance.
Also db.cloneDatabase() can handle it to copy data directly from old instance to new one. It should be slower against copying data files directly.
You can use mongodump and pipe directly to the new database with mongorestore like:
mongodump --archive --db=test | mongorestore --archive --nsFrom='test.*' --nsTo='examples.*'
add --host --port and --username to mongorestore to connect to the remote db.
db.cloneDatabase() has been deprecated for a while.
You can use the copydb command discribed here.
Copies a database from a remote host to the current host or copies a database to another database within the current host.
copydb runs on the destination mongod instance, i.e. the host receiving the copied data.

MongoDB 2.2: why didn't replication catch up a collection following a dump/restore?

We have a three-server replicaset running MongoDB 2.2 on Ubuntu 10.04, and recently had to upgrade the hard drive for each server where one particular database resides. This database contains log information for web service requests, where they write to collections in hourly buckets using the current timestamp to determine the name, e.g. log_yyyymmddhh.
I performed this process:
backup the database on the primary server with mongodump --db log_db
take a secondary server offline, replace the disk
bring the secondary server up in standalone mode (i.e. comment out the replSet entry
in /etc/mongodb.conf before starting the service)
restore the database on the secondary server with mongorestore --drop --db log_db
add the secondary server back into the replicaset and bring it online,
letting replication catch up the hourly buckets that were updated/created
while it had been offline
Everything seemed to go as expected, except that the collection which was the current bucket at the time of the backup was not brought up to date by replication. I had to manually copy that collection over by hand to get it up to date. Note that collections which were created after the backup were synched just fine.
What did I miss in this process that caused MongoDB not to get things back in synch for that one collection? I assume something got out of whack with regard to the oplog?
Edit 1:
The oplog on the primary showed that its earliest timestamp went back a couple of days, so there should have been plenty of space to maintain transactions for a few hours (which was the time the secondary was offline).
Edit 2:
Our MongoDB installation uses two disk partitions: /dev/sda1 and /dev/sdb1. The primary MongoDB directory /var/lib/mongodb/ is on /dev/sda1, and holds several databases, while the log database resides by itself on /dev/sdb1. There's a sym link /var/lib/mongodb/log_db which points to a directory on /dev/sdb1. Since the log db was getting full, we needed to upgrade the disk for /dev/sdb1.
You should be using mongodump with the --oplog option. Running a full database backup with mongodump on a replicaset that is updating collections at the same time may not leave you with a consistent backup. This becomes worse with larger databases, more collections and more frequent updates/inserts/deletes.
From the documentation for your version (2.2) of MongoDB (it's the same for 2.6 but just to be as accurate as possible):
--oplog
Use this option to ensure that mongodump creates a dump of the
database that includes an oplog, to create a point-in-time snapshot of
the state of a mongod instance. To restore to a specific point-in-time
backup, use the output created with this option in conjunction with
mongorestore --oplogReplay.
Without --oplog, if there are write operations during the dump
operation, the dump will not reflect a single moment in time. Changes
made to the database during the update process can affect the output
of the backup.
http://docs.mongodb.org/v2.2/reference/mongodump/
This is not covered well in most MongoDB tutorials around backups and restores. Generally you are better off if you can perform a live snapshot of the storage volume your database resides on (assuming your storage solution has a live snapshot ability compatible with MongoDB). Failing that, your next best bet is taking a secondary offline and then performing a snapshot or backup of the database files. Mongodump on a live database is increasingly a less optimal solution for larger databases due to performance issues.
I'd definitely take a look at the MongoDB overview of backup options: http://docs.mongodb.org/manual/core/backups/
I would guess this has to do with the oplog not being long enough, although it seems like you checked that and it looked reasonably big.
Still, when adding new members to a replica set you shouldn't be snapshotting and restoring them. It's better to simply add a new member and let replication happen by itself. This is described in the Mongo docs and is the process I've always followed.

Mongodump with --oplog for hot backup

I'm looking for the right way to do a Mongodb backup on a replica set (non-sharded).
By reading the Mongodb documentation, I understand that a "mongodump --oplog" should be enough, even on a replica (slave) server.
From the mongodb / mongodump documentation :
--oplog
Use this option to ensure that mongodump creates a dump of the database that includes an oplog, to create a point-in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction with mongorestore --oplogReplay.
Without --oplog, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup
I'm still having a very hard time to understand how Mongodb can backup and keep writing on the database and make a consistent backup, even with --oplog.
Should I lock my collections first or is it safe to run "mongodump --oplog ?
Is there anything else I should know about?
Thanks.
The following document explains how mongodump with –oplog option works to create a point in time backup.
http://docs.mongodb.org/manual/tutorial/backup-databases-with-binary-database-dumps/
However, using mongodump and mongorestore to back up and restore MongodDB can be slow. If file system snapshot is an option, you may want to consider using snapshot for MongoDB backup. Information from the following link details two snapshot options for performing hot backup of MongoDB.
http://docs.mongodb.org/manual/tutorial/backup-databases-with-filesystem-snapshots/
You can also look into MongoDB backup service.
http://www.10gen.com/products/mongodb-backup-service

Does mongodump lock the database?

I'm in the middle of setting up a backup strategy for mongo, was just curious to know if mongodump locks the database before performing the database dump?
I found this on mongo's google group:
Mongodump does a simple query on the live system and does not require
a shutdown. Like all queries it requires a read lock while running but
doesn't not block any more than normal queries.
If you have a replica set you will probably want to use the --oplog
flag to do your backups.
See the docs for more information
http://docs.mongodb.org/manual/administration/backups/
Additionally I found this previously asked question
MongoDB: mongodump/restore vs. backup up files directly
Excerpt from above question
Locking and copying files is only an option when you don't have heavy
write load.
mongodump can be run against live server. It will create some
additional load, so don't do it on peak hours. Also, it is advised to
do it on a secondary node (if you don't use replica sets, you should).
There are some complications when you have a DB so large that no
single machine can hold it. See this document.
Also, if you have replica set, you take down one of secondaries and copy its files directly. See http://www.mongodb.org/display/DOCS/Backups:
Mongdump does not lock the db. It means other read and write operations will continue normally.
Actually, both mongodump and mongorestore are non-blocking. So if you want to mongodump mongorestore a db then its your responsibility to make sure that it is really a desired snapshot backup/restore. To do this, you must stop all other write operations while taking/restoring backups with mongodump/mongorestore. If you are running a sharded environment then its recommended you stop the balancer also.