MongoDB Creating Backups and Point In Time Restores - mongodb

I'm a SQL Server DBA trying to use MongoDB for some particular cases. What I have at the moment is 3 node replica set with 2 data bearing nodes and 1 arbiter. The thing that i'm struggling at the moment is finding a clear answer on ho to create Backups that will allow me a point in time restores, similar to what you have in MS SQL with FULL and LOG backups. How can i do that?

Mongodb provides different methods to Backup and restore
1. Back Up with Atlas(cloud based AWS services)
2. Back Up with MongoDB Cloud Manager or Ops Manager(Enterprise edition only. Supports backing up and restoring MongoDB replica sets and sharded clusters from a graphical user interface.)
3. Back Up with file system snapshot on OS( on Linux, the Logical Volume Manager (LVM) can create snapshots. Similarly, Amazon’s EBS storage system for EC2 supports snapshots)
To get a correct snapshot of a running mongod process, you must have journaling enabled. Without journaling enabled, there is no guarantee that the snapshot will be consistent or valid.
To create a snapshot with LVM, issue a command as root in the following format:
lvcreate --size 100M --snapshot --name mdb-snap01 /dev/vg0/mongodb
This command creates an LVM snapshot (with the --snapshot option) named mdb-snap01 of the mongodb volume in the vg0 volume group.
This example creates a snapshot named mdb-snap01 located at /dev/vg0/mdb-snap01. The location and paths to your systems volume groups and devices may vary slightly depending on your operating system’s LVM configuration.
To restore a snapshot , issue the following sequence of commands:
umount /dev/vg0/mdb-snap01
lvcreate --size 1G --name mdb-new vg0
dd if=/dev/vg0/mdb-snap01 of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
for more details https://docs.mongodb.com/manual/tutorial/backup-with-filesystem-snapshots/#back-up-and-restore-using-lvm-on-linux
4.Back Up with mongodump (Terminal based Mongodb tools)
mongodump and mongorestore are simple and efficient tools for backing up and restoring small MongoDB deployments
mongodump and mongorestore operate against a running mongod process
If you dont specify any database it captures all the databases and copies into seperate folder along with indexes(json format) for every database
By default, mongodump does not backup the local database(which contains Replicaset configuration & oplog.rs collection).
For replica sets, mongodump provides the --oplog option to include in its output oplog entries that occur during the mongodump operation. This allows the corresponding mongorestore operation to replay the captured oplog. To restore a backup created with --oplog, use mongorestore with the --oplogReplay option.
Mongorestore captures only database files. indexes must be rebuild after restoring the data.
https://docs.mongodb.com/manual/tutorial/backup-and-restore-tools/#
commands:
mongodump --out /data/backup/ (It backups all the databases and indexes)
mongodump --collection myCollection --db test (specified database & collection)
mongorestore --port

Related

Mongodump while writing

Is it safe to run mongodump against running server with many writes per second? Is it possible to get corrupted dump doing in this way?
From here:
Use --oplog to capture incoming write operations during the mongodump operation to ensure that the backups reflect a consistent data state.
Does it mean that no matter how many writes in database dump will be consistent?
If I ran mongodump --oplog at 1AM and it finished at 2AM then I run mongorestore --oplogReplay what state will I get?
From here:
However, the use of mongodump and mongorestore as a backup strategy can be problematic for sharded clusters and replica sets.
but why? I had replica set of 1 primary and 2 secondary. What the problem to run mongodump against one of secondary? It should same as primary (except replication lag difference).
The docs are quite clear about it:
--oplog
Creates a file named oplog.bson as part of the mongodump output. The oplog.bson file, located in the top level of the output directory, contains oplog entries that occur during the mongodump operation. This file provides an effective point-in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction with mongorestore --oplogReplay.
Without --oplog, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup.
--oplog has no effect when running mongodump against a mongos instance to dump the entire contents of a sharded cluster. However, you can use --oplog to dump individual shards.
Without --oplog you still get a valid dump, just a bit inconsistent - some of the writes done between 1 AM and 2 AM will be missing.
With --oplog you have the oplog file captured at 2 AM. The dump remains inconsistent, and replaying the oplog on restore fixes this issue.
The problems dumping the sharded clusters deserve a dedicated page in the docs. Essentially because of complexity to synchronise backups of all nodes:
To create backups of a sharded cluster, you will stop the cluster balancer, take a backup of the config database, and then take backups of each shard in the cluster using mongodump to capture the backup data. To capture a more exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the filesystem snapshots; otherwise the snapshot will only approximate a moment in time.
There are no problems to dump replica set.

AWS EC2 taking snapshot or LVM of only folder

I am working on AWS ec2 instance. I have configured MongoDb on it.
1. I have 1TB storage space for mongo data to be store.
2. Other 50 GB for application to run.
Since cost of taking snapshot of everything is huge, Can I take snapshot of only folder where my mongodb data is stored.
e.g my folder for mongodb storage is /home/ubuntu/mongodb
So I want snap shot or LVM of only mongodb folder instead of taking it for 1 TB instance on AWS.
you can take dumps of your database.
mongodump --dbpath /data/db/ --out /data/backup/
or
mongodump --host mongodb.example.net --port 27017
and then store it on s3. You can also run cron job for this to take backup of your data at desired time frequency.

How does restoring a db backup affect the oplog?

I have a standalone MongoDb instance. It has many databases in it. I am though only concerned with backingup/restoring one of those databases, lets call it DbOne.
Using the instructions in (http://www.mongodb.com/blog/post/dont-let-your-standalone-mongodb-server-stand-alone), I can create an oplog on this standalone server.
Using the tool Tayra, I can record/store the oplog entries. Being able to create incremental backups is the main reason I enabled the oplog on my standalone instance.
I intend to take full backups once a day, using the command
mongodump --db DbOne --oplog
From my understanding, this backup will contain a point-in-time snapshot of my db.
Assuming I want to discard all updates since this backup, I delete all backedup oplog and I restore only this full backup, using the command
mongorestore --drop --db DbOne --oplogReplay
At this point, do I need to do something to the oplog collection in the local db? Will mongodb automatically drop the entries pertaining to this db from the oplog? Because if not, then wouldn't Tayra end up finding those oplog entries and backup them all over again?
Tbh, I haven't tried this yet on my machine. I am hoping someone can point to a document that lists supported/expected behaviour in this scenario.
I had experimented with a MongoDb server, setup as a replica set with only 1 member, shortly after asking the question. I however forgot to answer this question.
I took a backup using mongodump --db DbOne --oplog. I executed some additional updates. Keeping the mongodb server as is, ie still running under replication, if I run the mongorestore command, then it would create thousands of oplog entries, one for each document of each collection in the db. It was a big mess.
The alternative was to shutdown MongoDb and start it as a standalone instance (ie not running as a replica set). Now if I were to restore using mongorestore the oplog wouldnt be touched. This was bad, because the oplog now contained entries that were not present in the actual collections.
I wanted a mechanism that would restore both my database as well as oplog info in the local database to the time the backup took place. mongodump doesnt backup the local database.
Eventually I had stop using mongodump and instead switched to backing up the entire data directory (after stopping mongodb). Once we switch to AWS, I could use the EBS Snapshot feature to perform the same.
I understand you want a link to the docs about mongorestore:
http://docs.mongodb.org/manual/reference/program/mongorestore/
From what I understand you want to make a point in time backup and then restore that backup. The commands you listed above will do that:
1)mongodump --db DbOne --oplog
2)mongorestore --drop --db DbOne --oplogReplay
However, please note that the "point in time" that the backup was effectively taken at it is when the dump ends, not the moment the command started. This is a fine detail that might not matter to you, but is included for completeness.
Let me know if there is anything else.
Best,
Charlie

Mongodump with --oplog for hot backup

I'm looking for the right way to do a Mongodb backup on a replica set (non-sharded).
By reading the Mongodb documentation, I understand that a "mongodump --oplog" should be enough, even on a replica (slave) server.
From the mongodb / mongodump documentation :
--oplog
Use this option to ensure that mongodump creates a dump of the database that includes an oplog, to create a point-in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction with mongorestore --oplogReplay.
Without --oplog, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup
I'm still having a very hard time to understand how Mongodb can backup and keep writing on the database and make a consistent backup, even with --oplog.
Should I lock my collections first or is it safe to run "mongodump --oplog ?
Is there anything else I should know about?
Thanks.
The following document explains how mongodump with –oplog option works to create a point in time backup.
http://docs.mongodb.org/manual/tutorial/backup-databases-with-binary-database-dumps/
However, using mongodump and mongorestore to back up and restore MongodDB can be slow. If file system snapshot is an option, you may want to consider using snapshot for MongoDB backup. Information from the following link details two snapshot options for performing hot backup of MongoDB.
http://docs.mongodb.org/manual/tutorial/backup-databases-with-filesystem-snapshots/
You can also look into MongoDB backup service.
http://www.10gen.com/products/mongodb-backup-service

How to get a consistent MongoDB backup for a single node setup

I'm using MongoDB in a pretty simple setup and need a consistent backup strategy. I found out the hard way that wrapping a mongodump in a lock/unlock is a bad idea. Then I read that the --oplog option should be able to provide consistency without lock/unlock. However, when I tried that, it said that I could only use the --oplog option on a "full dump." I've poked around the docs and lots of articles but it still seems unclear on how to dump a mongo database from a single point in time.
For now I'm just going with a normal dump but I'm assuming that if there are writes during the dump it would make the backup not from a single point in time, correct?
mongodump -h $MONGO_HOST:$MONGO_PORT -d $MONGO_DATABASE -o ./${EXPORT_FILE} -u backup -p password --authenticationDatabase admin
In production environment, MongoDB is typically deployed as replica set(s) to ensure redundancy and high availability. There are a few options available for point in time backup if you are running a standalone mongod instance.
One option as you have mentioned is to do a mongodump with –oplog option. However, this option is only available if you are running a replica set. You can convert a standalone mongod instance to a single node replica set easily without adding any new replica set members. Please check the following document for details.
http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
This way, if there are writes while mongodump is running, they will be part of your backup. Please see Point in Time Operation Using Oplogs section from the following link.
http://docs.mongodb.org/manual/tutorial/backup-databases-with-binary-database-dumps/#point-in-time-operation-using-oplogs
Be aware that using mongodump and mongorestore to back up and restore MongodDB can be slow.
File system snapshot is another option. Information from the following link details two snapshot options for performing hot backup of MongoDB.
http://docs.mongodb.org/manual/tutorial/backup-databases-with-filesystem-snapshots/
You can also look into MongoDB backup service.
http://www.10gen.com/products/mongodb-backup-service
In addition, mongodump with oplog options does not work with single db/collection at this moment. There are plans to implement the feature. You can follow the ticket and vote for the feature under the More Actions button.
https://jira.mongodb.org/browse/SERVER-4273