How can I backup a MongoDB GridFS database the easiest way? - mongodb

Like the title says, I have a MongoDB GridFS database with a whole range of file types (e.g., text, pdf, xls), and I want to backup this database the easiest way.
Replication is not an option. Preferably I'd like to do it the usual database way of dumping the database to file and then backup that file (which could be used to restore the entire database 100% later on if needed). Can that be done with mongodump? I also want the backup to be incremental. Will that be a problem with GridFS and mongodump?
Most importantly, is that the best way of doing it? I am not that familiar with MongoDB, will mongodump work as well as mysqldump does with MySQL? Whats the best practice for MongoDB GridFS and incremental backups?
I am running Linux if that makes any difference.

GridFS stores files in two collections: fs.files and fs.chunks.
More information on this may be found in the GridFS Specification document:
http://www.mongodb.org/display/DOCS/GridFS+Specification
Both collections may be backed up using mongodump, the same as any other collection. The documentation on mongodump may be found here:
http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-mongodump
From a terminal, this would look something like the following:
For this demonstration, my db name is "gridFS":
First, mongodump is used to back the fs.files and fs.chunks collections to a folder on my desktop:
$ bin/mongodump --db gridFS --collection fs.chunks --out /Desktop
connected to: 127.0.0.1
DATABASE: gridFS to /Desktop/gridFS
gridFS.fs.chunks to /Desktop/gridFS/fs.chunks.bson
3 objects
$ bin/mongodump --db gridFS --collection fs.files --out /Desktop
connected to: 127.0.0.1
DATABASE: gridFS to /Desktop/gridFS
gridFS.fs.files to /Users/mbastien/Desktop/gridfs/gridFS/fs.files.bson
3 objects
Now, mongorestore is used to pull the backed-up collections into a new (for the purpose of demonstration) database called "gridFScopy"
$ bin/mongorestore --db gridFScopy --collection fs.chunks /Desktop/gridFS/fs.chunks.bson
connected to: 127.0.0.1
Thu Jan 19 12:38:43 /Desktop/gridFS/fs.chunks.bson
Thu Jan 19 12:38:43 going into namespace [gridFScopy.fs.chunks]
3 objects found
$ bin/mongorestore --db gridFScopy --collection fs.files /Desktop/gridFS/fs.files.bson
connected to: 127.0.0.1
Thu Jan 19 12:39:37 /Desktop/gridFS/fs.files.bson
Thu Jan 19 12:39:37 going into namespace [gridFScopy.fs.files]
3 objects found
Now the Mongo shell is started, so that the restore can be verified:
$ bin/mongo
MongoDB shell version: 2.0.2
connecting to: test
> use gridFScopy
switched to db gridFScopy
> show collections
fs.chunks
fs.files
system.indexes
>
The collections fs.chunks and fs.files have been successfully restored to the new DB.
You can write a script to perform mongodump on your fs.files and fs.chunks collections periodically.
As for incremental backups, they are not really supported by MongoDB. A Google search for "mongodb incremental backup" reveals a good mongodb-user Google Groups discussion on the subject:
http://groups.google.com/group/mongodb-user/browse_thread/thread/6b886794a9bf170f
For continuous back-ups, many users use a replica set. (Realizing that in your original question, you stated that this is not an option. This is included for other members of the Community who may be reading this response.) A member of a replica set can be hidden to ensure that it will never become Primary and will never be read from. More information on this may be found in the "Member Options" section of the Replica Set Configuration documentation.
http://www.mongodb.org/display/DOCS/Replica+Set+Configuration#ReplicaSetConfiguration-Memberoptions

Related

Faster way to remove all entries from mongodb collection by dropping collection and recreating schema

When I want to remove all objects from my mongoDB collection comments I do this with this command:
mongo $MONGODB_URI --eval 'db.comments.deleteMany({});'
However, this is super slow when there are millions of records inside the collection.
In a relational db like Postgres I'd simply copy the structure of the collection, create a comments2 collection, drop the comments collection, and rename comments2 to comments.
Is this possible to do in MongoDB as well?
Or are there any other tricks to speed up the progress?
Thanks, the answers inspired my own solution. I forgot that MongoDB doesn't have a schema like a relationalDB.
So what I did is this:
1. dump an empty collection + the indexes of the collection
mongodump --host=127.0.0.1 --port=7001 --db=coral --collection=comments --query='{"id": "doesntexist"}' --out=./dump
This will create a folder ./dump with the contents comments.bson (empty) and comments.metadata.json
2. Drop the comments collection
mongo mongodb://127.0.0.1:7001/coral --eval 'db.comments.drop();'
3. Import new data new_comments.json (different from comments.bson)
mongoimport --uri=mongodb://127.0.0.1:7001/coral --file=new_comments.json --collection comments --numInsertionWorkers 12
This is way faster than first adding the indexes, and then importing.
4. Add indexes back
mongorestore --uri=mongodb://127.0.0.1:7001/coral --dir dump/coral --nsInclude coral.comments --numInsertionWorkersPerCollection 12
Note that --numInsertionWorkers speeds up to process by dividing the work over 12 cpus.
How many cpus do you have can be found on OSx with:
sysctl -n hw.ncpu
db.cities.aggregate([{ $match: {} }, { $out: "collection2" }]) in case you can login to the mongo prompt and simply drop the previous collection.
Otherwise, the approach you have posted is the one.
mongoexport.exe /host: /port: /db:test /collection:collection1 /out:collection1.json
mongoimport.exe /host: /port: /db:test /collection:collection2 /file:collection1.json
Thanks,
Neha
For mongodb version >=4.0 you can do this via db.comments.renameCollection("comments2") ,but it is kind of resource intensive operation and for bigger collections better you do mongodump/mongorestore. So the best action steps are:
mongodump -d x -c comments -out dump.bson
>use x
>db.comments.drop()
mongorestore -d x -c comments2 dump.bson
Plese, note deleteMany({}) is even more resource intensive operation since it will create oplog single entry for every document you delete and propagate to all replicaSet members.

MongoDB migration

Hello I have an ubuntu 14.04 server that is running mongodb 2.4.14. I need to move the mongo instance to a new server. I have installed mongo 3.4.2 on the new server and need to move the databases over. I am pretty new with mongo. I have 2 databases that are pretty big but when I do a mongo dump the file is nowhere near the site of the databases that mongo is showing.I cannot figure out how to get mongoexport to work. What would be the best way to move those databases? If possible can we just export the data from mongo and then import it?
You'll need to give more information on your issue with mongodump and what mongodump parameters you were using.
Since you are doing a migration, you'll want to use mongodump and not mongoexport. mongoexport only outputs a JSON/CSV format of a collection. Because of this, mongoexport cannot retain certain datatypes that exist in BSON and thus MongoDB does not suggest that anyone uses mongoexport for full backups; this consideration is listed on mongo's site.
mongodump will be able to accurately create a backup of your database/collection that mongorestore will be able to restore that dump to your new server.
If you haven't already, check out Back Up and Restore with MongoDB Tools

How to restore Mongo(WT engine) only with collection-0-****.wt file?

My mongodb can't lanuch now, when I want start mongo got error ***aborting after invariant() failure
Now I want to restore collection-0-****.wt file to a new db, is this possible?
As at MongoDB 3.2, only full backups of WiredTiger data directories can be copied into a new instance. WiredTiger collection or index files aren't self-contained; they rely on other metadata in the WiredTiger.* catalog files. The invariant/assertion you are getting on startup is expected if data files are incomplete or inconsistent.
If you want to backup and restore a single collection, you should use mongodump and mongorestore, eg:
mongodump --db test --collection northwind --host host1
mongorestore --db test dump/test/northwind.bson --host host2
For supported full backup procedures, see: MongoDB Backup Methods.
I had the same issue and after spending 5 hours doing everything, found this.
https://medium.com/#imunscarred/repairing-mongodb-when-wiredtiger-wt-file-is-corrupted-9405978751b5
You will need to restore 1 collection at a time(a few at once when you get the hang of it), but it works!

How to dump (backup) several tables (collection) and recover from MongoDB?

I'm new in MongoDB and my english is too bad - so, i can't understand mongodb documentation.
I have to dump (backup) several tables (collection) from MongoDB base - how I can do it?
Which utility I have to use for recover collections from backup?
Thank you for your attention for my question!
Start your mongod server. Assuming that your mongod server is running on localhost and port 27017. Now open a command prompt and go to bin directory of your mongodb instance and type the command:
mongodump
This command will back all data of the server to directory /bin/dump/
To backup only specific collection, use:
mongodump --collection YOUR_COLLECTION_NAME --db YOUR_DB_NAME
On a good-bye note, to restore/recover data use command:
mongorestore

Migrate mongodb collections and indexes excluding data

I have two environments DEV and STAGE and I need to take all of my collections from a DEV database to a database in STAGE.
Currently using mongodump to get all of my collections and indexes, which appears to work very well.
>mongodump --host 192.168.#.# --port 27019 --db MyDB
Then I am using mongorestore to populate STAGE with the appropriate collections and indexes.
>mongorestore --host 192.168.#.# --port 27019 --db test "C:\Program Files\MongoDB 2.6 Standard\bin\dump\MyDB"
My collections and indexes come across perfectly, however, my collection content comes as well. I have not found a way to exclude my actual data inside my collections...is this doable? Can I remove files from the output of mongodump to only have my collections and their 'indexes'.
MongoDB is schemaless so basically there is no point of restoring empty collections. Collections are created on the fly when you do a write.
I understand you just want to restore collection's metadata as indexes etc., I don't know what driver are you using but I would suggest you deal with this problem on the application level by writing a routine that creates the indexes etc.
Also removing the bson files and keeping only the metadata files from mongodump as you suggest will work (or at least it works in my case with mongo V 3.0.5 with wiredtiger engine) but is not documented as far as I know.
An other alternative could be to use -query option on mongodump to specify documents to include i.e: {_id:a_non_existing_id} but this option is applicable only to collection level dumps.