Can I restore data from mongo oplog? - mongodb

my mongodb was hacked today, all data was deleted, and hacker requires some amount to get it back, I will not pay him, cause I know he will not send me back my database.
But I have had oplog turn on, I see it contains over 300 000 documents, saving all operations.
Is there any tool that can restore my data from this logs?

Depending on how far back your oplog is, you may be able to restore the deployment. I would recommend taking a backup of the current state of your dbpath just in case.
Note that there are many variables in play for doing a restore like this, so success is never a guarantee. It can be done using mongodump and mongorestore, but only if your oplog goes back to the beginning of time (i.e. when the deployment was first created). If it does, you may be able to restore your data. If it does not, you'll see errors during the process.
Secure your deployment before doing anything else. This situation arises due to a lack of security. There are extensive security features available in MongoDB. Check out the Security Checklist page for details.
Dump the oplog collection using mongodump --host <old_host> --username <user> --password <pwd> -d local -c oplog.rs -o oplogDump.
Check the content of the oplog to determine the timestamp when the offending drop operation occur by using bsondump oplogDump/local/oplog.rs.bson. You're looking for a line that looks approximately like this:
{"ts":{"$timestamp":{"t":1502172266,"i":1}},"t":{"$numberLong":"1"},"h":{"$numberLong":"7041819298365940282"},"v":2,"op":"c","ns":"test.$cmd","o":{"dropDatabase":1}}
This line means that a dropDatabase() command was executed on the test database.
Keep note of the t value in {"$timestamp":{"t":1502172266,"i":1}}.
Restore to a secure new deployment using mongorestore --host <new_host> --username <user> --password <pwd> --oplogReplay --oplogLimit=1502172266 --oplogFile=oplogDump/local/oplog.rs.bson oplogDump
Note the parameter to oplogLimit, which is basically telling mongorestore to stop replaying the oplog once it hit that timestamp (which is the timestamp of the dropDatabase command in Step 3.
The oplogFile parameter is new to MongoDB 3.4. For older versions, you would need to copy the oplogDump/local/oplog.rs.bson to the root of the dump directory to a file named oplog.bson, e.g. oplogDump/oplog.bson and remove the oplogFile parameter from the example command above.
After Step 4, if your oplog goes back to the beginning of time and you stop the oplog replay at the right time, hopefully you should see your data at the point just before the dropDatabase command was executed.

Related

How does restoring a db backup affect the oplog?

I have a standalone MongoDb instance. It has many databases in it. I am though only concerned with backingup/restoring one of those databases, lets call it DbOne.
Using the instructions in (http://www.mongodb.com/blog/post/dont-let-your-standalone-mongodb-server-stand-alone), I can create an oplog on this standalone server.
Using the tool Tayra, I can record/store the oplog entries. Being able to create incremental backups is the main reason I enabled the oplog on my standalone instance.
I intend to take full backups once a day, using the command
mongodump --db DbOne --oplog
From my understanding, this backup will contain a point-in-time snapshot of my db.
Assuming I want to discard all updates since this backup, I delete all backedup oplog and I restore only this full backup, using the command
mongorestore --drop --db DbOne --oplogReplay
At this point, do I need to do something to the oplog collection in the local db? Will mongodb automatically drop the entries pertaining to this db from the oplog? Because if not, then wouldn't Tayra end up finding those oplog entries and backup them all over again?
Tbh, I haven't tried this yet on my machine. I am hoping someone can point to a document that lists supported/expected behaviour in this scenario.
I had experimented with a MongoDb server, setup as a replica set with only 1 member, shortly after asking the question. I however forgot to answer this question.
I took a backup using mongodump --db DbOne --oplog. I executed some additional updates. Keeping the mongodb server as is, ie still running under replication, if I run the mongorestore command, then it would create thousands of oplog entries, one for each document of each collection in the db. It was a big mess.
The alternative was to shutdown MongoDb and start it as a standalone instance (ie not running as a replica set). Now if I were to restore using mongorestore the oplog wouldnt be touched. This was bad, because the oplog now contained entries that were not present in the actual collections.
I wanted a mechanism that would restore both my database as well as oplog info in the local database to the time the backup took place. mongodump doesnt backup the local database.
Eventually I had stop using mongodump and instead switched to backing up the entire data directory (after stopping mongodb). Once we switch to AWS, I could use the EBS Snapshot feature to perform the same.
I understand you want a link to the docs about mongorestore:
http://docs.mongodb.org/manual/reference/program/mongorestore/
From what I understand you want to make a point in time backup and then restore that backup. The commands you listed above will do that:
1)mongodump --db DbOne --oplog
2)mongorestore --drop --db DbOne --oplogReplay
However, please note that the "point in time" that the backup was effectively taken at it is when the dump ends, not the moment the command started. This is a fine detail that might not matter to you, but is included for completeness.
Let me know if there is anything else.
Best,
Charlie

Mongodump with --oplog for hot backup

I'm looking for the right way to do a Mongodb backup on a replica set (non-sharded).
By reading the Mongodb documentation, I understand that a "mongodump --oplog" should be enough, even on a replica (slave) server.
From the mongodb / mongodump documentation :
--oplog
Use this option to ensure that mongodump creates a dump of the database that includes an oplog, to create a point-in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction with mongorestore --oplogReplay.
Without --oplog, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup
I'm still having a very hard time to understand how Mongodb can backup and keep writing on the database and make a consistent backup, even with --oplog.
Should I lock my collections first or is it safe to run "mongodump --oplog ?
Is there anything else I should know about?
Thanks.
The following document explains how mongodump with –oplog option works to create a point in time backup.
http://docs.mongodb.org/manual/tutorial/backup-databases-with-binary-database-dumps/
However, using mongodump and mongorestore to back up and restore MongodDB can be slow. If file system snapshot is an option, you may want to consider using snapshot for MongoDB backup. Information from the following link details two snapshot options for performing hot backup of MongoDB.
http://docs.mongodb.org/manual/tutorial/backup-databases-with-filesystem-snapshots/
You can also look into MongoDB backup service.
http://www.10gen.com/products/mongodb-backup-service

How to get a consistent MongoDB backup for a single node setup

I'm using MongoDB in a pretty simple setup and need a consistent backup strategy. I found out the hard way that wrapping a mongodump in a lock/unlock is a bad idea. Then I read that the --oplog option should be able to provide consistency without lock/unlock. However, when I tried that, it said that I could only use the --oplog option on a "full dump." I've poked around the docs and lots of articles but it still seems unclear on how to dump a mongo database from a single point in time.
For now I'm just going with a normal dump but I'm assuming that if there are writes during the dump it would make the backup not from a single point in time, correct?
mongodump -h $MONGO_HOST:$MONGO_PORT -d $MONGO_DATABASE -o ./${EXPORT_FILE} -u backup -p password --authenticationDatabase admin
In production environment, MongoDB is typically deployed as replica set(s) to ensure redundancy and high availability. There are a few options available for point in time backup if you are running a standalone mongod instance.
One option as you have mentioned is to do a mongodump with –oplog option. However, this option is only available if you are running a replica set. You can convert a standalone mongod instance to a single node replica set easily without adding any new replica set members. Please check the following document for details.
http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
This way, if there are writes while mongodump is running, they will be part of your backup. Please see Point in Time Operation Using Oplogs section from the following link.
http://docs.mongodb.org/manual/tutorial/backup-databases-with-binary-database-dumps/#point-in-time-operation-using-oplogs
Be aware that using mongodump and mongorestore to back up and restore MongodDB can be slow.
File system snapshot is another option. Information from the following link details two snapshot options for performing hot backup of MongoDB.
http://docs.mongodb.org/manual/tutorial/backup-databases-with-filesystem-snapshots/
You can also look into MongoDB backup service.
http://www.10gen.com/products/mongodb-backup-service
In addition, mongodump with oplog options does not work with single db/collection at this moment. There are plans to implement the feature. You can follow the ticket and vote for the feature under the More Actions button.
https://jira.mongodb.org/browse/SERVER-4273

Initial sync of mongodb and Elasticsearch

What is a easy way to do the initial sync between mongodb and elasticsearch. I use the https://github.com/richardwilly98/elasticsearch-river-mongodb to sync any updates. The river works by tracking the changes in the mongodb replica set logs and applying those to ES but how can I sync what is already in mongodb to elasticsearch.
A proposed solution I have seen is to dump ( mongodump) the data and restore ( mongorestore ) but not sure of its impact on a live mongo database.
That is actually the solution.
mongodump -u root -p 'yourpassword' --oplog
the oplog will also copy the transaction log which I think is required for your script to work.
after that you do mongorestore on the otherside
mongorestore --oplogReplay
Also another solution is to use "OplogReplay" script instead of the one you are using
This script does the initial sync from source to destination automatically when you run it for the first time
https://pypi.python.org/pypi/OplogReplay/
I recommend that you download the latest code from github directly
https://github.com/uberVU/mongo-oplogreplay

Modify and replay MongoDB oplog

Is it possible to modify the MongoDB oplog and replay it?
A bug caused an update to be applied to more documents than it was supposed to be, overwriting some data. Data was recovered from backup and reintegrated, so nothing was actually lost, but I was wondering if there was a way to modify the oplog to remove or modify the offending update and replay it.
I don't have in depth knowledge of MongoDB internals, so informative answers along the lines of, "you don't understand how it works, it's like this" will also be considered for acceptance.
One of the big issues in application or human error data corruption is that the offending write to the primary will immediately be replicated to the secondary.
This is one of the reasons that users take advantage of "slaveDelay" - an option to run one of your secondary nodes with a fixed time delay (of course that only helps you if you discover the error or bug during the time period that's shorter than the delay on that secondary).
In case you don't have such a set-up, you have to rely on a backup to recreate the state of the records you need to restore to their pre-bug state.
Perform all the operations on a separate stand-alone copy of your data - only after verifying that everything was properly recreated should you then move the corrected data into your production system.
What is required to be able to do this is a recent copy of the backup (let's say the backup is X hours old) and the oplog on your cluster must hold more than X hours worth of data. I didn't specify which node's oplog because (a) every member of the replica set has the same contents in the oplog and (b) it is possible that your oplog size is different on different node members, in which case you want to check the "largest" one.
So let's say your most recent backup is 52 hours old, but luckily you have an oplog that holds 75 hours worth of data (yay).
You already realized that all of your nodes (primary and secondaries) have the "bad" data, so what you would do is restore this most recent backup into a new mongod. This is where you will restore these records to what they were right before the offending update - and then you can just move them into the current primary from where they will get replicated to all the secondaries.
While restoring your backup, create a mongodump of your oplog collection via this command:
mongodump -d local -c oplog.rs -o oplogD
Move the oplog to its own directory renaming it to oplog.bson:
mkdir oplogR
mv oplogD/local/oplog.rs.bson oplogR/oplog.bson
Now you need to find the "offending" operation. You can dump out the oplog to human readable form, using the bsondump command on oplogR/oplog.bson file (and then use grep or what-not to find the "bad" update). Alternatively you can query against the original oplog in the replica set via use local and db.oplog.rs.find() commands in the shell.
Your goal is to find this entry and note its ts field.
It might look like this:
"ts" : Timestamp( 1361497305, 2789 )
Note that the mongorestore command has two options, one called --oplogReplay and the other called oplogLimit. You will now replay this oplog on the restored stand-alone server BUT you will stop before this offending update operation.
The command would be (host and port are where your newly restored backup is):
mongorestore -h host --port NNNN --oplogReplay --oplogLimit 1361497305:2789 oplogR
This will restore each operation from the oplog.bson file in oplogR directory stopping right before the entry with ts value Timestamp(1361497305, 2789).
Recall that the reason you were doing this on a separate instance is so you can verify the restore and replay created correct data - once you have verified it then you can write the restored records to the appropriate place in the real primary (and allow replication propagate the corrected records to the secondaries).