Is it possible to delete data from a single Mongo secondary instance, by running delete command directly on a secondary, without affecting the primary and other secondary instances?
Explanation: I want to purge a large collection ~500 GB, having ~500 million records. I want to keep last few months data, so I will have to remove ~400 million records. It is a replica setup, with one primary and 2 secondaries. Storage Engine is WiredTiger. I do not want any downtime or slowness as it is a production DB of a live transactional system. I am thinking of below options:
Create a new collection, and copy last few months records in it, and drop the old one. But copying such huge data will slow down the DB server.
Take backup of entire collection, then run bulk delete, with a batch size of 1000. This will take weeks to delete so many records, also will create huge op logs, as every delete will produce an op log that will be synced to secondary. These op logs will take up huge disk space.
Another option is that I run bulk delete on one secondary only. Once the data is deleted, I promote it as primary. Then run same delete on other 2 secondary instances. This will not affect the prod environment. Hence the question: Can we run delete on a secondary only? Once this secondary comes back in cluster after deletion, what will be the behaviour of the sync process between primary and secondary?
I run a small test on a local MongoDB cluster. In principle it seems to work when you follow this procedure:
Shut down the Secondary
Restart the secondary as a standalone (you cannot perform any changes on SECONDARY)
Connect to the standalone and delete old data
Shutdown the standalone
Restart the standalone normally as ReplicaSet member
Repeat step (1) to (5) with the other Secondary. You may run above steps in parallel on all Secondaries, however then you have no redundancy in case of problems.
Set a Secondary from above to Primary
Repeat step (1) to (5) with the last node
As I said, I did a "quick and dirty" test with a few documents and it seems to work.
However, I don't think it will work in your setup because:
Step (5) "delete old data" will take some time, maybe some hours or even days. When you have finished deletion, most likely you will trap into this situation:
Resync a Member of a Replica Set:
A replica set member becomes "stale" when its replication process falls so far behind that the primary overwrites oplog entries the member has not yet replicated. The member cannot catch up and becomes "stale." When this occurs, you must completely resynchronize the member by removing its data and performing an initial sync.
I.e. you will add all deleted data again.
Perhaps there are hacks to overwrite "stale" to "SECONDARY". Then you would have to drop old PRIMARY and add it again as SECONDARY. But by this you would loose all data which have been newly inserted in production while step (5) was running. I assume the application is constantly inserting new data (otherwise you would not get such an amount of documents), such data would be lost.
Related
Take this circumstances when a client writes to a server in replica set mode:
Successful write & acknowledgement
Unsuccessful and error.
If 1. happens but right after that the primary goes down - before sending data to secondary nodes, there will be troubles. When going back in, it will roll back and although the client got an acknowledgement, the data is dismissed.
Question
why does it roll back instead of sending the data to the remaining nodes when the primary is back in? Does this happen because of an election? And what if the result of the election is the same node?
Conjecture: The server goes down, triggers an election, and a different server takes place. When it catches up the new primary, the written message it's not in the oplog, and I guess they continue with different oplogs?
I know we can change this behaviour using majority but would like to understand why this roll back happens.
Any ideas?
MongoDB implements single-master replication, which mandates that only one server is the authoritative source of replication at any time. If it were to replicate the data that is rolled back, it would have to merge it into the new primary and this is complicated and error-prone as the data could have been changed multiple times while the old primary was down.
When a primary goes down and later rejoins the cluster, it reconciles its own copy of the oplog with the one that is in the server that is currently the primary. Since other write operations could have happened in the meantime on the new primary, the new authoritative source of replication is the oplog of the new primary. So, the old primary has to purge its oplog of any operations that are not present in the oplog of the new primary and these are rolled back.
If no primary was available in the cluster when the server rejoins, election takes care of selecting the server with the newest copy of the data (based on the timestamp of the last operation in the oplog). This becomes the new primary and all other servers will sync their oplog to this. So, if the old primary becomes primary again and no newer writes happened in the cluster, then it will not rollback.
Rolled back data is not lost but put aside on a file so that it can be examined and eventually recovered by DBAs if needed. However, you should consider the nature of the data you are storing and, if it is crucial that rollbacks never happen, then use the appropriate write concern to ensure additional copies are made to guarantee it is never lost.
I am dealing with rollback procedures of MongoDB. Problem is rollback for huge data may be bigger than 300 MB or more.
Is there any solution for this problem? Error log is
replSet syncThread: replSet too much data to roll back
In official MongoDB document, I could not see a solution.
Thanks for the answers.
The cause
The page Rollbacks During Replica Set Failover states:
A rollback is necessary only if the primary had accepted write operations that the secondaries had not successfully replicated before the primary stepped down. When the primary rejoins the set as a secondary, it reverts, or “rolls back,” its write operations to maintain database consistency with the other members.
and:
When a rollback does occur, it is often the result of a network partition.
In other words, rollback scenario typically occur like this:
You have a 3-nodes replica set setup of primary-secondary-secondary.
There is a network partition, separating the current primary and the secondaries.
The two secondaries cannot see the former primary, and elected one of them to be the new primary. Applications that are replica-set aware are now writing to the new primary.
However, some writes keep coming into the old primary before it realized that it cannot see the rest of the set and stepped down.
The data written to the old primary in step 4 above are the data that are rolled back, since for a period of time, it was acting as a "false" primary (i.e., the "real" primary is supposed to be the elected secondary in step 3)
The fix
MongoDB will not perform a rollback if there are more than 300 MB of data to be rolled back. The message you are seeing (replSet too much data to roll back) means that you are hitting this situation, and would either have to save the rollback directory under the node's dbpath, or perform an initial sync.
Preventing rollbacks
Configuring your application using w: majority (see Write Concern and Write Concern for Replica Sets) would prevent rollbacks. See Avoid Replica Set Rollbacks for more details.
My mongodb version is 3.2.4.
I have a replicaSet with 2 database nodes and 1 arbitor.
All db are running fine for a long time at my customer site. One day, the primary db was brought down for maintenance. After about 2 hours, the-was-primary was brought back up, and becomes primary again, and secondary db is in Rollback state.
I have a few questions regarding above mentioned scenarios:
when the primary db was brought down the first time, if there are db entries haven't synced to the secondary db, what would happen?
when the failed primary brought back up again, does it become primary right the way? Does it sync with now-primary-was-secondary db before becoming primary db again?
how do I recover the lost data in rollback folder given my latest primary and secondary db state?
Thanks and regards.
When you want to bring down a primary for maintenance, you'd have to do a rs.stepDown() command on the primary. This will elect the other DB node to become the primary:
Primary steps down, it rejects writes. Your application will get brief write errors until the next bullet point below is completed.
Secondary that gets elected will make sure it has synced up with the old primary before becoming a primary itself. This should happen in split-second, but if you have a write heavy application, it can take longer.
When the old primary is brought back up, it will become a primary if you give it a highest priority. I would still recommend having equal priority for the 2 data nodes, and not return the primary back. The process of promoting the other node is exactly the same as the 2 bullet points above.
The rollback state you're in doesn't seem normal if you do a proper stepDown. Here's a good link to apply your rollback:
https://scalegrid.io/blog/how-to-recover-from-a-mongodb-rollback/
Rick wrote good answer, but did not answer your last question... IF rollback happened, there will be rollback-directory under your dbpath. In that directory you can find all rollbacked documents in database-column specific json files. Those files can be mongoimport'ed back to the primary...
How do I rescue a sharded MongoDB cluster when one shard is permanently damaged?
I have a MongoDB cluster with 48 shards. Each shard is a primary with one replicaset. Due to Bad Planning (tm), one of the boxes ran out of filespace and died. The other one, already close, then ran out of space too. Due to bad circumstances (probably a compact() or repairdb() going on at the time, the entire shard was corrupted.
I stopped daemons, tried to repair, but it would not succeed.
So, the question is, how do I accept the loss of one shard but keep the other good shards? With 48 shards, the los of one is only 2% of my data. I'm okay with losing that data, but I have to get to a normal healthy state.
What do I do?
ANSWER OBSOLETE, REDOING ANSWER:
Stop all daemons on all boxes.
change config files for primaries to make them come up as standalone instances.
use mongoexport or mongodump to dump that shard's data into a file. Ensure that the file contains the collections you want. Try to get it so it doesn't include the _id field.
when you have backups completed and moved off the boxes to appropriately safe locations, clean up. delete all data files, etc., and essentially re-create your cluster.
Re-load your data from your data backups.
Note that when you do the re-creation of the cluster, you should probably prepopulate it with a certain / large number of chunks so the splitchunk processes doesn't take forever.
If you end up with unbalanced shards (lots of chunks in one, not another), pause, turn off balancer's throttle so it goes Real Fast, and once it's balanced again, restart reloading.
Im new to mongodb, I have one question:
Im setting up a mongoDB test env, with one mongos, 3 conf server, 2 shards ( 3 server as replication set for a shard )
let's say, for reasons I have a big lag of replication (like secondary is backing up, or network issue. or something else happening. in case it happens)
during this time, the primary server is down,
what will happen ? Auto fail-over select one secondary db as new primary, how about these data havnt replicated yet ?
are we gonna lost data ?
if so, how can we do to get data back and what need to be done, to avoid such issue.
thanks a lot.
during this time, the primary server is down, what will happen ?
No writes will be allowed
Auto fail-over select one secondary db as new primary, how about these data havnt replicated yet ?
If data has no replicated from the primary to the secondary which then becomes primary then a rollback will occur when the primary comes back into the set as a secondary: http://docs.mongodb.org/manual/core/replica-set-rollbacks/
Of course as to whether you lose data or not depends on whether the write went to journal and/or data files and whether the member just left the set or crashed. If the member crashed before the write could go to journal then there is a chance that the write could be lost, yes.
how can we do to get data back and what need to be done, to avoid such issue.
You can use w=majority for most cases but this will still have certain pitfalls within edge cases that just cannot be taken care of, for example if the primary fails over before the write can be propogated to other members etc.