MultiThread insert mongodb made stale data - mongodb

In my test Envinroment:
node1:shard1 primary,shard2 primary
node2:shard1 secondary,shard2 secondary
node3:shard1 arbiter,shard2 artbiter
I wrote a multi-thread to concurrently write the mongo replicat set shard,after 1 hour(the primary had 6g data)
I found the secondary status is :recovering
I checked the secondary log,said:stale data from primary oplog
So the reason was my write request very frequent?then render the secondary cannot replicate in time?
or other reasons?
I'm puzzling...
Thanks in advance

This situation can happen if the size of the OpLog is not sufficient to keep a record of all the operations occurring on the primary, or the secondary just can't keep up with the primary. What will happen in that case is the position in the OpLog where the secondary is will be overwritten by the new inserts from the primary. At this point the secondary will report that it's status is Recovering and you will see a RS102 message in the log, indicating that it is too stale to catch up.
To fix the issue you would need to follow the steps outlined in the documentation.
In order to prevent the problem from happening in the future, you would need to tune the size of the OpLog, and make sure that the secondaries are of equivalent hardware configurations.
To help tune the OpLog you can look at the output of db.printReplicationInfo() which will tell you how much time you have in your OpLog. The documentation outlines how to resize the OpLog if it is too small.

Related

MongoDB autoresync stale data

MongoDB configuration has a param called "autoresync".
This is what it says:
Automatically resync if slave data is stale
autoresync
So if we enable this parameter, when one of the secondaries go into RECOVERING state, can it auto heal MongoDB non primary members who have stale data and unable to replicate data. Some times we see data is too stale. So if we enable this param, can it automatically heal and bring it to good state.
That parameter is "legasy" and has not been supported for... Long time. It was when there was master-slave paradigm in the MongoDB.
With current versions of MongoDB, secondaries always recover (auto heal), IF primary have opLog what is big enough to cover that "missing" data.
So, if your secondary cannot replicate/recover, check that your PRIMARY node opLog is big enough. Resize opLog without reboot
To see how long time your opLog can cover, use command db.getReplicationInfo.timeDiff

Mongo DB Replica Disk Writes

Our secondary instances are reporting much higher disk write rate than the primary. Is this expected behavior in a replica set? Given that the oplog gets copied and replayed from the primary periodically, what's contributing to additional writes on secondaries?
Version: 3.4.1
StorageEngine: WiredTiger
Primary: i3.8xlarge
Secondaries: i3en.3xlarge
Update:
1. The issue disappeared after a couple of days.
2. Secondaries are showing disk writes comparable to primary.
We are assuming there was a one-off issue that caused this behavior. In absence of enough historical data we chose to pause the investigation given that the problem has disappeared.

Can MongoDB manage RollBack procedure more then 300 MB Streaming Data?

I am dealing with rollback procedures of MongoDB. Problem is rollback for huge data may be bigger than 300 MB or more.
Is there any solution for this problem? Error log is
replSet syncThread: replSet too much data to roll back
In official MongoDB document, I could not see a solution.
Thanks for the answers.
The cause
The page Rollbacks During Replica Set Failover states:
A rollback is necessary only if the primary had accepted write operations that the secondaries had not successfully replicated before the primary stepped down. When the primary rejoins the set as a secondary, it reverts, or “rolls back,” its write operations to maintain database consistency with the other members.
and:
When a rollback does occur, it is often the result of a network partition.
In other words, rollback scenario typically occur like this:
You have a 3-nodes replica set setup of primary-secondary-secondary.
There is a network partition, separating the current primary and the secondaries.
The two secondaries cannot see the former primary, and elected one of them to be the new primary. Applications that are replica-set aware are now writing to the new primary.
However, some writes keep coming into the old primary before it realized that it cannot see the rest of the set and stepped down.
The data written to the old primary in step 4 above are the data that are rolled back, since for a period of time, it was acting as a "false" primary (i.e., the "real" primary is supposed to be the elected secondary in step 3)
The fix
MongoDB will not perform a rollback if there are more than 300 MB of data to be rolled back. The message you are seeing (replSet too much data to roll back) means that you are hitting this situation, and would either have to save the rollback directory under the node's dbpath, or perform an initial sync.
Preventing rollbacks
Configuring your application using w: majority (see Write Concern and Write Concern for Replica Sets) would prevent rollbacks. See Avoid Replica Set Rollbacks for more details.

Can rollback still occur on a MongoDB replica set with J=1 and W=Majority?

I have been reading the docs and from my understanding I could see a scenario whereby a rollback could still occur:
Write goes to primary which confirms that the journal has been written to disk
Majority of the secondaries confirm the write but do not write to disk
Power fails on entire cluster
Primary for some reason does not start back up when power is restored
A secondary takes the primary role
The original primary finally starts, rejoins the set as a secondary and rolls back
Is this scenario plausible?
This could be a plausible case for rollback yes, if the power fails between the other members getting the command and writing to disk.
In this case, as you point out, the primary could not start back up and so would, once back up, contain operations that the rest of the set could not validate causing a rollback.
It is also good to note, as a curve ball that if the primary were not to go down then it would return a successful write and the application would be none the wiser that the set has gone down and their {w: majority} wasn't written to disk. This is, of course, an edge case.
Don't think it will happen in MongoDB 3.2+, as in here, you see:
Changed in version 3.2: With j: true, MongoDB returns only after the
requested number of members, including the primary, have written to
the journal. Previously j: true write concern in a replica set only
requires the primary to write to the journal, regardless of the w:
write concern.
based on the docs, my understanding is that if you set j=1 then w > 1 doesn't matter. your app will have the write ack'd only once (and as soon as) the primary has committed the write to its own journal. writes to replicas will happen but don't factor into your write concern.
in light of this, the senario of "can the primary commit to journal, ack the write, and have the cluster go down before the secondaries commit to their journal and then roll back the primary when a secondary comes back up as primary" is more likely (but still of very low likelihood) than the original question implies.
from the docs:
Requiring journaled write concern in a replica set only requires a journal commit of the write
operation to the primary of the set regardless of the level of replica acknowledged write concern.
http://docs.mongodb.org/manual/core/write-concern/

Replication Behavior in MongoDB

I have a single mongod instance with 2 replications(secondary mongod instances) and a java code which inserts 1 million simple objects with WriteConcern = JOURNAL_SAFE.
While the java code is in execution we kill the primary instance, the java code throws an exception server not available. Then i shutdown both other secondary nodes and started each node separately as standalone and then check the record count. we observe that record count in both secondary mongod instances is same while in primary a one record is missing and the missing record is the one on which the job failed(mongod instance was killed).
Can anyone please explain this behavior, if the record is not present is primary how can it be possible that record exist in secondary.
Regards,
Bhagwant Bhobe
This not at all unexpected - in fact, I would expect this to be the case because replication in MongoDB is asynchronous and "as fast as possible" - as soon as the primary records the write in memory, it is visible via the oplog to the secondaries which apply it to themselves.
Because you killed the primary server before it had a chance to flush the write from memory to disk, that node does not have the inserted record when you examine it, but the secondaries have it because it was replicated, and normally replication takes less time than flushing data to disk (though it depends on speed of your disk, speed of your network and relative timing of events).