MongoDB configuration has a param called "autoresync".
This is what it says:
Automatically resync if slave data is stale
autoresync
So if we enable this parameter, when one of the secondaries go into RECOVERING state, can it auto heal MongoDB non primary members who have stale data and unable to replicate data. Some times we see data is too stale. So if we enable this param, can it automatically heal and bring it to good state.
That parameter is "legasy" and has not been supported for... Long time. It was when there was master-slave paradigm in the MongoDB.
With current versions of MongoDB, secondaries always recover (auto heal), IF primary have opLog what is big enough to cover that "missing" data.
So, if your secondary cannot replicate/recover, check that your PRIMARY node opLog is big enough. Resize opLog without reboot
To see how long time your opLog can cover, use command db.getReplicationInfo.timeDiff
Related
In MongoDB 4.4.1 there is mirroredRead configuration which allows primary to forward read/update requests to secondary replicaset.
How it is different from secondaryPreferred readPerence when its sampling rate is set to 1.0?
What is the use-case of mirroredRead?
reference - https://docs.mongodb.com/manual/replication/#mirrored-reads-supported-operations
What is the use-case of mirroredRead?
This is described in the documentation you linked:
MongoDB provides mirrored reads to pre-warm the cache of electable secondary members
If you are not familiar with cache warming, there are many resources describing it, e.g. https://www.section.io/blog/what-is-cache-warming/.
A secondary read:
Is sent to the secondary, thus reducing the load on the primary
Can return stale data
A mirrored read:
Is sent to the primary
Always returns most recent data
mirroredRead configuration which allows primary to forward read/update requests to secondary replicaset.
This is incorrect:
A mirrored read is not applicable to updates.
The read is not "forwarded". The primary responds to the read using its local data. Additionally, the primary sends a read request to one or more secondaries, but does not receive a result of this read at all (and does not "forward" the secondary read result back to the application).
Let's suppose you always use primary read preference and you have 2 members that are electable for being primary.
Since all of your reads are taking place in primary instance, its cache is heavily populated and since your other electable member doesn't receive any reads, its cache can be considered to be empty.
Using mirrored reads, the primary will send a portion (in your question 100%) of read requests to that secondary as well, to make her familiar with the pattern of read queries and populate its cache.
Suddenly a disaster occurs and current primary goes down. Now your new primary has a pre-warmed cache that can respond to queries as fast as the previous primary, without shocking the system to populate its cache.
Regarding the impact of sampling rate, MongoDB folks in their blog post introducing this feature stated that increasing the sampling rate would increase load on the Replica Set. My understanding is that you may already have queries with read preference other than primary that makes your secondary instance already busy. In this case, these mirrored reads can impact on the performance of your secondary instance. Hence, you may not want to perform all primary reads again on these secondaries (The repetition of terms secondary and primary is mind blowing!).
The story with secondaryPreferred reads is different and you're querying secondaries for data, unless there is no secondary.
I am dealing with rollback procedures of MongoDB. Problem is rollback for huge data may be bigger than 300 MB or more.
Is there any solution for this problem? Error log is
replSet syncThread: replSet too much data to roll back
In official MongoDB document, I could not see a solution.
Thanks for the answers.
The cause
The page Rollbacks During Replica Set Failover states:
A rollback is necessary only if the primary had accepted write operations that the secondaries had not successfully replicated before the primary stepped down. When the primary rejoins the set as a secondary, it reverts, or “rolls back,” its write operations to maintain database consistency with the other members.
and:
When a rollback does occur, it is often the result of a network partition.
In other words, rollback scenario typically occur like this:
You have a 3-nodes replica set setup of primary-secondary-secondary.
There is a network partition, separating the current primary and the secondaries.
The two secondaries cannot see the former primary, and elected one of them to be the new primary. Applications that are replica-set aware are now writing to the new primary.
However, some writes keep coming into the old primary before it realized that it cannot see the rest of the set and stepped down.
The data written to the old primary in step 4 above are the data that are rolled back, since for a period of time, it was acting as a "false" primary (i.e., the "real" primary is supposed to be the elected secondary in step 3)
The fix
MongoDB will not perform a rollback if there are more than 300 MB of data to be rolled back. The message you are seeing (replSet too much data to roll back) means that you are hitting this situation, and would either have to save the rollback directory under the node's dbpath, or perform an initial sync.
Preventing rollbacks
Configuring your application using w: majority (see Write Concern and Write Concern for Replica Sets) would prevent rollbacks. See Avoid Replica Set Rollbacks for more details.
I have a 3 node MongoDB (2.6.3) replica set and am testing various failure scenarios.
It was my understanding that if a majority of the replica nodes are not available then the replica set becomes read only. But what I am experiencing is if I shut down my 2 secondary nodes, the last remaining node (which was previously primary) becomes a secondary and I cannot even read from it.
From the docs:
Users may configure read preference on a per-connection basis to
prefer that the read operations return on the secondary members. If
clients configure the read preference to permit secondary reads, read
operations can return from secondary members that have not replicated
more recent updates or operations. When reading from a secondary, a
query may return data that reflects a previous state.
It sounds like I can configure my client to allow reads from secondaries, but since it was the primary node that I left up, it should be up to date with all of the data. Does MongoDB make the last node secondary even if it is fully caught up with data?
As you've noted, once you've shut down the two secondaries, your primary steps down and becomes a secondary (it's a normal scenario once a primary looses connection to the majority of members).
The default read preference of a replica set is to read from primary, but since your former primary is not even primary anymore, as you have encountered , "I cannot even read from it."
You can change read-preference on a driver/database/collection and even operation basis.
since it was the primary node that I left up, it should be up to date with all of the data. Does MongoDB make the last node secondary even if it is fully caught up with data?
As said, the primary becomes secondary as it steps down, nothing to do with the fact that it's up to date or not. It wouldn't read even because of the default read preference, if you will change your driver preference to secondary , nearest or so, you'll be able to continue reading even if a single node (former primary) remains.
In my test Envinroment:
node1:shard1 primary,shard2 primary
node2:shard1 secondary,shard2 secondary
node3:shard1 arbiter,shard2 artbiter
I wrote a multi-thread to concurrently write the mongo replicat set shard,after 1 hour(the primary had 6g data)
I found the secondary status is :recovering
I checked the secondary log,said:stale data from primary oplog
So the reason was my write request very frequent?then render the secondary cannot replicate in time?
or other reasons?
I'm puzzling...
Thanks in advance
This situation can happen if the size of the OpLog is not sufficient to keep a record of all the operations occurring on the primary, or the secondary just can't keep up with the primary. What will happen in that case is the position in the OpLog where the secondary is will be overwritten by the new inserts from the primary. At this point the secondary will report that it's status is Recovering and you will see a RS102 message in the log, indicating that it is too stale to catch up.
To fix the issue you would need to follow the steps outlined in the documentation.
In order to prevent the problem from happening in the future, you would need to tune the size of the OpLog, and make sure that the secondaries are of equivalent hardware configurations.
To help tune the OpLog you can look at the output of db.printReplicationInfo() which will tell you how much time you have in your OpLog. The documentation outlines how to resize the OpLog if it is too small.
I have recently started on MongodDb and I'm trying to explore on replica sets and crash recovery.
I have read it like journal file are write a head redo log file.
oplog files are those where every write activity will be written to.
What is the difference between these two...?
Do we have oplogs on both the master and the slave...?
Please post any web links that shed some light on this area.
Oplog stores high-level transactions that modify the database (queries are not stored for example), like insert this document, update that, etc. Oplog is kept on the primary and secondaries will periodically poll the master to get newly performed operations (since the last poll). Operations sometimes get transformed before being stored in the oplog so that they are idempotent (and can be safely applied many times).
Journal on the other hand can be switched on/off on any node (primary or secondary), and is a low-level log of an operation for the purpose of crash recovery and durability of a single mongo instance. You can read low-level op like 'write these bytes to this file at this position'.
NOTE:
Starting in MongoDB 4.0, you cannot turn journaling off for replica set members that use the WiredTiger storage engine.
Source: https://docs.mongodb.com/manual/tutorial/manage-journaling/
Oplog is just capped collection where MongoDB tracks all changes in its collections (insert, update, delete). It doesn't track read operations. MongoDB uses oplog to spread all changes within all nodes in a replica set. Secondary nodes copy and apply this changes.
Journal is a feature of underlying storage engine. Since MongoDB 3.2 default storage engine is WiredTiger and since MongoDB 4.0 you can't disable journaling for WiredTiger. All operations are tracked in the journal files. WiredTiger uses checkpoints to recover data in case of crash. Checkpoints are created every 60 secs. In case if a crash happens between checkpoints some data can be lost. To prevent this, WiredTiger uses journal files to apply all the changes after the last checkpoint.
In general, write flow in MongoDB looks like that:
High-level - when a customer writes/updates/removes data, MongoDB applies it to proper collection, updates index and inserts the change to oplog. If any of these operations fails then other related operations must be rolled back to prevent inconsistency. For this MongoDB uses WiredTiger transactions:
begin transaction
apply change to collection
update index
add the change to the oplog
commit the transaction
Low-level - WiredTiger runs the transaction and adds the changes to journal file.
There must be a relationship between journal and oplog. when w=1 u commit to primary jouranl and also created an oplog entry for replset collection. I think at least in primary of a replSet - they both contain same update/delete/insert just in different format.