MongoDB Write Concern in context with replication - mongodb

In a Shard MongoDB provide replication to replica set as pull mechanism, my question is if a write operation arrive at the primary with majority write concern then will it be waiting for all the operation in oplog to be replicated to all nodes or will just replicate this write operation to all nodes.

Since the oplog is replicated asynchronously the wait will only be for that particular write operation and not for the whole oplog.
Visit http://docs.mongodb.org/manual/core/distributed-write-operations/ for more details.

Related

MongoDB WriteConcern impact on replication

In general, MongoDB will replicate from a Primary to Secondaries asynchronously, based on number of write operations, time and other factors by shipping oplog from primary to secondaries.
When describing WriteConcern options, MongoDB documentation states "...primary waits until the required number of secondaries acknowledge the write before returning write concern acknowledgment". This seems to suggest that a WriteConcern other than "w:1" would replicate to at least some of the members of the replica set in a blocking manner, potentially avoiding log shipping.
The basic question I'm trying to answer is this: if every write is using WriteCocnern of "majority", would MongoDB ever have to use log shipment? In other words, is using WriteCocnern of "majority" also controls replication timing?
I would like to better understand how MongoDB handles WriteConcern of "majority". A few obvious options:
Primary sends write requests to every Secondary, and blocks the thread until majority respond with acknowledgment
or
Primary pre-selects Secondaries first and sends requests to only those secondaries, blocking the thread until all chosen secondaries respond with acknowledgment
or
Something much smarter than either of these options
If Option 1 is used, in most cases (assuming equidistant placement of secondaries) all secondaries will have received the write operation by the time Write completes, and there's high probability (although not a guarantee) all secondaries will have applied it. If true, this behavior enables use cases where writes need to be reflected on Secondaries quicker than typical asynchronous replication process.
Obviously WriteConcern of "majority" will incur performance penalty, but this may be acceptable for specific use cases where read operations may target Secondaries (e.g. ReadPreference of "nearest") and desire more recent data.
if every write is using WriteConcern of "majority", would MongoDB ever have to use log shipment?
Replication in MongoDB uses what is termed as the oplog. This is a record of all operations on the primary (the only node that accept writes).
Instead of pushing the oplog into the secondaries, the secondaries long-pull on the oplog of the primary. If replication chaining is allowed (the default), then a secondary can also pull the oplog from another secondary. So scenario 1 & 2 you posted are not the reality with MongoDB replication as of MongoDB 4.0.
The details of the replication process is described in MongoDB Github wiki page: Replication Internals.
To quote the relevant parts regarding your question:
If a command includes a write concern, the command will just block in its own thread until the oplog entries it generates have been replicated to the requested number of nodes. The primary keeps track of how up-to-date the secondaries are to know when to return. A write concern can specify a number of nodes to wait for, or majority.
In other words, the secondaries continually report back to the primary how far along it has applied the oplog into its own dataset. Since the primary knows the timestamp that the write took place, once a secondary has applied that timestamp, it can tell that the write has propagated to that secondary. To satisfy the write concern, the primary simply waits until a determined number of secondaries have applied the write timestamp.
Note that only the thread specifying the write concern is waiting for this acknowledgment. All other threads are not blocked due to this waiting at all.
Regarding to you other question:
Obviously WriteConcern of "majority" will incur performance penalty, but this may be acceptable for specific use cases where read operations may target Secondaries (e.g. ReadPreference of "nearest") and desire more recent data.
To achieve what you described, you need a combination of read and write concerns. See
Causal Consistency and Read and Write Concerns for more details on this subject.
Write majority is typically used for:
Ensuring that the write will not be rolled back in the event of the primary failure.
Ensuring that the application is not writing so fast that the provisioned hardware of the replica set cannot cope with the traffic; i.e. it can act as a backpressure mechanism.
In combination with read concern, provide the client with differing levels of consistency guarantees.
These points assume that the write majority was acknowledged and the acknowledgment was received by the client. There are multiple different failure scenario that are possible (as expected with a distributed system that needs to cope with unreliable network), but those are beyond the scope of this discussion.

MongoDB replica out of sync when performing a lot of inserts

I have a three member replica set using MongoDB v3.2.4. Each member is a VM with 8 cores and 8GB RAM, and in normal operations these nodes are running very low in CPU and memory consumption.
I have a 60GB database (30 million docs) that once a month is totally reloaded by a Map/Reduce job written in Pig. During this job the cluster receives 30k insert/s and in a few minutes the secondaries becomes out of sync.
The current oplog size is 20GB (already modified from the default) but this does not resolve the replication sync issue.
I don't know if modifying the oplog size again will help. My concern is that the replication seems to be done when there is no load on the primary. Since my insert job lasts 1 hour does that mean I need an oplog the size of my db?
Is there a way to tell MongoDB to put more effort on replication and have a more balanced workload between accepting inserts and replication?
Is there a way to tell mongo to put more effort on replication to have a more balanced workload between accepting inserts and replicatings these inserts?
To ensure data has replicated to secondaries (and throttle your inserts) you should increase your write concern to w:majority. The default write concern (w:1) only confirms that a write operation has been accepted by the primary, so if your secondaries cannot keep up for an extended period of inserts they will eventually fall out of sync (as you have experienced).
You can include the majority as an option in your MongoDB Connection String URI, eg:
STORE data INTO
'mongodb://user:pass#db1.example.net,db2.example.net/my_db.my_collection?replicaSet=replicaSetName&w=majority'
USING com.mongodb.hadoop.pig.MongoInsertStorage('', '');

MongoDB failover with majority write concern: Can it get into an inconsistent state?

I have a question regarding replication and write concerns. Suppose I have a write concern of journaled + majority acknowledged, is it ever possible that in a span of two or more writes, the first write is acknowledged by the secondary 1 and not 2, AND the second write is acknowledged by secondary 2 and not 1?
And if this can happen, what will happen if a new primary has to be elected in this state?
Thank you!
No, I do not believe this is possible. The write operations translate into some number of entries in the oplog of the primary. The secondaries apply the operations by tailing the oplog. One of the two write operations has its last oplog entry before the other, so that operation will always be completed first by any secondary tailing the primary's oplog. Thus, the situation you describe is not possible. One of the writes had to be completely applied on a secondary before the other could've been.

MongoDB sharding and read replicas

I am in a process of setting up Sharded cluster.
I also wanted to configure read replicas in cluster.
Suppose there is a Shard of 3, 1 Primary and 2 Secondary. The writes will goto Primary member of a Shard, but can I send all reads to Secondaries?
It is not recommended to read from secondaries in a sharded cluster. This is because unlike the primaries, the secondaries have no idea what data the shard is supposed to contain. So when chunks are migrated from one shard to another, reading from the secondaries might result in duplicate or missing results.
In general, reading from secondaries also means the loss of consistency guarantees, since as long as writes have not propagated from the primaries, the secondaries will see, and return, stale data.
Finally, you should keep in mind that secondaries receive essentially the same write load as the primaries, so reading from the secondaries is not likely to provide much of a performance advantage.
The recommended way to get more read throughput is usually to add more shards, not reading from secondaries.

MongoDB write lock blocks reads on secondaries?

MongoDB docs about concurrency state that the DB is 'write greedy'. That is something I understand. However it does not tell about what locks do to secondaries in a replica set.
Taking my use-case which would get about 40 writes per 100 queries wherein I am not in need of having the most recent document at all times. A lag of 5-10 seconds is okay with me which is how much the secondaries in a replica set would be behind the master. Now if the write lock locks down master as well as the replicas, then I am locked out of reads on secondaries as well.
I wanted to know if writers will lock read operations on secondaries as well.
Into a replica set SECONDARY servers are not affected by the write lock on MASTERS.
You can see the status of your servers by using mongotop or montostat.
The locks are per mongod instance. That means that the read/write locks are locking operations only on the primary. The secondaries are reading oplog from primary and replicating actions from the primary.
You can read much more details on their manual about concurrency.