Guarantee consistency of data across microservices access a sharded cluster in MongoDB - mongodb

My application is essentially a bunch of microservices deployed across Node.js instances. One service might write some data while a different service will read those updates. (specific example, I'm processing data that is inbound to my solution using a processing pipeline. Stage 1 does something, stage 2 does something else to the same data, etc. It's a fairly common pattern)
So, I have a large data set (~250GB now, and I've read that once a DB gets much larger than this size, it is impossible to introduce sharding to a database, at least, not without some major hoop jumping). I want to have a highly available DB, so I'm planning on a replica set with at least one secondary and an arbiter.
I am still researching my 'sharding' options, but I think that I can shard my data by the 'client' that it belongs to and so I think it makes sense for me to have 3 shards.
First question, if I am correct, if I have 3 shards and my replica set is Primary/Secondary/Arbiter (with Arbiter running on the Primary), I will have 6 instances of MongoDB running. There will be three primaries and three secondaries (with the Arbiter running on each Primary). Is this correct?
Second question. I've read conflicting info about what 'majority' means... If I have a Primary and Secondary and I'm writing using the 'majority' write acknowledgement, what happens when either the Primary or Secondary goes down? If the Arbiter is still there, the election can happen and I'll still have a Primary. But, does Majority refer to members of the replication set? Or to Secondaries? So, if I only have a Primary and I try to write with 'majority' option, will I ever get an acknowledgement? If there is only a Primary, then 'majority' would mean a write to the Primary alone triggers the acknowledgement. Or, would this just block until my timeout was reached and then I would get an error?
Third question... I'm assuming that as long as I do writes with 'majority' acknowledgement and do reads from all the Primaries, I don't need to worry about causally consistent data? I've read that doing reads from 'Secondary' nodes is not worth the effort. If reading from a Secondary, you have to worry about 'eventual consistency' and since writes are getting synchronized, the Secondaries are essentially seeing the same amount of traffic that the Primaries are. So there isn't any benefit to reading from the Secondaries. If that is the case, I can do all reads from the Primaries (using 'majority' read concern) and be sure that I'm always getting consistent data and the sharding I'm doing is giving me some benefits from distributing the load across the shards. Is this correct?
Fourth (and last) question... When are causally consistent sessions worthwhile? If I understand correctly, and I'm not sure that I do, then I think it is when I have a case like a typical web app (not some distributed application, like my current one), where there is just one (or two) nodes doing the reading and writing. In that case, I would use causally consistent sessions and do my writes to the Primary and reads from the Secondary. But, in that case, what would the benefit of reading from the Secondaries be, anyway? What am I missing? What is the use case for causally consistent sessions?

if I have 3 shards and my replica set is Primary/Secondary/Arbiter (with Arbiter running on the Primary), I will have 6 instances of MongoDB running. There will be three primaries and three secondaries (with the Arbiter running on each Primary). Is this correct?
A replica set Arbiter is still an instance of mongod. It's just that an Arbiter does not have a copy of the data and cannot become a Primary. You should have 3 instances per shard, which means 9 instances in total.
Since you mentioned that you would like to have a highly available database deployment, please note that the minimum recommended replica set members for production deployment would be a Primary with two Secondaries.
If I have a Primary and Secondary and I'm writing using the 'majority' write acknowledgement, what happens when either the Primary or Secondary goes down?
When either the Primary or Secondary becomes unavailable, a w:majority writes will either:
Wait indefinitely,
Wait until either nodes is restored, or
Failed with timeout.
This is because an Arbiter carries no data and unable to acknowledge writes but still counted as a voting member. See also Write Concern for Replica sets.
I can do all reads from the Primaries (using 'majority' read concern) and be sure that I'm always getting consistent data and the sharding I'm doing is giving me some benefits from distributing the load across the shards
Correct, MongoDB Sharding is to scale horizontally to distribute load across shards. While MongoDB Replication is to provide high availability.
If you read only from the Primary and also specifies readConcern:majority, the application will read data that has been acknowledged by the majority of the replica set members. This data is durable in the event of partition (i.e. not rolled back). See also Read Concern 'majority'.
What is the use case for causally consistent sessions?
Causal Consistency is used if the application requires an operation to be logically dependent on a preceding operation (causal). For example, a write operation that deletes all documents based on a specified condition and a subsequent read operation that verifies the delete operation have a causal relationship. This is especially important in a sharded cluster environment, where write operations may go to different replica sets.

Related

How MongoDB detects majority in PSA architecture?

Consider I have a replica set with 3 nodes (2 data nodes and one arbiter (PSA)). When for some reason one of my data nodes goes down and I bring it back, during syncing with primary node, that is in state STARTUP2. At his time I will lose my change stream because my replica set has 2 data nodes but I don't have majority of nodes to read.
How can I handle this issue?
I also read this MongoDB doc. Is that possible to set primary node priority value higher than secondary node (that is syncing itself with primary node)? Can I have majority by doing this even when my secondary node is in STARTUP2 state?
There are technically two types of majority. As I called them, they're "election majority" and "data majority".
Arbiters are supposed to help with "election majority", where it helps maintain a primary availability in a PSA architecture should the S went down. However, they're not a part of "data majority".
"Data majority", in contrast, are both for voting and acknowledging majority-read and majority-write.
Changestreams by design will return documents that are committed to the "data majority" of voting nodes. This is because a write that's propagated to them will not be rolled back. It will be confusing if a changestream declared that a document was written, then it rolled back, then would have to issue a "no wait, scratch that, the write didn't happen".
Thus by its nature, arbiters are not compatible with majority-read and majority-write scenarios such as changestreams or transactions. However arbiters still has its place in a replica set, provided you know what to expect from them.
See What is the default mongod write concern in which version? for a more complete explanation of write concerns and the effect of having arbiters.
A secondary in STARTUP2 is not a secondary yet. It may vote in elections, but it won't acknowledge majority writes since it's still starting up.
In terms of changestream, since in a PSA architecture the "data majority" is practically only the PS part of PSA, none of the data bearing nodes can be offline for majority reads and writes to be maintained.
The best solution is to replace the arbiter with an actual data-bearing node. This way, you can have majority-write, majority-read, and can have one node down and still maintain majority.

Can someone give me detailed technical reasons why writing to a secondary in MongoDB replica set is not allowed

I know we can't write to a secondary in MongoDB. But I can't find any technical reason why. In my case, I don't really care if there is a slight delay but write to a secondary might be faster. Please provide some reference if you can. Thanks!!
The reason why you can not write to a secondary is the way replication works:
Secondaries connect to a special collection on the primary, called oplog. This oplog contains operations which were run through the query optimizer. Basically, the oplog is a capped collection, and the secondaries use a tailable cursor to access it's entries and processes it from the oldest to the newest.
When a election takes place because the primary goes down / steps down, the secondary with the most recent oplog entry is elected primary. The secondaries connect to the new primary, query for the oplog entries they haven't processed yet and the cluster is in sync.
This procedure is pretty straight forward. Now imagine one could write to a secondary. All nodes in the cluster would have to have a tailable cursor on all other nodes of the cluster, and maintaining a consistent state in case of one machine failing becomes a very complicated and in case of a failure even race condition dependent thing. Effectively, there could be no guarantee even for eventual consistency any more. It would be a more or less a gamble.
That being said: A replica set is not for load balancing. A replica sets purpose is to enhance the availability and durability of the data. Because reading from a secondary is a non-risky thing, MongoDB made it possible, according to their dogma of offering the maximum of possible features without compromising scalability (which would be severely hampered if one could write to secondaries).
But MongoDB does provide a load balancing feature: sharding. Choosing the right shard key, you can distribute read and write load over (almost) as many shards as you want. Not to mention that you can provide a lot more of the precious RAM for a reasonable price when sharding.
There is a one liner answer:
Multi-master replication is a hairball.
If you was allowed to write to secondaries MongoDB would have to use milti-master replication to ge this working: http://en.wikipedia.org/wiki/Multi-master_replication where essentially evey node copies to each other the OPs (operations) they have received and somehow do it without losing data.
This form of replication has many obsticles to overcome.
One would be throughput; remember that OPs need to transfer across the entire network so it is possible you might actually lose throughput while adding consistentcy problems. So getting better throughput would be a problem. It is much having a secondary, taking all of the primaries OPs and then its own for replication outbound and then asking it to do yet another job.
Adding consistentcy over a distributed set like this would also be hazardous, one main question that bugs MongoDB when asking if a member is down or is: "Is it really down or just unavailable?". It is almost impossible to ensure true consistentcy in a distributed set like this, at the very least tricky.
Those are just two problems immediately.
Essentially, to sum up, MongoDB does not yet possess mlti-master replication. It could in the future but I would not be jumping for joy if it does, I will most likely ignore such a feature, normal replication and sharding in both ACID and non-ACID databases causes enough blood pressure.

mongo - read Preference design strategy

I have an application for which I am tasked with designing a mongo backed data storage.
The application goals are to provide the latest data ( no stale data ) with the fastest load times.
The data size is in the order of a few millions with the application being write heavy.
In choosing what the read strategy is given a 3-node replica set ( 1 primary, 1 secondary, 1 arbiter ), I came across two different strategies to determine where to source the reads from -
Read from the secondary to reduce load on primary. With the writeConcern = REPLICA_SAFE, thus ensuring the writes are done on both primary and the secondary. Set the read preference. to secondaryPreferred.
Always read from primary. but ensure the data is in primary before reading. So set writeConcern= SAFE . The read preference is default - primaryPreferred .
What are the things to be considered before choosing one of the options.
According to the documentation REPLICA_SAFE is a deprecated term and should be replaced with REPLICA_ACKNOWLEDGED. The other problem here is that the w value here appears to be 2 from this constant.
This is a problem for your configuration, as you have your Primary and only one Secondary, combined with an arbiter. In the event of a node going down, or being otherwise unreachable, with the level set as this it is looking to acknowledge all writes from 2 nodes where there will not be 2 nodes available. You can leave write operations hanging in this way.
The better case for your configuration would be MAJORITY, as no matter the number of nodes it will ensure writes to the Primary and the "majority" of the secondaries. But in your case any write concern condition involving more than the PRIMARY will block on all writes, if one of your nodes is down or unavailable, as you would have to have at least two more secondary nodes available so that there would still be a "majority" of nodes to acknowledge the write. Or drop the ARBITER and have two SECONDARY nodes.
So you will have to stick to the default w=1 where all writes are acknowledged to the PRIMARY unless you can deal with writes failing when your one SECONDARY goes down.
You can set the read preference to secondaryPreferred as long as you accept that you can ""possibly" be reading stale or not the latest representation of your data as the only real guarantee is of a write to the Primary node. The general replication considerations remain, in that the nodes should be somewhat equal in processing capability or this can lead to lag or general performance degradation as a result of your query operations.
Remember that replication is implemented for redundancy and is not a system for improving performance. If you are looking for performance then perhaps look into scaling up your system hardware or implement sharding to distribute the load.

mongodb replication + sharding consistency

I have a doubt (well a couple). I think i grasp the answer, but im looking for a confirm
lets say i would implement a sharded cluster of mongodb, is that necessary to have a replica set lying beside shards?
I know that if i use only the replicaSet, and i decide to distribute the reading operation on the secondary nodes, it will cause the eventual-consistency, right?
and in the other hand if i don't enable reads on secondary nodes, the "only" advantage i will get is to protect the database in case of a node will fall
but what about the consistency in a sharded-replicaset? it will still be eventual-consistent or it will be full consistent?
is that necessary to have a replica set lying beside shards
You don't have to but if you care for availability you will.
and i decide to distribute the reading operation on the secondary nodes, it will cause the eventual-consistency, right?
Yes and since secondaries gather as much OPs as primaries and most drivers will only read from one active secondary reading from secondary is quite useless.
the "only" advantage i will get is to protect the database in case of a node will fall
The "only"? That is the whole point of replica sets, to produce automatic fail over. That is there fundamental reason for existing and it is a big one.
it will still be eventual-consistent or it will be full consistent?
It depends on where you do your reads but if you read from secondaries in a sharded setup you not only get eventual consistency but due to chunk movement you might also get duplicate documents.
If you are reading from primaries then you will get strong consistency in a sharded replica set setup.

MongoDB sharding and read replicas

I am in a process of setting up Sharded cluster.
I also wanted to configure read replicas in cluster.
Suppose there is a Shard of 3, 1 Primary and 2 Secondary. The writes will goto Primary member of a Shard, but can I send all reads to Secondaries?
It is not recommended to read from secondaries in a sharded cluster. This is because unlike the primaries, the secondaries have no idea what data the shard is supposed to contain. So when chunks are migrated from one shard to another, reading from the secondaries might result in duplicate or missing results.
In general, reading from secondaries also means the loss of consistency guarantees, since as long as writes have not propagated from the primaries, the secondaries will see, and return, stale data.
Finally, you should keep in mind that secondaries receive essentially the same write load as the primaries, so reading from the secondaries is not likely to provide much of a performance advantage.
The recommended way to get more read throughput is usually to add more shards, not reading from secondaries.