mongodb replica sets read locking - mongodb

We have a Mongo Replica Set with three nodes in three datacenters. Two of them with data and the other one is an arbitrer
We are doing stressful writes in the primary with almost 100% of locking so we are doing the reads in the replica node (secondary). Our problem is that the reads are slow too in the secondary due to those writes.
Are we missing anything?

We are doing stressful writes in the primary with almost 100% of locking so we are doing the reads in the replica node (secondary). Our problem is that the reads are slow too in the secondary due to those writes.
When you perform a write to the primary, that write also has to be performed on the secondary. So the secondary is doing the same work as the primary.
So if you have 100% locking on the primary, you have 100% locking on the secondary.
Moving reads to the secondary probably won't help because your IO on the primary is probably completely locked so it can't keep up.
Run iostat and top and figure out where the bottleneck is. It's likely that you'll need power, but it may just be an indexing problem.

Related

Guarantee consistency of data across microservices access a sharded cluster in MongoDB

My application is essentially a bunch of microservices deployed across Node.js instances. One service might write some data while a different service will read those updates. (specific example, I'm processing data that is inbound to my solution using a processing pipeline. Stage 1 does something, stage 2 does something else to the same data, etc. It's a fairly common pattern)
So, I have a large data set (~250GB now, and I've read that once a DB gets much larger than this size, it is impossible to introduce sharding to a database, at least, not without some major hoop jumping). I want to have a highly available DB, so I'm planning on a replica set with at least one secondary and an arbiter.
I am still researching my 'sharding' options, but I think that I can shard my data by the 'client' that it belongs to and so I think it makes sense for me to have 3 shards.
First question, if I am correct, if I have 3 shards and my replica set is Primary/Secondary/Arbiter (with Arbiter running on the Primary), I will have 6 instances of MongoDB running. There will be three primaries and three secondaries (with the Arbiter running on each Primary). Is this correct?
Second question. I've read conflicting info about what 'majority' means... If I have a Primary and Secondary and I'm writing using the 'majority' write acknowledgement, what happens when either the Primary or Secondary goes down? If the Arbiter is still there, the election can happen and I'll still have a Primary. But, does Majority refer to members of the replication set? Or to Secondaries? So, if I only have a Primary and I try to write with 'majority' option, will I ever get an acknowledgement? If there is only a Primary, then 'majority' would mean a write to the Primary alone triggers the acknowledgement. Or, would this just block until my timeout was reached and then I would get an error?
Third question... I'm assuming that as long as I do writes with 'majority' acknowledgement and do reads from all the Primaries, I don't need to worry about causally consistent data? I've read that doing reads from 'Secondary' nodes is not worth the effort. If reading from a Secondary, you have to worry about 'eventual consistency' and since writes are getting synchronized, the Secondaries are essentially seeing the same amount of traffic that the Primaries are. So there isn't any benefit to reading from the Secondaries. If that is the case, I can do all reads from the Primaries (using 'majority' read concern) and be sure that I'm always getting consistent data and the sharding I'm doing is giving me some benefits from distributing the load across the shards. Is this correct?
Fourth (and last) question... When are causally consistent sessions worthwhile? If I understand correctly, and I'm not sure that I do, then I think it is when I have a case like a typical web app (not some distributed application, like my current one), where there is just one (or two) nodes doing the reading and writing. In that case, I would use causally consistent sessions and do my writes to the Primary and reads from the Secondary. But, in that case, what would the benefit of reading from the Secondaries be, anyway? What am I missing? What is the use case for causally consistent sessions?
if I have 3 shards and my replica set is Primary/Secondary/Arbiter (with Arbiter running on the Primary), I will have 6 instances of MongoDB running. There will be three primaries and three secondaries (with the Arbiter running on each Primary). Is this correct?
A replica set Arbiter is still an instance of mongod. It's just that an Arbiter does not have a copy of the data and cannot become a Primary. You should have 3 instances per shard, which means 9 instances in total.
Since you mentioned that you would like to have a highly available database deployment, please note that the minimum recommended replica set members for production deployment would be a Primary with two Secondaries.
If I have a Primary and Secondary and I'm writing using the 'majority' write acknowledgement, what happens when either the Primary or Secondary goes down?
When either the Primary or Secondary becomes unavailable, a w:majority writes will either:
Wait indefinitely,
Wait until either nodes is restored, or
Failed with timeout.
This is because an Arbiter carries no data and unable to acknowledge writes but still counted as a voting member. See also Write Concern for Replica sets.
I can do all reads from the Primaries (using 'majority' read concern) and be sure that I'm always getting consistent data and the sharding I'm doing is giving me some benefits from distributing the load across the shards
Correct, MongoDB Sharding is to scale horizontally to distribute load across shards. While MongoDB Replication is to provide high availability.
If you read only from the Primary and also specifies readConcern:majority, the application will read data that has been acknowledged by the majority of the replica set members. This data is durable in the event of partition (i.e. not rolled back). See also Read Concern 'majority'.
What is the use case for causally consistent sessions?
Causal Consistency is used if the application requires an operation to be logically dependent on a preceding operation (causal). For example, a write operation that deletes all documents based on a specified condition and a subsequent read operation that verifies the delete operation have a causal relationship. This is especially important in a sharded cluster environment, where write operations may go to different replica sets.

MongoDB Replica Set CPU load

I am running a fairly standard MongoDB (3.0.5) replica set with 1 primary and 2 secondaries. My PHP application's read preference is primary, so no reads take place on the secondaries - they are only for failover. I am running a load test on my application, which creates around 600 queries / updates per second. The operations are all being run against a collection that has ~500,000 documents. However, the queries are optimized and supported by indexes. Any query will not take longer than 40ms max.
My problem is that I am getting a quite high CPU load on all 3 nodes (200% - 300%) - sometimes the load on the secondaries is even higher than on the primary. Disk IO and RAM usage seem to be okay - at least they are not hitting any limits.
The primary's log file contains a huge amount of getmore oplog queries - I would guess that any operation on the primary creates an oplog query. It appears to me that this is too much replication overhead but I don't have any prior experience with MongoDB under load and I don't have any key figures.
As the setup will have to tolerate even more load in production, my question is whether the replication overhead is to be expected and whether it's normal that the CPU load goes up that high, even on the secondaries or is there something I'm missing?
Think about it this way. Whatever data-changing operation happens on the primary, it also needs to happen on every secondary. If there are many such operations and they create high CPU load on the primary, well, then the same situation will repeat itself on the secondaries.
Of course, in your case you'd expect the primary's CPU to be more stressed, because in addition to the writes it also handles all the reads. Probably, in your scenario, reads are relatively light and there aren't many of them when compared to the amount of writes. This would explain why the load on the primary is roughly the same as on the secondaries.
my question is whether the replication overhead is to be expected
What you call replication overhead I see as the nature of replication. A primary stressed by writes results in all secondaries being stressed by writes as well.
and whether it's normal that the CPU load goes up that high, even on the secondaries
You have 600 write queries per second and your RAM and disk are not stressed, to me this signifies that you've set up your indexes properly. High CPU load is expected with this amount of write operations per second, because the indexes are being used intensively.
Please keep in mind that once you have gathered more data, the indexes and the memory-mapped data may not fit into memory anymore, and then both the RAM and the disk will be stressed, while CPU is unlikely to be under high load anymore. In this situation, you will probably want to either add more RAM or look into sharding.

Mongodb write performance/load primary vs secondary

In a replicated Mongodb environment, is the write performance/load on the secondaries the same as the primaries? If so, or not, why?
Edit: By writes to the secondary I am referring to the automatic propagation of the writes from the primary to the secondary.
Edit2: To help guide the conversation http://docs.mongodb.org/manual/core/replica-set-sync/#multithreaded-replication might suggest that write performance from the primaries to the secondaries might be better since they are performed in batch.
If by load you mean only writes in an isolated or off-peak system then it must be an uninteresting and similar performance/write. and almost who cares. However, in a working system where you have concurrent reads and writes then no. because one can alter that performance/readwrite if you use a 'read preference' of "secondary" or "secondary preferred" (heck, anything but "primary"). Under this scenario a replica set with 11 secondaries and 1 primary one can clearly see that any single secondary has a fraction of cpu/memory/disk/etc competition not only disk contention than the single primary. Recall that the default mode is brutal on the primary. here the secondaries exist only for redundancy vs high-availability.
primary Default mode. All operations read from the current replica set
primary.
One can think of a RAID system whereby mirroring increases redundancy while striping increases performance. (True, not exactly the same mechanics but from the user's pov its a similar result in terms of reads. Sharding is closer to a RAID with striping) Using the default read preference of 'primary' one taps only into mirroring; using a read preference of 'secondary' you tap into the greater throughput.

Can someone give me detailed technical reasons why writing to a secondary in MongoDB replica set is not allowed

I know we can't write to a secondary in MongoDB. But I can't find any technical reason why. In my case, I don't really care if there is a slight delay but write to a secondary might be faster. Please provide some reference if you can. Thanks!!
The reason why you can not write to a secondary is the way replication works:
Secondaries connect to a special collection on the primary, called oplog. This oplog contains operations which were run through the query optimizer. Basically, the oplog is a capped collection, and the secondaries use a tailable cursor to access it's entries and processes it from the oldest to the newest.
When a election takes place because the primary goes down / steps down, the secondary with the most recent oplog entry is elected primary. The secondaries connect to the new primary, query for the oplog entries they haven't processed yet and the cluster is in sync.
This procedure is pretty straight forward. Now imagine one could write to a secondary. All nodes in the cluster would have to have a tailable cursor on all other nodes of the cluster, and maintaining a consistent state in case of one machine failing becomes a very complicated and in case of a failure even race condition dependent thing. Effectively, there could be no guarantee even for eventual consistency any more. It would be a more or less a gamble.
That being said: A replica set is not for load balancing. A replica sets purpose is to enhance the availability and durability of the data. Because reading from a secondary is a non-risky thing, MongoDB made it possible, according to their dogma of offering the maximum of possible features without compromising scalability (which would be severely hampered if one could write to secondaries).
But MongoDB does provide a load balancing feature: sharding. Choosing the right shard key, you can distribute read and write load over (almost) as many shards as you want. Not to mention that you can provide a lot more of the precious RAM for a reasonable price when sharding.
There is a one liner answer:
Multi-master replication is a hairball.
If you was allowed to write to secondaries MongoDB would have to use milti-master replication to ge this working: http://en.wikipedia.org/wiki/Multi-master_replication where essentially evey node copies to each other the OPs (operations) they have received and somehow do it without losing data.
This form of replication has many obsticles to overcome.
One would be throughput; remember that OPs need to transfer across the entire network so it is possible you might actually lose throughput while adding consistentcy problems. So getting better throughput would be a problem. It is much having a secondary, taking all of the primaries OPs and then its own for replication outbound and then asking it to do yet another job.
Adding consistentcy over a distributed set like this would also be hazardous, one main question that bugs MongoDB when asking if a member is down or is: "Is it really down or just unavailable?". It is almost impossible to ensure true consistentcy in a distributed set like this, at the very least tricky.
Those are just two problems immediately.
Essentially, to sum up, MongoDB does not yet possess mlti-master replication. It could in the future but I would not be jumping for joy if it does, I will most likely ignore such a feature, normal replication and sharding in both ACID and non-ACID databases causes enough blood pressure.

MongoDB write lock blocks reads on secondaries?

MongoDB docs about concurrency state that the DB is 'write greedy'. That is something I understand. However it does not tell about what locks do to secondaries in a replica set.
Taking my use-case which would get about 40 writes per 100 queries wherein I am not in need of having the most recent document at all times. A lag of 5-10 seconds is okay with me which is how much the secondaries in a replica set would be behind the master. Now if the write lock locks down master as well as the replicas, then I am locked out of reads on secondaries as well.
I wanted to know if writers will lock read operations on secondaries as well.
Into a replica set SECONDARY servers are not affected by the write lock on MASTERS.
You can see the status of your servers by using mongotop or montostat.
The locks are per mongod instance. That means that the read/write locks are locking operations only on the primary. The secondaries are reading oplog from primary and replicating actions from the primary.
You can read much more details on their manual about concurrency.