I have a single mongod instance with 2 replications(secondary mongod instances) and a java code which inserts 1 million simple objects with WriteConcern = JOURNAL_SAFE.
While the java code is in execution we kill the primary instance, the java code throws an exception server not available. Then i shutdown both other secondary nodes and started each node separately as standalone and then check the record count. we observe that record count in both secondary mongod instances is same while in primary a one record is missing and the missing record is the one on which the job failed(mongod instance was killed).
Can anyone please explain this behavior, if the record is not present is primary how can it be possible that record exist in secondary.
Regards,
Bhagwant Bhobe
This not at all unexpected - in fact, I would expect this to be the case because replication in MongoDB is asynchronous and "as fast as possible" - as soon as the primary records the write in memory, it is visible via the oplog to the secondaries which apply it to themselves.
Because you killed the primary server before it had a chance to flush the write from memory to disk, that node does not have the inserted record when you examine it, but the secondaries have it because it was replicated, and normally replication takes less time than flushing data to disk (though it depends on speed of your disk, speed of your network and relative timing of events).
Related
I'm new to MongoDB correct me if I'm wrong.
In MongoDB read and write operation is performed on the primary node. Doesn't it makes more sense to do a read operation in both primary and secondary while write operation only in primary node. As the primary node will eventually update the secondary nodes.
If both read and write operation has to be done from primary node then why to maintain more than one secondary node as it will not reduce the traffic to a single database, ignoring the data safety part for time being.
By default, the Primary handled both read and write but you can direct your reads to Secondary nodes and mongodb supports that. The question is, are you ok with reading stale data. Because the Secondary nodes replicate by tailing the Primary's oplog, they usually lag behind the Primary and you may end up reading old data at times. If your requirement isn't realtime read/write, it is totally fine to read the data from Secondary nodes
The main purpose of maintaining more than one secondary node is also for High Availability (no downtime). For instance, if you've a 3 node replica set and say one node is down due to NW issue. At this state, you have two nodes (majority members) online that can serve the read and write requests without any impact to your application
My application is essentially a bunch of microservices deployed across Node.js instances. One service might write some data while a different service will read those updates. (specific example, I'm processing data that is inbound to my solution using a processing pipeline. Stage 1 does something, stage 2 does something else to the same data, etc. It's a fairly common pattern)
So, I have a large data set (~250GB now, and I've read that once a DB gets much larger than this size, it is impossible to introduce sharding to a database, at least, not without some major hoop jumping). I want to have a highly available DB, so I'm planning on a replica set with at least one secondary and an arbiter.
I am still researching my 'sharding' options, but I think that I can shard my data by the 'client' that it belongs to and so I think it makes sense for me to have 3 shards.
First question, if I am correct, if I have 3 shards and my replica set is Primary/Secondary/Arbiter (with Arbiter running on the Primary), I will have 6 instances of MongoDB running. There will be three primaries and three secondaries (with the Arbiter running on each Primary). Is this correct?
Second question. I've read conflicting info about what 'majority' means... If I have a Primary and Secondary and I'm writing using the 'majority' write acknowledgement, what happens when either the Primary or Secondary goes down? If the Arbiter is still there, the election can happen and I'll still have a Primary. But, does Majority refer to members of the replication set? Or to Secondaries? So, if I only have a Primary and I try to write with 'majority' option, will I ever get an acknowledgement? If there is only a Primary, then 'majority' would mean a write to the Primary alone triggers the acknowledgement. Or, would this just block until my timeout was reached and then I would get an error?
Third question... I'm assuming that as long as I do writes with 'majority' acknowledgement and do reads from all the Primaries, I don't need to worry about causally consistent data? I've read that doing reads from 'Secondary' nodes is not worth the effort. If reading from a Secondary, you have to worry about 'eventual consistency' and since writes are getting synchronized, the Secondaries are essentially seeing the same amount of traffic that the Primaries are. So there isn't any benefit to reading from the Secondaries. If that is the case, I can do all reads from the Primaries (using 'majority' read concern) and be sure that I'm always getting consistent data and the sharding I'm doing is giving me some benefits from distributing the load across the shards. Is this correct?
Fourth (and last) question... When are causally consistent sessions worthwhile? If I understand correctly, and I'm not sure that I do, then I think it is when I have a case like a typical web app (not some distributed application, like my current one), where there is just one (or two) nodes doing the reading and writing. In that case, I would use causally consistent sessions and do my writes to the Primary and reads from the Secondary. But, in that case, what would the benefit of reading from the Secondaries be, anyway? What am I missing? What is the use case for causally consistent sessions?
if I have 3 shards and my replica set is Primary/Secondary/Arbiter (with Arbiter running on the Primary), I will have 6 instances of MongoDB running. There will be three primaries and three secondaries (with the Arbiter running on each Primary). Is this correct?
A replica set Arbiter is still an instance of mongod. It's just that an Arbiter does not have a copy of the data and cannot become a Primary. You should have 3 instances per shard, which means 9 instances in total.
Since you mentioned that you would like to have a highly available database deployment, please note that the minimum recommended replica set members for production deployment would be a Primary with two Secondaries.
If I have a Primary and Secondary and I'm writing using the 'majority' write acknowledgement, what happens when either the Primary or Secondary goes down?
When either the Primary or Secondary becomes unavailable, a w:majority writes will either:
Wait indefinitely,
Wait until either nodes is restored, or
Failed with timeout.
This is because an Arbiter carries no data and unable to acknowledge writes but still counted as a voting member. See also Write Concern for Replica sets.
I can do all reads from the Primaries (using 'majority' read concern) and be sure that I'm always getting consistent data and the sharding I'm doing is giving me some benefits from distributing the load across the shards
Correct, MongoDB Sharding is to scale horizontally to distribute load across shards. While MongoDB Replication is to provide high availability.
If you read only from the Primary and also specifies readConcern:majority, the application will read data that has been acknowledged by the majority of the replica set members. This data is durable in the event of partition (i.e. not rolled back). See also Read Concern 'majority'.
What is the use case for causally consistent sessions?
Causal Consistency is used if the application requires an operation to be logically dependent on a preceding operation (causal). For example, a write operation that deletes all documents based on a specified condition and a subsequent read operation that verifies the delete operation have a causal relationship. This is especially important in a sharded cluster environment, where write operations may go to different replica sets.
I am dealing with rollback procedures of MongoDB. Problem is rollback for huge data may be bigger than 300 MB or more.
Is there any solution for this problem? Error log is
replSet syncThread: replSet too much data to roll back
In official MongoDB document, I could not see a solution.
Thanks for the answers.
The cause
The page Rollbacks During Replica Set Failover states:
A rollback is necessary only if the primary had accepted write operations that the secondaries had not successfully replicated before the primary stepped down. When the primary rejoins the set as a secondary, it reverts, or “rolls back,” its write operations to maintain database consistency with the other members.
and:
When a rollback does occur, it is often the result of a network partition.
In other words, rollback scenario typically occur like this:
You have a 3-nodes replica set setup of primary-secondary-secondary.
There is a network partition, separating the current primary and the secondaries.
The two secondaries cannot see the former primary, and elected one of them to be the new primary. Applications that are replica-set aware are now writing to the new primary.
However, some writes keep coming into the old primary before it realized that it cannot see the rest of the set and stepped down.
The data written to the old primary in step 4 above are the data that are rolled back, since for a period of time, it was acting as a "false" primary (i.e., the "real" primary is supposed to be the elected secondary in step 3)
The fix
MongoDB will not perform a rollback if there are more than 300 MB of data to be rolled back. The message you are seeing (replSet too much data to roll back) means that you are hitting this situation, and would either have to save the rollback directory under the node's dbpath, or perform an initial sync.
Preventing rollbacks
Configuring your application using w: majority (see Write Concern and Write Concern for Replica Sets) would prevent rollbacks. See Avoid Replica Set Rollbacks for more details.
I have a three member replica set using MongoDB v3.2.4. Each member is a VM with 8 cores and 8GB RAM, and in normal operations these nodes are running very low in CPU and memory consumption.
I have a 60GB database (30 million docs) that once a month is totally reloaded by a Map/Reduce job written in Pig. During this job the cluster receives 30k insert/s and in a few minutes the secondaries becomes out of sync.
The current oplog size is 20GB (already modified from the default) but this does not resolve the replication sync issue.
I don't know if modifying the oplog size again will help. My concern is that the replication seems to be done when there is no load on the primary. Since my insert job lasts 1 hour does that mean I need an oplog the size of my db?
Is there a way to tell MongoDB to put more effort on replication and have a more balanced workload between accepting inserts and replication?
Is there a way to tell mongo to put more effort on replication to have a more balanced workload between accepting inserts and replicatings these inserts?
To ensure data has replicated to secondaries (and throttle your inserts) you should increase your write concern to w:majority. The default write concern (w:1) only confirms that a write operation has been accepted by the primary, so if your secondaries cannot keep up for an extended period of inserts they will eventually fall out of sync (as you have experienced).
You can include the majority as an option in your MongoDB Connection String URI, eg:
STORE data INTO
'mongodb://user:pass#db1.example.net,db2.example.net/my_db.my_collection?replicaSet=replicaSetName&w=majority'
USING com.mongodb.hadoop.pig.MongoInsertStorage('', '');
Recently, I found that a insert op make the mongod slow query be happend.
And always heppend when the secondary mongod instance syncing data from another node.
The replicate set has three members and I set the client driver write concern "w : 2".
the oplog sync will block insert op?
what be happend when insert document to a syncing node?
The writeConcern setting w:2 means that the write will be acknowledged when exactly two members of the replica set has acknowledged that the write happened (see https://docs.mongodb.com/v3.2/reference/write-concern/#w-option). In other words, it will wait until the write has replicated (via the oplog) to one other node, since the Primary is counted as one node.
This means that the "speed" of the insert/update query will be subject to your network speed. If the network is slow or congested, then the insert will appear to be "slow". This is not due to replication blocking anything, it is simply the effect of specifying w:2 in a congested network.
There may be a network congestion that triggers both the sync source change and the slow insert, but the replication process by itself does not block any insert operation.