Let's say I am having one Primary (A) & two secondary (B, C). If I am doing write using write majority. Can some one please explain my below doubts:-
Let's say a write was done using majority and it updated the data in
A & B and the write did not yet propagate to C. At this time if a
read comes for the same data using secondary or secondary preferred
will the query be served from B which is having the latest data or
mongo cannot gurantee this and the read may return a stale data from
C.
Let's say a write was done using majority again and let's say the
write was done on A and then a write is on progress in one of the
secondary B. If a read comes at that time will the read be blocked
or it will serve a stale data from C?
Let's say I have taken out the secondary C and the same case is in
progress as we mentioned in the above case. Will the read from
secondary B be blocked till the write is complete on B or the read
will not be blocked and a stale data will be served from B?
Environment
Mongo Version - 3.0.9
Storage Engine - MMAPv1
Mongodb replication process is async to secondary. If the read concern is set as 'majority', you may read the stale data. Basically, this means you have set the read preference as Eventual consistency.
If the read concern is set as "local", you will get the latest data from primary.
Please note that readConcern level of "majority" can be used in WiredTiger storage engine only. The WiredTiger storage engine is append only storage engine and doesn't use in place updates. There is no locks and offers document level concurrency.
Read concern = "majority"
The query returns the instance’s most recent copy of data confirmed as
written to a majority of members in the replica set.
To use a read concern level of "majority", you must use the WiredTiger
storage engine and start the mongod instances with the
--enableMajorityReadConcern command line option (or the replication.enableMajorityReadConcern setting if using a configuration
file).
Question 1: The Mongo does not guarantee that the read will be served from the secondary in which the data is written?
Answer 1: MongoDB doesn't guarantee this. The selection of the secondary depends on the following:-
When you select non-primary read preference, the driver will determine which member to target based on various factors. Refer this link.
Read preference mechanics member selection
Question 2: The reading will never be blocked even if a write is on progress on the same data?
Answer 2: Reading will not be blocked. However, you may read some stale data.
Reads may miss matching documents that are updated during the course
of the read operation.
Concurrency locking what isolation guarantees does MongoDB provide
Related
In MongoDB 4.4.1 there is mirroredRead configuration which allows primary to forward read/update requests to secondary replicaset.
How it is different from secondaryPreferred readPerence when its sampling rate is set to 1.0?
What is the use-case of mirroredRead?
reference - https://docs.mongodb.com/manual/replication/#mirrored-reads-supported-operations
What is the use-case of mirroredRead?
This is described in the documentation you linked:
MongoDB provides mirrored reads to pre-warm the cache of electable secondary members
If you are not familiar with cache warming, there are many resources describing it, e.g. https://www.section.io/blog/what-is-cache-warming/.
A secondary read:
Is sent to the secondary, thus reducing the load on the primary
Can return stale data
A mirrored read:
Is sent to the primary
Always returns most recent data
mirroredRead configuration which allows primary to forward read/update requests to secondary replicaset.
This is incorrect:
A mirrored read is not applicable to updates.
The read is not "forwarded". The primary responds to the read using its local data. Additionally, the primary sends a read request to one or more secondaries, but does not receive a result of this read at all (and does not "forward" the secondary read result back to the application).
Let's suppose you always use primary read preference and you have 2 members that are electable for being primary.
Since all of your reads are taking place in primary instance, its cache is heavily populated and since your other electable member doesn't receive any reads, its cache can be considered to be empty.
Using mirrored reads, the primary will send a portion (in your question 100%) of read requests to that secondary as well, to make her familiar with the pattern of read queries and populate its cache.
Suddenly a disaster occurs and current primary goes down. Now your new primary has a pre-warmed cache that can respond to queries as fast as the previous primary, without shocking the system to populate its cache.
Regarding the impact of sampling rate, MongoDB folks in their blog post introducing this feature stated that increasing the sampling rate would increase load on the Replica Set. My understanding is that you may already have queries with read preference other than primary that makes your secondary instance already busy. In this case, these mirrored reads can impact on the performance of your secondary instance. Hence, you may not want to perform all primary reads again on these secondaries (The repetition of terms secondary and primary is mind blowing!).
The story with secondaryPreferred reads is different and you're querying secondaries for data, unless there is no secondary.
Mongo
From this resource I understand why mongo is not A(Highly Available) based on below statement
MongoDB supports a “single master” model. This means you have a master
node and a number of slave nodes. In case the master goes down, one of
the slaves is elected as master. This process happens automatically
but it takes time, usually 10-40 seconds. During this time of new
leader election, your replica set is down and cannot take writes
Is it for the same reason Mongo is said to be Consistent(as write did not happen so returning the latest data in system ) but not Available(not available for writes) ?
Till re-election happens and write operation is in pending, can slave return perform the read operation ? Also does user re-initiate the write operation again once master is selected ?
But i do not understand from another angle why Mongo is highly consistent
As said on Where does mongodb stand in the CAP theorem?,
Mongo is consistent when all reads go to the primary by default.
But that is not true. If under Master/slave model , all reads will go to primary what is the use of slaves then ? It further says If you optionally enable reading from the secondaries then MongoDB becomes eventually consistent where it's possible to read out-of-date results. It means mongo may not be be
consistent with master/slaves(provided i do not configure write to all nodes before return). It does not makes sense to me to say mongo is consistent if all
read and writes go to primary. In that case every other DB also(like cassandra) will be consistent . Is n't it ?
Cassandra
From this resource I understand why Cassandra is A(Highly Available ) based on below statement
Cassandra supports a “multiple master” model. The loss of a single
node does not affect the ability of the cluster to take writes – so
you can achieve 100% uptime for writes
But I do not understand why cassandra is not Consistent ? Is it because node not available for write(as coordinated node is not able to connect) is available for read which can return stale data ?
Go through: MongoDB, Cassandra, and RDBMS in CAP, for better understanding of the topic.
A brief definition of Consistency and availability.
Consistency simply means, when you write a piece of data in a system/distributed system, the same data you should get when you read it from any node of the system.
Availability means, the system should always be available for read/write operation.
Note: Most systems are not, only available or only consistent, they always offer a bit of both
With the above definition let's see where MongoDB and Cassandra fall in CAP.
MongoDB
As you said MongoDB is highly consistent when reads and write go to the same node(the default case). Further, you can choose in MongoDB to read from other secondary nodes instead of reading from only leader/primary.
Now, when you try to read data from secondary, your consistency will completely depend on, how you want to read data:
You could ask data which is up to maximum, say 5 seconds stale or,
You could just say, return data from majority of nodes for your select statement.
Same way when you write from your client into Mongo leader, you can say, a write is successful if the data is replicated to or stored on majority of servers.
Clearly, from above, we can say MongoDb can be highly consistent or eventually consistent based on how you read/write your data.
Now, what about availability? MongoDB is mostly always available, but, the only time when the leader is down, MongoDB can't accept writes, until it figures out the new leader. Hence, not highly available
So, MongoDB is categorized under CP.
What about Cassandra?
In Cassandra, there is no leader and any nodes can accept write, so the Cassandra cluster is always available for writes and reads even if some nodes go down.
What about consistency in Cassandra?
Same as MongoDB Cassandra can be eventually consistent or highly consistent based on how you read/write data.
You can give consistency levels in your read/write operations, For example:
read/write data from one node
read/write data from majority/quorum of nodes and more
Let's say you give a consistency level of one in your read/write operation. So, your write is successful as soon as data is written to one replica. Now, if your read request happens to go to the other replica where the data is not updated yet(could be due to high network latency or any other reason), you will end up reading the old data.
So, Cassandra is highly available but has configurable consistency levels and hence not always consistent.
In conclusion, in their default behavior, MongoDB falls under CP and Cassandra in AP.
Consistency in the CAP paradigm also includes "eventual consistency" which MongoDB supports. In a contrast to ACID systems, the read in CAP systems does not guarantee a safe return.
In simple words, this means that your Master could have an updated value, but if you do read from Slave, it does not necessarily return the updated value, and that it's okay to no have this updated value by design.
The concept of eventual consistency is explained in an excellent answer here.
By architecture, Cassandra is supposed to be consistent; it offers a special implementation of eventual consistency called the 'tunable consistency' which would meant that the client application may choose the method of handling this- it even offers multi data centre consistency support at low levels!
Most issues from row wise inconsistency in Cassandra comes from the fact that Cassandra uses client timestamps to determine which value is the most recent, and not the server side ones, which may be tad bit confusing to understand at first.
I hope this helps!
You have only to understand the "point-in-time": As you only write to mongodb master, even if slave is not updated, it is consistent, as it has all the data generated util the sync moment.
That is not true for cassandra. As cassandra uses a master-less model, there's no garantee that other nodes has all the data. At a certain time, a node can have certain recent data, and not having older data from nodes not yet synced. Cassandra will only be consistent if you stop write to all nodes and put them online. As soon the sync finished you have a consistent data.
I read a lot the MongoDB documentation, but I couldn't understand the difference between readConcern and readPreference options.
For example: what is the result if I set 'majority' in my read concern option and 'primary' as a my read preference option? These two options seems contraditory.
I know that at query level I can only set the readConcern preference, but at client level I can set readPreference also.
In a replica set the primary MongoDB instance is the master instance that receives all write operations.
The primary read preference is the default mode and concerns MongoDB clients; it's a driver/client option. That means you read data from the master instance where it is written first (before replicated to other replica set members).
If you use other modes than the primary read preference then you risk to read stale data.
Read concern is a query option for replica sets. By default the read concern is local. This returns the most recent data available at the time of the query execution. The data may not have been persisted to the majority of the replica set members and may be rolled back. The option can be set to majority, which will make the query read the most recent data that has been persisted to the majority of the replica set members and will not be rolled back. However you have to set that up properly (works only with WiredTiger engine and some other requirements...) and you might miss more recent data that is written but not persisted to the majority of replica set members.
Let's assume that you use default options for read preference and read concern. Then your MongoDB driver will route read request to the primary replica set member (master instance) and that instance would return the most recent data available at that moment. That data might not have been persisted to the majority of the replica set members and might be rolled back.
Similarly you can think of use cases where you use a different combination of the read concern and read preference options.
local / primaryPreferred
local / secondary
local / secondaryPreferred
local / nearest
majority / primaryPreferred
majority / secondary
majority / secondaryPreferred
majority / nearest
The options are described in the MongoDB Doc. Some combinations might make sense in some situations and some other combinations may make sense in other situations. I simply listed them here for completeness. And I'd interpret that as follows:
the request is routed according to the read preference option (driver option)
second the request is executed according to the read concern option (query option)
readConcern - is the way we want to read data from mongo - that means if we have a replica set, then readConcern majority will allow to get data saved (persisted) to majority of replica set, so we can be assured that this document will not be rolled back if there be an issue with replication.
readPreference - can help with balancing load, so we can have a report generator process which will always read data from secondary and leave primary server to serve data for online users.
The current configuration of MongoDb is:
one primary(A), two secondaries(B and C), all part of one replica set
inserts to the primary are done with write-concern: majority
read preference is set to "nearest" when reading from the replica set
Scenario:
an insert is triggered, which means that it will be successful only after it propagates to the majority of the replica set members
the application cannot read from the primary until the write operation returns (reference)
since the write concern is set to "majority", the write operation will return only after it propagates to at least one secondary (B) instance, in our case with the setup of 3 members
this means that the secondary (B) is also locked for reading, as it is replicating (according to this)
The question is, since the application is set to read from the nearest instance, and let's say the nearest instance is the other secondary (C), if a read request comes through while the write operation is still in progress on the other 2 instances, would the read be allowed or blocked. If it will be allowed, how can I prevent it?
Write concern doesn't really work that way. B and C will both process the write, and take the same db-level write lock while they do it, regardless of whether you send a getLastError with any write concern. While the lock is held on C, reads on C will block.
Write concern is really just for the client, it makes the client wait until a condition (in your case, a majority of the replicas have applied the write) is satisfied. It doesn't change how the secondaries prioritize the replication.
if a read request comes through while the write operation is still in progress on the other 2 instances, would the read be allowed or blocked
Well, you sort of figured it out yourself. You can read (a stale data) from 'C' if it's in the nearest group
how can I prevent it?
Read preference can be applied globally by your driver, database level, collection level or operation level (same can be applied more or less for write concern). If for that certain operation you can't suffer stale data, you can override your read preference for that specific query to primary after you had issued the insert (note that in that scenario the insert operation can be set with a write concern of {w:1}
I am working on a project which has some important data in it. This means we cannot to lose any of it if the light or server goes down. We are using MongoDB for the database. I'd like to be sure that my data is in the database after the insert and rollback the whole batch if one element was not inserted. I know it is the philosophy behind Mongo that we do not need transactions but how can I make sure that my data is really safely stored after insert rather than sent to some "black hole".
Should I make a search?
Should I use some specific mongoDB commands?
Should I use sharding even if one server is enough for satisfying
the speed and by the way it doesn't guarantee anything if the light
goes down?
What is the best solution?
Your best bet is to use Write Concerns - these allow you to tell MongoDB how important a piece of data is. The quickest Write Concern is also the least safe - the data is not flushed to disk until the next scheduled flush. The safest will confirm that the data has been written to disk on a number of machines before returning.
The write concern you are looking for is FSYNC_SAFE (at least that is what it is called from the point of view of the Java driver) or REPLICAS_SAFE which confirms that your data has been replicated.
Bear in mind that MongoDB does not have transactions in the traditional sense - your rollback will have to be rolled by hand as you can't tell the Mongo database to do this for you.
The other thing you need to do is either use the relatively new --journal option (which uses a Write Ahead Log), or use replica sets to share your data across many machines in order to maximise data integrity in the event of a crash/power loss.
Sharding is not so much a protection against hardware failure as a method for sharing the load when dealing with particularly large datasets - sharding shouldn't be confused with replica sets which is a way of writing data to more than one disk on more than one machine.
Therefore, if your data is valuable enough, you should definitely be using replica sets, perhaps even siting slaves in other data centres/availability zones/racks/etc in order to provide the resilience you require.
There is/will be (can't remember offhand whether this has been implemented yet) a way to specify the priority of individual nodes in a replica set such that if the master goes down the new master that is elected is one in the same data centre if such a machine is available (ie to stop a slave on the other side of the country from becoming master unless it really is the only other option).
I received a really nice answer from a person called GVP on google groups. I will quote it(basically it adds up to Rich's answer):
I'd like to be sure that my data is in the database after the
insert and rollback the whole batch if one element was not inserted.
This is a complex topic and there are several trade-offs you have to
consider here.
Should I use sharding?
Sharding is for scaling writes. For data safety, you want to look a
replica sets.
Should I use some specific mongoDB commands?
First thing to consider is "safe" mode or "getLastError()" as
indicated by Andreas. If you issue a "safe" write, you know that the
database has received the insert and applied the write. However,
MongoDB only flushes to disk every 60 seconds, so the server can fail
without the data on disk.
Second thing to consider is "journaling"
(v1.8+). With journaling turned on, data is flushed to the journal
every 100ms. So you have a smaller window of time before failure. The
drivers have an "fsync" option (check that name) that goes one step
further than "safe", it waits for acknowledgement that the data has
be flushed to the disk (i.e. the journal file). However, this only
covers one server. What happens if the hard drive on the server just
dies? Well you need a second copy.
Third thing to consider is
replication. The drivers support a "W" parameter that says "replicate
this data to N nodes" before returning. If the write does not reach
"N" nodes before a certain timeout, then the write fails (exception
is thrown). However, you have to configure "W" correctly based on the
number of nodes in your replica set. Again, because a hard drive
could fail, even with journaling, you'll want to look at replication.
Then there's replication across data centers which is too long to get
into here. The last thing to consider is your requirement to "roll
back". From my understanding, MongoDB does not have this "roll back"
capacity. If you're doing a batch insert the best you'll get is an
indication of which elements failed.
Here's a link to the PHP driver on this one: http://it.php.net/manual/en/mongocollection.batchinsert.php You'll have to check the details on replication and the W parameter. I believe the same limitations apply here.