I am reading WriteConcern at mongoDB wiki but it's not clear for me. I have a question! what is it and when we must use of WithWriteConcern(WriteConcern.Acknowledged)?
what is difference between:
WithWriteConcern(WriteConcern.Acknowledged).InsertOne()
and InsertOne() and which one is better that we use?
please explain simple.
sayres, the write concern is an specification of MongoDB for write operations that determines the acknowledgement you want after a write operation has taken place. MongoDB has a default write concern of always acknowledging all writes, which means that after every write, MongoDB has to always return an acknowledgement (in a form of a document), meaning that it was successful. When asking for write acknowledgement, if none isn't returned (in case of failover, crashes), the write isn't successful. This behavior is very useful specially on replica set usage, since you will have more than one mongod instance, and depending on your needs, maybe you don't want all instances to acknowledge the write, just a few, to speed up writes. Also, when to specify a write concern, you can specify journal writing, so you can guarantee that operation result and any rollbacks required if a failover happens. More information, here.
In your case, it depends on how many mongod (if you have replica sets or just a single server) instances you have. Since "always acknowledge" is the default, you may want to change it if you have to manage replica sets operations and speed things up or just doesn't care about write acknowledgement in a single instance (which is not so good, since it's a single server only).
Related
In general, MongoDB will replicate from a Primary to Secondaries asynchronously, based on number of write operations, time and other factors by shipping oplog from primary to secondaries.
When describing WriteConcern options, MongoDB documentation states "...primary waits until the required number of secondaries acknowledge the write before returning write concern acknowledgment". This seems to suggest that a WriteConcern other than "w:1" would replicate to at least some of the members of the replica set in a blocking manner, potentially avoiding log shipping.
The basic question I'm trying to answer is this: if every write is using WriteCocnern of "majority", would MongoDB ever have to use log shipment? In other words, is using WriteCocnern of "majority" also controls replication timing?
I would like to better understand how MongoDB handles WriteConcern of "majority". A few obvious options:
Primary sends write requests to every Secondary, and blocks the thread until majority respond with acknowledgment
or
Primary pre-selects Secondaries first and sends requests to only those secondaries, blocking the thread until all chosen secondaries respond with acknowledgment
or
Something much smarter than either of these options
If Option 1 is used, in most cases (assuming equidistant placement of secondaries) all secondaries will have received the write operation by the time Write completes, and there's high probability (although not a guarantee) all secondaries will have applied it. If true, this behavior enables use cases where writes need to be reflected on Secondaries quicker than typical asynchronous replication process.
Obviously WriteConcern of "majority" will incur performance penalty, but this may be acceptable for specific use cases where read operations may target Secondaries (e.g. ReadPreference of "nearest") and desire more recent data.
if every write is using WriteConcern of "majority", would MongoDB ever have to use log shipment?
Replication in MongoDB uses what is termed as the oplog. This is a record of all operations on the primary (the only node that accept writes).
Instead of pushing the oplog into the secondaries, the secondaries long-pull on the oplog of the primary. If replication chaining is allowed (the default), then a secondary can also pull the oplog from another secondary. So scenario 1 & 2 you posted are not the reality with MongoDB replication as of MongoDB 4.0.
The details of the replication process is described in MongoDB Github wiki page: Replication Internals.
To quote the relevant parts regarding your question:
If a command includes a write concern, the command will just block in its own thread until the oplog entries it generates have been replicated to the requested number of nodes. The primary keeps track of how up-to-date the secondaries are to know when to return. A write concern can specify a number of nodes to wait for, or majority.
In other words, the secondaries continually report back to the primary how far along it has applied the oplog into its own dataset. Since the primary knows the timestamp that the write took place, once a secondary has applied that timestamp, it can tell that the write has propagated to that secondary. To satisfy the write concern, the primary simply waits until a determined number of secondaries have applied the write timestamp.
Note that only the thread specifying the write concern is waiting for this acknowledgment. All other threads are not blocked due to this waiting at all.
Regarding to you other question:
Obviously WriteConcern of "majority" will incur performance penalty, but this may be acceptable for specific use cases where read operations may target Secondaries (e.g. ReadPreference of "nearest") and desire more recent data.
To achieve what you described, you need a combination of read and write concerns. See
Causal Consistency and Read and Write Concerns for more details on this subject.
Write majority is typically used for:
Ensuring that the write will not be rolled back in the event of the primary failure.
Ensuring that the application is not writing so fast that the provisioned hardware of the replica set cannot cope with the traffic; i.e. it can act as a backpressure mechanism.
In combination with read concern, provide the client with differing levels of consistency guarantees.
These points assume that the write majority was acknowledged and the acknowledgment was received by the client. There are multiple different failure scenario that are possible (as expected with a distributed system that needs to cope with unreliable network), but those are beyond the scope of this discussion.
Mongo
From this resource I understand why mongo is not A(Highly Available) based on below statement
MongoDB supports a “single master” model. This means you have a master
node and a number of slave nodes. In case the master goes down, one of
the slaves is elected as master. This process happens automatically
but it takes time, usually 10-40 seconds. During this time of new
leader election, your replica set is down and cannot take writes
Is it for the same reason Mongo is said to be Consistent(as write did not happen so returning the latest data in system ) but not Available(not available for writes) ?
Till re-election happens and write operation is in pending, can slave return perform the read operation ? Also does user re-initiate the write operation again once master is selected ?
But i do not understand from another angle why Mongo is highly consistent
As said on Where does mongodb stand in the CAP theorem?,
Mongo is consistent when all reads go to the primary by default.
But that is not true. If under Master/slave model , all reads will go to primary what is the use of slaves then ? It further says If you optionally enable reading from the secondaries then MongoDB becomes eventually consistent where it's possible to read out-of-date results. It means mongo may not be be
consistent with master/slaves(provided i do not configure write to all nodes before return). It does not makes sense to me to say mongo is consistent if all
read and writes go to primary. In that case every other DB also(like cassandra) will be consistent . Is n't it ?
Cassandra
From this resource I understand why Cassandra is A(Highly Available ) based on below statement
Cassandra supports a “multiple master” model. The loss of a single
node does not affect the ability of the cluster to take writes – so
you can achieve 100% uptime for writes
But I do not understand why cassandra is not Consistent ? Is it because node not available for write(as coordinated node is not able to connect) is available for read which can return stale data ?
Go through: MongoDB, Cassandra, and RDBMS in CAP, for better understanding of the topic.
A brief definition of Consistency and availability.
Consistency simply means, when you write a piece of data in a system/distributed system, the same data you should get when you read it from any node of the system.
Availability means, the system should always be available for read/write operation.
Note: Most systems are not, only available or only consistent, they always offer a bit of both
With the above definition let's see where MongoDB and Cassandra fall in CAP.
MongoDB
As you said MongoDB is highly consistent when reads and write go to the same node(the default case). Further, you can choose in MongoDB to read from other secondary nodes instead of reading from only leader/primary.
Now, when you try to read data from secondary, your consistency will completely depend on, how you want to read data:
You could ask data which is up to maximum, say 5 seconds stale or,
You could just say, return data from majority of nodes for your select statement.
Same way when you write from your client into Mongo leader, you can say, a write is successful if the data is replicated to or stored on majority of servers.
Clearly, from above, we can say MongoDb can be highly consistent or eventually consistent based on how you read/write your data.
Now, what about availability? MongoDB is mostly always available, but, the only time when the leader is down, MongoDB can't accept writes, until it figures out the new leader. Hence, not highly available
So, MongoDB is categorized under CP.
What about Cassandra?
In Cassandra, there is no leader and any nodes can accept write, so the Cassandra cluster is always available for writes and reads even if some nodes go down.
What about consistency in Cassandra?
Same as MongoDB Cassandra can be eventually consistent or highly consistent based on how you read/write data.
You can give consistency levels in your read/write operations, For example:
read/write data from one node
read/write data from majority/quorum of nodes and more
Let's say you give a consistency level of one in your read/write operation. So, your write is successful as soon as data is written to one replica. Now, if your read request happens to go to the other replica where the data is not updated yet(could be due to high network latency or any other reason), you will end up reading the old data.
So, Cassandra is highly available but has configurable consistency levels and hence not always consistent.
In conclusion, in their default behavior, MongoDB falls under CP and Cassandra in AP.
Consistency in the CAP paradigm also includes "eventual consistency" which MongoDB supports. In a contrast to ACID systems, the read in CAP systems does not guarantee a safe return.
In simple words, this means that your Master could have an updated value, but if you do read from Slave, it does not necessarily return the updated value, and that it's okay to no have this updated value by design.
The concept of eventual consistency is explained in an excellent answer here.
By architecture, Cassandra is supposed to be consistent; it offers a special implementation of eventual consistency called the 'tunable consistency' which would meant that the client application may choose the method of handling this- it even offers multi data centre consistency support at low levels!
Most issues from row wise inconsistency in Cassandra comes from the fact that Cassandra uses client timestamps to determine which value is the most recent, and not the server side ones, which may be tad bit confusing to understand at first.
I hope this helps!
You have only to understand the "point-in-time": As you only write to mongodb master, even if slave is not updated, it is consistent, as it has all the data generated util the sync moment.
That is not true for cassandra. As cassandra uses a master-less model, there's no garantee that other nodes has all the data. At a certain time, a node can have certain recent data, and not having older data from nodes not yet synced. Cassandra will only be consistent if you stop write to all nodes and put them online. As soon the sync finished you have a consistent data.
I have a large collection with more that half a million of docs, which I need to updated continuously. To achieve this, my first approach was to use w=1 to ensure write result, which causes a lot of delay.
collection.update(
{'_id': _id},
{'$set': data},
w=1
)
So I decided to use w=0 in my update method, now the performance got significantly faster.
Since my past bitter experience with mongodb, I'm not sure if all the update are guaranteed when w=0. My question is, is it guaranteed to update using w=0?
Edit: Also, I would like to know how does it work? Does it create an internal queue and perform update asynchronously one by one? I saw using mongostat, that some update is being processed even after the python script quits. Or the update is instant?
Edit 2: According to the answer of Sammaye, link, any error can cause silent failure. But what happens if a heavy load of updates are given? Does some updates fail then?
No, w=0 can fail, it is only:
http://docs.mongodb.org/manual/core/write-concern/#unacknowledged
Unacknowledged is similar to errors ignored; however, drivers will attempt to receive and handle network errors when possible.
Which means that the write can fail silently within MongoDB itself.
It is not reliable if you wish to specifically guarantee. At the end of the day if you wish to touch the database and get an acknowledgment from it then you must wait, laws of physics.
Does w:0 guarantee an update?
As Sammaye has written: No, since there might be a time where the data is only applied to the in memory data and is not written to the journal yet. So if there is an outage during this time, which, depending on the configuration, is somewhere between 10 (with j:1 and the journal and the datafiles living on separate block devices) and 100ms by default, your update may be lost.
Please keep in mind that illegal updates (such as changing the _id of a document) will silently fail.
How does the update work with w:0?
Assuming there are no network errors, the driver will return as soon it has send the operation to the mongod/mongos instance with w:0. But let's look a bit further to give you an idea on what happens under the hood.
Next, the update will be processed by the query optimizer and applied to the in memory data set. After sucessful application of the operation a write with write concern w:1 would return now. The operations applied will be synced to the journal every commitIntervalMs, which is divided by 3 with write concern j:1. If you have a write concern of {j:1}, the driver will return after the operations are stored in the journal successfully. Note that there are still edge cases in which data which made it to the journal won't be applied to replica set members in case a very "well" timed outage occurs now.
By default, every syncPeriodSecs, the data from the journal is applied to the actual data files.
Regarding what you saw in mongostat: It's granularity isn't very high, you might well we operations which took place in the past. As discussed, the update to the in memory data isn't instant, as the update first has to pass the query optimizer.
Will heavy load make updates silently fail with w:0?
In general, it is safe to say "No." And here is why:
For each connection, there is a certain amount of RAM allocated. If the load is so high that mongo can't allocate any further RAM, there would be a connection error – which is dealt with, regardless of the write concern, except for unacknowledged writes.
Furthermore, the application of updates to the in memory data is extremely fast - most likely still faster than they come in in case we are talking of load peaks. If mongod is totally overloaded (e.g. 150k updates a second on a standalone mongod with spinning disks), problems might occur, of course, though even that usually is leveraged from a durability point of view by the underlying OS.
However, updates still may silently disappear in case of an outage when the write concern is w:0,j:0 and the outage happens in the time the update is not synced to the journal.
Notes:
The optimal balance between maximum performance and minimal guaranteed durability is a write concern of j:1. With a proper setup, you can reduce the latency to slightly over 10ms.
To further reduce the latency/update, it might be worth having a look at bulk write operations, if those apply to your use case. In my experience, they do more often than not. Please read and try before dismissing the idea.
Doing write operations with w:0,j:0 is highly discouraged in case you expect any guarantee on data durability. Use a t your own risk. This write concern is only meant for "cheap" data, which is easy to reobtain or where speed concern exceeds the need for durability. Collecting real time weather data in a large scale would be an example – the system still works, even if one or two data points are missing here and there. For most applications, durability is a concern. Conclusion: use w:1,j:1 at least for durable writes.
I know we can't write to a secondary in MongoDB. But I can't find any technical reason why. In my case, I don't really care if there is a slight delay but write to a secondary might be faster. Please provide some reference if you can. Thanks!!
The reason why you can not write to a secondary is the way replication works:
Secondaries connect to a special collection on the primary, called oplog. This oplog contains operations which were run through the query optimizer. Basically, the oplog is a capped collection, and the secondaries use a tailable cursor to access it's entries and processes it from the oldest to the newest.
When a election takes place because the primary goes down / steps down, the secondary with the most recent oplog entry is elected primary. The secondaries connect to the new primary, query for the oplog entries they haven't processed yet and the cluster is in sync.
This procedure is pretty straight forward. Now imagine one could write to a secondary. All nodes in the cluster would have to have a tailable cursor on all other nodes of the cluster, and maintaining a consistent state in case of one machine failing becomes a very complicated and in case of a failure even race condition dependent thing. Effectively, there could be no guarantee even for eventual consistency any more. It would be a more or less a gamble.
That being said: A replica set is not for load balancing. A replica sets purpose is to enhance the availability and durability of the data. Because reading from a secondary is a non-risky thing, MongoDB made it possible, according to their dogma of offering the maximum of possible features without compromising scalability (which would be severely hampered if one could write to secondaries).
But MongoDB does provide a load balancing feature: sharding. Choosing the right shard key, you can distribute read and write load over (almost) as many shards as you want. Not to mention that you can provide a lot more of the precious RAM for a reasonable price when sharding.
There is a one liner answer:
Multi-master replication is a hairball.
If you was allowed to write to secondaries MongoDB would have to use milti-master replication to ge this working: http://en.wikipedia.org/wiki/Multi-master_replication where essentially evey node copies to each other the OPs (operations) they have received and somehow do it without losing data.
This form of replication has many obsticles to overcome.
One would be throughput; remember that OPs need to transfer across the entire network so it is possible you might actually lose throughput while adding consistentcy problems. So getting better throughput would be a problem. It is much having a secondary, taking all of the primaries OPs and then its own for replication outbound and then asking it to do yet another job.
Adding consistentcy over a distributed set like this would also be hazardous, one main question that bugs MongoDB when asking if a member is down or is: "Is it really down or just unavailable?". It is almost impossible to ensure true consistentcy in a distributed set like this, at the very least tricky.
Those are just two problems immediately.
Essentially, to sum up, MongoDB does not yet possess mlti-master replication. It could in the future but I would not be jumping for joy if it does, I will most likely ignore such a feature, normal replication and sharding in both ACID and non-ACID databases causes enough blood pressure.
I am working on a project which has some important data in it. This means we cannot to lose any of it if the light or server goes down. We are using MongoDB for the database. I'd like to be sure that my data is in the database after the insert and rollback the whole batch if one element was not inserted. I know it is the philosophy behind Mongo that we do not need transactions but how can I make sure that my data is really safely stored after insert rather than sent to some "black hole".
Should I make a search?
Should I use some specific mongoDB commands?
Should I use sharding even if one server is enough for satisfying
the speed and by the way it doesn't guarantee anything if the light
goes down?
What is the best solution?
Your best bet is to use Write Concerns - these allow you to tell MongoDB how important a piece of data is. The quickest Write Concern is also the least safe - the data is not flushed to disk until the next scheduled flush. The safest will confirm that the data has been written to disk on a number of machines before returning.
The write concern you are looking for is FSYNC_SAFE (at least that is what it is called from the point of view of the Java driver) or REPLICAS_SAFE which confirms that your data has been replicated.
Bear in mind that MongoDB does not have transactions in the traditional sense - your rollback will have to be rolled by hand as you can't tell the Mongo database to do this for you.
The other thing you need to do is either use the relatively new --journal option (which uses a Write Ahead Log), or use replica sets to share your data across many machines in order to maximise data integrity in the event of a crash/power loss.
Sharding is not so much a protection against hardware failure as a method for sharing the load when dealing with particularly large datasets - sharding shouldn't be confused with replica sets which is a way of writing data to more than one disk on more than one machine.
Therefore, if your data is valuable enough, you should definitely be using replica sets, perhaps even siting slaves in other data centres/availability zones/racks/etc in order to provide the resilience you require.
There is/will be (can't remember offhand whether this has been implemented yet) a way to specify the priority of individual nodes in a replica set such that if the master goes down the new master that is elected is one in the same data centre if such a machine is available (ie to stop a slave on the other side of the country from becoming master unless it really is the only other option).
I received a really nice answer from a person called GVP on google groups. I will quote it(basically it adds up to Rich's answer):
I'd like to be sure that my data is in the database after the
insert and rollback the whole batch if one element was not inserted.
This is a complex topic and there are several trade-offs you have to
consider here.
Should I use sharding?
Sharding is for scaling writes. For data safety, you want to look a
replica sets.
Should I use some specific mongoDB commands?
First thing to consider is "safe" mode or "getLastError()" as
indicated by Andreas. If you issue a "safe" write, you know that the
database has received the insert and applied the write. However,
MongoDB only flushes to disk every 60 seconds, so the server can fail
without the data on disk.
Second thing to consider is "journaling"
(v1.8+). With journaling turned on, data is flushed to the journal
every 100ms. So you have a smaller window of time before failure. The
drivers have an "fsync" option (check that name) that goes one step
further than "safe", it waits for acknowledgement that the data has
be flushed to the disk (i.e. the journal file). However, this only
covers one server. What happens if the hard drive on the server just
dies? Well you need a second copy.
Third thing to consider is
replication. The drivers support a "W" parameter that says "replicate
this data to N nodes" before returning. If the write does not reach
"N" nodes before a certain timeout, then the write fails (exception
is thrown). However, you have to configure "W" correctly based on the
number of nodes in your replica set. Again, because a hard drive
could fail, even with journaling, you'll want to look at replication.
Then there's replication across data centers which is too long to get
into here. The last thing to consider is your requirement to "roll
back". From my understanding, MongoDB does not have this "roll back"
capacity. If you're doing a batch insert the best you'll get is an
indication of which elements failed.
Here's a link to the PHP driver on this one: http://it.php.net/manual/en/mongocollection.batchinsert.php You'll have to check the details on replication and the W parameter. I believe the same limitations apply here.