Atomicity of write in Mongo db

Atomicity of write in Mongo db - mongodb

Mongo documentation says single document write are atomic but at another place it mentions interleaved transactions may read uncommitted data and before the writer thread has returned.
I understand that other transactions can read uncommitted data because the write may not be still committed to the journal.
But how can threads read data while the writer thread has not returned. Is it for cases when the write concern is not default?
Thanks
Ankur

Ok with the reference I can now get a context and tell you what this is about.
Mongo documentation says single document write are atomic
Yep
it mentions interleaved transactions may read uncommitted
Infact any read may get uncommited data. This is because MongoDB will write to the fsync queue BEFORE it writes to disk.
MongoDB can read from this fsync queue before it goes to disk, and quoting the page:
Other database systems refer to these isolation semantics as read uncommitted.
Mainly ACID databases do.
But how can threads read data while the writer thread has not returned.
Thanks to MongoDBs con-currency rules: http://docs.mongodb.org/manual/faq/concurrency/#does-a-read-or-write-operation-ever-yield-the-lock
In short, to sum up: The write will not take exclusive lock for the duration of its running, instead it can subside (due to various rules) to a read allowing you to return data half way through a write.
This is also why you must sometimes be careful about multi-document updates and other threads of your application reading data, it may actually get one half of data that is upto date and the other half which is not.

Related

Is there an equivalent of PostgreSQL advisory locks in MongoDB? [duplicate]

In the mongodb documentation, it says:
Beginning with version 2.2, MongoDB implements locks on a per-database basis for most read and write operations. Some global operations, typically short lived operations involving multiple databases, still require a global “instance” wide lock. Before 2.2, there is only one “global” lock per mongod instance.
Does this mean that in the situation that I Have, say, 3 connections to mongodb://localhost/test from different apps running on the network - only one could be writing at a time? Or is it just per connection?
IOW: Is it per connection, or is the whole /test database locked while it writes?

MongoDB Locking is Different
Locking in MongoDB does not work like locking in an RDBMS, so a bit of explanation is in order. In earlier versions of MongoDB, there was a single global reader/writer latch. Starting with MongoDB 2.2, there is a reader/writer latch for each database.
The readers-writer latch
The latch is multiple-reader, single-writer, and is writer-greedy. This means that:
There can be an unlimited number of simultaneous readers on a database
There can only be one writer at a time on any collection in any one database (more on this in a bit)
Writers block out readers
By "writer-greedy", I mean that once a write request comes in, all readers are blocked until the write completes (more on this later)
Note that I call this a "latch" rather than a "lock". This is because it's lightweight, and in a properly designed schema the write lock is held on the order of a dozen or so microseconds. See here for more on readers-writer locking.
In MongoDB you can run as many simultaneous queries as you like: as long as the relevant data is in RAM they will all be satisfied without locking conflicts.
Atomic Document Updates
Recall that in MongoDB the level of transaction is a single document. All updates to a single document are Atomic. MongoDB achieves this by holding the write latch for only as long as it takes to update a single document in RAM. If there is any slow-running operation (in particular, if a document or an index entry needs to be paged in from disk), then that operation will yield the write latch. When the operation yields the latch, then the next queued operation can proceed.
This does mean that the writes to all documents within a single database get serialized. This can be a problem if you have a poor schema design, and your writes take a long time, but in a properly-designed schema, locking isn't a problem.
Writer-Greedy
A few more words on being writer-greedy:
Only one writer can hold the latch at one time; multiple readers can hold the latch at a time. In a naive implementation, writers could starve indefinitely if there was a single reader in operation. To avoid this, in the MongoDB implementation, once any single thread makes a write request for a particular latch
All subsequent readers needing that latch will block
That writer will wait until all current readers are finished
The writer will acquire the write latch, do its work, and then release the write latch
All the queued readers will now proceed
The actual behavior is complex, since this writer-greedy behavior interacts with yielding in ways that can be non-obvious. Recall that, starting with release 2.2, there is a separate latch for each database, so writes to any collection in database 'A' will acquire a separate latch than writes to any collection in database 'B'.
Specific questions
Regarding the specific questions:
Locks (actually latches) are held by the MongoDB kernel for only as long as it takes to update a single document
If you have multiple connections coming in to MongoDB, and each one of them is performing a series of writes, the latch will be held on a per-database basis for only as long as it takes for that write to complete
Multiple connections coming in performing writes (update/insert/delete) will all be interleaved
While this sounds like it would be a big performance concern, in practice it doesn't slow things down. With a properly designed schema and a typical workload, MongoDB will saturate the disk I/O capacity -- even for an SSD -- before lock percentage on any database goes above 50%.
The highest capacity MongoDB cluster that I am aware of is currently performing 2 million writes per second.

It is not per connection, it is per mongod. In other words the lock will exist across all connections to the test database on that server.
It is also a read/write lock so if a write is occuring then a read must wait, otherwise how can MongoDB know it is a consistent read?
However I should mention that MongoDB locks are very different to SQL/normal transactional locks you get and normally a lock will be held for about a microsecond between average updates.

Mongo 3.0 now supports collection-level locking.
In addition to this, now Mongo created an API that allows to create a storage engine. Mongo 3.0 comes with 2 storage engines:
MMAPv1: the default storage engine and the one use in the previous versions. Comes with collection-level locking.
WiredTiger: the new storage engine, comes with document-level locking and compression. (Only available for the 64-bit version)
MongoDB 3.0 release notes
WiredTiger

I know the question is pretty old but still some people are confused....
Starting in MongoDB 3.0, the WiredTiger storage engine (which uses document-level concurrency) is available in the 64-bit builds.
WiredTiger uses document-level concurrency control for write operations. As a result, multiple clients can modify different documents of a collection at the same time.
For most read and write operations, WiredTiger uses optimistic concurrency control. WiredTiger uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation.
Some global operations, typically short lived operations involving multiple databases, still require a global “instance-wide” lock. Some other operations, such as dropping a collection, still require an exclusive database lock.
Document Level Concurrency

How does Mongo's eventual consistency work with a large number of data writes?

I have a flow like this:
I have a Worker that's processing a "large" batch (say, 1M records) and storing the results in Mongo.
Once the batch is complete, a notification message is sent to Publish, which then pulls all the records from Mongo for final publication.
Let's say the Worker write process is done, i.e. it has sent all 1M records to Mongo through a driver. Mongo is "eventually consistent" so I'm not 100% guaranteed all records are written to physical storage at the time the Notify Publish happens.
When Publish does a 'find' and gets a cursor on the collection holding the batch records, is the cursor smart enough to handle the eventual consistency?
So in practical terms let's imagine 750,000 records are actually physically written by Mongo when Notify Publish happens and Publish does its find. Will the cursor traverse 750,000 records and stop or will it block or otherwise handle the remaining 250,000 as they're eventually written to disk (which presumably is very likely to happen while publishing of the first 750K)?

As #BlakesSeven already noted in the comments, "eventual consistency" refers to the fact that in a replicated environment, when a write is finished on the primary, it will only be written to the secondaries eventually. You can modify this behavior at the cost of reduced write performance by setting the write concern to > 1. Setting it to "majority" basically guarantees that a write operation is durable even in case of a failover – though at a (in some cases) drastically reduced performance.
In general here is what happens when you do a write (simplified) with journaling enabled:
The operation is checked for being syntactically correct.
The query optimizer kicks in and does his stuff. (Irrelevant for this question, so I spare the details).
The write operation is applied to the in memory representation of the data set called "private view".
Every commitIntervalMs, the private view is synced to the journal, with a median of 15 or 50ms, depending on the write concern.
On sync, the operation is applied to the shared view. Iirc, this is the point where a new connection would be provided with the new data.
So in order to ensure that the data will be readable by the new connection, simply delay the publish notification by commitIntervalMs + 1, which, given your batch size, is hardly noticeable.

MongoDB Write and lock processes

I've been read a lot about MongoDB recently, but one topic I can't find any clear material on, is how data is written to the journal and oplog.
So this is what I understand of the process so far, please correct me where I'm wrong
A client connect to mongod and performs a write. The write is stored in the socket buffer
When Mongo is available (not sure what available means at this point), data is written to the journal?
The mongoDB docs then say that writes every 60 seconds are flushed from the journal onto disk. By this I can only assume this mean written to the primary and the oplog. If this is the case, how to writes appear earlier than the 60 seconds sync interval?
Some time later, secondaries suck data from the primary or their sync source and update their oplog and databases. It seems very vague about when exactly this happens and what delays it.
I'm also wondering if journaling was disabled (I understand that's a really bad idea), at what point does the oplog and database get updated?
Lastly I'm a bit stumpted at which points in this process, the write locks get created. Is this just when the database and oplog are updated or at other times too?
Thanks to anyone who can shed some light on this or point me to some reading material.
Simon

Here is what happens as far as I understand it. I simplified a bit, but it should make clear how it works.
A client connects to mongo. No writes done so far, and no connection torn down, because it really depends on the write concern what happens now.Let's assume that we go with the (by the time of this writing) default "acknowledged".
The client sends it's write operation. Here is where I am really not sure. Either after this step or the next one the acknowledgement is sent to the driver.
The write operation is run through the query optimizer. It is here where the acknowledgment is sent because with in an acknowledged write concern, you may be returned a duplicate key error. It is possible that this was checked in the last step. If I should bet, I'd say it is after this one.
The output of the query optimizer is then applied to the data in memory Actually to the data of the memory mapped datafiles, to the memory mapped oplog and to the journal's memory mapped files. Queries are answered from this memory mapped parts or the according data is mapped to memory for answering the query. The oplog is read from memory if present, too.
Every 100ms in general the journal is synced to disk. The precise value is determined by a number of factors, one of them being the journalCommitInterval configuration parameter. If you have a write concern of journaled, the driver will be notified now.
Every syncDelay seconds, the current state of the memory mapped files is synced to disk I think the journal is truncated to the entries which weren't applied to the data yet, but I am not too sure of that since that it should basically never happen that data in the journal isn't yet applied to the current data.
If you have read carefully, you noticed that the data is ready for the oplog as early as it has been run through the query optimizer and was applied to the files mapped into memory. When the oplog entry is pulled by one of the secondaries, it is immediately applied to it's data of the memory mapped files and synced in the disk the same way as on the primary.
Some things to note: As soon as the relatively small data is written to the journal, it is quite safe. If a node goes down between two syncs to the datafiles, both the datafiles and the oplog can be restored from their last state in the datafiles and the journal. In general, the maximum data loss you can have is the operations recorded into the log after the last commit, 50ms in median.
As for the locks. If you have written carefully, there aren't locks imposed on a database level when the data is synced to disk. Write locks may be created in order to assure that only one thread at any given point in time modifies a given document. There are other write locks possible , but in general, they should be rather rare.
Write locks on the filesystem layer are created once, though only implicitly, iirc. During application startup, a lock file is created in the root directory of the dbpath. Any other mongod instance will refuse to do any operation on those datafiles while a valid lock exists. And you shouldn't either ;)
Hope this helps.

To what level does MongoDB lock on writes? (or: what does it mean by "per connection"

In the mongodb documentation, it says:
Beginning with version 2.2, MongoDB implements locks on a per-database basis for most read and write operations. Some global operations, typically short lived operations involving multiple databases, still require a global “instance” wide lock. Before 2.2, there is only one “global” lock per mongod instance.
Does this mean that in the situation that I Have, say, 3 connections to mongodb://localhost/test from different apps running on the network - only one could be writing at a time? Or is it just per connection?
IOW: Is it per connection, or is the whole /test database locked while it writes?

It is not per connection, it is per mongod. In other words the lock will exist across all connections to the test database on that server.
It is also a read/write lock so if a write is occuring then a read must wait, otherwise how can MongoDB know it is a consistent read?
However I should mention that MongoDB locks are very different to SQL/normal transactional locks you get and normally a lock will be held for about a microsecond between average updates.

Mongo 3.0 now supports collection-level locking.
In addition to this, now Mongo created an API that allows to create a storage engine. Mongo 3.0 comes with 2 storage engines:
MMAPv1: the default storage engine and the one use in the previous versions. Comes with collection-level locking.
WiredTiger: the new storage engine, comes with document-level locking and compression. (Only available for the 64-bit version)
MongoDB 3.0 release notes
WiredTiger

I know the question is pretty old but still some people are confused....
Starting in MongoDB 3.0, the WiredTiger storage engine (which uses document-level concurrency) is available in the 64-bit builds.
WiredTiger uses document-level concurrency control for write operations. As a result, multiple clients can modify different documents of a collection at the same time.
For most read and write operations, WiredTiger uses optimistic concurrency control. WiredTiger uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation.
Some global operations, typically short lived operations involving multiple databases, still require a global “instance-wide” lock. Some other operations, such as dropping a collection, still require an exclusive database lock.
Document Level Concurrency

Do mongodb completely lock write to database while doing createIndex operation and mongodump?

Just want to know if my database has about 50 queries per sec in production, will it still be able to run normally while do these operation?
My server info is:
Normal replica 2 machine with no sharding.
Just use capped collection for logging purpose only. No read. It's write heavy.
8GB of RAM

The documentation [1] about concurrency is quite clear:
"When a read lock exists, many read operations may use this lock. However, when a write lock exists, a single write operation holds the lock exclusively, and no other read or write operations may share the lock."
Insert, update, and delete operations use a write lock.
Basically that means all you inserts will happen sequentially, so it's a matter of how fast Mongo writes the data.
[1] http://docs.mongodb.org/manual/faq/concurrency/#which-administrative-commands-lock-the-database

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse