I'm a little confused about MongoDB's read locking. How many concurrent read operations can a single collection support?
As written in the link given by tk : http://www.mongodb.org/pages/viewpage.action?pageId=7209296
Read/Write Lock
mongod uses a read/write lock for many
operations. Any number of concurrent
read operations are allowed, but
typically only one write operation
(although some write operations yield
and in the future more concurrency
will be added). The write lock
acquisition is greedy: a pending write
lock acquisition will prevent further
read lock acquisitions until
fulfilled.
Typical read/write lock
You can check here: http://www.mongodb.org/pages/viewpage.action?pageId=7209296
Related
In the mongodb documentation, it says:
Beginning with version 2.2, MongoDB implements locks on a per-database basis for most read and write operations. Some global operations, typically short lived operations involving multiple databases, still require a global “instance” wide lock. Before 2.2, there is only one “global” lock per mongod instance.
Does this mean that in the situation that I Have, say, 3 connections to mongodb://localhost/test from different apps running on the network - only one could be writing at a time? Or is it just per connection?
IOW: Is it per connection, or is the whole /test database locked while it writes?
MongoDB Locking is Different
Locking in MongoDB does not work like locking in an RDBMS, so a bit of explanation is in order. In earlier versions of MongoDB, there was a single global reader/writer latch. Starting with MongoDB 2.2, there is a reader/writer latch for each database.
The readers-writer latch
The latch is multiple-reader, single-writer, and is writer-greedy. This means that:
There can be an unlimited number of simultaneous readers on a database
There can only be one writer at a time on any collection in any one database (more on this in a bit)
Writers block out readers
By "writer-greedy", I mean that once a write request comes in, all readers are blocked until the write completes (more on this later)
Note that I call this a "latch" rather than a "lock". This is because it's lightweight, and in a properly designed schema the write lock is held on the order of a dozen or so microseconds. See here for more on readers-writer locking.
In MongoDB you can run as many simultaneous queries as you like: as long as the relevant data is in RAM they will all be satisfied without locking conflicts.
Atomic Document Updates
Recall that in MongoDB the level of transaction is a single document. All updates to a single document are Atomic. MongoDB achieves this by holding the write latch for only as long as it takes to update a single document in RAM. If there is any slow-running operation (in particular, if a document or an index entry needs to be paged in from disk), then that operation will yield the write latch. When the operation yields the latch, then the next queued operation can proceed.
This does mean that the writes to all documents within a single database get serialized. This can be a problem if you have a poor schema design, and your writes take a long time, but in a properly-designed schema, locking isn't a problem.
Writer-Greedy
A few more words on being writer-greedy:
Only one writer can hold the latch at one time; multiple readers can hold the latch at a time. In a naive implementation, writers could starve indefinitely if there was a single reader in operation. To avoid this, in the MongoDB implementation, once any single thread makes a write request for a particular latch
All subsequent readers needing that latch will block
That writer will wait until all current readers are finished
The writer will acquire the write latch, do its work, and then release the write latch
All the queued readers will now proceed
The actual behavior is complex, since this writer-greedy behavior interacts with yielding in ways that can be non-obvious. Recall that, starting with release 2.2, there is a separate latch for each database, so writes to any collection in database 'A' will acquire a separate latch than writes to any collection in database 'B'.
Specific questions
Regarding the specific questions:
Locks (actually latches) are held by the MongoDB kernel for only as long as it takes to update a single document
If you have multiple connections coming in to MongoDB, and each one of them is performing a series of writes, the latch will be held on a per-database basis for only as long as it takes for that write to complete
Multiple connections coming in performing writes (update/insert/delete) will all be interleaved
While this sounds like it would be a big performance concern, in practice it doesn't slow things down. With a properly designed schema and a typical workload, MongoDB will saturate the disk I/O capacity -- even for an SSD -- before lock percentage on any database goes above 50%.
The highest capacity MongoDB cluster that I am aware of is currently performing 2 million writes per second.
It is not per connection, it is per mongod. In other words the lock will exist across all connections to the test database on that server.
It is also a read/write lock so if a write is occuring then a read must wait, otherwise how can MongoDB know it is a consistent read?
However I should mention that MongoDB locks are very different to SQL/normal transactional locks you get and normally a lock will be held for about a microsecond between average updates.
Mongo 3.0 now supports collection-level locking.
In addition to this, now Mongo created an API that allows to create a storage engine. Mongo 3.0 comes with 2 storage engines:
MMAPv1: the default storage engine and the one use in the previous versions. Comes with collection-level locking.
WiredTiger: the new storage engine, comes with document-level locking and compression. (Only available for the 64-bit version)
MongoDB 3.0 release notes
WiredTiger
I know the question is pretty old but still some people are confused....
Starting in MongoDB 3.0, the WiredTiger storage engine (which uses document-level concurrency) is available in the 64-bit builds.
WiredTiger uses document-level concurrency control for write operations. As a result, multiple clients can modify different documents of a collection at the same time.
For most read and write operations, WiredTiger uses optimistic concurrency control. WiredTiger uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation.
Some global operations, typically short lived operations involving multiple databases, still require a global “instance-wide” lock. Some other operations, such as dropping a collection, still require an exclusive database lock.
Document Level Concurrency
When mongoDB write is occurring then a read must wait or not wait.
when mongoDB going to write some doc in mongodb at that time write lock happen and
other thread try to read other docs then it should wait until write lock release or not.
is there any dependency between all read and write locks
From the docs.
MongoDB uses a readers-writer lock that allows concurrent reads access
to a database but gives exclusive access to a single write operation.
When a read lock exists, many read operations may use this lock.
However, when a write lock exists, a single write operation holds the
lock exclusively, and no other read or write operations may share the
lock.
Locks are “writer greedy,” which means write locks have preference
over reads. When both a read and write are waiting for a lock, MongoDB
grants the lock to the write.
To answer your question:
So 'n' number of reads can happen in parallel, and all reads/writes are blocked when a write happens.
Mongo documentation says single document write are atomic but at another place it mentions interleaved transactions may read uncommitted data and before the writer thread has returned.
I understand that other transactions can read uncommitted data because the write may not be still committed to the journal.
But how can threads read data while the writer thread has not returned. Is it for cases when the write concern is not default?
Thanks
Ankur
Ok with the reference I can now get a context and tell you what this is about.
Mongo documentation says single document write are atomic
Yep
it mentions interleaved transactions may read uncommitted
Infact any read may get uncommited data. This is because MongoDB will write to the fsync queue BEFORE it writes to disk.
MongoDB can read from this fsync queue before it goes to disk, and quoting the page:
Other database systems refer to these isolation semantics as read uncommitted.
Mainly ACID databases do.
But how can threads read data while the writer thread has not returned.
Thanks to MongoDBs con-currency rules: http://docs.mongodb.org/manual/faq/concurrency/#does-a-read-or-write-operation-ever-yield-the-lock
In short, to sum up: The write will not take exclusive lock for the duration of its running, instead it can subside (due to various rules) to a read allowing you to return data half way through a write.
This is also why you must sometimes be careful about multi-document updates and other threads of your application reading data, it may actually get one half of data that is upto date and the other half which is not.
In the mongodb documentation, it says:
Beginning with version 2.2, MongoDB implements locks on a per-database basis for most read and write operations. Some global operations, typically short lived operations involving multiple databases, still require a global “instance” wide lock. Before 2.2, there is only one “global” lock per mongod instance.
Does this mean that in the situation that I Have, say, 3 connections to mongodb://localhost/test from different apps running on the network - only one could be writing at a time? Or is it just per connection?
IOW: Is it per connection, or is the whole /test database locked while it writes?
MongoDB Locking is Different
Locking in MongoDB does not work like locking in an RDBMS, so a bit of explanation is in order. In earlier versions of MongoDB, there was a single global reader/writer latch. Starting with MongoDB 2.2, there is a reader/writer latch for each database.
The readers-writer latch
The latch is multiple-reader, single-writer, and is writer-greedy. This means that:
There can be an unlimited number of simultaneous readers on a database
There can only be one writer at a time on any collection in any one database (more on this in a bit)
Writers block out readers
By "writer-greedy", I mean that once a write request comes in, all readers are blocked until the write completes (more on this later)
Note that I call this a "latch" rather than a "lock". This is because it's lightweight, and in a properly designed schema the write lock is held on the order of a dozen or so microseconds. See here for more on readers-writer locking.
In MongoDB you can run as many simultaneous queries as you like: as long as the relevant data is in RAM they will all be satisfied without locking conflicts.
Atomic Document Updates
Recall that in MongoDB the level of transaction is a single document. All updates to a single document are Atomic. MongoDB achieves this by holding the write latch for only as long as it takes to update a single document in RAM. If there is any slow-running operation (in particular, if a document or an index entry needs to be paged in from disk), then that operation will yield the write latch. When the operation yields the latch, then the next queued operation can proceed.
This does mean that the writes to all documents within a single database get serialized. This can be a problem if you have a poor schema design, and your writes take a long time, but in a properly-designed schema, locking isn't a problem.
Writer-Greedy
A few more words on being writer-greedy:
Only one writer can hold the latch at one time; multiple readers can hold the latch at a time. In a naive implementation, writers could starve indefinitely if there was a single reader in operation. To avoid this, in the MongoDB implementation, once any single thread makes a write request for a particular latch
All subsequent readers needing that latch will block
That writer will wait until all current readers are finished
The writer will acquire the write latch, do its work, and then release the write latch
All the queued readers will now proceed
The actual behavior is complex, since this writer-greedy behavior interacts with yielding in ways that can be non-obvious. Recall that, starting with release 2.2, there is a separate latch for each database, so writes to any collection in database 'A' will acquire a separate latch than writes to any collection in database 'B'.
Specific questions
Regarding the specific questions:
Locks (actually latches) are held by the MongoDB kernel for only as long as it takes to update a single document
If you have multiple connections coming in to MongoDB, and each one of them is performing a series of writes, the latch will be held on a per-database basis for only as long as it takes for that write to complete
Multiple connections coming in performing writes (update/insert/delete) will all be interleaved
While this sounds like it would be a big performance concern, in practice it doesn't slow things down. With a properly designed schema and a typical workload, MongoDB will saturate the disk I/O capacity -- even for an SSD -- before lock percentage on any database goes above 50%.
The highest capacity MongoDB cluster that I am aware of is currently performing 2 million writes per second.
It is not per connection, it is per mongod. In other words the lock will exist across all connections to the test database on that server.
It is also a read/write lock so if a write is occuring then a read must wait, otherwise how can MongoDB know it is a consistent read?
However I should mention that MongoDB locks are very different to SQL/normal transactional locks you get and normally a lock will be held for about a microsecond between average updates.
Mongo 3.0 now supports collection-level locking.
In addition to this, now Mongo created an API that allows to create a storage engine. Mongo 3.0 comes with 2 storage engines:
MMAPv1: the default storage engine and the one use in the previous versions. Comes with collection-level locking.
WiredTiger: the new storage engine, comes with document-level locking and compression. (Only available for the 64-bit version)
MongoDB 3.0 release notes
WiredTiger
I know the question is pretty old but still some people are confused....
Starting in MongoDB 3.0, the WiredTiger storage engine (which uses document-level concurrency) is available in the 64-bit builds.
WiredTiger uses document-level concurrency control for write operations. As a result, multiple clients can modify different documents of a collection at the same time.
For most read and write operations, WiredTiger uses optimistic concurrency control. WiredTiger uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation.
Some global operations, typically short lived operations involving multiple databases, still require a global “instance-wide” lock. Some other operations, such as dropping a collection, still require an exclusive database lock.
Document Level Concurrency
Just want to know if my database has about 50 queries per sec in production, will it still be able to run normally while do these operation?
My server info is:
Normal replica 2 machine with no sharding.
Just use capped collection for logging purpose only. No read. It's write heavy.
8GB of RAM
The documentation [1] about concurrency is quite clear:
"When a read lock exists, many read operations may use this lock. However, when a write lock exists, a single write operation holds the lock exclusively, and no other read or write operations may share the lock."
Insert, update, and delete operations use a write lock.
Basically that means all you inserts will happen sequentially, so it's a matter of how fast Mongo writes the data.
[1] http://docs.mongodb.org/manual/faq/concurrency/#which-administrative-commands-lock-the-database