I have recently started on MongodDb and I'm trying to explore on replica sets and crash recovery.
I have read it like journal file are write a head redo log file.
oplog files are those where every write activity will be written to.
What is the difference between these two...?
Do we have oplogs on both the master and the slave...?
Please post any web links that shed some light on this area.
Oplog stores high-level transactions that modify the database (queries are not stored for example), like insert this document, update that, etc. Oplog is kept on the primary and secondaries will periodically poll the master to get newly performed operations (since the last poll). Operations sometimes get transformed before being stored in the oplog so that they are idempotent (and can be safely applied many times).
Journal on the other hand can be switched on/off on any node (primary or secondary), and is a low-level log of an operation for the purpose of crash recovery and durability of a single mongo instance. You can read low-level op like 'write these bytes to this file at this position'.
NOTE:
Starting in MongoDB 4.0, you cannot turn journaling off for replica set members that use the WiredTiger storage engine.
Source: https://docs.mongodb.com/manual/tutorial/manage-journaling/
Oplog is just capped collection where MongoDB tracks all changes in its collections (insert, update, delete). It doesn't track read operations. MongoDB uses oplog to spread all changes within all nodes in a replica set. Secondary nodes copy and apply this changes.
Journal is a feature of underlying storage engine. Since MongoDB 3.2 default storage engine is WiredTiger and since MongoDB 4.0 you can't disable journaling for WiredTiger. All operations are tracked in the journal files. WiredTiger uses checkpoints to recover data in case of crash. Checkpoints are created every 60 secs. In case if a crash happens between checkpoints some data can be lost. To prevent this, WiredTiger uses journal files to apply all the changes after the last checkpoint.
In general, write flow in MongoDB looks like that:
High-level - when a customer writes/updates/removes data, MongoDB applies it to proper collection, updates index and inserts the change to oplog. If any of these operations fails then other related operations must be rolled back to prevent inconsistency. For this MongoDB uses WiredTiger transactions:
begin transaction
apply change to collection
update index
add the change to the oplog
commit the transaction
Low-level - WiredTiger runs the transaction and adds the changes to journal file.
There must be a relationship between journal and oplog. when w=1 u commit to primary jouranl and also created an oplog entry for replset collection. I think at least in primary of a replSet - they both contain same update/delete/insert just in different format.
Related
MongoDB configuration has a param called "autoresync".
This is what it says:
Automatically resync if slave data is stale
autoresync
So if we enable this parameter, when one of the secondaries go into RECOVERING state, can it auto heal MongoDB non primary members who have stale data and unable to replicate data. Some times we see data is too stale. So if we enable this param, can it automatically heal and bring it to good state.
That parameter is "legasy" and has not been supported for... Long time. It was when there was master-slave paradigm in the MongoDB.
With current versions of MongoDB, secondaries always recover (auto heal), IF primary have opLog what is big enough to cover that "missing" data.
So, if your secondary cannot replicate/recover, check that your PRIMARY node opLog is big enough. Resize opLog without reboot
To see how long time your opLog can cover, use command db.getReplicationInfo.timeDiff
In the mongodb documentation, it says:
Beginning with version 2.2, MongoDB implements locks on a per-database basis for most read and write operations. Some global operations, typically short lived operations involving multiple databases, still require a global “instance” wide lock. Before 2.2, there is only one “global” lock per mongod instance.
Does this mean that in the situation that I Have, say, 3 connections to mongodb://localhost/test from different apps running on the network - only one could be writing at a time? Or is it just per connection?
IOW: Is it per connection, or is the whole /test database locked while it writes?
MongoDB Locking is Different
Locking in MongoDB does not work like locking in an RDBMS, so a bit of explanation is in order. In earlier versions of MongoDB, there was a single global reader/writer latch. Starting with MongoDB 2.2, there is a reader/writer latch for each database.
The readers-writer latch
The latch is multiple-reader, single-writer, and is writer-greedy. This means that:
There can be an unlimited number of simultaneous readers on a database
There can only be one writer at a time on any collection in any one database (more on this in a bit)
Writers block out readers
By "writer-greedy", I mean that once a write request comes in, all readers are blocked until the write completes (more on this later)
Note that I call this a "latch" rather than a "lock". This is because it's lightweight, and in a properly designed schema the write lock is held on the order of a dozen or so microseconds. See here for more on readers-writer locking.
In MongoDB you can run as many simultaneous queries as you like: as long as the relevant data is in RAM they will all be satisfied without locking conflicts.
Atomic Document Updates
Recall that in MongoDB the level of transaction is a single document. All updates to a single document are Atomic. MongoDB achieves this by holding the write latch for only as long as it takes to update a single document in RAM. If there is any slow-running operation (in particular, if a document or an index entry needs to be paged in from disk), then that operation will yield the write latch. When the operation yields the latch, then the next queued operation can proceed.
This does mean that the writes to all documents within a single database get serialized. This can be a problem if you have a poor schema design, and your writes take a long time, but in a properly-designed schema, locking isn't a problem.
Writer-Greedy
A few more words on being writer-greedy:
Only one writer can hold the latch at one time; multiple readers can hold the latch at a time. In a naive implementation, writers could starve indefinitely if there was a single reader in operation. To avoid this, in the MongoDB implementation, once any single thread makes a write request for a particular latch
All subsequent readers needing that latch will block
That writer will wait until all current readers are finished
The writer will acquire the write latch, do its work, and then release the write latch
All the queued readers will now proceed
The actual behavior is complex, since this writer-greedy behavior interacts with yielding in ways that can be non-obvious. Recall that, starting with release 2.2, there is a separate latch for each database, so writes to any collection in database 'A' will acquire a separate latch than writes to any collection in database 'B'.
Specific questions
Regarding the specific questions:
Locks (actually latches) are held by the MongoDB kernel for only as long as it takes to update a single document
If you have multiple connections coming in to MongoDB, and each one of them is performing a series of writes, the latch will be held on a per-database basis for only as long as it takes for that write to complete
Multiple connections coming in performing writes (update/insert/delete) will all be interleaved
While this sounds like it would be a big performance concern, in practice it doesn't slow things down. With a properly designed schema and a typical workload, MongoDB will saturate the disk I/O capacity -- even for an SSD -- before lock percentage on any database goes above 50%.
The highest capacity MongoDB cluster that I am aware of is currently performing 2 million writes per second.
It is not per connection, it is per mongod. In other words the lock will exist across all connections to the test database on that server.
It is also a read/write lock so if a write is occuring then a read must wait, otherwise how can MongoDB know it is a consistent read?
However I should mention that MongoDB locks are very different to SQL/normal transactional locks you get and normally a lock will be held for about a microsecond between average updates.
Mongo 3.0 now supports collection-level locking.
In addition to this, now Mongo created an API that allows to create a storage engine. Mongo 3.0 comes with 2 storage engines:
MMAPv1: the default storage engine and the one use in the previous versions. Comes with collection-level locking.
WiredTiger: the new storage engine, comes with document-level locking and compression. (Only available for the 64-bit version)
MongoDB 3.0 release notes
WiredTiger
I know the question is pretty old but still some people are confused....
Starting in MongoDB 3.0, the WiredTiger storage engine (which uses document-level concurrency) is available in the 64-bit builds.
WiredTiger uses document-level concurrency control for write operations. As a result, multiple clients can modify different documents of a collection at the same time.
For most read and write operations, WiredTiger uses optimistic concurrency control. WiredTiger uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation.
Some global operations, typically short lived operations involving multiple databases, still require a global “instance-wide” lock. Some other operations, such as dropping a collection, still require an exclusive database lock.
Document Level Concurrency
Let's say I am having one Primary (A) & two secondary (B, C). If I am doing write using write majority. Can some one please explain my below doubts:-
Let's say a write was done using majority and it updated the data in
A & B and the write did not yet propagate to C. At this time if a
read comes for the same data using secondary or secondary preferred
will the query be served from B which is having the latest data or
mongo cannot gurantee this and the read may return a stale data from
C.
Let's say a write was done using majority again and let's say the
write was done on A and then a write is on progress in one of the
secondary B. If a read comes at that time will the read be blocked
or it will serve a stale data from C?
Let's say I have taken out the secondary C and the same case is in
progress as we mentioned in the above case. Will the read from
secondary B be blocked till the write is complete on B or the read
will not be blocked and a stale data will be served from B?
Environment
Mongo Version - 3.0.9
Storage Engine - MMAPv1
Mongodb replication process is async to secondary. If the read concern is set as 'majority', you may read the stale data. Basically, this means you have set the read preference as Eventual consistency.
If the read concern is set as "local", you will get the latest data from primary.
Please note that readConcern level of "majority" can be used in WiredTiger storage engine only. The WiredTiger storage engine is append only storage engine and doesn't use in place updates. There is no locks and offers document level concurrency.
Read concern = "majority"
The query returns the instance’s most recent copy of data confirmed as
written to a majority of members in the replica set.
To use a read concern level of "majority", you must use the WiredTiger
storage engine and start the mongod instances with the
--enableMajorityReadConcern command line option (or the replication.enableMajorityReadConcern setting if using a configuration
file).
Question 1: The Mongo does not guarantee that the read will be served from the secondary in which the data is written?
Answer 1: MongoDB doesn't guarantee this. The selection of the secondary depends on the following:-
When you select non-primary read preference, the driver will determine which member to target based on various factors. Refer this link.
Read preference mechanics member selection
Question 2: The reading will never be blocked even if a write is on progress on the same data?
Answer 2: Reading will not be blocked. However, you may read some stale data.
Reads may miss matching documents that are updated during the course
of the read operation.
Concurrency locking what isolation guarantees does MongoDB provide
In replica mode each write operation to any collection in any DB, also writes to the oplog collection.
Now, when writing to multiple DBs in parallel, all these write operations also write to the oplog.
My question: do these write operations require locking the oplog ? (I'm using w:1 write concern). If they do, this is kind of similar to having a global lock between all the write operations to all the different DBs, isn't it ?
I'd be happy to get any hints on this.
According to the documentation, in replication, when MongoDB writes to a collection on the primary, MongoDB also writes to the primary’s oplog, which is a special collection in the local database. Therefore, MongoDB must lock both the collection’s database and the local database. The mongod must lock both databases at the same time to keep the database consistent and ensure that write operations, even with replication, are “all-or-nothing” operations.
This means that concurrent writing to multiple database in parallel on the primary can result in global locks between all the write operations. This is not applicable to the secondary, as MongoDB does not apply writes serially to secondaries, but instead collects oplog entries in batches and then apply those batches in parallel.
Disclaimer This is all of the top off my head, so please do not crucify me if I have a mistake. However, please correct me.
Why should they?
Premise: Databases, by definition, are not interconnected
oplog entries are always idempotent
The Oplog is a capped collection, with a guarantee of preserving the insert order
Let's assume true parallelism of queries being applied. So, we have two queries arriving at the very same time and we'd need to decide which one to insert to the oplog first. The first one taking the lock will write first, right? Except, there is a problem. Let's assume the first query is a simple one db.collection.update({_id:"foo"},{$set:{"bar":"baz"}}) while the other query is more complicated and therefor takes longer to evaluate for correctness. So in order to prevent that, a lock had to be taken on arrival and released after the idempotent oplog entry was written.
Here is where I have to rely on my memory
However, queries aren't applied in parallel. Queries are queued and evaluated in order of arrival. The database get's locked upon the application of the queries after they ran through the query optimizer. During that lock the idempotent oplog queries are written to the oplog. Since databases are not interconnected and only one query can be applied to a database at any given time, the lock on the database is sufficient. No two data changing queries can be applied to the same database concurrently anyway, so why should a lock be set on the oplog?
Apparently, a lock is take on the local database. However, since a lock is already taken on the data, I do not see the reason why. *scratchingMyHead*
In the mongodb documentation, it says:
Beginning with version 2.2, MongoDB implements locks on a per-database basis for most read and write operations. Some global operations, typically short lived operations involving multiple databases, still require a global “instance” wide lock. Before 2.2, there is only one “global” lock per mongod instance.
Does this mean that in the situation that I Have, say, 3 connections to mongodb://localhost/test from different apps running on the network - only one could be writing at a time? Or is it just per connection?
IOW: Is it per connection, or is the whole /test database locked while it writes?
MongoDB Locking is Different
Locking in MongoDB does not work like locking in an RDBMS, so a bit of explanation is in order. In earlier versions of MongoDB, there was a single global reader/writer latch. Starting with MongoDB 2.2, there is a reader/writer latch for each database.
The readers-writer latch
The latch is multiple-reader, single-writer, and is writer-greedy. This means that:
There can be an unlimited number of simultaneous readers on a database
There can only be one writer at a time on any collection in any one database (more on this in a bit)
Writers block out readers
By "writer-greedy", I mean that once a write request comes in, all readers are blocked until the write completes (more on this later)
Note that I call this a "latch" rather than a "lock". This is because it's lightweight, and in a properly designed schema the write lock is held on the order of a dozen or so microseconds. See here for more on readers-writer locking.
In MongoDB you can run as many simultaneous queries as you like: as long as the relevant data is in RAM they will all be satisfied without locking conflicts.
Atomic Document Updates
Recall that in MongoDB the level of transaction is a single document. All updates to a single document are Atomic. MongoDB achieves this by holding the write latch for only as long as it takes to update a single document in RAM. If there is any slow-running operation (in particular, if a document or an index entry needs to be paged in from disk), then that operation will yield the write latch. When the operation yields the latch, then the next queued operation can proceed.
This does mean that the writes to all documents within a single database get serialized. This can be a problem if you have a poor schema design, and your writes take a long time, but in a properly-designed schema, locking isn't a problem.
Writer-Greedy
A few more words on being writer-greedy:
Only one writer can hold the latch at one time; multiple readers can hold the latch at a time. In a naive implementation, writers could starve indefinitely if there was a single reader in operation. To avoid this, in the MongoDB implementation, once any single thread makes a write request for a particular latch
All subsequent readers needing that latch will block
That writer will wait until all current readers are finished
The writer will acquire the write latch, do its work, and then release the write latch
All the queued readers will now proceed
The actual behavior is complex, since this writer-greedy behavior interacts with yielding in ways that can be non-obvious. Recall that, starting with release 2.2, there is a separate latch for each database, so writes to any collection in database 'A' will acquire a separate latch than writes to any collection in database 'B'.
Specific questions
Regarding the specific questions:
Locks (actually latches) are held by the MongoDB kernel for only as long as it takes to update a single document
If you have multiple connections coming in to MongoDB, and each one of them is performing a series of writes, the latch will be held on a per-database basis for only as long as it takes for that write to complete
Multiple connections coming in performing writes (update/insert/delete) will all be interleaved
While this sounds like it would be a big performance concern, in practice it doesn't slow things down. With a properly designed schema and a typical workload, MongoDB will saturate the disk I/O capacity -- even for an SSD -- before lock percentage on any database goes above 50%.
The highest capacity MongoDB cluster that I am aware of is currently performing 2 million writes per second.
It is not per connection, it is per mongod. In other words the lock will exist across all connections to the test database on that server.
It is also a read/write lock so if a write is occuring then a read must wait, otherwise how can MongoDB know it is a consistent read?
However I should mention that MongoDB locks are very different to SQL/normal transactional locks you get and normally a lock will be held for about a microsecond between average updates.
Mongo 3.0 now supports collection-level locking.
In addition to this, now Mongo created an API that allows to create a storage engine. Mongo 3.0 comes with 2 storage engines:
MMAPv1: the default storage engine and the one use in the previous versions. Comes with collection-level locking.
WiredTiger: the new storage engine, comes with document-level locking and compression. (Only available for the 64-bit version)
MongoDB 3.0 release notes
WiredTiger
I know the question is pretty old but still some people are confused....
Starting in MongoDB 3.0, the WiredTiger storage engine (which uses document-level concurrency) is available in the 64-bit builds.
WiredTiger uses document-level concurrency control for write operations. As a result, multiple clients can modify different documents of a collection at the same time.
For most read and write operations, WiredTiger uses optimistic concurrency control. WiredTiger uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation.
Some global operations, typically short lived operations involving multiple databases, still require a global “instance-wide” lock. Some other operations, such as dropping a collection, still require an exclusive database lock.
Document Level Concurrency