What is MongDB (WiredTiger) update query's default lock wait time? - mongodb

I have embedded sub documents array in MongoDB document and multiple users can try add sub documents into the array. I use update($push) query to add document into array, but if multiple users try to add entry from UI, how do I make sure second $push doesn't fail due to lock by first? I would have chance of only couple of users adding entry at same into single document, so not to worry about the case where 100s of users exist. What is the default wait time of update in WiredTiger, so 2nd push doesn't abort immediately and can take upto 1 sec, but $push should complete successful?
I tried finding default wait time in MongoDB and WiredTiger docs, I could find transaction default wait times, but update query.

Internally, WiredTiger uses an optimistic locking method. This means that when two threads are trying to update the same document, one of them will succeed, and the other will back off. This will manifest as a "write conflict" (see metrics.operation.writeConflicts).
This conflict will be retried transparently, so from a client's point of view, the write will just take longer than usual.
The backing off algorithm will wait longer the more conflict it encounters, from 1 ms, and capped at 100 ms per wait. So the more conflict it encounters, it will eventually wait at 100 ms per retry.
Having said that, the design of updating a single document from multiple sources will have trouble scaling in the future due to two reasons:
MongoDB has a limit of 16 MB document size, so the array cannot be updated indefinitely.
Locking issues, as you have readily identified.
For #2, in a pathological case, a write can encounter conflict after conflict, waiting 100 ms between waits. There is no cap limit on the number of waits, so it can potentially wait for minutes. This is a sign that the workload is bottlenecked on a single document, and the app essentially operates on a single-thread model.
Typically the solution is to not create artificial bottlenecks, but to spread out the work across many different documents, perhaps in a separate collection. This way, concurrency can be maintained.

Related

MongoDB TTL doesn't delete documents if under load

Use case
I am using MongoDB to persist messages from a message queue system (e. g. RabbitMQ / Kafka). Each message has a timestamp and based on that timestamp I want to expire the documents 1 hour afterwards. Therefore I've got a deleteAt field which is indexed and has set expireAfterSeconds: 0. Everything works fine, except if MongoDB is under heavy load.
We are inserting roughly 5-7k messages / second into a single replica set. The TTL Thread seems to be way slower than the rate of message coming in and thus the storage is quickly growing (which we want to avoid with TTLs).
To describe the behaviour more exactly, when I sort the messages by deleteAt ascending (oldest date first) I can see that it sometimes does not delete any of those messages for hours. Because of this observation I believe that the TTL thread sometimes is stuck or not active at all.
My question
What could I do to ensure that the TTL thread is not negatively impacted by the rate of messages coming in? According to our metrics our only bottleneck seems to be CPU, even though the SSD disk I/O is pretty high too.
Do I need to tune something (e. g. give MongoDB more threads for document deletion) so that the TTL thread can keep up with the write rate?
I believe I am facing a known bug as described in MongoDB's Jira Dashboard: https://jira.mongodb.org/browse/SERVER-19334
From https://docs.mongodb.com/manual/core/index-ttl/:
The background task that removes expired documents runs every 60 seconds. As a result, documents may remain in a collection during the period between the expiration of the document and the running of the background task.
Because the duration of the removal operation depends on the workload of your mongod instance, expired data may exist for some time beyond the 60 second period between runs of the background task.
I'm not aware of any way to tune that TTL thread, and I suspect you'll need to run your own cron to do batched deletes.
The other thing to look at might be what's taking up CPU and IO and see if there's any way of reducing that load.
You can create the index with "sparse", this should perform the clean up on a separate thread in the background.

How does Mongo's eventual consistency work with a large number of data writes?

I have a flow like this:
I have a Worker that's processing a "large" batch (say, 1M records) and storing the results in Mongo.
Once the batch is complete, a notification message is sent to Publish, which then pulls all the records from Mongo for final publication.
Let's say the Worker write process is done, i.e. it has sent all 1M records to Mongo through a driver. Mongo is "eventually consistent" so I'm not 100% guaranteed all records are written to physical storage at the time the Notify Publish happens.
When Publish does a 'find' and gets a cursor on the collection holding the batch records, is the cursor smart enough to handle the eventual consistency?
So in practical terms let's imagine 750,000 records are actually physically written by Mongo when Notify Publish happens and Publish does its find. Will the cursor traverse 750,000 records and stop or will it block or otherwise handle the remaining 250,000 as they're eventually written to disk (which presumably is very likely to happen while publishing of the first 750K)?
As #BlakesSeven already noted in the comments, "eventual consistency" refers to the fact that in a replicated environment, when a write is finished on the primary, it will only be written to the secondaries eventually. You can modify this behavior at the cost of reduced write performance by setting the write concern to > 1. Setting it to "majority" basically guarantees that a write operation is durable even in case of a failover – though at a (in some cases) drastically reduced performance.
In general here is what happens when you do a write (simplified) with journaling enabled:
The operation is checked for being syntactically correct.
The query optimizer kicks in and does his stuff. (Irrelevant for this question, so I spare the details).
The write operation is applied to the in memory representation of the data set called "private view".
Every commitIntervalMs, the private view is synced to the journal, with a median of 15 or 50ms, depending on the write concern.
On sync, the operation is applied to the shared view. Iirc, this is the point where a new connection would be provided with the new data.
So in order to ensure that the data will be readable by the new connection, simply delay the publish notification by commitIntervalMs + 1, which, given your batch size, is hardly noticeable.

When does mongodb update the index after a write operation is fired?

I want to know when does an index gets updated after write operation(insert/update/remove) is fired. Is it updated after the db-file on disk is updated or before that?
My understanding is that when JOURNALED write concern is used, the data is written to journal file (after ~33 ms) and then the ack is be sent to client. How does indexes add overhead to write operations here? When are they updated in this scenario?
Thanks!
The write includes all of the parts - modifying the data and modifying all the indexes.
The journal keeps track of "commit groups" so that it is able to replay operations completely and consistently. If your client received acknowledgement of a write, it means that all of the parts of that write, data and index (and oplog in the case of a replica node) have been completed.
For this reason, your write speed will be affected by the number of indexes that have to be updated when the document is written: the more indexes, the longer each write will take, in that way, MongoDB is quite similar to traditional RDBMSs.

Does a MongoDB write concern make assurances about previous writes?

MongoDB has configurable durability: When doing an update operation, you can specify a "write concern" to tell the system that you want the update to only be considered complete when the data has (for example) been flushed to disk and replicated to X slaves.
Are there any kind of assurances about not the current update, but the writes that preceded it? If I want to update three documents, do I have to tag the expensive write concern on all of them, or is it sufficient to issue it with just the last operation?
Also, is this reasoning affected by using connection pools (i.e. the three updates being done over three different connections) and sharding (i.e. the three updates affecting more than one shard)?
If you use the same connection for multiple writes, getLastError with j, safe, fsync flag that returns successfully will indicate that the previous operations before the last operation are getting saved/replicated to other servers already.
However, if those previous operations fail for any reasons, you wouldn't know about it if you don't call getLastError for each operation. However, if your writes are sent from different connections, there is no guaranteed that all operations are saved to disk/replicated to other servers.

MongoDB :are reads/writes to database concurrent?

What happens when million threads try to read from and write to MongoDB at the same time? does locking happens on a db-level, table-level or row-level?
It happens at db-level, however with Mongo 2.0 there are a few methods for concurrency, such as inserting/updating by the _id field.
You might run into concurrency problems, especially if you're working with a single MongoDB instance rather than a sharded cluster. The threads would likely start blocking eachother as they wait for writes and other operations to complete and locks to be released.
Locking in MongoDB happens at the global level of the instance, but some operations since v2.0 will yield their locks (update by _id, remove, long cursor iteration). Collection-level locking will probably be added sometime soon.
If you need to have a large number of threads accessing MongoDB, consider placing a queue in front to absorb the impact of the concurrency contention, then execute the queued operations sequentially from a single thread.