I want to know when does an index gets updated after write operation(insert/update/remove) is fired. Is it updated after the db-file on disk is updated or before that?
My understanding is that when JOURNALED write concern is used, the data is written to journal file (after ~33 ms) and then the ack is be sent to client. How does indexes add overhead to write operations here? When are they updated in this scenario?
Thanks!
The write includes all of the parts - modifying the data and modifying all the indexes.
The journal keeps track of "commit groups" so that it is able to replay operations completely and consistently. If your client received acknowledgement of a write, it means that all of the parts of that write, data and index (and oplog in the case of a replica node) have been completed.
For this reason, your write speed will be affected by the number of indexes that have to be updated when the document is written: the more indexes, the longer each write will take, in that way, MongoDB is quite similar to traditional RDBMSs.
Related
In replica mode each write operation to any collection in any DB, also writes to the oplog collection.
Now, when writing to multiple DBs in parallel, all these write operations also write to the oplog.
My question: do these write operations require locking the oplog ? (I'm using w:1 write concern). If they do, this is kind of similar to having a global lock between all the write operations to all the different DBs, isn't it ?
I'd be happy to get any hints on this.
According to the documentation, in replication, when MongoDB writes to a collection on the primary, MongoDB also writes to the primary’s oplog, which is a special collection in the local database. Therefore, MongoDB must lock both the collection’s database and the local database. The mongod must lock both databases at the same time to keep the database consistent and ensure that write operations, even with replication, are “all-or-nothing” operations.
This means that concurrent writing to multiple database in parallel on the primary can result in global locks between all the write operations. This is not applicable to the secondary, as MongoDB does not apply writes serially to secondaries, but instead collects oplog entries in batches and then apply those batches in parallel.
Disclaimer This is all of the top off my head, so please do not crucify me if I have a mistake. However, please correct me.
Why should they?
Premise: Databases, by definition, are not interconnected
oplog entries are always idempotent
The Oplog is a capped collection, with a guarantee of preserving the insert order
Let's assume true parallelism of queries being applied. So, we have two queries arriving at the very same time and we'd need to decide which one to insert to the oplog first. The first one taking the lock will write first, right? Except, there is a problem. Let's assume the first query is a simple one db.collection.update({_id:"foo"},{$set:{"bar":"baz"}}) while the other query is more complicated and therefor takes longer to evaluate for correctness. So in order to prevent that, a lock had to be taken on arrival and released after the idempotent oplog entry was written.
Here is where I have to rely on my memory
However, queries aren't applied in parallel. Queries are queued and evaluated in order of arrival. The database get's locked upon the application of the queries after they ran through the query optimizer. During that lock the idempotent oplog queries are written to the oplog. Since databases are not interconnected and only one query can be applied to a database at any given time, the lock on the database is sufficient. No two data changing queries can be applied to the same database concurrently anyway, so why should a lock be set on the oplog?
Apparently, a lock is take on the local database. However, since a lock is already taken on the data, I do not see the reason why. *scratchingMyHead*
I have a flow like this:
I have a Worker that's processing a "large" batch (say, 1M records) and storing the results in Mongo.
Once the batch is complete, a notification message is sent to Publish, which then pulls all the records from Mongo for final publication.
Let's say the Worker write process is done, i.e. it has sent all 1M records to Mongo through a driver. Mongo is "eventually consistent" so I'm not 100% guaranteed all records are written to physical storage at the time the Notify Publish happens.
When Publish does a 'find' and gets a cursor on the collection holding the batch records, is the cursor smart enough to handle the eventual consistency?
So in practical terms let's imagine 750,000 records are actually physically written by Mongo when Notify Publish happens and Publish does its find. Will the cursor traverse 750,000 records and stop or will it block or otherwise handle the remaining 250,000 as they're eventually written to disk (which presumably is very likely to happen while publishing of the first 750K)?
As #BlakesSeven already noted in the comments, "eventual consistency" refers to the fact that in a replicated environment, when a write is finished on the primary, it will only be written to the secondaries eventually. You can modify this behavior at the cost of reduced write performance by setting the write concern to > 1. Setting it to "majority" basically guarantees that a write operation is durable even in case of a failover – though at a (in some cases) drastically reduced performance.
In general here is what happens when you do a write (simplified) with journaling enabled:
The operation is checked for being syntactically correct.
The query optimizer kicks in and does his stuff. (Irrelevant for this question, so I spare the details).
The write operation is applied to the in memory representation of the data set called "private view".
Every commitIntervalMs, the private view is synced to the journal, with a median of 15 or 50ms, depending on the write concern.
On sync, the operation is applied to the shared view. Iirc, this is the point where a new connection would be provided with the new data.
So in order to ensure that the data will be readable by the new connection, simply delay the publish notification by commitIntervalMs + 1, which, given your batch size, is hardly noticeable.
I have a large collection with more that half a million of docs, which I need to updated continuously. To achieve this, my first approach was to use w=1 to ensure write result, which causes a lot of delay.
collection.update(
{'_id': _id},
{'$set': data},
w=1
)
So I decided to use w=0 in my update method, now the performance got significantly faster.
Since my past bitter experience with mongodb, I'm not sure if all the update are guaranteed when w=0. My question is, is it guaranteed to update using w=0?
Edit: Also, I would like to know how does it work? Does it create an internal queue and perform update asynchronously one by one? I saw using mongostat, that some update is being processed even after the python script quits. Or the update is instant?
Edit 2: According to the answer of Sammaye, link, any error can cause silent failure. But what happens if a heavy load of updates are given? Does some updates fail then?
No, w=0 can fail, it is only:
http://docs.mongodb.org/manual/core/write-concern/#unacknowledged
Unacknowledged is similar to errors ignored; however, drivers will attempt to receive and handle network errors when possible.
Which means that the write can fail silently within MongoDB itself.
It is not reliable if you wish to specifically guarantee. At the end of the day if you wish to touch the database and get an acknowledgment from it then you must wait, laws of physics.
Does w:0 guarantee an update?
As Sammaye has written: No, since there might be a time where the data is only applied to the in memory data and is not written to the journal yet. So if there is an outage during this time, which, depending on the configuration, is somewhere between 10 (with j:1 and the journal and the datafiles living on separate block devices) and 100ms by default, your update may be lost.
Please keep in mind that illegal updates (such as changing the _id of a document) will silently fail.
How does the update work with w:0?
Assuming there are no network errors, the driver will return as soon it has send the operation to the mongod/mongos instance with w:0. But let's look a bit further to give you an idea on what happens under the hood.
Next, the update will be processed by the query optimizer and applied to the in memory data set. After sucessful application of the operation a write with write concern w:1 would return now. The operations applied will be synced to the journal every commitIntervalMs, which is divided by 3 with write concern j:1. If you have a write concern of {j:1}, the driver will return after the operations are stored in the journal successfully. Note that there are still edge cases in which data which made it to the journal won't be applied to replica set members in case a very "well" timed outage occurs now.
By default, every syncPeriodSecs, the data from the journal is applied to the actual data files.
Regarding what you saw in mongostat: It's granularity isn't very high, you might well we operations which took place in the past. As discussed, the update to the in memory data isn't instant, as the update first has to pass the query optimizer.
Will heavy load make updates silently fail with w:0?
In general, it is safe to say "No." And here is why:
For each connection, there is a certain amount of RAM allocated. If the load is so high that mongo can't allocate any further RAM, there would be a connection error – which is dealt with, regardless of the write concern, except for unacknowledged writes.
Furthermore, the application of updates to the in memory data is extremely fast - most likely still faster than they come in in case we are talking of load peaks. If mongod is totally overloaded (e.g. 150k updates a second on a standalone mongod with spinning disks), problems might occur, of course, though even that usually is leveraged from a durability point of view by the underlying OS.
However, updates still may silently disappear in case of an outage when the write concern is w:0,j:0 and the outage happens in the time the update is not synced to the journal.
Notes:
The optimal balance between maximum performance and minimal guaranteed durability is a write concern of j:1. With a proper setup, you can reduce the latency to slightly over 10ms.
To further reduce the latency/update, it might be worth having a look at bulk write operations, if those apply to your use case. In my experience, they do more often than not. Please read and try before dismissing the idea.
Doing write operations with w:0,j:0 is highly discouraged in case you expect any guarantee on data durability. Use a t your own risk. This write concern is only meant for "cheap" data, which is easy to reobtain or where speed concern exceeds the need for durability. Collecting real time weather data in a large scale would be an example – the system still works, even if one or two data points are missing here and there. For most applications, durability is a concern. Conclusion: use w:1,j:1 at least for durable writes.
I've been read a lot about MongoDB recently, but one topic I can't find any clear material on, is how data is written to the journal and oplog.
So this is what I understand of the process so far, please correct me where I'm wrong
A client connect to mongod and performs a write. The write is stored in the socket buffer
When Mongo is available (not sure what available means at this point), data is written to the journal?
The mongoDB docs then say that writes every 60 seconds are flushed from the journal onto disk. By this I can only assume this mean written to the primary and the oplog. If this is the case, how to writes appear earlier than the 60 seconds sync interval?
Some time later, secondaries suck data from the primary or their sync source and update their oplog and databases. It seems very vague about when exactly this happens and what delays it.
I'm also wondering if journaling was disabled (I understand that's a really bad idea), at what point does the oplog and database get updated?
Lastly I'm a bit stumpted at which points in this process, the write locks get created. Is this just when the database and oplog are updated or at other times too?
Thanks to anyone who can shed some light on this or point me to some reading material.
Simon
Here is what happens as far as I understand it. I simplified a bit, but it should make clear how it works.
A client connects to mongo. No writes done so far, and no connection torn down, because it really depends on the write concern what happens now.Let's assume that we go with the (by the time of this writing) default "acknowledged".
The client sends it's write operation. Here is where I am really not sure. Either after this step or the next one the acknowledgement is sent to the driver.
The write operation is run through the query optimizer. It is here where the acknowledgment is sent because with in an acknowledged write concern, you may be returned a duplicate key error. It is possible that this was checked in the last step. If I should bet, I'd say it is after this one.
The output of the query optimizer is then applied to the data in memory Actually to the data of the memory mapped datafiles, to the memory mapped oplog and to the journal's memory mapped files. Queries are answered from this memory mapped parts or the according data is mapped to memory for answering the query. The oplog is read from memory if present, too.
Every 100ms in general the journal is synced to disk. The precise value is determined by a number of factors, one of them being the journalCommitInterval configuration parameter. If you have a write concern of journaled, the driver will be notified now.
Every syncDelay seconds, the current state of the memory mapped files is synced to disk I think the journal is truncated to the entries which weren't applied to the data yet, but I am not too sure of that since that it should basically never happen that data in the journal isn't yet applied to the current data.
If you have read carefully, you noticed that the data is ready for the oplog as early as it has been run through the query optimizer and was applied to the files mapped into memory. When the oplog entry is pulled by one of the secondaries, it is immediately applied to it's data of the memory mapped files and synced in the disk the same way as on the primary.
Some things to note: As soon as the relatively small data is written to the journal, it is quite safe. If a node goes down between two syncs to the datafiles, both the datafiles and the oplog can be restored from their last state in the datafiles and the journal. In general, the maximum data loss you can have is the operations recorded into the log after the last commit, 50ms in median.
As for the locks. If you have written carefully, there aren't locks imposed on a database level when the data is synced to disk. Write locks may be created in order to assure that only one thread at any given point in time modifies a given document. There are other write locks possible , but in general, they should be rather rare.
Write locks on the filesystem layer are created once, though only implicitly, iirc. During application startup, a lock file is created in the root directory of the dbpath. Any other mongod instance will refuse to do any operation on those datafiles while a valid lock exists. And you shouldn't either ;)
Hope this helps.
MongoDB has configurable durability: When doing an update operation, you can specify a "write concern" to tell the system that you want the update to only be considered complete when the data has (for example) been flushed to disk and replicated to X slaves.
Are there any kind of assurances about not the current update, but the writes that preceded it? If I want to update three documents, do I have to tag the expensive write concern on all of them, or is it sufficient to issue it with just the last operation?
Also, is this reasoning affected by using connection pools (i.e. the three updates being done over three different connections) and sharding (i.e. the three updates affecting more than one shard)?
If you use the same connection for multiple writes, getLastError with j, safe, fsync flag that returns successfully will indicate that the previous operations before the last operation are getting saved/replicated to other servers already.
However, if those previous operations fail for any reasons, you wouldn't know about it if you don't call getLastError for each operation. However, if your writes are sent from different connections, there is no guaranteed that all operations are saved to disk/replicated to other servers.