I have a question on mongo locks. Basically I have to perform some write operation on the table(insert/ delete/ update). When I read this link Locking in Mongodb. It says "Locks are “writer greedy,” and when a write lock exists, a single write operation holds the lock exclusively, and no other read or write operations may share the lock.
My question is -- The lock is memory block based or we have a single lock on entire db. What I was thinking is concurrently run 2 scripts scanning 2 memory blocks of mongodb (planning to scan 2 million documents in one query) and perform write operation side by side thereby increasing the performance and saving time.
I searched on net about this but didnt find anything satisfactory.
Any help will be deeply appreciated
The write lock has nothing to do with memory, MongoDB is not an in-memory database, the OS merely caches the working set of the mongod process to RAM. MongoDB has no memory hooks in its program.
The write lock is also on database level as such your plan is not feasible.
Related
While researching how to check the size of a MongoDB, I found this comment:
Be warned that dbstats blocks your database while it runs, so it's not suitable in production. https://jira.mongodb.org/browse/SERVER-5714
Looking at the linked bug report (which is still open), it quotes the Mongo docs as saying:
Command takes some time to run, typically a few seconds unless the .ns file is very large (via use of --nssize). While running other operations may be blocked.
However, when I check the current Mongo docs, I don't find that text. Instead, they say:
The time required to run the command depends on the total size of the database. Because the command must touch all data files, the command may take several seconds to run.
For MongoDB instances using the WiredTiger storage engine, after an unclean shutdown, statistics on size and count may off by up to 1000 documents as reported by collStats, dbStats, count. To restore the correct statistics for the collection, run validate on the collection.
Does this mean the WiredTiger storage engine changed this to a non-blocking call by keeping ongoing stats?
a bit late to the game but I found this question while looking for the answer, and the answer is: Yes until 3.6.12 / 4.0.5 it was acquiring a "shared" lock ("R") which block all write requests during the execution. After that it's now an "intent shared" lock ("r") which doesn't block write requests. Read requests were not impacted.
Source: https://jira.mongodb.org/browse/SERVER-36437
I use MongoDB 2.4 with a single DB.
I find all items in a collection (50.000+) and for each one, I insert it into another one.
it = coll1.find()
while (it.hasNext()) {
coll2.save(it.next())
}
Is it a performance issue to make intensive writes when a cusor is open on the same database ?
This essentially comes down to a question about concurrency ( http://docs.mongodb.org/manual/faq/concurrency/ ) being able to do reads on a single database level writer greedy lock performantly while creating a write intensive load.
MongoDB should be able to juggle your read lock with the write lock quite well here, interweaving operations and yielding the current operation under certain conditions that it sees fit to keep performance up (see link supplied above).
This is, of course, in contrast to SQL where read and write operations are isolated, as such this means that MongoDBs concurrency rules actually break the I in ACID. Of course, in SQL the lock is much more granular so you would get relative performance normally.
If you do see a performance hit, mainly due to IO (reading requires IO as well remember) then you might find it prudent to batch your writes into groups of maybe 1000, taking about a 5 second break after each batch to let the IO subside.
No as cursors are not atomic. Each read is its own atomic transaction. This means that mongo is not subject to the issues of ensuring that the cursor represents a single snapshot in time.
I must be wrong about this. I'm considering using mongodb in my project, but I read this:
http://docs.mongodb.org/manual/faq/concurrency/#what-type-of-locking-does-mongodb-use
It says that mongodb uses a database level reader-writer lock.
MySQL InnoDB uses row-level locking. Well, doesn't it means, theoretically, mongodb is 2 levels slower than MySQL for concurrent access?
If you look up readers-writer lock you will find that it is a completely different type of animal than database lock that MySQL is referring to when you use the phrase "row level locking".
Readers-writer lock protects shared memory access, and therefore is extremely short lived (on the order of microseconds). Since in MongoDB operations are only atomic on the document level, these locks (in traditional databases they are sometimes referred to as latches and are used to guard index access) are only held as long as a single document takes to update in memory.
Regular "database lock" will usually exists until the transaction that's in progress has either been committed or rolled back. Because RDBMS transactions can span multiple operations across many tables, these locks are normally much longer lived and therefore must be much more granular to allow other work to happen concurrency.
doesn't it means, theoretically, mongodb is 2 levels slower than MySQL for concurrent access?
No, it really does not, and depending on your exact workload could be a lot faster or a little faster or slower - it all depends on the types of operations you are doing, your available physical resources, the structure of your data, as well as the needs of your application.
Applications that write a lot of data to the database in MongoDB tend to be limited primarily by the available disk IO throughput rate. Only when available disk bandwidth exceeds the amount of writes done by the application to the database would you see concurrency become a factor with MongoDB. With relational databases, because of longer lifetimes of locks, concurrency can become a factor much earlier even with relatively small amount of total data being written.
MoongoDB is from the NoSql era, and Lock is something related to RDBMS? from Wikipedia:
Optimistic concurrency control (OCC) is a concurrency control method for relational database management systems...
So why do i find in PyMongo is_locked , and even in driver that makes non-blocking calls, Lock still exists, Motor has is_locked.
NoSQL does not mean automatically no locks.
There always some operations that do require a lock.
For example building of index
And official MongoDB documentation is a more reliable source than wikipedia(none offense meant to wikipedia :) )
http://docs.mongodb.org/manual/faq/concurrency/
Mongo does in-place updates, so it needs to lock in order to modify the database. There are other things that need locks, so read the link #Tigra provided for more info.
This is pretty standard as far as databases and it isn't an RDBMS-specific thing (Redis also does this, but on a per-key basis).
There are plans to implement collection-level (instead of database-level) locking: https://jira.mongodb.org/browse/SERVER-1240
Some databases, like CouchDB, get around the locking problem by only appending new documents. They create a new, unique revision id and once the document is finished writing, the database points to the new revision. I'm sure there's some kind of concurrency control when changing which revision is used, but it doesn't need to block the database to do that. There are certain downsides to this, such as compaction needing to be run regularly.
MongoDB implements a Database level locking system. This means that operations which are not atomic will lock on a per database level, unlike SQL whereby most techs lock on a table level for basic operations.
In-place updates only occur on certain operators - $set being one of them, MongoDB documentation did used to have a page that displayed all of them but I can't find it now.
MongoDB currently implements a read/write lock whereby each is separate but they can block each other.
Locks are utterly vital to any database, for example, how can you ensure a consistent read of a document if it is currently being written to? And if you write to the document how do you ensure that you only apply that single update at once and not multiple updates at the same time?
I am unsure how version control can stop this in CouchDB, locks are really quite vital for a consistent read and are separate to version control, i.e. what if you wish to apply a read lock to the same version or read a document that is currently being written to a new revision? You will obviously see a lock queue appear. Even though version control might help a little with write lock saturation there will still be a write lock and it will still need to work on a level.
As for concurrency features; MongoDB has the ability (for one), if the data is not in RAM, to subside a operation for other operations. This means that locks will not just sit there waiting for data to be paged in and other operations will run in the mean time.
As a side note, MongoDB actually has more locks than this, it also has a JavaScript lock which is global and blocking, it does not have the normal concurrency features of regular locks.
and even in driver that makes non-blocking calls
Hmm I think you might be confused by what is meant as a "non-blocking" application or server: http://en.wikipedia.org/wiki/Non-blocking_algorithm
I want to avoid doing two operations to achieve the following :
Find document, update with modifier-1.
If document not exist, populate default fields with modifier-2, then update with modifier-1.
it's a common pattern so it should be possible. At the moment I am having to do two upserts.
( feel free to adjust the psuedocode, I am new to the query language).
update( {...}, modifier-1, true)
if(upserted)
{
// check for race condition, detect if another query from another thread
// hasn't populated the default values.
update ( {...,if_a_default_value_does_not_exist}, modifier-2, true)
}
I assume that two operations would result in two disk writes, I understand mongodb does asynchronous disk writes. If I can't do this with one operation, is there some sort of mechanism in place that would merge the writes into a single write before writing to journal / disk ? And yes this would make a significant difference in loading my 300 gb data set :D
Hassan,
The asynchronous writes to disk you mentioned are accomplished by writing the changes to memory and then fsync'ing them onto disk periodically in the background, so merging the two operations would likely not impact performance here as much as you would think.
The journal is another matter entirely - it is written separately to disk in an idempotent manner for safety to allow for easier recovery/restoration in case of failure or other similar issues. You can always start the DB with journaling off, do the import, and then restart with journaling enabled once the bulk update is done if the journal writes are causing you significant issues.
Finally, be careful of the not exists logic in your second modifier - from an indexing perspective a positive operator such as exits is preferred, otherwise indexes may not be used and that will certainly slow down your inserts.
Away from bulk inserts, for single atomic updates you can also explore the use of findAndModify (http://www.mongodb.org/display/DOCS/findAndModify+Command) to do the check and subsequent change for you, it's hard to tell based on the description if that would be a good fit because it has its own drawbacks.