What level does mongodb write lock take place? - mongodb

I am beginning to research technology for a project, that can have frequent large writes. I am wondering at what level does mongo write lock take place? Is it at the server level or database level? I have read http://www.mongodb.org/display/DOCS/How+does+concurrency+work but official documentation says. a write operation can block all other operations.
To me this means write locks are server level but I am hoping they are db level. Could someone please confirm or deny this?

At the moment, MongoDB does indeed have a global server lock. However, there is some additional code that will release the lock in case memory blocks have to be loaded from disk. It uses lock-yielding for that. Although this does not solve all concurrency issues, it addresses quite a few of the generally associated problems. This post describes it well: http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html
From MongoDB 2.2, there will be a database-level lock, and also more work on yielding is done.

Related

Transactional guarantee in mongodb

So, I am doing research on MongoDB, in line with upper management decision to embrace open source and to migrate existing product database from SQL Server to MongoDB and revamp the entire thing. Do note that our database should focus on data consistency and transactional guarantee.
And i discover this post: Click here. A summary of the post is as follow:
MongoDB claims to be strongly consistent, but a lot of evidence
recently has shown this not to be the case in certain scenarios (when
network partitioning occurs, which can happen under heavy load). This
means that you can potentially lose records that MongoDB has
acknowledged as "successfully written".
In terms of your application, if you have a need to have transactional
guarantees (meaning if you can't make a durable write, you need the
transaction to fail), you should avoid MongoDB. Example scenarios
where strong consistency and durability are essential include "making
a deposit into a bank account" or "creating a record of birth". To put
it another way, these are scenarios where you would get punched in the
face by your customer if you indicated an operation succeeded and it
didn't.
So, my question is as follow:
1) To what extend does "lost data" still valid in current version of MongoDB?
2) What approach can be take to ensure transactional guarantee in MongoDB?
I am pretty sure that if company like PayPal do use MongoDB, there is certainly a way of overcoming these issue.
The references in that post have been discussed here before (for example, here is one: MongoDB: Does Write Concern guarantee the write on primary plus atleast one secondary ). No need to duplicate your question.
The blog "Aphyr" mostly uses these articles to tout it's own tech (if you read the entire blog you will realise they have their own database which they market). Every database they show loses data except their own.
2) What approach can be take to ensure transactional guarantee in MongoDB?
I agree you should be handling database problems in client code, if not then how is your client side ever going to remain consistent in the event of partitions?
Since you are not Harry Potter (are you?) I will say that you need to check for exceptions thrown in your client code and react to them as needed.
1) To what extend does "lost data" still valid in current version of MongoDB?
As for the bug he mentions in 2.4.3: he fails (as I mention in the linked post) to state the bug reference so again, no comment still.
Besides 2 writes in 6,000? That's less data loss than I have seen in MySQL on a partition! So not too shabby.
I have not noticed such behaviour in my own app and, from small to extremely large sites, I have not noticed anyone reproduce benchmark type scenarios as displayed in that article, I doubt very much you will.
I am pretty sure that if company like PayPal do use MongoDB, there is certainly a way of overcoming these issue.
They would have tight coding to ensure consistency in distributed environments.
Of course, they would start by choosing the right tech for the situation...
Write Concern Reference
Write concern describes the level of acknowledgement requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters. In sharded clusters, mongos instances will pass the write concern on to the shards.
https://docs.mongodb.org/v3.0/reference/write-concern/

Why using Locking in MongoDB?

MoongoDB is from the NoSql era, and Lock is something related to RDBMS? from Wikipedia:
Optimistic concurrency control (OCC) is a concurrency control method for relational database management systems...
So why do i find in PyMongo is_locked , and even in driver that makes non-blocking calls, Lock still exists, Motor has is_locked.
NoSQL does not mean automatically no locks.
There always some operations that do require a lock.
For example building of index
And official MongoDB documentation is a more reliable source than wikipedia(none offense meant to wikipedia :) )
http://docs.mongodb.org/manual/faq/concurrency/
Mongo does in-place updates, so it needs to lock in order to modify the database. There are other things that need locks, so read the link #Tigra provided for more info.
This is pretty standard as far as databases and it isn't an RDBMS-specific thing (Redis also does this, but on a per-key basis).
There are plans to implement collection-level (instead of database-level) locking: https://jira.mongodb.org/browse/SERVER-1240
Some databases, like CouchDB, get around the locking problem by only appending new documents. They create a new, unique revision id and once the document is finished writing, the database points to the new revision. I'm sure there's some kind of concurrency control when changing which revision is used, but it doesn't need to block the database to do that. There are certain downsides to this, such as compaction needing to be run regularly.
MongoDB implements a Database level locking system. This means that operations which are not atomic will lock on a per database level, unlike SQL whereby most techs lock on a table level for basic operations.
In-place updates only occur on certain operators - $set being one of them, MongoDB documentation did used to have a page that displayed all of them but I can't find it now.
MongoDB currently implements a read/write lock whereby each is separate but they can block each other.
Locks are utterly vital to any database, for example, how can you ensure a consistent read of a document if it is currently being written to? And if you write to the document how do you ensure that you only apply that single update at once and not multiple updates at the same time?
I am unsure how version control can stop this in CouchDB, locks are really quite vital for a consistent read and are separate to version control, i.e. what if you wish to apply a read lock to the same version or read a document that is currently being written to a new revision? You will obviously see a lock queue appear. Even though version control might help a little with write lock saturation there will still be a write lock and it will still need to work on a level.
As for concurrency features; MongoDB has the ability (for one), if the data is not in RAM, to subside a operation for other operations. This means that locks will not just sit there waiting for data to be paged in and other operations will run in the mean time.
As a side note, MongoDB actually has more locks than this, it also has a JavaScript lock which is global and blocking, it does not have the normal concurrency features of regular locks.
and even in driver that makes non-blocking calls
Hmm I think you might be confused by what is meant as a "non-blocking" application or server: http://en.wikipedia.org/wiki/Non-blocking_algorithm

Is it possible to default all MongoDB writes to safe? What is the performance hit from doing this?

For MongoDB 2.2.2, is it possible to default all writes to safe, or do you have to include the right flags for each write operation individually?
If you use safe writes, what is the performance hit?
We're using MongoMapper on Rails.
If you are using the latest version of 10gen official drivers, then the default actually is safe, not fire-and-forget (which used to be the default).
You can read this blog post by 10gen co-founder and CTO which explains some history and announces that all MongoDB drivers as of late November use "safe" mode by default rather than "fire-and-forget".
MongoMapper is built on top of 10gen supported Ruby Driver, they also updated their code to be consistent with the new defaults. You can see the check-in and comments here for the master branch. Since I'm not certain what their release schedule is, I would recommend you ask on MongoMapper mailing list.
Even prior to this change you could set "safe" value on connection level in MongoMapper which is as good as global. Starting with 0.11, you can do it in mongo.yml file. You can see how in the release notes.
The bottom line is that you don't have to specify safe mode for each write, but you can still specify higher or lower than default durability for each individual write operation if you wish, but when you switch to the latest versions of everything, then you will be using "safe writes" globally by default.
I do not use mongomapper so I can only answer a little.
In terms of the database, depends. A safe write is basically (basically being the keyword) waiting for the database to do what it would normally do after you got a default "I am done" response from a fire and forget.
There is more work depending on how safe you want the write to be. A good example is a write to a single node and one to many nodes. If you write to that single node you will get a quicker response from the database than if you wish to replicate the command (safely) to other nodes in the network.
Any amount of safe write does, of course, cause a performance hit in the amount of writes you can send to the server since there is more work required before a response is given which means less writes able to be thrown at the database. The key thing is getting the balance just right for your application, between speed and durability of your data.
Many drivers now (including MongoDB PHP 1.3, using a write concern of 1: http://php.net/manual/en/mongo.writeconcerns.php ) are starting to use normal safe writes by default and normal fire and forget queries are starting to abolished by default.
By looking at the mongomapper documentation: http://mongomapper.com/documentation/plugins/safe.html it seems you must still add the flags everywhere.

mongodb: atomically rename two collections?

I have two existing collections "A" and "B". I need to rename "B" to "C", and rename "A" to "B", without permitting any writes to B during that time. The rename itself activates the global lock, but I need to prevent writes from occurring in between renames. Is this possible?
Here's my code:
db.B.renameCollection('C')
<-- prevent writes from occurring to B in between commands
db.A.renameCollection('B')
Edit: I'm using mongodb version 1.8.1, and changing versions is not currently an option.
Mongodb itself cannot handle this, the only way you could do this is with some custom code.
If this will only occur one time in your app ( I guess renaming collections is not something that is done often ) you could have a more 'aggressive' approach, where you search for a flag in your database that will mean 'collection db.B has been renamed but db.A not yet'. If all your writes check for this before submitting the write to the server and just return if the flag is set, it can protect the app from writing to db.A after db.B is renamed.
I consider this the 'aggressive' approach since it clearly affects performance ( still, reads are so fast, you probably won't feel it ).
If your app runs on a single web server (and not a web farm) you can have the synchronization mechanism on the web app itself, using thread synchronization tools like semaphores, etc or even some thread safe variable that will be used as the flag I suggested above. (depends on the server side technology you are using )
You can create a function named "renameCollection" and put a lock on it :
db.runCommand({eval:renameCollection,args:["Collection1","Collection2"],nolock:false});
The lock allows to do this kind of operations safely and make wait the requests
As you could guess: this is not possible. No transaction support, only atomic operations.
MongoDB has no sense of transactional renames, in fact I am not sure if SQL does in this case either, however you could accomplish this with a bit of server-side programming and a lock collection.
From your server side language you can fire off the commands while writing a row to a lock table, each query against B will check for lock, if not found will write otherwise will bail out.
This is a simple method however most likely a bit tedious, especially if you have a very segmented code base that does not house a standardised query layer between the server-side code and the database.
I should also note that renameCollection will not work on sharded collections, you most likely already knew that but I thought I would just say it anyway. In the case of sharded collection it would be better to "move" the collection instead via copy OPs.
I work for Tokutek on TokuMX with multi-statement transactions.
As other answers have said, MongoDB cannot do this (to the best of my knowledge), but TokuMX can. TokuMX has multi-statement transactions on non-sharded clusters. To perform this operation, you can do:
db.beginTransaction()
db.B.renameCollection('C')
db.A.renameCollection('B')
db.commitTransaction()

Is there any method to guarantee transaction from the user end

Since MongoDB does not support transactions, is there any way to guarantee transaction?
What do you mean by "guarantee transaction"?
There are two conepts in MongoDB that are similar;
Atomic operations
Using safe mode / getlasterror ...
http://www.mongodb.org/display/DOCS/Last+Error+Commands
If you simply need to know if there was an error when you run an update for example you can use the getlasterror command, from the docs ...
getlasterror is primarily useful for
write operations (although it is set
after a command or query too). Write
operations by default do not have a
return code: this saves the client
from waiting for client/server
turnarounds during write operations.
One can always call getLastError if
one wants a return code.
If you're writing data to MongoDB on
multiple connections, then it can
sometimes be important to call
getlasterror on one connection to be
certain that the data has been
committed to the database. For
instance, if you're writing to
connection # 1 and want those writes to
be reflected in reads from connection #2, you can assure this by calling getlasterror after writing to
connection # 1.
Alternatively, you can use atomic operations for cases where you need to increment a value for example (like an upvote, etc.) more about that here:
http://www.mongodb.org/display/DOCS/Atomic+Operations
As a side note, MySQL's default storage engine doesn't have transaction either! :)
http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html
MongoDB only supports atomic operations. There is no ways implement transaction in the sense of ACID on top of MongoDB. Such a transaction support must be implemented in the core. But you will never see full transaction support due to the CARP theorem. You can not have speed, durability and consistency at the same time.
I think ti's one of the things you choose to forego when you choose a NoSQL solution.
If transactions are required, perhaps NoSQL is not for you. Time to go back to ACID relational databases.
Unfortunately MongoDB does't support transaction out of the box, but actually you can implement ACID optimistic transactions on top on it. I wrote an example and some explanation on a GitHub page.