I have a question about mongoDb's mapReduce funcion. Say there is currently a mapReduce running that will take a long time. What will happen when a user tries to acces the same collection the mapReduce is writing to?
Does the map reduce write all the data after it is finished or does it write it while running?
Long running read and write operations, such as queries, updates, and deletes, yield under many conditions. MongoDB operations can also yield locks between individual document modifications in write operations that affect multiple documents like update() with the multi parameter.
In Map reduce mongoDB is doing Read and write lock, unless operations
are specified as non-atomic. Portions of map-reduce jobs can run
concurrently.
See the concurrency page for details on mongodb locking. For your case, the map-reduce command takes a read and write lock for the relevant collections while it is running. Portions of the map-reduce command can be concurrent, but in the general case it is locked while running.
Related
I understand you cannot do transactions in MongoDB and the thinking is that its not needed because everything locks the whole database or collection, I am not sure which. However how then do you perform the following?
How do I chain together multiple insert, update, delete or select queries in mongodb so that other queries that might operate on the same data wait until these queries finish? An analogy would be serialization transaction isolation in ms sql server.
more..
I want to insert/update record into collection A and update a record in collection B and then read Collection A and B but I don't want anyone (process or thread) to read or write to collection A or B until BOTH A and B have been updated or inserted by the first queries.
Yes, that's absolutely possible.
It is called ordered bulk operations on planet Mongo and works like this in the mongo shell:
bulk = db.emptyCollection.initializeOrderedBulkOp()
bulk.insert({name:"First document"})
bulk.find({name:"First document"})
.update({$set:{name:"First document, updated"}})
bulk.execute()
bulk.findOne()
> {_id: <someObjectId>, name:"First document, updated"}
Please read the manual regarding Bulk Write Operations for details.
Edit: Somehow is misread your question. It isn't possible for two collections. Remember though, that you can have different documents in one collection. Some ODMs even allow to have different models saved to the same collection. Exploiting this, you should be able to achieve what you want using the above bulk operations. You may want to combine this with locking to prevent writing. But preventing reading and writing would be the same as an transaction in terms of global and possibly distributed locks.
I have a collection in my database
1.I want to lock my collection when the User Updating the Document
2.No operations are Done Expect Reads while Updating the collection for another Users
please give suggestions how to Lock the collection in MongoDB
Best Regards
GSY
MongoDB implements a writer greedy database level lock already.
This means that when a specific document is being written to:
The User collection would be locked
No reads will be available until the data is written
The reason that no reads are available is because MongoDB cannot do a consistent read while writing (darn you physics, you win again).
It is good to note that if you wish for a more complex lock, spanning multiple rows, then this will not be available in MongoDB and there is no real way of implementing such a thing.
MongoDB locking already does that for you. See what operations acquire which lock and what does each lock mean.
See the MongoDB documentation on write operations paying special attention to this section:
Isolation of Write Operations
The modification of a single document is always atomic, even if the write operation modifies >multiple sub-documents within that document. For write operations that modify multiple >documents, the operation as a whole is not atomic, and other operations may interleave.
No other operations are atomic. You can, however, attempt to isolate a write operation that >affects multiple documents using the isolation operator.
To isolate a sequence of write operations from other read and write operations, see Perform >Two Phase Commits.
I have been using Mongo's map/reduce for some time and this blocks other operations since the JS engine in 2.0 took out a lock (or so I believe). I am just experimenting with the new aggregation framework in 2.2 and had hoped that since it's just reading it would not need to lock, but according to db.currentOps() it is locking.
So in relation to the new aggregation framework (and indeed with any MongoDB operation) I would like to know if it's possible to indicate a priority of a certain operation so that MongoDB can intelligently yield low priority operations (such as some background updates) to a time-sensitive high-priority operation?
In this doc you can see it says Map Reduce "Allows substantial concurrent operation but exclusive to other javascript execution."
So this means that it already yields operation on the database. Plus mostly this is bounded to the map/reduce being a single threaded JavaScript operation.
But if you want make sure there is no locking you can write the output of Map Reduce to another database and then move the collection to the original database.
>use admin
>db.runCommand( {renameCollection: "mapreducedb.mycol", to: "appdb.mycol"} )
Same for the Aggregation Framework
EDIT: can't be used for the Aggregation framework (as of 2.2), as it does not have an {{$out}} operator to write to another database. But Aggregation operations are still safe to execute on the production/main database as Yielding of those operations will still occur.
Does MongoDB map reduce lock a collection when performing an operation on it?
I have some collections that are widely and intensively used by an application. A Map/Reduce runs in the background every 10 minutes via a cron job, on that widely and intensively used collection.
I want to know if there is a high probability that Map/Reduce won't perform well because other operations are in progress (inserts, updates, and mostly reads) on that collection. In particular, I want know if Map/Reduce interferes with normal operations performed on the collection by users.
MapReduce, if outputting to a collection will take multiple write locks out as it writes (as any operation which is creating/updating a collection would). If you are doing an in-line MR, then you avoid that locking (but have limitations on result sizes). Even so, there are still read-locks and the Javascript lock (single threaded for server side JS on mongoDB right now).
This is all explained (and will be updated if it changes) here:
http://www.mongodb.org/display/DOCS/How+does+concurrency+work#Howdoesconcurrencywork-MapReduce
Note: the SpiderMonkey to V8 JS engine migration issues are ones to watch if multi-threading is something you are concerned about.
Is it possible to run MongoDB commands like a query to grab additional data or to do an update from with in MongoDB's MapReduce command. Either in the Map or the Reduce function?
Is this completely ludicrous to do anyways? Currently I have some documents that refer to separate collections using the MongoDB DBReference command.
Thanks for the help!
Is it possible to run MongoDB commands... from within MongoDB's MapReduce command.
In theory, this is possible. In practice there are lots of problems with this.
Problem #1: exponential work. M/R is already pretty intense and poorly logged. Adding queries can easily make M/R run out of control.
Problem #2: context. Imagine that you're running a sharded M/R and you are querying into an unsharded collection. Does the current context even have that connection?
You're basically trying to implement JOIN logic and MongoDB has no joins. Instead, you may need to build the final data in a couple of phases by running a few loops on a few sets of data.