my MongoDB collection is used as a job queue, and there are 3 C++ machines that read from this collection. The problem is that those three can't perform the same job. All jobs need to be made only once.
I fetch all un-done jobs by searching the collection for all records with 'isDone:False' and then update this document, 'isDone:True'. But if 2 machines find the same document at the same time, they would to the same job both. How can I avoid this?
Edit: My question is - do findAndModify really solves that problem?
(After reading A way to ensure exclusive reads in MongoDb's findAndModify?)
Yes, findAndModify solve it.
Ref: MongoDB findAndModify from multiple clients
"...
Note: This command obtains a write lock on the affected database and will block other operations until it has completed; however, typically the write lock is short lived and equivalent to other similar update() operations.
..."
Ref: http://docs.mongodb.org/manual/reference/method/db.collection.update/#db.collection.update
"...
For unsharded collections, you can override this behavior with the $isolated isolation operator, which isolates the update operation and blocks other write operations during the update. See the isolation operator.
..."
Ref: http://docs.mongodb.org/manual/reference/operator/isolated/
Regards,
Moacy
Yes, find-and-modify will solve your problem:
db.collection.findAndModify( {
query: { isDone: false },
update: { $set: { isDone: true } },
new: true,
upsert: false # never create new docs
} );
This will return a single document that it just updated from false to true.
But you have a serious problem if your C++ clients ever have a hiccup (the box dies, they are killed, the code has an error, etc.) Imagine if your TCP connection drops just after the update on the server, but before the C++ code gets the job. It's generally better to have multi-phase approach:
change "isDone" to "isInProgress", then when it's done, delete the document. (Now, you can see the stack of "todo" and "being done". If something is "being done" for a long time, the client probably died.
change "isDone" to "phase" and atomically set it from "new" to "started" (and later set it to "finished"). Now you can see if something is "started" for a long time, the client may have died.
If you're really sophisticated, you can make a partial index. For example, "Only index documents with "phase:{ $ne: 'finished'}". Now you don't need to waste space indexing the millions of finished documents. The index only holds the handful of new/in-progress documents, so it's smaller/faster.
Related
I have a REST API (distributed with across multiple hosts/containers) that uses MongoDB as a database. Among the collections in my database, I want to focus on the Users and Games collections in this example.
Let's say I have an endpoint (called by the user/client) called /join_game. This endpoint will step-through the following logic:
Check if game is open (query the Games model)
If the game is open, allow the user to join (continue with below logic)
Add player to the participants field in the Games model and update that document
Update some fields in the Users document (stats, etc.)
And let there be another endpoint (called by a cron job on the server) called /close_game which steps-through the following logic:
Close the game (update the Games Model)
Determine the winner & update their stats (update in the Users model)
Now I know that the following race condition is possible between two concurrent requests handled by each of the endpoints:
request A to /join_game called by a client - /join_game controller checks if game is open (it is so it proceeds with the rest of the endpoint logic)
request B to /close_game called internally by the server - /close_game controller sets the game as closed within the game's document
If these requests are concurrent and request A is called before request B, then the remaining /join_game logic potentially might be executed despite that the game is technically locked. Now this is obvious behavior I don't want and can introduce many errors/unexpected outcomes.
To prevent this, I looked into using the transactions API since it makes all database operations within the transaction atomic. However, I'm not sure if transactions actually solve my case since I'm not sure if they place a complete lock on the documents being queried and modified (I read that mongodb uses shared locks for reads and exclusive locks for writes). And if they do put a complete lock, would other database calls to those documents within the transaction just wait for those transactions to complete? I also read that transactions abort if they wait after a certain period of time (which can also lead to unwanted behavior).
If transactions are not the way to go about preventing race conditions across multiple different endpoints, I'd like to know of any good alternative methods.
Originally, I was using an in-memory queue for handling these race conditions which seemed to have work on a server running the REST API on a single node. But as I scale up, managing this queue amongst distributed servers will become more of an issue so I'd like to handle these race-conditions directly within mongo if possible.
Thanks!
From my understanding, it looks like you don't need to use Transactions within MongoDB but you can use MongoDB atomic update operations.
Each operation to a document gets executed one at a time, thus meaning if you join a game while a close game gets called and executes first then you just won't be able to join.
This is why schema design is important, you'll need to figure out how you can model your document to allow atomic updates to solve the concurrency issues. For updating other collections which are view which might be how many times a player has won/lost/games joined. I'd make that all eventually consistent on back of events.
You can also use the FindOneAndModify operations which can return the new state of the document once the update has been completed. Which becomes really useful for dealing with concurrency.
db.test.insert({ _id: 1, name: "Game 1", players : [ ], isClosed: false})
db.test.insert({ _id: 2, name: "Game 2", players : [ ], isClosed: false})
db.test.insert({ _id: 3, name: "Game 2", players : [ ], isClosed: false})
db.test.update({ _id: 1, isClosed: false}, {$push: { players: 20} })
db.test.update({ _id: 1, isClosed: false}, {isClosed: true, players : [] })
I've got a document that needs to be read and updated. Meanwhile, it's quite likely that another process is doing the same which would break the document update.
For example, if Process A reads document d and adds field 'a' to it and writes the document, and Process B reads document d before Process A writes it, and adds field b and writes the document, then whichever process writes the changes out will get their change because it clobbers the change by the one that wrote first.
I've read this article and some other very complicated transaction articles around mongo. Can someone describe a simple solution to this - I have not come across something that makes me comfortable with this yet.
https://www.mongodb.com/blog/post/how-to-select--for-update-inside-mongodb-transactions
[UPDATE]- In addition, I'm trying to augment a document that might not yet exist. I need to create the document if it doesn't exist. I also need to read it to analyze it. One key is "relatedIds" (an array). I push to that array if the id is not found in it. Another method I have that needs to create the document if it doesn't exist adds to a separate collection of objects.
[ANOTHER UPDATE x2] --> From what I've been reading and getting from various sources - is that the only way to properly create a transaction for this - is to "findOneAndModify" the document to mark it as dirty with some field that will definitely update, such as "lock" with an objectId (since that will never result in a NO-OP - ie, it definitely causes a change).
If another operation tries to write to it, Mongo can now detect that this record is already part of a transaction.
Thus anything that writes to it will cause a writeError on that other operation. My transaction can then slowly work on that record and have a lock on it. When it writes it out and commits, that record is definitely not touched by anything else. If there's no way to do this without a transaction for some reason, then am I creating the transaction in the easiest way here?
Using Mongo's transactions is the "proper" way to go but i'll offer a simple solution that is sufficient ( with some caveats ).
The simplest solution would be to use findOneAndUpdate to read the document and update a new field, let's call it status, since it is atomic this is possible.
the query would look like so:
const doc = await db.collection.findOneAndUpdate(
{
_id: docId,
status: { $ne: 'processing' }
},
{
$set: {
status: 'processing'
}
}
);
so if dov.value is null then it means (assuming the document exists) that another process is processing it. When you finish processing you just need to reset status to be any other value.
Now because you are inherently locking this document from being read until the process finishes you have to make sure that you handle cases like an error thrown throughout the process, update failure, db connection issue's, etc .
Overall I would be cautious about using this approach as it will only "lock" the document for the "proper" queries ( every single process needs to be updated to use the status field ), which is a little problematic, depending on your usecase.
TLDR: Is there a way I can conditionally update a Meteor Mongo record inside a collection, so that if I use the id as a selector, I want to update if that matches and only if the revision number is greater than what already exists, or perform an upsert if there is no id match?
I am having an issue with updates to server side Meteor Mongo collections, whereby it seems the added() function callback in the Observers is being triggered on an upsert.
Here is what I am trying to do in a nutshell.
My meteor js app boots and then connects to an endpoint, fetching data and then upserting it into the collection.
collection.update({'sys.id': item.sys.id}, item, {upsert: true});
The 'sys.id' selector checks to see if the item exists, and then updates if it does or adds if it does not.
I have an observer monitoring the above collection, which then acts when an item has been added/updated to the collection.
collection.find({}).observeChanges({
added: this.itemAdded.bind(this),
changed: this.itemChanged.bind(this),
removed: this.itemRemoved.bind(this)
});
The first thing that puzzles me is that when the app is closed and then booted again, the 'added()' callback is fired when the collection is observed. What I would hope to happen is that the changed() callback is fired.
Going back to my original update - is it possible in Mongo to conditionally update something, so you have the selector, then the item, but only perform the update when another condition is met?
// Incoming item
var item = {
sys: {
id: 1,
revision: 5
}
};
collection.update({'sys.id': item.sys.id, 'sys.revision': {$gt: item.sys.revision}, item, {upsert: true});
If you look at the above code, what this is going to do is try to match the sys.id which is fine, but then the revisions will of course be different which means the update function will see it as a different document and then perform a new insert, thus creating duplicate data.
How do I fix this?
To your main question:
What you want is called findAndModify. First, look for the the document meeting the specs, and then update accordingly. This is a really powerful idea because if you did it in 2 queries, the document you found could be deleted/updated before you got to update it. Luckily for you, someone made a package (I really wish this existed a year ago!) https://github.com/fongandrew/meteor-find-and-modify
If you were to do this without using findAndModify you'd have to use javascript to find the doc, see if it matches your criteria, and then update it. In your use case, this would probably work, but there will always be that "what if" in the back of your mind.
Regarding observeChanges, the added is called each time the local minimongo receives a document (it's just reading what the DDP is telling it). Since a refresh will delete your local collection, you have to add those docs one by one. What you could do is wait until all added callbacks have fired, and then run your server method. In doing so, you get a ton of adds, and then a couple more changes will trickle in afterwards.
As Matt K said, you want findAndModify. There are some gotchas to be aware of:
findAndModify is about 100x slower than a find followed by an update. Find+modify is, obviously, not atomic and so won't do what you need, but be aware of the speed hit. (This is based off experience with MongoDB v2.4, so run some benchmarks to confirm under your own version.)
If your query matches multiple items, findAndModify will only act on the first one. In this case, you're querying on a unique id, but be aware of the issue for future use.
findAndModify will return the document after doing its thing, but by default it returns the pre-modification version. If you want the modified one, you need to pass the 'new: true' in your query.
If I've got an environment with multiple instances of the same client connecting to a MongoDB server and I want a simple locking mechanism to ensure single client access for a short time, can I safely use my own lock object?
Say I have one object with a lockState that can be "locked" or "unlocked" and the plan is everyone checks that it is "unlocked" before doing "stuff". To lock the system I say:
db.collection.update( { "lockState": "unlocked" }, { "lockState": "locked" })
(aka UPDATE lockObj SET lockState = 'locked' WHERE lockState = 'unlocked')
If two clients try to lock the system at the same time, is it possible that both clients can end up thinking they "have the lock"?
Both clients find the record by the query parameter of the update
Client 1 updates the record (which is an atomic operation)
update returns success
Client 2 updates the document (it's already found it before client 1 modified it)
update returns success
I realize this is probably a very contrived case that would be very hard to reproduce, but is it possible or does mongo somehow make client 2's update fail?
Alternative approach
Use insert instead of update. insert is atomic and will fail if the document already exists.
To lock the system: db.locks.insert({someId: 27, state: “locked”}).
If the insert succeeds - I've got the lock and since the update was atomic, no one else can have it.
If the insert fails - someone else must have the lock.
If two clients try to lock the system at the same time, is it possible that both clients can end up thinking they "have the lock"?
No, only one client at a time writes to the lock space (Global, Database, Collection or Document depending on your version and configuration) and the operations on that lock space are sequential and one or the other (read or write, not both) per document so that other connections will not mistakenly pick up a document in a inbetween state and think that it is not locked by another client.
All operations on a single document are atomic, whether update or insert.
The Mongo documentation on atomicity and isolation is a tad vague and slightly confusing. I have this document structure and I want to ensure mongo will handle updates while isolating updated from different users.
Assume a collection of product discount offers. When a user redeems an offer, the offers_redeemed field is incremented. The max_allowed_redemptions field controls how many times an offer can be redeemed.
{
offer_id: 99,
description: “Save 25% on Overstock.com…”
max_allowed_redemptions: 10,
offers_redeemed: 3
}
I tested this using findAndModify and it appears to work by updating the offer only if another redemption would be less than or equal to the max allowed; I want to know if this is the correct way to do it and if this would work in a multi user, sharded environment. Is there a scenario where an update to offers_redeemed would force it to exceed max_allowed_redemptions ? , obviously, that would corrupt the record so we need to avoid it.
db.offer.findAndModify({
query:{ offer_id: 99, $where: “this.offers_redeemed + 1 <= this.max_allowed_redemptions”},
update: {$inc: {offers_redeemed: 1},
new: true })
First as the documentation very clearly says
If you don't need to return the document, you can use Update (which can affect multiple documents, as well).
Second, watch the Update if Current strategy on the atomic page. It clearly shows that if the condition applies then the update happens and nothing can come between the two.