Multi-collection, multi-document 'transactions' in MongoDB - mongodb

I realise that MongoDB, by it's very nature, doesn't and probably never will support these kinds of transactions. However, I have found that I do need to use them in a somewhat limited fashion, so I've come up with the following solution, and I'm wondering: is this the best way of doing it, and can it be improved upon? (before I go and implement it in my app!)
Obviously the transaction is controlled via the application (in my case, a Python web app). For each document in this transaction (in any collection), the following fields are added:
'lock_status': bool (true = locked, false = unlocked),
'data_old': dict (of any old values - current values really - that are being changed),
'data_new': dict (of values replacing the old (current) values - should be an identical list to data_old),
'change_complete': bool (true = the update to this specific document has occurred and was successful),
'transaction_id': ObjectId of the parent transaction
In addition, there is a transaction collection which stores documents detailing each transaction in progress. They look like:
{
'_id': ObjectId,
'date_added': datetime,
'status': bool (true = all changes successful, false = in progress),
'collections': array of collection names involved in the transaction
}
And here's the logic of the process. Hopefully it works in such a way that if it's interupted, or fails in some other way, it can be rolled back properly.
1: Set up a transaction document
2: For each document that is affected by this transaction:
Set lock_status to true (to 'lock' the document from being modified)
Set data_old and data_new to their old and new values
Set change_complete to false
Set transaction_id to the ObjectId of the transaction document we just made
3: Perform the update. For each document affected:
Replace any affected fields in that document with the data_new values
Set change_complete to true
4: Set the transaction document's status to true (as all data has been modified successfully)
5: For each document affected by the transaction, do some clean up:
remove the data_old and data_new, as they're no longer needed
set lock_status to false (to unlock the document)
6: Remove the transaction document set up in step 1 (or as suggested, mark it as complete)
I think that logically works in such a way that if it fails at any point, all data can be either rolled back or the transaction can be continued (depending on what you want to do). Obviously all rollback/recovery/etc. is performed by the application and not the database, by using the transaction documents and the documents in the other collections with that transaction_id.
Is there any glaring error in this logic that I've missed or overlooked? Is there a more efficient way of going about it (e.g. less writing/reading from the database)?

As a generic response multi-document commits on MongoDB can be performed as two phase commits, which have been somewhat extensively documented in the manual (See: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/).
The pattern suggested by the manual is briefly to following:
Set up a separate transactions collection, that includes target document, source document, value and state (of the transaction)
Create new transaction object with initial as the state
Start making a transaction and update state to pending
Apply transactions to both documents (target, source)
Update transaction state to committed
Use find to determine whether documents reflect the transaction state, if ok, update transaction state to done
In addition:
You need to manually handle failure scenarios (something didn't happen as described below)
You need to manually implement a rollback, basically by introducing a name state value canceling
Some specific notes for your implementation:
I would discourage you from adding fields like lock_status, data_old, data_new into source/target documents. These should be properties of the transactions, not the documents themselves.
To generalize the concept of target/source documents, I think you could use DBrefs: http://www.mongodb.org/display/DOCS/Database+References
I don't like the idea of deleting transaction documents when they are done. Setting state to done seems like a better idea since this allows you to later debug and find out what kind of transactions have been performed. I'm pretty sure you won't run out of disk space either (and for this there are solutions as well).
In your model how do you guarantee that everything has been changed as expected? Do you inspect the changes somehow?

MongoDB 4.0 adds support for multi-document ACID transactions.
Java Example:
try (ClientSession clientSession = client.startSession()) {
clientSession.startTransaction();
collection.insertOne(clientSession, docOne);
collection.insertOne(clientSession, docTwo);
clientSession.commitTransaction();
}
Note, it works for replica set. You can still have a replica set with one node and run it on local machine.
https://stackoverflow.com/a/51396785/4587961
https://docs.mongodb.com/manual/tutorial/deploy-replica-set-for-testing/

MongoDB 4.0 is adding (multi-collection) multi-document transactions: link

Related

MongoDB - select document for update - without another operation modifying the same document after the select

I've got a document that needs to be read and updated. Meanwhile, it's quite likely that another process is doing the same which would break the document update.
For example, if Process A reads document d and adds field 'a' to it and writes the document, and Process B reads document d before Process A writes it, and adds field b and writes the document, then whichever process writes the changes out will get their change because it clobbers the change by the one that wrote first.
I've read this article and some other very complicated transaction articles around mongo. Can someone describe a simple solution to this - I have not come across something that makes me comfortable with this yet.
https://www.mongodb.com/blog/post/how-to-select--for-update-inside-mongodb-transactions
[UPDATE]- In addition, I'm trying to augment a document that might not yet exist. I need to create the document if it doesn't exist. I also need to read it to analyze it. One key is "relatedIds" (an array). I push to that array if the id is not found in it. Another method I have that needs to create the document if it doesn't exist adds to a separate collection of objects.
[ANOTHER UPDATE x2] --> From what I've been reading and getting from various sources - is that the only way to properly create a transaction for this - is to "findOneAndModify" the document to mark it as dirty with some field that will definitely update, such as "lock" with an objectId (since that will never result in a NO-OP - ie, it definitely causes a change).
If another operation tries to write to it, Mongo can now detect that this record is already part of a transaction.
Thus anything that writes to it will cause a writeError on that other operation. My transaction can then slowly work on that record and have a lock on it. When it writes it out and commits, that record is definitely not touched by anything else. If there's no way to do this without a transaction for some reason, then am I creating the transaction in the easiest way here?
Using Mongo's transactions is the "proper" way to go but i'll offer a simple solution that is sufficient ( with some caveats ).
The simplest solution would be to use findOneAndUpdate to read the document and update a new field, let's call it status, since it is atomic this is possible.
the query would look like so:
const doc = await db.collection.findOneAndUpdate(
{
_id: docId,
status: { $ne: 'processing' }
},
{
$set: {
status: 'processing'
}
}
);
so if dov.value is null then it means (assuming the document exists) that another process is processing it. When you finish processing you just need to reset status to be any other value.
Now because you are inherently locking this document from being read until the process finishes you have to make sure that you handle cases like an error thrown throughout the process, update failure, db connection issue's, etc .
Overall I would be cautious about using this approach as it will only "lock" the document for the "proper" queries ( every single process needs to be updated to use the status field ), which is a little problematic, depending on your usecase.

Tracking who done what; mongo Atomicity when deleting by filter

I've been implementing an auditing system for mongo that tracks call and user information for each mongo transaction.
IE user bill
made a call to x endpoint
at y time
and changed z field from foo to bar
inserts and updates are easy because I tie a stored call info object to any objects updated in that call. (through a set property or updating the property directly on a replace or upsert call.)
all that works great.
Deletes are a hairy beast though.
when I delete by id I can easily track that information. BUT when I delete by filter
IE delete from users where username like bill.
mongo doesn't return the deleted ids back in the response. if I query to get those objects before I delete them who knows what could happen between the time I get those objects and when I actually delete them.
(Knock Knock, race condition. who's there?)
any ideas on how to keep the atomicity of that delete and have a reliable way to tie that delete call to the delete transaction?

Conditional update for MongoDB (Meteor)

TLDR: Is there a way I can conditionally update a Meteor Mongo record inside a collection, so that if I use the id as a selector, I want to update if that matches and only if the revision number is greater than what already exists, or perform an upsert if there is no id match?
I am having an issue with updates to server side Meteor Mongo collections, whereby it seems the added() function callback in the Observers is being triggered on an upsert.
Here is what I am trying to do in a nutshell.
My meteor js app boots and then connects to an endpoint, fetching data and then upserting it into the collection.
collection.update({'sys.id': item.sys.id}, item, {upsert: true});
The 'sys.id' selector checks to see if the item exists, and then updates if it does or adds if it does not.
I have an observer monitoring the above collection, which then acts when an item has been added/updated to the collection.
collection.find({}).observeChanges({
added: this.itemAdded.bind(this),
changed: this.itemChanged.bind(this),
removed: this.itemRemoved.bind(this)
});
The first thing that puzzles me is that when the app is closed and then booted again, the 'added()' callback is fired when the collection is observed. What I would hope to happen is that the changed() callback is fired.
Going back to my original update - is it possible in Mongo to conditionally update something, so you have the selector, then the item, but only perform the update when another condition is met?
// Incoming item
var item = {
sys: {
id: 1,
revision: 5
}
};
collection.update({'sys.id': item.sys.id, 'sys.revision': {$gt: item.sys.revision}, item, {upsert: true});
If you look at the above code, what this is going to do is try to match the sys.id which is fine, but then the revisions will of course be different which means the update function will see it as a different document and then perform a new insert, thus creating duplicate data.
How do I fix this?
To your main question:
What you want is called findAndModify. First, look for the the document meeting the specs, and then update accordingly. This is a really powerful idea because if you did it in 2 queries, the document you found could be deleted/updated before you got to update it. Luckily for you, someone made a package (I really wish this existed a year ago!) https://github.com/fongandrew/meteor-find-and-modify
If you were to do this without using findAndModify you'd have to use javascript to find the doc, see if it matches your criteria, and then update it. In your use case, this would probably work, but there will always be that "what if" in the back of your mind.
Regarding observeChanges, the added is called each time the local minimongo receives a document (it's just reading what the DDP is telling it). Since a refresh will delete your local collection, you have to add those docs one by one. What you could do is wait until all added callbacks have fired, and then run your server method. In doing so, you get a ton of adds, and then a couple more changes will trickle in afterwards.
As Matt K said, you want findAndModify. There are some gotchas to be aware of:
findAndModify is about 100x slower than a find followed by an update. Find+modify is, obviously, not atomic and so won't do what you need, but be aware of the speed hit. (This is based off experience with MongoDB v2.4, so run some benchmarks to confirm under your own version.)
If your query matches multiple items, findAndModify will only act on the first one. In this case, you're querying on a unique id, but be aware of the issue for future use.
findAndModify will return the document after doing its thing, but by default it returns the pre-modification version. If you want the modified one, you need to pass the 'new: true' in your query.

Mongo transactions and updates

If I've got an environment with multiple instances of the same client connecting to a MongoDB server and I want a simple locking mechanism to ensure single client access for a short time, can I safely use my own lock object?
Say I have one object with a lockState that can be "locked" or "unlocked" and the plan is everyone checks that it is "unlocked" before doing "stuff". To lock the system I say:
db.collection.update( { "lockState": "unlocked" }, { "lockState": "locked" })
(aka UPDATE lockObj SET lockState = 'locked' WHERE lockState = 'unlocked')
If two clients try to lock the system at the same time, is it possible that both clients can end up thinking they "have the lock"?
Both clients find the record by the query parameter of the update
Client 1 updates the record (which is an atomic operation)
update returns success
Client 2 updates the document (it's already found it before client 1 modified it)
update returns success
I realize this is probably a very contrived case that would be very hard to reproduce, but is it possible or does mongo somehow make client 2's update fail?
Alternative approach
Use insert instead of update. insert is atomic and will fail if the document already exists.
To lock the system: db.locks.insert({someId: 27, state: “locked”}).
If the insert succeeds - I've got the lock and since the update was atomic, no one else can have it.
If the insert fails - someone else must have the lock.
If two clients try to lock the system at the same time, is it possible that both clients can end up thinking they "have the lock"?
No, only one client at a time writes to the lock space (Global, Database, Collection or Document depending on your version and configuration) and the operations on that lock space are sequential and one or the other (read or write, not both) per document so that other connections will not mistakenly pick up a document in a inbetween state and think that it is not locked by another client.
All operations on a single document are atomic, whether update or insert.

MongoDB critical section

I need to perform a few operations (read and writes) on my mongodb without having another process interrupting. It's for an online game and when a user sends resources to another the following steps are performed:
Check his resource value
Abort if it's not enough
Insert a resource transaction
Decrement his resource value
Increment the other ones resource value
I'm concerned that while checking if its enough or inserting the resource transaction some other transaction has already been inserted and the values become invalid. How can I make sure that this part is executed in this order?
I can see two ways:
Use client side transactions to hold a "lock": http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/
Or use versioning here whereby you hold a field with a $inc'd version number which gets updated every time you save and must be queried by whenever you go to save. A good example is within Vermongo: https://github.com/thiloplanz/v7files/wiki/Vermongo
Those seem to be the two most plausible ways I see of getting this done.
Transaction is a almost forbidden word when talking about mongo. But you can perform steps 1,2 and 4 using a atomic uptade with $inc using resource value as condition, and then perform steps 3 and 5. You will not have support for rolling back on step if next steps fails.
I am an engineer at Tokutek
TokuMX is a MongoDB replacement server that uses the same protocol and drivers and supports native multi-statement transactions on non-sharded setups. What you want can be accomplished with a serializable transaction, which will take document-level locks on documents you touch. This would be done something like
> db.beginTransaction("serializable");
> if (resourcesInsufficient()) { db.rollbackTransaction(); }
> // insert and update
> db.commitTransaction()
Again, this is not supported in sharding but may be useful for your application. More details, features and limitations are discussed here.