Tracking who done what; mongo Atomicity when deleting by filter - mongodb

I've been implementing an auditing system for mongo that tracks call and user information for each mongo transaction.
IE user bill
made a call to x endpoint
at y time
and changed z field from foo to bar
inserts and updates are easy because I tie a stored call info object to any objects updated in that call. (through a set property or updating the property directly on a replace or upsert call.)
all that works great.
Deletes are a hairy beast though.
when I delete by id I can easily track that information. BUT when I delete by filter
IE delete from users where username like bill.
mongo doesn't return the deleted ids back in the response. if I query to get those objects before I delete them who knows what could happen between the time I get those objects and when I actually delete them.
(Knock Knock, race condition. who's there?)
any ideas on how to keep the atomicity of that delete and have a reliable way to tie that delete call to the delete transaction?

Related

MongoDB - select document for update - without another operation modifying the same document after the select

I've got a document that needs to be read and updated. Meanwhile, it's quite likely that another process is doing the same which would break the document update.
For example, if Process A reads document d and adds field 'a' to it and writes the document, and Process B reads document d before Process A writes it, and adds field b and writes the document, then whichever process writes the changes out will get their change because it clobbers the change by the one that wrote first.
I've read this article and some other very complicated transaction articles around mongo. Can someone describe a simple solution to this - I have not come across something that makes me comfortable with this yet.
https://www.mongodb.com/blog/post/how-to-select--for-update-inside-mongodb-transactions
[UPDATE]- In addition, I'm trying to augment a document that might not yet exist. I need to create the document if it doesn't exist. I also need to read it to analyze it. One key is "relatedIds" (an array). I push to that array if the id is not found in it. Another method I have that needs to create the document if it doesn't exist adds to a separate collection of objects.
[ANOTHER UPDATE x2] --> From what I've been reading and getting from various sources - is that the only way to properly create a transaction for this - is to "findOneAndModify" the document to mark it as dirty with some field that will definitely update, such as "lock" with an objectId (since that will never result in a NO-OP - ie, it definitely causes a change).
If another operation tries to write to it, Mongo can now detect that this record is already part of a transaction.
Thus anything that writes to it will cause a writeError on that other operation. My transaction can then slowly work on that record and have a lock on it. When it writes it out and commits, that record is definitely not touched by anything else. If there's no way to do this without a transaction for some reason, then am I creating the transaction in the easiest way here?
Using Mongo's transactions is the "proper" way to go but i'll offer a simple solution that is sufficient ( with some caveats ).
The simplest solution would be to use findOneAndUpdate to read the document and update a new field, let's call it status, since it is atomic this is possible.
the query would look like so:
const doc = await db.collection.findOneAndUpdate(
{
_id: docId,
status: { $ne: 'processing' }
},
{
$set: {
status: 'processing'
}
}
);
so if dov.value is null then it means (assuming the document exists) that another process is processing it. When you finish processing you just need to reset status to be any other value.
Now because you are inherently locking this document from being read until the process finishes you have to make sure that you handle cases like an error thrown throughout the process, update failure, db connection issue's, etc .
Overall I would be cautious about using this approach as it will only "lock" the document for the "proper" queries ( every single process needs to be updated to use the status field ), which is a little problematic, depending on your usecase.

The correct HTTP method for resetting data

I want to "reset" certain data in my database tables using a RESTful method but I'm not sure which one I should be using. UPDATE because I'm deleting records and updating the record where the ID is referenced (but never removing the record where ID lives itself), or DELETE because my main action is to delete associated records and updating is tracking those changes?
I suppose this action can be ran multiple times but the result will be the same, however the ID will always be found when "resetting".
I think you want the DELETE method

Update associated field in Waterline/Sails without affecting other fields

I notice that multiple requests to a record causes writes to be possibly overwritten. I am using Mongo btw.
I have a schema like:
Trip { id, status, tagged_friends }
where tagged_friends is an association to Users collection
When I make 2 calls to update trips in close succession (in this case I am making 2 API calls from client - actually automated tests), its possible for them to interfere. Since they all call trip.save().
Update 1: update the tagged_friends association
Update 2: update the status field
So I am thinking these 2 updates should only save the "dirty" fields. I think I can do that with Trips.update() rather than trip.save()? But problem is I cannot use update to update an association? That does not appear to work?
Or perhaps there's a better way to do this?

Multi-collection, multi-document 'transactions' in MongoDB

I realise that MongoDB, by it's very nature, doesn't and probably never will support these kinds of transactions. However, I have found that I do need to use them in a somewhat limited fashion, so I've come up with the following solution, and I'm wondering: is this the best way of doing it, and can it be improved upon? (before I go and implement it in my app!)
Obviously the transaction is controlled via the application (in my case, a Python web app). For each document in this transaction (in any collection), the following fields are added:
'lock_status': bool (true = locked, false = unlocked),
'data_old': dict (of any old values - current values really - that are being changed),
'data_new': dict (of values replacing the old (current) values - should be an identical list to data_old),
'change_complete': bool (true = the update to this specific document has occurred and was successful),
'transaction_id': ObjectId of the parent transaction
In addition, there is a transaction collection which stores documents detailing each transaction in progress. They look like:
{
'_id': ObjectId,
'date_added': datetime,
'status': bool (true = all changes successful, false = in progress),
'collections': array of collection names involved in the transaction
}
And here's the logic of the process. Hopefully it works in such a way that if it's interupted, or fails in some other way, it can be rolled back properly.
1: Set up a transaction document
2: For each document that is affected by this transaction:
Set lock_status to true (to 'lock' the document from being modified)
Set data_old and data_new to their old and new values
Set change_complete to false
Set transaction_id to the ObjectId of the transaction document we just made
3: Perform the update. For each document affected:
Replace any affected fields in that document with the data_new values
Set change_complete to true
4: Set the transaction document's status to true (as all data has been modified successfully)
5: For each document affected by the transaction, do some clean up:
remove the data_old and data_new, as they're no longer needed
set lock_status to false (to unlock the document)
6: Remove the transaction document set up in step 1 (or as suggested, mark it as complete)
I think that logically works in such a way that if it fails at any point, all data can be either rolled back or the transaction can be continued (depending on what you want to do). Obviously all rollback/recovery/etc. is performed by the application and not the database, by using the transaction documents and the documents in the other collections with that transaction_id.
Is there any glaring error in this logic that I've missed or overlooked? Is there a more efficient way of going about it (e.g. less writing/reading from the database)?
As a generic response multi-document commits on MongoDB can be performed as two phase commits, which have been somewhat extensively documented in the manual (See: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/).
The pattern suggested by the manual is briefly to following:
Set up a separate transactions collection, that includes target document, source document, value and state (of the transaction)
Create new transaction object with initial as the state
Start making a transaction and update state to pending
Apply transactions to both documents (target, source)
Update transaction state to committed
Use find to determine whether documents reflect the transaction state, if ok, update transaction state to done
In addition:
You need to manually handle failure scenarios (something didn't happen as described below)
You need to manually implement a rollback, basically by introducing a name state value canceling
Some specific notes for your implementation:
I would discourage you from adding fields like lock_status, data_old, data_new into source/target documents. These should be properties of the transactions, not the documents themselves.
To generalize the concept of target/source documents, I think you could use DBrefs: http://www.mongodb.org/display/DOCS/Database+References
I don't like the idea of deleting transaction documents when they are done. Setting state to done seems like a better idea since this allows you to later debug and find out what kind of transactions have been performed. I'm pretty sure you won't run out of disk space either (and for this there are solutions as well).
In your model how do you guarantee that everything has been changed as expected? Do you inspect the changes somehow?
MongoDB 4.0 adds support for multi-document ACID transactions.
Java Example:
try (ClientSession clientSession = client.startSession()) {
clientSession.startTransaction();
collection.insertOne(clientSession, docOne);
collection.insertOne(clientSession, docTwo);
clientSession.commitTransaction();
}
Note, it works for replica set. You can still have a replica set with one node and run it on local machine.
https://stackoverflow.com/a/51396785/4587961
https://docs.mongodb.com/manual/tutorial/deploy-replica-set-for-testing/
MongoDB 4.0 is adding (multi-collection) multi-document transactions: link

best practice when updating records using openJPA

I am wondering what the best practice would be for updating a record using JPA? I currently have devised my own pattern, but I suspect it is by no means the best practice. What I do is essentially look to see if the record is in the db, if I don't find it, I call the enityManager.persist(object<T>) method. if it does exist I call the entityManager.Merge(Object<T>) method.
The reason that I ask, is that I found out that the the merge method looks to see if the record is in the database allready, and if it is not in the db, then it proceeds to add it, if it is, it makes the changes necessary. Also, do you need to nestle the merge call in getTransaction().begin() and getTransaction.commit()? Here is what I have so far...
try{
launchRet = emf.find(QuickLaunch.class, launch.getQuickLaunchId());
if(launchRet!=null){
launchRet = emf.merge(launch);
}
else{
emf.getTransaction().begin();
emf.persist(launch);
emf.getTransaction().commit();
}
}
If the entity you're trying to save already has an ID, then it must exist in the database. If it doesn't exist, you probably don't want to blindly recreate it, because it means that someone else has deleted the entity, and updating it doesn't make much sense.
The merge() method persists an entity that is not persistent yet (doesn't have an ID or version), and updates the entity if it is persistent. You thus don't need to do anything other than calling merge() (and returning the value returned by this call to merge()).
A transaction is a functional atomic unit of work. It should be demarcated at a higher level (in the service layer). For example, transfering money from an account to another needs both account updates to be done in the same transaction, to make sure both changes either succeed or fail. Removing money from one account and failing to add it to the other would be a major bug.