How doest Trello store generated actions from updates to other documents (boards, cards) in MongoDB without atomic transactions? - mongodb

I'm developing a single page web app that will use a NoSQL Document Database (like MongoDB) and I want to generate events when I make a change to my entities.
Since most of these databases support transactions only on a document level (MongoDB just added ASIC support) there is no good way to store changes in one document and then store events from those changes to other documents.
Let's say for example that I have a collection 'Events' and a collection 'Cards' like Trello does. When I make a change to the description of a card from the 'Cards' collection, an event 'CardDescriptionChanged' should be generated.
The problem is that if there is a crash or some error between saving the changes to the 'Cards' collection and adding the event in the 'Events' collection this event will not be persisted and I don't want that.
I've done some research on this issue and most people would suggest that one of several approaches can be used:
Do not use MongoDB, use SQL database instead (I don't want that)
Use Event Sourcing. (This introduces complexity and I want to clear older events at some point, so I don't want to keep all events stored. I now that I can use snapshots and delete older events from the snapshot point, but there is a complexity in this solution)
Since errors of this nature probably won't happen too often, just ignore them and risk having events that won't be saved (I don't want that too)
Use an event/command/action processor. Store commands/action like 'ChangeCardDescription' and use a Processor that will process them and update the entities.
I have considered option 4, but a couple of question occurs:
How do I manage concurrency?
I can queue all commands for the same entity (like a card or a board) and make sure that they are processed sequentially, while events for different entities (different cards) can be processed in parallel. Then I can use processed commands as events. One problem here is that changes to an entity may generate several events that may not correspond to a single command. I will have to break down to very fine-grained commands all user actions so I can then translate them to events.
Error reporting and error handling.
If this process is asynchronous, I have to manage error reporting to the client. And also I have to remove or mark commands that failed.
I still have the problem with marking the commands as processed, as there are no transactions. I know I have to make processing of commands idempotent to resolve this problem.
Since Trello used MongoDB and generates actions ('DeleteCardAction', 'CreateCardAction') with changes to entities (Cards, Boards..) I was wondering how do they solve this problem?

Create a new collection called FutureUpdates. Write planned updates to the FutureUpdates collection with a single document defining the changes you plan to make to cards and the events you plan to generate. This insert will be atomic.
Now take a [ChangeStream][1] of the FutureUpdates collection this will give you the stream of updates you need to make. Take each doc from the change stream and apply the updates. Finally, update the doc in FutureUpdates to mark it as complete. Again this update will be atomic.
When you apply the updates to Events and Cards make sure to include the objectID of the doc used to create the update in FutureUpdates.
Now if the program crashes after inserting the update in FutureUpdates you can check the Events and Cards collections for the existence of records containing the objectID of the update. If they are not present then you can reapply the missing updates.
If the updates have been applied but the FutureUpdate doc is not marked as complete we can update that during recovery to complete the process.
Effectively you are continuously atomically updating a doc for each change in FutureUpdates to track progress. Once an update is complete you can archive the old docs or just delete them.

Related

Data syncing with pouchdb-based systems client-side: is there a workaround to the 'deleted' flag?

I'm planning on using rxdb + hasura/postgresql in the backend. I'm reading this rxdb page for example, which off the bat requires sync-able entities to have a deleted flag.
Q1 (main question)
Is there ANY point at which I can finally hard-delete these entities? What conditions would have to be met - eg could I simply use "older than X months" and then force my app to only ever displays data for less than X months?
Is such a hard-delete, if possible, best carried out directly in the central db, since it will be the source of truth? Would there be any repercussions client-side that I'm not foreseeing/understanding?
I foresee the number of deleted's growing rapidly in my app and i don't want to have to store all this extra data forever.
Q2 (bonus / just curious)
What is the (algorithmic) basis for needing a 'deleted' flag? Is it that it's just faster to check a flag rather than to check for the omission of an object from, say, a very large list. I apologize if it's kind of a stupid question :(
Ultimately it comes down to a decision that's informed by your particular business/product with regards to how long you want to keep deleted entities in your system. For some applications it's important to always keep a history of deleted things or even individual revisions to records stored as a kind of ledger or history. You'll have to make a judgement call as to how long you want to keep your deleted entities.
I'd recommend that you also add a deleted_at column if you haven't already and then you could easily leverage something like Hasura's new Scheduled Triggers functionality to run a recurring job that fully deletes records older than whatever your threshold is.
You could also leverage Hasura's permissions system to ensure that rows that have been deleted aren't returned to the client. There is documentation and examples for ways to work with soft deletes and Hasura
For your second question it is definitely much faster to check for the deleted flag on records than to have to try and diff the entire dataset looking for things that are now missing.

Should I update read model on each event

I have Customer read model that needs to be updated after NewOrderEvent.
One thing i want to understand, should i update my read model on every event. Or i need to replay all events and replace read model.
What im doing now is:
Saving NewOrderEvent
Getting or creating Customer read model
Invoking Customer.ApplyEvent(NewOrderEvent) that changes Customer state.
Saving Customer read model
Am i missing something?
Usually yes, you want to update the read model every time you have an event. But, it's just a simple CRUD operation, a db update. The replaying of events is done when you want to (re)generate a new read model, because you could have millions of events and could be a very long running operation.
Btw, the apply stuff should be reserved for command model only, in order to avoid confusion. You apply events to a domain aggregate root (entity), but you use an event as the source of data for read model updates.
Looks good to me. You may decide to replay the stream of events in order to recreate the read model only if you introducing something new to it.
Some people rebuild read models whenever the schema changes, but in many cases you can use migrations for that. Really depends on your application.

Could I save Postgres transaction and continue work with db within it later

I know about prepared transaction in Postgres, but seems you can just commit or rollback it later. You cannot even view the transaction's db state before you've committed it. Is any way to save transaction for later use?
What I want to achieve actually is a preview (and correcting) of some changes in db (changes are imports from csv file, so user need to see preview before apply it). I want to make changes, add some changes later, see full state of db and apply it (certainly, commit transaction)
I cannot find a very good reference in docs, but I have a very strong feeling that the answer is: No, you cannot do that.
It would mean that when you "save" the transaction, the database would basically have to maintain all of its locks in place for an indefinite amount of time. Even if it was possible, it would mean horrible failure modes and trouble on all fronts.
For the pattern that you are describing, I would use two separate transactions. Import to a staging table and show that to user (or import to the main table but mark rows as "unapproved"). If user approves, in another transactions move or update these rows.
You can always end up in a situation where user can simply leave or crash without clicking "OK" or "Cancel". If what you're describing was possible, you would end up with a hung transaction holding all these resources. In my proposed solution you end up with wasteful rows in "staging" table that you may still show to user later or remove.
You may want to read up on persistence saga. This is actually a very simple example of a well known and researched problem.
To make the long story short, this pattern breaks down a long-running process like yours into smaller operations that are applied and persisted in some way in separate transactions. If any of them happens to fail (or does not occur as expected), you have compensating actions that usually undo what the steps executed so far have done (e.g. by throwing away stale/irrelevant data).
Here's a decent introduction:
https://blog.couchbase.com/saga-pattern-implement-business-transactions-using-microservices-part/#:~:text=The%20SAGA%20Pattern,completion%20of%20the%20previous%20one.
http://vasters.com/clemensv/2012/09/01/Sagas.aspx
This concept was formally introduced in the 80s, but is well alive and relevant today.

What is the proper way to keep track of updates in progress using MondoDB?

I have a collection with a bunch of documents representing various items. Once in a while, I need to update item properties, but the update takes some time. When properties are updated, the item gets a new timestamp for when it was modified. If I run updates one at a time, then there is no problem. However, if I want to run multiple update processes simultaneously, it's possible that one process starts updating the item, but the next process still sees the item as needing an update and starts updating it as well.
One solution is to mark the item as soon as it is retrieved for update (findAndModify), but it seems wasteful to add a whole extra field to every document just to keep track of items currently being updated.
This should be a very common issue. Maybe there are some built-in functions that exist to address it? If not, is there a standard established method to deal with it?
I apologize if this has been addressed before, but I am having a hard time finding this information. I may just be using the wrong terms.
You could use db.currentOp() to check if an update is already in flight.

Meteor batch update

I'm using meteor. I'm wondering if theres a shorthand way to do batch updates before the DOM is updated.
for instance I want to update some records,more than one (All at once):
Collection.update(id1,{..})
Collection.update(id2,{..})
Collection.update(id3,{..})
The problem is there are 3 items being updated separately. So when the DOM in my case was being redrawn 3 times instead of once (with all 3 updated records).
Is there a way to hold off the ui updating until all of them are updated?
Mongo's update can modify more than one document at a time. Just give it a selector that matches more than one document, and set the multi option. In your case, that's just a list of IDs, but you can use any selector.
Collection.update({_id: {$in: [id1, id2, id3]}}, {...}, {multi:true});
This will run a single DB update and a single redraw.
Execute them on the server instead, that way they might be synchronously done such that they are less likely to cause multiple DOM updates on the client.
See the first two and last interesting code bits, which explain how to protect your clients from messing with the database as well as how to define methods on the server and call them from the client.