I am looking at the CouchBase Dev Guide and trying to understand how Two Phase Commits work. I feel like the code example they give differs from the diagram.
Code Example
They link to a Ruby Gist that describes how you would do this transferring points from one account to another.
My understanding is that they go down the following route:
Change transaction state to pending
Update the first account by altering points and adding reference to transaction
Update the second account by altering points and adding reference to transaction
Change transaction state to committed
Remove reference of transaction from first account
Remove reference of transaction from second account
Change the transaction state to done
In this example, if there was a failure between steps 2 and 3, we could rollback by reversing the changed to points to any account which has a reference to a transaction.
Diagram Example
This is the diagram given to explain two phase locking; I think it disagrees with the code example...
The diagram seems to say that you add a reference to transaction to both accounts, then you add/remove points from both accounts.
In this example, if there was a failure between steps 3 and 4, how would you know what to roll back? How would you know if you had applied the changed to points or not?
Is the diagram wrong?
Yeah, you are right. The diagram step 3 should also apply balance changes at the same time as adding transaction to the list. Basically adding transaction to the list is just a sign that some changes were applied to the balance.
btw in the link to original gist you can find more complete executable solution, with rollback code: https://gist.github.com/avsej/3136027
Related
i am new to pre and post deployment
To understand this i came across this:
“”When databases are created or upgraded, data may need to be added, changed, or deleted. Moreover, certain actions may have to occur on the database before and/or after the process completes. Deployment scripts can be used to accomplish this.””
I want to understand how this exactly works with an example
https://www.mssqltips.com/sqlservertutorial/3006/working-with-pre-and-post-deployment-scripts/
As pointed out in the site, a good example of a post deployment step is insertion of seed data.
For instance, you create a new currency table as part of the schema migration step. Then you insert the most commonly used currencies (say USD, EUR, etc.) so that they don't have to be inserted with a manual step.
Another example of post deployment step is populating data for a newly added column. For example you add a new column called IsPremium to the Customers table and want to set all customers with a start date > 5 years as true. A post deployment script is good place to do that.
Similarly scripts that run before the migration go into pre-deployment scripts. One example is locking certain table to ensure that the migration script is run only once, or setting a flag to indicate a migration is in progress.
I'm planning on using rxdb + hasura/postgresql in the backend. I'm reading this rxdb page for example, which off the bat requires sync-able entities to have a deleted flag.
Q1 (main question)
Is there ANY point at which I can finally hard-delete these entities? What conditions would have to be met - eg could I simply use "older than X months" and then force my app to only ever displays data for less than X months?
Is such a hard-delete, if possible, best carried out directly in the central db, since it will be the source of truth? Would there be any repercussions client-side that I'm not foreseeing/understanding?
I foresee the number of deleted's growing rapidly in my app and i don't want to have to store all this extra data forever.
Q2 (bonus / just curious)
What is the (algorithmic) basis for needing a 'deleted' flag? Is it that it's just faster to check a flag rather than to check for the omission of an object from, say, a very large list. I apologize if it's kind of a stupid question :(
Ultimately it comes down to a decision that's informed by your particular business/product with regards to how long you want to keep deleted entities in your system. For some applications it's important to always keep a history of deleted things or even individual revisions to records stored as a kind of ledger or history. You'll have to make a judgement call as to how long you want to keep your deleted entities.
I'd recommend that you also add a deleted_at column if you haven't already and then you could easily leverage something like Hasura's new Scheduled Triggers functionality to run a recurring job that fully deletes records older than whatever your threshold is.
You could also leverage Hasura's permissions system to ensure that rows that have been deleted aren't returned to the client. There is documentation and examples for ways to work with soft deletes and Hasura
For your second question it is definitely much faster to check for the deleted flag on records than to have to try and diff the entire dataset looking for things that are now missing.
Following is a use case in a workflow system
Work order enters into a system. Work order will have a target which goes through different workflow states before completing a work order.
Say Work order for a target Vehicle came into a system - workflow for this work oder involves 2 tasks say
a)wash vehicle
b)inspect vehicle
Say wash vehicle workflow task changes vehicle attribute from "not washed" to "washed". And say "inspect vehicle" workflow task changes vehicle attribute "not inspected" to "inspection done"
If user is pulling work order data user will always see latest vehicle data (in this example assuming both workflow tasks are completed user will see value "washed" and "inspection done". However when user pulls ONLY workflow Task Wash Vehicle data -> user will see "washed" -Though second task was done, workflow Task 1 will only see that that it modified. Getting data for Workflow Task 2 will see both "washed" and "inspection done"
This involves milstoning (audit trail) of data; One approach is as shown below image - where when workflow task modifies data it'll update version number, modified_ts and maintain that version number in it's own data row (via a JOIN table as depicted below). Basically this is nothing but maintaining a reference to a history record for workflow task data so when pulling workflow task data it knows which history record to pull back. please ignore parent_id and other notes, noise in a below picture. it's not relevant for this question.
I am thinking event sourcing will also be another alternative design - however don't want to apply event sourcing(or any other similar solution) as a whole sale solution but only for this particular use case (affecting only 3 or so tables where audit trail matters). I am trying to evaluate if CQRS/Event sourcing is a right fit as a partial solution (again only limited to 3-4 tables which need to preserve history/audit trail data) or ES/CQRS will be an overkill? any other thoughts?
P.S. Though this isn't related to Scala - Scala is a platform we are using hence tagging it to see if there are language specific solutions that can help. tagging Akka for finding out if ES/CQRS via Akka persistence is an option or not. Postgresql is a db - And DB triggers is not a solution I am looking for.
I know about prepared transaction in Postgres, but seems you can just commit or rollback it later. You cannot even view the transaction's db state before you've committed it. Is any way to save transaction for later use?
What I want to achieve actually is a preview (and correcting) of some changes in db (changes are imports from csv file, so user need to see preview before apply it). I want to make changes, add some changes later, see full state of db and apply it (certainly, commit transaction)
I cannot find a very good reference in docs, but I have a very strong feeling that the answer is: No, you cannot do that.
It would mean that when you "save" the transaction, the database would basically have to maintain all of its locks in place for an indefinite amount of time. Even if it was possible, it would mean horrible failure modes and trouble on all fronts.
For the pattern that you are describing, I would use two separate transactions. Import to a staging table and show that to user (or import to the main table but mark rows as "unapproved"). If user approves, in another transactions move or update these rows.
You can always end up in a situation where user can simply leave or crash without clicking "OK" or "Cancel". If what you're describing was possible, you would end up with a hung transaction holding all these resources. In my proposed solution you end up with wasteful rows in "staging" table that you may still show to user later or remove.
You may want to read up on persistence saga. This is actually a very simple example of a well known and researched problem.
To make the long story short, this pattern breaks down a long-running process like yours into smaller operations that are applied and persisted in some way in separate transactions. If any of them happens to fail (or does not occur as expected), you have compensating actions that usually undo what the steps executed so far have done (e.g. by throwing away stale/irrelevant data).
Here's a decent introduction:
https://blog.couchbase.com/saga-pattern-implement-business-transactions-using-microservices-part/#:~:text=The%20SAGA%20Pattern,completion%20of%20the%20previous%20one.
http://vasters.com/clemensv/2012/09/01/Sagas.aspx
This concept was formally introduced in the 80s, but is well alive and relevant today.
I keep seeing questions floating through that make reference to a column in a database table named something like DateLastUpdated. I don't get it.
The only companion field I've ever seen is LastUpdateUserId or such. There's never an indicator about why the update took place; or even what the update was.
On top of that, this field is sometimes written from within a trigger, where even less context is available.
It certainly doesn't even come close to being an audit trail; so that can't be the justification. And if there is and audit trail somewhere in a log or whatever, this field would be redundant.
What am I missing? Why is this pattern so popular?
Such a field can be used to detect whether there are conflicting edits made by different processes. When you retrieve a record from the database, you get the previous DateLastUpdated field. After making changes to other fields, you submit the record back to the database layer. The database layer checks that the DateLastUpdated you submit matches the one still in the database. If it matches, then the update is performed (and DateLastUpdated is updated to the current time). However, if it does not match, then some other process has changed the record in the meantime and the current update can be aborted.
It depends on the exact circumstance, but a timestamp like that can be very useful for autogenerated data - you can figure out if something needs to be recalculated if a depedency has changed later on (this is how build systems calculate which files need to be recompiled).
Also, many websites will have data marking "Last changed" on a page, particularly news sites that may edit content. The exact reason isn't necessary (and there likely exist backups in case an audit trail is really necessary), but this data needs to be visible to the end user.
These sorts of things are typically used for business applications where user action is required to initiate the update. Typically, there will be some kind of business app (eg a CRM desktop application) and for most updates there tends to be only one way of making the update.
If you're looking at address data, that was done through the "Maintain Address" screen, etc.
Such database auditing is there to augment business-level auditing, not to replace it. Call centres will sometimes (or always in the case of financial services providers in Australia, as one example) record phone calls. That's part of the audit trail too but doesn't tend to be part of the IT solution as far as the desktop application (and related infrastructure) goes, although that is by no means a hard and fast rule.
Call centre staff will also typically have some sort of "Notes" or "Log" functionality where they can type freeform text as to why the customer called and what action was taken so the next operator can pick up where they left off when the customer rings back.
Triggers will often be used to record exactly what was changed (eg writing the old record to an audit table). The purpose of all this is that with all the information (the notes, recorded call, database audit trail and logs) the previous state of the data can be reconstructed as can the resulting action. This may be to find/resolve bugs in the system or simply as a conflict resolution process with the customer.
It is certainly popular - rails for example has a shorthand for it, as well as a creation timestamp (:timestamps).
At the application level it's very useful, as the same pattern is very common in views - look at the questions here for example (answered 56 secs ago, etc).
It can also be used retrospectively in reporting to generate stats (e.g. what is the growth curve of the number of records in the DB).
there are a couple of scenarios
Let's say you have an address table for your customers
you have your CRM app, the customer calls that his address has changed a month ago, with the LastUpdate column you can see that this row for this customer hasn't been touched in 4 months
usually you use triggers to populate a history table so that you can see all the other history, if you see that the creationdate and updated date are the same there is no point hitting the history table since you won't find anything
you calculate indexes (stock market), you can easily see that it was recalculated just by looking at this column
there are 2 DB servers, by comparing the date column you can find out if all the changes have been replicated or not etc etc ect
This is also very useful if you have to send feeds out to clients that are delta feeds, that is only the records that have been changed or inserted since the data of the last feed are sent.