building audit trail functionality - postgresql

Following is a use case in a workflow system
Work order enters into a system. Work order will have a target which goes through different workflow states before completing a work order.
Say Work order for a target Vehicle came into a system - workflow for this work oder involves 2 tasks say
a)wash vehicle
b)inspect vehicle
Say wash vehicle workflow task changes vehicle attribute from "not washed" to "washed". And say "inspect vehicle" workflow task changes vehicle attribute "not inspected" to "inspection done"
If user is pulling work order data user will always see latest vehicle data (in this example assuming both workflow tasks are completed user will see value "washed" and "inspection done". However when user pulls ONLY workflow Task Wash Vehicle data -> user will see "washed" -Though second task was done, workflow Task 1 will only see that that it modified. Getting data for Workflow Task 2 will see both "washed" and "inspection done"
This involves milstoning (audit trail) of data; One approach is as shown below image - where when workflow task modifies data it'll update version number, modified_ts and maintain that version number in it's own data row (via a JOIN table as depicted below). Basically this is nothing but maintaining a reference to a history record for workflow task data so when pulling workflow task data it knows which history record to pull back. please ignore parent_id and other notes, noise in a below picture. it's not relevant for this question.
I am thinking event sourcing will also be another alternative design - however don't want to apply event sourcing(or any other similar solution) as a whole sale solution but only for this particular use case (affecting only 3 or so tables where audit trail matters). I am trying to evaluate if CQRS/Event sourcing is a right fit as a partial solution (again only limited to 3-4 tables which need to preserve history/audit trail data) or ES/CQRS will be an overkill? any other thoughts?
P.S. Though this isn't related to Scala - Scala is a platform we are using hence tagging it to see if there are language specific solutions that can help. tagging Akka for finding out if ES/CQRS via Akka persistence is an option or not. Postgresql is a db - And DB triggers is not a solution I am looking for.

Related

What is the purpose of pre and post deployment?

i am new to pre and post deployment
To understand this i came across this:
“”When databases are created or upgraded, data may need to be added, changed, or deleted. Moreover, certain actions may have to occur on the database before and/or after the process completes. Deployment scripts can be used to accomplish this.””
I want to understand how this exactly works with an example
https://www.mssqltips.com/sqlservertutorial/3006/working-with-pre-and-post-deployment-scripts/
As pointed out in the site, a good example of a post deployment step is insertion of seed data.
For instance, you create a new currency table as part of the schema migration step. Then you insert the most commonly used currencies (say USD, EUR, etc.) so that they don't have to be inserted with a manual step.
Another example of post deployment step is populating data for a newly added column. For example you add a new column called IsPremium to the Customers table and want to set all customers with a start date > 5 years as true. A post deployment script is good place to do that.
Similarly scripts that run before the migration go into pre-deployment scripts. One example is locking certain table to ensure that the migration script is run only once, or setting a flag to indicate a migration is in progress.

DDD: How to solve this using Domain-Driven design?

I'm new to DDD and cutting my teeth on the following exercise. The use case is real, but my attempt to solve it with DDD is purely for learning.
We have multiple Git repos, each containing a file that we call
product spec. The system needs to respond to a HTTP POST by cloning all
the repos, and then update the product spec in those that match some
information in the POST body. System also needs to log the POST request as the cause for updating the product spec.
I'd like to use Aggregates and event sourcing for solving this problem because they seem like a good fit. Event sourcing comes with automatic persistence of the commands, so if I convert the POST body to a command, I get auditing for free.
Problem is, the POST may match multiple product spec. I'm not sure how to deal with that. Should I create a domain service, let it find all the matching product spec and then issue an update command to each? Or should I have the aggregate root do so? If using aggregate root to update multiple entities, it itself needs to be an entity, so what would it be in my problem domain?
The first comment to your question is right (the one of #VoiceOfUnreason): this 'is mostly side effect coordination'.
But I will try to answer your question: How to solve this using DDD / Event Sourcing:
The first aggregate root could just be named: 'MultipleRepoOperations'. This aggregate root has only one stream of events.
The command that fires the whole process could be: 'CloneAndUpdateProdSpecRepos' which carries a list of all the repos to be cloned and updated.
When the aggregate root processes the command it will simply spit a bunch of events of type 'UserRequestedToCloneAndUpdateProdSpec'
The second bounded context manages all the repos, and it its subscribed to all the events from 'MultipleRepoOperations' and will receive each event emitted by it. This bounded context aggregate root can be called: 'GitRepoManagement', and has a stream per repo. Eg: GitRepoManagement-Repo1, GitRepoManagement-Repo215, GitRepoManagement-20158, etc.
'GitRepoManagement' receives each event of type 'UserRequestedToCloneAndUpdateProdSpec', replays its corresponding repo stream in order to rehydrate the current state, and then tries to clone and update the product spec for the repo. When fails emits a failed event or a suceed if appropiate.
for learning purposes try to choose problem domain that has more complex rules and logic, where many actions is needed. for example small game (card game,multiplayer quiz game or whatever). or simulate some real world process like school management or some business process.

How Can use real-time workflow in CRM 2015?

I have a real-time workflow for creating unique numbers. This workflow get a numeric field from my custom entity, increase it by 1, and update it for next use.
I want to run this workflow on multiple records.
Running on-demand mode, it works fine,and I have true and unique numbers, but for "Record is Created" mode, it dose not work fine and get repeated numbers.
What I have to do?
This approach wont work, when the workflow runs on demand its running multi-threaded, e.g. two users create two records, two instances of the workflow start. As there is no locking mechanism you end up with duplicated numbers.
I'm guessing this isn't happening when running on demand because you are running as a single user.
You will need to implement a custom auto number approach, such as Auto Number for DynamicsCRM.
Disclaimer: I work for Gap Consulting who produce the tool linked above.

How do I listen for, load and run user-defined workflows at runtime that have been persisted using SqlWorkflowInstanceStore?

The result of SqlWorkflowInstanceStore.WaitForEvents does not tell me what type of workflow is runnable. The constructor of WorkflowApplication takes a workflow definition, and at a minimum, I need to be able to store a workflow ID in the store and query it, so that I can determine which workflow definition to load for the WorkflowApplication.
I also don't want to create a SqlWorkflowInstanceStore for each custom workflow type, since there may be thousands of different workflows.
I thought about trying to use WorkflowServiceHost, but not every workflow has a Receive activity and I don't think it is feasible to have thousands of WorkflowServiceHosts running, each supporting a different workflow type.
Ideally, I just want to query the database for a runnable workflow, determine its workflow definition ID, load the appropriate XAML from a workflow definition table, instantiate WorkflowApplication with the workflow definition, and call LoadRunnableInstance().
I would like to have a way to correlate which workflow is related to a given HasRunnableWorkflowEvent raised by the SqlWorkflowInstanceStore (along with the custom workflow definition ID), or have an alternate way of supporting potentially thousands of different custom workflow types created at runtime. I must also load balance the execution of workflows across multiple application servers.
There's a free product from Microsoft that does pretty much everything you say there, and then some. Oh, and it's excellent too.
Windows Server AppFabric. No, not Azure.
http://www.microsoft.com/windowsserver2008/en/us/app-main.aspx
-Oisin

Last Updated Date: Antipattern?

I keep seeing questions floating through that make reference to a column in a database table named something like DateLastUpdated. I don't get it.
The only companion field I've ever seen is LastUpdateUserId or such. There's never an indicator about why the update took place; or even what the update was.
On top of that, this field is sometimes written from within a trigger, where even less context is available.
It certainly doesn't even come close to being an audit trail; so that can't be the justification. And if there is and audit trail somewhere in a log or whatever, this field would be redundant.
What am I missing? Why is this pattern so popular?
Such a field can be used to detect whether there are conflicting edits made by different processes. When you retrieve a record from the database, you get the previous DateLastUpdated field. After making changes to other fields, you submit the record back to the database layer. The database layer checks that the DateLastUpdated you submit matches the one still in the database. If it matches, then the update is performed (and DateLastUpdated is updated to the current time). However, if it does not match, then some other process has changed the record in the meantime and the current update can be aborted.
It depends on the exact circumstance, but a timestamp like that can be very useful for autogenerated data - you can figure out if something needs to be recalculated if a depedency has changed later on (this is how build systems calculate which files need to be recompiled).
Also, many websites will have data marking "Last changed" on a page, particularly news sites that may edit content. The exact reason isn't necessary (and there likely exist backups in case an audit trail is really necessary), but this data needs to be visible to the end user.
These sorts of things are typically used for business applications where user action is required to initiate the update. Typically, there will be some kind of business app (eg a CRM desktop application) and for most updates there tends to be only one way of making the update.
If you're looking at address data, that was done through the "Maintain Address" screen, etc.
Such database auditing is there to augment business-level auditing, not to replace it. Call centres will sometimes (or always in the case of financial services providers in Australia, as one example) record phone calls. That's part of the audit trail too but doesn't tend to be part of the IT solution as far as the desktop application (and related infrastructure) goes, although that is by no means a hard and fast rule.
Call centre staff will also typically have some sort of "Notes" or "Log" functionality where they can type freeform text as to why the customer called and what action was taken so the next operator can pick up where they left off when the customer rings back.
Triggers will often be used to record exactly what was changed (eg writing the old record to an audit table). The purpose of all this is that with all the information (the notes, recorded call, database audit trail and logs) the previous state of the data can be reconstructed as can the resulting action. This may be to find/resolve bugs in the system or simply as a conflict resolution process with the customer.
It is certainly popular - rails for example has a shorthand for it, as well as a creation timestamp (:timestamps).
At the application level it's very useful, as the same pattern is very common in views - look at the questions here for example (answered 56 secs ago, etc).
It can also be used retrospectively in reporting to generate stats (e.g. what is the growth curve of the number of records in the DB).
there are a couple of scenarios
Let's say you have an address table for your customers
you have your CRM app, the customer calls that his address has changed a month ago, with the LastUpdate column you can see that this row for this customer hasn't been touched in 4 months
usually you use triggers to populate a history table so that you can see all the other history, if you see that the creationdate and updated date are the same there is no point hitting the history table since you won't find anything
you calculate indexes (stock market), you can easily see that it was recalculated just by looking at this column
there are 2 DB servers, by comparing the date column you can find out if all the changes have been replicated or not etc etc ect
This is also very useful if you have to send feeds out to clients that are delta feeds, that is only the records that have been changed or inserted since the data of the last feed are sent.