How to implement a content driven workflow in jBPM/activiti/YAWL? - workflow

I need a workflow with human tasks, where each node can be revisited at any point of execution.
/A1 -- B1 \
/ \
Start - AND AND - End
\ /
\ A1 --- B2/
So even if the current execution is at B2, the user can go to A1 (assuming A1 is assigned to the same user and is already done).
How can I model this behavior in jBPM/Activiti - since the task once completed is deleted from the execution chain?
Is there any other workflow engine which allows me to do this?

I'm afraid this is difficult to achieve because BPMN is intentionally stateless.
Let's suppose you're at End and want to return to A1. It means that the process token somehow teleports to A1, flows to B1, to the second AND and... we're stuck here. As far as I understand, AND means a BPMN parallel gateway. And it does not allow us to go further because it needs two input tokens at once to produce an output token.
Probably you can adapt another approach, called a Finite State Machine. Imagine that your document (piece of content, token, et cetera) which flows through the workflow has a state attribute, which can be one of several values: Start, A1, etc. Besides this, you have a transition table of the following format:
from | to
----------
Start | AND
AND | A1
AND | A2
Thus, your system knows that AND follows Start. This table does not actually specify a classical finite state machine because it defines a non-functional relation; in other words, any state may be followed from two or more states as it is shown on your diagram. It's up to you how to implement this. Maybe you can create two copies of the instance, or use a combined state attribute which is a list of values.
But anyway, in this approach, 1) the system knows how to swith states automatically; 2) you can build a UI to switch them manually.
The choice of workflow/state machine libraries if you choose to use them depends on your platform, programming langauge, and requirements.

Related

Best practices for Informatica Webservice workflow

I have created a Informatica webservice workflow which takes 1 parameter as input. A Webservice provider source definition is used for this and mapping is a one-way type.
Workflow works fine when parameter is being passed. But when the same workflow is triggered from Informatica Power center directly (in which case no parameters are passed), mapping that contains webservice provider source definition takes 3 minutes to complete (Gives Timeout based commit point in the log).
Is it a good practice to run the webservice workflow from power center directly? And is there a way to improve its performance when triggered from power center directly?
Note: I am trying to use 1 workflow for both - 1) Pass the parameter from web 2) Schedule the workflow in Informatica
Answers to your questions below.
Is it a good practice to run the webservice workflow from power center directly?
Of course it depends on requirement - whether you need to extract data automatically from WS or not. If you pass parameter using some session then i dont see much issue here and your session is completing within time.
So, you can create a new session/command task/shell script to create a param file and then use it in original session so it is passed on to WS.
In a complex scenario, you may have to pass multiple values, in such case, i would recommend to use a parent workflow to call original workflow multiple times and change param every time before call.
Is there a way to improve its performance when triggered from power center directly?
It is really depends on few factors.
The web service - Make sure you are using correct input and output columns. Most of the time WS are sensitive to outside call and you need to choose optimized column to extract data for better performance. You can work with WS admin to know correct column.
If informatica flow is complex then depending on bottle neck transformation/s (source, target, expression, lookup, aggregator, sorter), we can check and take actions.
For lookup, you can add new filter to exclude unwanted data, remove unwanted columns etc.
For aggregator, you can use sorter before to improve perf.
... like this

DDD: How to solve this using Domain-Driven design?

I'm new to DDD and cutting my teeth on the following exercise. The use case is real, but my attempt to solve it with DDD is purely for learning.
We have multiple Git repos, each containing a file that we call
product spec. The system needs to respond to a HTTP POST by cloning all
the repos, and then update the product spec in those that match some
information in the POST body. System also needs to log the POST request as the cause for updating the product spec.
I'd like to use Aggregates and event sourcing for solving this problem because they seem like a good fit. Event sourcing comes with automatic persistence of the commands, so if I convert the POST body to a command, I get auditing for free.
Problem is, the POST may match multiple product spec. I'm not sure how to deal with that. Should I create a domain service, let it find all the matching product spec and then issue an update command to each? Or should I have the aggregate root do so? If using aggregate root to update multiple entities, it itself needs to be an entity, so what would it be in my problem domain?
The first comment to your question is right (the one of #VoiceOfUnreason): this 'is mostly side effect coordination'.
But I will try to answer your question: How to solve this using DDD / Event Sourcing:
The first aggregate root could just be named: 'MultipleRepoOperations'. This aggregate root has only one stream of events.
The command that fires the whole process could be: 'CloneAndUpdateProdSpecRepos' which carries a list of all the repos to be cloned and updated.
When the aggregate root processes the command it will simply spit a bunch of events of type 'UserRequestedToCloneAndUpdateProdSpec'
The second bounded context manages all the repos, and it its subscribed to all the events from 'MultipleRepoOperations' and will receive each event emitted by it. This bounded context aggregate root can be called: 'GitRepoManagement', and has a stream per repo. Eg: GitRepoManagement-Repo1, GitRepoManagement-Repo215, GitRepoManagement-20158, etc.
'GitRepoManagement' receives each event of type 'UserRequestedToCloneAndUpdateProdSpec', replays its corresponding repo stream in order to rehydrate the current state, and then tries to clone and update the product spec for the repo. When fails emits a failed event or a suceed if appropiate.
for learning purposes try to choose problem domain that has more complex rules and logic, where many actions is needed. for example small game (card game,multiplayer quiz game or whatever). or simulate some real world process like school management or some business process.

CQRS and Passing Data

Suppose I have an aggregate containing some data and when it reaches a certain state, I'd like to take all that state and pass it to some outside service. For argument and simplicity's sake, lets just say it is an aggregate that has a list and when all items in that list are checked off, I'd like to send the entire state to some outside service. Now when I'm handling the command for checking off the last item in the list, I'll know that I'm at the end but it doesn't seem correct to send it to the outside system from the processing of the command. So given this scenario what is the recommended approach if the outside system requires all of the state of the aggregate. Should the outside system build its own copy of the data based on the aggregate events or is there some better approach?
Should the outside system build its own copy of the data based on the aggregate events.
Probably not -- it's almost never a good idea to share the responsibility of rehydrating an aggregate from its history. The service that owns the object should be responsible for rehydration.
First key idea to understand is when in the flow the call to the outside service should happen.
First, the domain model processes the command arguments, computing the update to the event history, including the ChecklistCompleted event.
The application takes that history, and saves it to the book of record
The transaction completes successfully.
At this point, the application knows that the operation was successful, but the caller doesn't. So the usual answer is to be thinking of an asynchronous operation that will do the rest of the work.
Possibility one: the application takes the history that it just saved, and uses that history to create schedule a task to rehydrate a read-only copy of the aggregate state, and then send that state to the external service.
Possibility two: you ditch the copy of the history that you have now, and fire off an asynchronous task that has enough information to load its own copy of the history from the book of record.
There are at least three ways that you might do this. First, you could have the command schedule the task as before.
Second, you could have a event handler listening for ChecklistCompleted events in the book of record, and have that handler schedule the task.
Third, you could read the ChecklistCompleted event from the book of record, and publish a representation of that event to a shared bus, and let the handler in the external service call you back for a copy of the state.
I was under the impression that one bounded context should not reach out to get state from another bounded context but rather keep local copies of the data it needed.
From my experience, the key idea is that the services shouldn't block each other -- or more specifically, a call to service B should not block when service A is unavailable. Responding to events is fundamentally non blocking; does it really matter that we respond to an asynchronously delivered event by making an asynchronous blocking call?
What this buys you, however, is independent evolution of the two services - A broadcasts an event, B reacts to the event by calling A and asking for a representation of the aggregate that B understands, A -- being backwards compatible -- delivers the requested representation.
Compare this with requiring a new release of B every time the rehydration logic in A changes.
Udi Dahan raised a challenging idea - the notion that each piece of data belongs to a singe technical authority. "Raw business data" should not be replicated between services.
A service is the technical authority for a specific business capability.
Any piece of data or rule must be owned by only one service.
So in Udi's approach, you'd start to investigate why B has any responsibility for data owned by A, and from there determine how to align that responsibility and the data into a single service. (Part of the trick: the physical view of a service can span process boundaries; in other words, a process may be composed from components that belong to more than one service).
Jeppe Cramon series on microservices is nicely sourced, and touches on many of the points above.
You should never externalise your state. Reporting on that state is a function of the read side, as it produces reports and you'll need that data to call the service. The structure of your state is plastic, and you shouldn't have an external service that relies up that structure otherwise you'll have to update both in lockstep which is a bad thing.
There is a blog that puts forward a strong argument that the process manager is the correct place to put this type of feature (calling an external service), because that's the appropriate place for orchestrating events.

building audit trail functionality

Following is a use case in a workflow system
Work order enters into a system. Work order will have a target which goes through different workflow states before completing a work order.
Say Work order for a target Vehicle came into a system - workflow for this work oder involves 2 tasks say
a)wash vehicle
b)inspect vehicle
Say wash vehicle workflow task changes vehicle attribute from "not washed" to "washed". And say "inspect vehicle" workflow task changes vehicle attribute "not inspected" to "inspection done"
If user is pulling work order data user will always see latest vehicle data (in this example assuming both workflow tasks are completed user will see value "washed" and "inspection done". However when user pulls ONLY workflow Task Wash Vehicle data -> user will see "washed" -Though second task was done, workflow Task 1 will only see that that it modified. Getting data for Workflow Task 2 will see both "washed" and "inspection done"
This involves milstoning (audit trail) of data; One approach is as shown below image - where when workflow task modifies data it'll update version number, modified_ts and maintain that version number in it's own data row (via a JOIN table as depicted below). Basically this is nothing but maintaining a reference to a history record for workflow task data so when pulling workflow task data it knows which history record to pull back. please ignore parent_id and other notes, noise in a below picture. it's not relevant for this question.
I am thinking event sourcing will also be another alternative design - however don't want to apply event sourcing(or any other similar solution) as a whole sale solution but only for this particular use case (affecting only 3 or so tables where audit trail matters). I am trying to evaluate if CQRS/Event sourcing is a right fit as a partial solution (again only limited to 3-4 tables which need to preserve history/audit trail data) or ES/CQRS will be an overkill? any other thoughts?
P.S. Though this isn't related to Scala - Scala is a platform we are using hence tagging it to see if there are language specific solutions that can help. tagging Akka for finding out if ES/CQRS via Akka persistence is an option or not. Postgresql is a db - And DB triggers is not a solution I am looking for.

jBPM5 Task Instances Sequence for Display

Please assume the following scenario,
In a process flow there are three humanTask1, humanTask2, humanTask3; All three are assigned to same userA;
Now, assume there are two process instances (p1, p2) live. Each process instance can be in different tasklevel.
That is, for p1, task status are as, humanTask3-inprogress and for p2, humanTask1-inprogress
To display the tasks for this userA in a web page, I want them to be ordered as they appear in workflow design like,
p2-humanTask1
p1-humanTask3
taskService.getTasksOwned() may not return the tasks list in this order.
How do I ensure the tasks are displayed in this sequence?
I am using jBPM 5.3; LocalTaskService;
AFAIK, there is no out-of-the-box solution for this. You will have to decide your own order and then apply it whether by ordering the result of taskService.getTasksOwned() or by creating and executing a customized version of the query that taskService.getTasksOwned() is using.
Hope it helps,