How is the state of a BPMN process defined? - workflow

Assuming a BPMN process describing activities, gateways, start and end events. As follow:
Each step is managed by a BPMN engine. At one point, how can we tell which is the state of the process ? Activities seem to define some state embodied as actions (e.g. evaluating request). Am I correct ?
Also, if we assume activity represents the state, how do we get a listing of next possible states if we were to navigate through a dedicated follow-up application ?
Should the process be modeled in a more workflow oriented way to express those state/actions possibilities ? I have the intuition that events could also be used to manage states and possible related actions.

Since I am not sure what exactly you understand as state of the process, I will try to define that first. I guess you are aware of the token concept, see a discussion in the Camunda forum:
A token is a BPMN concept that represents a state within a process instance. It does not have any variables or any message.
You may now define the state of the process as a statistics how many tokens at a given time are existing, and how many are currently in a given activity or event.
This statistics can be extracted from your favorite BPMN engine (and seen e.g. in Camunda's Cockpit as little colorful bubbles). With that statistics in hand, you could in principle generate forecast on next possible states, i.e. determine scenarios how many tokens will be in the next time instance probably in each activity.

State has a different meaning in BPMN, it could mean:
1 - Where is the token in the flow?
2 - Is the process flow running correctly or not?
3 - Or, by a specific variable (field) in the forms.
If you mean the third case, which is common in processes, you have to define a field in your data model as enum (depends on the engine) and manually or automatically change its value in the forms.

Obviously, the rather abstract Petri-Net-style token flow semantics of BPMN does not capture the real semantics of business processes. It has just been artificially imposed on BPMN due to academic pressure groups. A really meaningful semantics must refer to the information context of a process in the business system that owns it.
Of course, a business system that is the owner of a process (type), is, at any point during a running process, in a certain complex dynamic information state, some part of which forms the context of the process and can therefore be considered its state.
In fact, the (information) state of a process is essentially given by all the property-value slots of objects that are used or affected by (events/activities of) the process. In addition to these "global variables", the state of a process also includes
the values of (auxiliary) process variables,
the information, which activities have been started (and are ongoing).

Take a look into the Imixs-Workflow project. It is a event orientated workflow engine instead of the task orientated design often seen in BPM engines.
Each task in this kind of workflow engine defines a state in your process model. The workflow engine holds this state until a event is fired. An event defines the transition from one state to another.
You can find examples how to model different szenarious in a event driven workflow model here.

Related

How to store sagas’ data?

From what I read aggregates must only contain properties which are used to protect their invariants.
I also read sagas can be aggregates which makes sense to me.
Now I modeled a registration process using a saga: on RegistrationStarted event it sends a ReserveEmail command which will trigger an EmailReserved or EmailReservationFailed given if the email is free or not. A listener will then either send a validation link or a message telling an account already exists.
I would like to use data from the RegistrationStarted event in this listener (say the IP and user-agent). How should I do it?
Storing these data in the saga? But they’re not used to protect invariants.
Pushing them through ReserveEmail command and the resulting event? Sounds tedious.
Project the saga to the read model? What about eventual consistency?
Another way?
Rinat Abdullin wrote a good overview of sagas / process managers.
The usual answer is that the saga has copies of the events that it cares about, and uses the information in those events to compute the command messages to send.
List[Command] processManager(List[Event] events)
Pushing them through ReserveEmail command and the resulting event?
Yes, that's the usual approach; we get a list [RegistrationStarted], and we use that to calculate the result [ReserveEmail]. Later on, we'll get [RegistrationStarted, EmailReserved], and we can use that to compute the next set of commands (if any).
Sounds tedious.
The data has to travel between the two capabilities somehow. So you are either copying the data from one message to another, or you are copying a correlation identifier from one message to another and then allowing the consumer to decide how to use the correlation identifier to fetch a copy of the data.
Storing these data in the saga? But they’re not used to protect invariants.
You are typically going to be storing events in the sagas (to keep track of what has happened). That gives you a copy of the data provided in the event. You don't have an invariant to protect because you are just caching a copy of a decision made somewhere else. You won't usually have the process manager running queries to collect additional data.
What about eventual consistency?
By their nature, sagas are always going to be "eventually consistent"; the "state" of an instance of a saga is just cached copies of data controlled elsewhere. The data is probably nanoseconds old by the time the saga sees it, there's no point in pretending that the data is "now".
If I understand correctly I could model my saga as a Registration aggregate storing all the events whose correlation identifier is its own identifier?
Udi Dahan, writing about CQRS:
Here’s the strongest indication I can give you to know that you’re doing CQRS correctly: Your aggregate roots are sagas.

Multi-instance and Loop in BPMN

I am trying to model a certain behaviour, where couple of activities in differents swimlanes supposed to be processed in a loop. Now BPMN uses tokens to ilustrate the flow and paths taken. I wonder how such tokens work in case of loops. Does every activity iteration creates a token which consequently travel through the connected activities?
E.g. Let's say Activity1 will be performed in a loop 10 times. Will that create 10 tokens where each will travel through the remaining activities of the process? Such behaviour would be undesirable, however if I am not mistaken multi-instance activities work that way.
The only solution on my mind which would comply with BPMN specification would be to create a Call activity for the whole block of activities and then run the Call activity in a loop.
Can anyone clarify for me the use of loops and multi-instances in BPMN from the view of tokens?
Thank you in advance!
Based upon my reading of the documentation: https://www.omg.org/spec/BPMN/2.0/PDF The answer from #qwerty_so does not seem to conform to the standard, although in part this seems to be because the question also seems imprecise or at least underspecified.
A token (see glossary) is simply an imaginary object that represents the flow unit in the process diagram. There are at least three different types of loops specified in the standard, which suggest different implications for the flow unit.
Sections 13.2.6 and 12.2.7 describe Loop Activity and Multiple Instance Activities respectively. While the latter, on its face, might not seem like a loop, the standard defines attributes of the activity that suggest otherwise including: MultipleInstanceLoopCharacteristics and ExpressionloopCardinality.
In the former case, it seems that the operational semantics suggest a single flow unit that repeats multiple times according to some policy or even unbounded.
In the latter case, the activity has "multiple instances spawned," including a parallel variant.
That multiple instances can flow forward in parallel, on its face, suggests that the system must at least allow for the possibility of spawning multiple tokens (or conceptually splitting the original token) to support multiple threads proceeding simultaneously along different paths.
That said, the Loop Activity (13.2.6) appears to support the OP's desired semantics.

How to represent part of BPMN workflow that is automated by system?

I am documenting a user workflow where part of the flow is automated by a system (e.g. if the order quantity is less than 10 then approve the order immediately rather than sending it to a staff for review).
I have swim lanes that goes from people to people but not sure where I can fit this system task/decision path. What's the best practice? Possibly a dumb idea but I'm inclined to create a new swim lane and call it the "system".
Any thoughts?
The approach of detaching system task into separate lane is quite possible as BPMN 2.0 specification does not explicitly specify meaning of lanes and says something like that:
Lanes are used to organize and categorize Activities within a Pool.
The meaning of the Lanes is up to the modeler. BPMN does not specify
the usage of Lanes. Lanes are often used for such things as internal
roles (e.g., Manager, Associate), systems (e.g., an enterprise
application), an internal department (e.g., shipping, finance), etc.
So you are completely free to fill them with everything you want. However, your case is quite evident and doesn't require such separation at all. According to your description we have typical conditional activity which can be expressed via Service task or Sub-process. These are 2 different approaches and they hold different semantics.
According to BPMN specification Service Task is a task that uses some sort of service, which could be a Web service or an automated application. I.e it is usually used when modeller don't want to decompose some process and is intended to outsource it to some external tool or agent.
Another cup of tea is Sub-process, which is typically used when you
want to wrap some complex piece of workflow for reuse or if that
piece of workflow can be decomposed into sub-elements.
In your use case sub-process is a thing of choice. It is highly adjustable, transparent and maintainable. For example, inside sub-process you can use Business Rules Engine for your condition parameter (Order Quantity) and flexibly adjust its value on-the-fly.
In greater detail you can learn the difference of these approcahes from this blog.
There is a technique of expressing system tasks/decisions via dedicated participant/lane. Then all system tasks are collocated on a system lane.
System tasks (service tasks in BPMN) are usually done on behalf of an actor, so in my opinion it is useful to position them in the lane for that actor.
Usually such design also help to keep the diagram easy to read by limiting the number of transition between "users" lanes and "system" lane.

Using aggregates and Domain events with nosql storage

I'm wandering on DDD and NoSql field actually. I have a doubt now: i need to produce events from the aggregate and i would like to use a NoSql storage. But how can i be sure that events are saved on the storage AND the changes on the aggregate root not having transactions?
Does it makes sense? Is there a way to do this without being forced to use event sourcing or a transactional db?
Actually i was lookin at implementing a 2 phase commit algorithm but it seems pretty heavy from a performance point of view...
Am i approaching the problem the wrong way?
Stuffed with questions...
Thanks for every suggestion
Enrico
PS
I'm a newbie on stackoverflow so any suggestion/critic/... is more than welcome
Enrico
Edit 1
Well i would need events to notify aggregates that something happened and i they should react to the change. The problem arise when such events are important for the business logic. As far as i understood, after a night of thinking, i can't use a nosql storage to do such things. Let me explain (thinking with loud voice :P):
With ES (1st scenery): I save the "diff" of the data. Then i produce an event associated with it. 2 operations.
With ES (2nd scenery): I save the "diff" of the data. A process, watch the ES and produce the event. But i'm tied to having only one watcher process to ensure the correct ordering of events.
With ES (3d scenery): Idempotent events. The events can be inferred by the state and every reapplication of the event can cause a change on the consumer only once, can have multiple "dequeue" processes, duplicates can't possibly happen. 1 operation, but it introduce heavy limitations on the consumers.
In general: I save the aggregate's data. Then i produce an event associated with it. 2 operations.
Now the question becomes wider imho, is it possible to work with domain events and nosql when such domain events are fundamental part of the business process?
I think that could be a better option to go relational... even if i would need to add quite a lot of machines to get the same performances.
Edit 2
For the sake of completness, searching for "domain events nosql idempotent" on google: http://svendvanderveken.wordpress.com/2011/08/26/transactional-event-based-nosql-storage/
If you need Event Sourcing, you should store events only.
This should be the sequence:
the aggregate root recieves a command
it fires proper events
events are stored
Each aggregate's re-hydratation should be done only by executing events over them. You can create aggregates' snapshots if you measure performance problems on their initialization, but this doesn't require two-phase commits, since you can build snapshots asynchronously via batch.
Note however that you need CQRS and/or Event Sourcing only if your application is heavily concurrent and you need to cope with partition tolerance and compensating actions.
edit
Event Sourcing is alternative to the persistence of object state. You either store the events or the state of the object model. You can save snapshot, but they're just performance tools: your application must be able to work without them. You can consider such snapshots as a caching technique. As an alternative you can persist object state (the classical model), but in that case you don't need to store events.
In my own DDD application, I use observable entities to decouple (via direct events' subscription from the repository) aggregates and their persistence. For example your repository can subscribe each domain events, and execute the actions required by the application (persist to the store, dispatch to a queue and so on...). But as a persistence technique, Event Sourcing is alternative to classical persistence of the observable object state. In most scenarios you don't need both.
edit 2
A final note: if you choose ES, one of the events subscriber can build a relational read-model too.

CQRS sagas - did I understand them right?

I'm trying to understand sagas, and meanwhile I have a specific way of thinking of them - but I am not sure whether I got the idea right. Hence I'd like to elaborate and have others tell me whether it's right or wrong.
In my understanding, sagas are a solution to the question of how to model long-running processes. Long-running means: Involving multiple commands, multiple events and possibly multiple aggregates. The process is not modeled inside one of the participating aggregates to avoid dependencies between them.
Basically, a saga is nothing more but a command / event handler that reacts on internal and external commands / events. It does not contain its own logic, it's just a (finite) state machine, and therefor provides tasks such as When event X happens, send command Y.
Sagas are persisted to the event store as well as aggregates, are correlated to a specific aggregate instance, and hence are reloaded when this specific aggregate (or set of aggregates) is used.
Is this right?
There are different means of implementing Sagas. Reaching from stateless event handlers that publish commands all the way to carrying all the state and basically being the domain's aggregates themselves. Udi Dahan once wrote an article about Sagas being the only Aggregates in a (in his specific case) correctly modeled system. I'll look it up and update this answer.
There's also the concept of document-based sagas.
Your definition of Sagas sounds right for me and I also would define them so.
The only change in your description I would made is that a saga is only a eventhandler (not a command) for event(s) and based on the receiving event and its internal state constructs a command and sents it to the CommandBus for execution.
Normally has a Saga only a single event to be started from (StartByEvent) and multiple events to transition (TransitionByEvent) to the next state and mutiple event to be ended by(EndByEvent).
On MSDN they defined Sagas as ProcessManager.
The term saga is commonly used in discussions of CQRS to refer to a
piece of code that coordinates and routes messages between bounded
contexts and aggregates. However, for the purposes of this guidance we
prefer to use the term process manager to refer to this type of code
artifact. There are two reasons for this: There is a well-known,
pre-existing definition of the term saga that has a different meaning
from the one generally understood in relation to CQRS. The term
process manager is a better description of the role performed by this
type of code artifact. Although the term saga is often used in the
context of the CQRS pattern, it has a pre-existing definition. We have
chosen to use the term process manager in this guidance to avoid
confusion with this pre-existing definition. The term saga, in
relation to distributed systems, was originally defined in the paper
"Sagas" by Hector Garcia-Molina and Kenneth Salem. This paper proposes
a mechanism that it calls a saga as an alternative to using a
distributed transaction for managing a long-running business process.
The paper recognizes that business processes are often comprised of
multiple steps, each of which involves a transaction, and that overall
consistency can be achieved by grouping these individual transactions
into a distributed transaction. However, in long-running business
processes, using distributed transactions can impact on the
performance and concurrency of the system because of the locks that
must be held for the duration of the distributed transaction.
reference: http://msdn.microsoft.com/en-us/library/jj591569.aspx