How to represent part of BPMN workflow that is automated by system? - workflow

I am documenting a user workflow where part of the flow is automated by a system (e.g. if the order quantity is less than 10 then approve the order immediately rather than sending it to a staff for review).
I have swim lanes that goes from people to people but not sure where I can fit this system task/decision path. What's the best practice? Possibly a dumb idea but I'm inclined to create a new swim lane and call it the "system".
Any thoughts?

The approach of detaching system task into separate lane is quite possible as BPMN 2.0 specification does not explicitly specify meaning of lanes and says something like that:
Lanes are used to organize and categorize Activities within a Pool.
The meaning of the Lanes is up to the modeler. BPMN does not specify
the usage of Lanes. Lanes are often used for such things as internal
roles (e.g., Manager, Associate), systems (e.g., an enterprise
application), an internal department (e.g., shipping, finance), etc.
So you are completely free to fill them with everything you want. However, your case is quite evident and doesn't require such separation at all. According to your description we have typical conditional activity which can be expressed via Service task or Sub-process. These are 2 different approaches and they hold different semantics.
According to BPMN specification Service Task is a task that uses some sort of service, which could be a Web service or an automated application. I.e it is usually used when modeller don't want to decompose some process and is intended to outsource it to some external tool or agent.
Another cup of tea is Sub-process, which is typically used when you
want to wrap some complex piece of workflow for reuse or if that
piece of workflow can be decomposed into sub-elements.
In your use case sub-process is a thing of choice. It is highly adjustable, transparent and maintainable. For example, inside sub-process you can use Business Rules Engine for your condition parameter (Order Quantity) and flexibly adjust its value on-the-fly.
In greater detail you can learn the difference of these approcahes from this blog.

There is a technique of expressing system tasks/decisions via dedicated participant/lane. Then all system tasks are collocated on a system lane.

System tasks (service tasks in BPMN) are usually done on behalf of an actor, so in my opinion it is useful to position them in the lane for that actor.
Usually such design also help to keep the diagram easy to read by limiting the number of transition between "users" lanes and "system" lane.

Related

Release resources when a resource is destroyed in Anylogic

The current model I am working on involves equipments seizing resources (materials, operators) to perform taskings within the equipment. During the model run, the number of equipment (a resource as well) will be varied using a slider. Upon reduction of the resourcepool quantity, excess equipments will be destroyed. How do I release all the resources that are still seized by this equipment after it is destroyed so that the resources can continue to be utilised by the rest of the model?
As far as I know when you reduce the number of resource units in a resource pool they will only be destroyed once the current agents that are seizing them have released them. Thus you don't need to worry about releasing all the resources that are still seized by this equipment.
This looks like a design problem, but you don't really explain how your flows are linked (or when your equipment agents enter the top flow).
Your information suggests that your equipment agents go through the top flow (seizing materials and operators); your bottom flow is presumably triggered after an 'EQ run' to simulate operators doing something at the machine before unload can start/complete(?) by releasing the Hold in the top flow.
The problem is that you should never have resources (i.e., agents that are part of a resource pool) flowing through processes; agents are either flow-through-processes agents or resource agents. Changing resource pool sizes will do nothing to affect those agents if they're in a process flow because AnyLogic is only concerned about tasks those resources are doing as resources.
In your case, it depends what 'removing the machine' represents in real-life (given the machine is potentially processing things at the time). That will determine what 'sub-flow' you need to enact in that case to free resources, etc. This is likely to mean in your case:
Have some trigger (e.g., event, schedule action, button action, some specific state elsewhere in your model) which means an equipment is now 'removed'.
This then needs code to know what block that equipment is currently in (agents have a currentBlock function for this purpose), remove it from that block (typically using the remove function provided by most blocks, but this is not consistent across all blocks) and then probably send it into a 'clean-up' process flow (using an Enter block, with a Sink block at the end to delete it) to do whatever needs doing (like releasing all resources).
Some of the code needed here is slightly complex (removing agents from within a process flow is not something 'cleanly' supported by AnyLogic except in the Pedestrian library) and will involve casting (since currentBlock returns you the block as a generic FlowchartBlock supertype).
P.S. A more 'radical' design rework (but probably a better one for what you want) is to keep equipments as resource agents but redesign your top flow so that the agent represents, say, a 'request for machine processing (of something)'. Your top flow will then be seizing/releasing equipment resources along with everything else, and you can use dynamic resource pool capacity changes as you were trying to. (It is quite common in process modelling to need to 'reframe' the processes via agents which represent some abstract 'request', rather than a physical thing.)

Chaos engineering best practice

I studied the principles of chaos, and looks for some opensource project, such as chaosblade which is open sourced by Alibaba, and mangle, by vmware.
These tools are both fault injection tools, and do nothing to analysis on the tested system.
According to the principles of chaos, we should
1.Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior.
2.Hypothesize that this steady state will continue in both the control group and the experimental group.
3.Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc.
4.Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group.
so how we do step 4? Should we use monitoring system to monitor some major metrics, to check the status of the system after fault injection.
Is there any good suggestions or best practice?
so how we do step 4? Should we use monitoring system to monitor some major metrics, to check the status of the system after fault injection.
As always the answer is it depends.... It depends how do you want to measure your hypothesis, it depends on the hypothesis itself and it depends on the system. But normally it makes totally sense to introduce metrics to improve/increase the observability.
If your hypothesis is like Our service can process 120 requests in a second, even if one node fails. Then you could do it via metrics to measure that yes, but you could also measure it via the requests you send and receive the responses back. It is up to you.
But if your Hypothesis is I get a response for an request which was send before a node goes down. Then it makes more sense to verify this directly with the requests and response.
At our project we use for example chaostoolkit, which lets you specify the hypothesis in json or yaml and related action to prove it.
So you can say I have a steady state X and if I do Y, then the steady state X should be still valid. The toolkit is also able to verify metrics if you want to.
The Principles of Chaos are a bit above the actual testing, they reflect the philosophy of designed vs actual system and system under injection vs baseline, but are a bit too abstract to apply in everyday testing, they are a way of reasoning, not a work process methodology.
I'm think the control group vs experiment wording is one especially doubtful part - you stage a test (injection) in a controlled environment and try to catch if there is a user-facing incident, SLA breach of any kind or a degradation. I do not see where there is a control group out there if you test on a stand or dedicated environment.
We use a very linear variety of chaos methodology which is:
find failure points in the system (based on architecture, critical user scenarios and history of incidents)
design choas test scenarios (may be a single attack or more elaborate sequence)
run tests, register results and reuse green for new releases
start tasks to fix red tests, verify the solutions when they are available
One may say we are actually using the Principles of Choas in 1 and 2, but we tend to think of choas testing as quite linear and simple process.
Mangle 3.0 released with an option for analysis using resiliency score. Detailed documentation available at https://github.com/vmware/mangle/blob/master/docs/sre-developers-and-users/resiliency-score.md

How is the state of a BPMN process defined?

Assuming a BPMN process describing activities, gateways, start and end events. As follow:
Each step is managed by a BPMN engine. At one point, how can we tell which is the state of the process ? Activities seem to define some state embodied as actions (e.g. evaluating request). Am I correct ?
Also, if we assume activity represents the state, how do we get a listing of next possible states if we were to navigate through a dedicated follow-up application ?
Should the process be modeled in a more workflow oriented way to express those state/actions possibilities ? I have the intuition that events could also be used to manage states and possible related actions.
Since I am not sure what exactly you understand as state of the process, I will try to define that first. I guess you are aware of the token concept, see a discussion in the Camunda forum:
A token is a BPMN concept that represents a state within a process instance. It does not have any variables or any message.
You may now define the state of the process as a statistics how many tokens at a given time are existing, and how many are currently in a given activity or event.
This statistics can be extracted from your favorite BPMN engine (and seen e.g. in Camunda's Cockpit as little colorful bubbles). With that statistics in hand, you could in principle generate forecast on next possible states, i.e. determine scenarios how many tokens will be in the next time instance probably in each activity.
State has a different meaning in BPMN, it could mean:
1 - Where is the token in the flow?
2 - Is the process flow running correctly or not?
3 - Or, by a specific variable (field) in the forms.
If you mean the third case, which is common in processes, you have to define a field in your data model as enum (depends on the engine) and manually or automatically change its value in the forms.
Obviously, the rather abstract Petri-Net-style token flow semantics of BPMN does not capture the real semantics of business processes. It has just been artificially imposed on BPMN due to academic pressure groups. A really meaningful semantics must refer to the information context of a process in the business system that owns it.
Of course, a business system that is the owner of a process (type), is, at any point during a running process, in a certain complex dynamic information state, some part of which forms the context of the process and can therefore be considered its state.
In fact, the (information) state of a process is essentially given by all the property-value slots of objects that are used or affected by (events/activities of) the process. In addition to these "global variables", the state of a process also includes
the values of (auxiliary) process variables,
the information, which activities have been started (and are ongoing).
Take a look into the Imixs-Workflow project. It is a event orientated workflow engine instead of the task orientated design often seen in BPM engines.
Each task in this kind of workflow engine defines a state in your process model. The workflow engine holds this state until a event is fired. An event defines the transition from one state to another.
You can find examples how to model different szenarious in a event driven workflow model here.

SOA Service Boundary and production support

Our development group is working towards building up with service catalog.
Right now, we have two groups, one to sale a product, another to service that product.
We have one particular service that calculates if the price of the product is profitable. When a sale occurs, the sale can be overridden by a manager. This sale must also be represented in another system to track various sales and the numbers must match. The parameters of profitability also change, and are different from month to month, but a sale may be based on the previous set of parameters.
Right now the sale profitability service only calculates the profit, it also provides a RESTful URI.
One group of developers has suggested that the profitability service also support these "manager overrides" and support a date parameter to calculate based on a previous date. Of course the sales group of developers disagree. If the service won't support this, the servicing developers will have to do an ETL between two systems for each product(s), instead of just the profitability service. Right now since we do not have a set of services to handle this, production support gets the request and then has to update the 1+ systems associated for that given product.
So, if a service works for a narrow slice, but an exception based business process breaks it, does that mean the boundaries of the service are incorrect and need to account for the change in business process?
Two, does adding a date parameter extend the boundary of the service too much, or should it be excepted that if the service already has the parameters, it would also have a history of parameters as well? At this moment, we don't not have a service that only stores the parameters as no one has required a need for it.
If there is any clarification needed before an answer can be given, please let me know.
I think the key here is: How much pain would be incurred by both teams if and ETL was introduced between to the two?
Not that I think you're over-analysing this, but if I may, you probably have an opinion that adding a date parameter into the service contract is bad, but also dislike the idea of the ETL.
Well, strategy aside, I find these days my overriding instinct is to focus less on the technical ins and outs and more on the purely practical.
At the end of the day, ETL is simple, reliable, and relatively pain free to implement, however it comes with side effects. The main one is that you'll be coupling changes to your service's db schema with an outside party, which will limit options to evolve your service in the future.
On the other hand allowing consumer demand to dictate service evolution is easy and low-friction, but also a rocky road as you may become beholden to that consumer at the expense of others.
Another possibility is to allow the extra service parameters to be delivered to the consumer via a message, rather then across the same service. This would allow you to keep your service boundary intact and for the consumer to hold the necessary parameters local to themselves.
Apologies if this does not address your question directly.

CQRS sagas - did I understand them right?

I'm trying to understand sagas, and meanwhile I have a specific way of thinking of them - but I am not sure whether I got the idea right. Hence I'd like to elaborate and have others tell me whether it's right or wrong.
In my understanding, sagas are a solution to the question of how to model long-running processes. Long-running means: Involving multiple commands, multiple events and possibly multiple aggregates. The process is not modeled inside one of the participating aggregates to avoid dependencies between them.
Basically, a saga is nothing more but a command / event handler that reacts on internal and external commands / events. It does not contain its own logic, it's just a (finite) state machine, and therefor provides tasks such as When event X happens, send command Y.
Sagas are persisted to the event store as well as aggregates, are correlated to a specific aggregate instance, and hence are reloaded when this specific aggregate (or set of aggregates) is used.
Is this right?
There are different means of implementing Sagas. Reaching from stateless event handlers that publish commands all the way to carrying all the state and basically being the domain's aggregates themselves. Udi Dahan once wrote an article about Sagas being the only Aggregates in a (in his specific case) correctly modeled system. I'll look it up and update this answer.
There's also the concept of document-based sagas.
Your definition of Sagas sounds right for me and I also would define them so.
The only change in your description I would made is that a saga is only a eventhandler (not a command) for event(s) and based on the receiving event and its internal state constructs a command and sents it to the CommandBus for execution.
Normally has a Saga only a single event to be started from (StartByEvent) and multiple events to transition (TransitionByEvent) to the next state and mutiple event to be ended by(EndByEvent).
On MSDN they defined Sagas as ProcessManager.
The term saga is commonly used in discussions of CQRS to refer to a
piece of code that coordinates and routes messages between bounded
contexts and aggregates. However, for the purposes of this guidance we
prefer to use the term process manager to refer to this type of code
artifact. There are two reasons for this: There is a well-known,
pre-existing definition of the term saga that has a different meaning
from the one generally understood in relation to CQRS. The term
process manager is a better description of the role performed by this
type of code artifact. Although the term saga is often used in the
context of the CQRS pattern, it has a pre-existing definition. We have
chosen to use the term process manager in this guidance to avoid
confusion with this pre-existing definition. The term saga, in
relation to distributed systems, was originally defined in the paper
"Sagas" by Hector Garcia-Molina and Kenneth Salem. This paper proposes
a mechanism that it calls a saga as an alternative to using a
distributed transaction for managing a long-running business process.
The paper recognizes that business processes are often comprised of
multiple steps, each of which involves a transaction, and that overall
consistency can be achieved by grouping these individual transactions
into a distributed transaction. However, in long-running business
processes, using distributed transactions can impact on the
performance and concurrency of the system because of the locks that
must be held for the duration of the distributed transaction.
reference: http://msdn.microsoft.com/en-us/library/jj591569.aspx