Kogito - wait until data from multiple endpoints is received - drools

I am using Kogito with Quarkus. I have set on drl rule and am using a bpmn configuration. As can be seen below, currently one endpoint is exposed, that starts the process. All needed data is received from the initial request, it is then evaluated and process goes on.
I would like to extend the workflow to have two separate endpoints. One to provide the age of the person and another to provide the name. The process must wait until all needed data is gathered before it proceeds with evaluation.
Has anybody come across a similar solution?

Technically you could use a signal or message to add more data into a process instance before you execute the rules over the entire data, see https://docs.kogito.kie.org/latest/html_single/#ref-bpmn-intermediate-events_kogito-developing-process-services.
In order to do that you need to have some sort of correlation between these events, otherwise, how do you map that event name 1 should be matched to event age 1. If you can keep the process instance id, then the second event can either trigger a rest endpoint to the specific process instance or send it a message via a message broker.
You also have your own custom logic to aggregate the events and only fire a new process instance once your criteria of complete data is met, and there is also plans in Kogito to extend the capabilities of how correlation is done, allowing for instance to use variables of the process as the identifier. For example, if you have person.id as correlation and event to name and age of the same id would signal the same process instance. HOpe this info helps.

Related

Implementing a pub-sub pattern with Axon

We have a multi-step process we'd like to implement using a pub-sub pattern, and we're considering Axon for a big part of the solution.
Simply, the goal is to generate risk scores for insurance companies. These steps would apply generally to a pub-sub application:
A client begins the process by putting a StartRiskScore message on a bus, specifying the customer ID. The client subscribes to RiskScorePart3 messages for the customer ID.
Actor A, who subscribes to StartRiskScore messages, receives the message, generates part 1 of the risk score, and puts it on the bus as a RiskScorePart1 message, including the customer ID.
Actor B, who subscribes to RiskScorePart1 messages, receives the message, generates part 2 of the risk score, and puts it on the bus as a RiskScorePart2 message, including the customer ID.
Actor C, who subscribes to RiskScorePart2 messages, receives the message, generates part 3 of the risk score, and puts it on the bus as a RiskScorePart3 message, including the customer ID.
The original client, who already subscribed to RiskScorePart3 messages for the customer ID, receives the message and the process is complete.
I considered the following Axon implementation:
A. Make an aggregate called RiskScore
B. StartRiskScore becomes a command associated with the RiskScore aggregate.
C. The command handler for StartRiskScore becomes Actor A. It processes some data and puts a RiskScorePart1 event on the bus.
Now, here's the part I'm concerned about...
D. I'd create a RiskScorePart1 event handler in a separate PubSub object, which would do nothing but put a CreateRiskScorePart2 command on the command bus using the data from the event.
E. In the RiskScore aggregate, a command handler for CreateRiskScorePart2 (Actor B) would do some processing, then put a RiskScorePart2 event on the bus.
F. Similar to step D, a PubSub event handler for RiskScorePart2 would put a CreateRiskScorePart3 command on the command bus.
G. Similar to step E, a RiskScore aggregate command handler for CreateRiskScorePart3 (Actor C) would do some processing, then put a RiskScorePart3 event on the bus.
H. In the aggregate and the RiskScoreProjection query module, a RiskScorePart3 event handler would update the aggregate and projection, respectively.
I. The client is updated by a subscribed query to the projection.
I understand that replay occurs when a service is restarted. That's bad for old events because I don't want to re-fire commands from the PubSub handlers. It's good news for new events that occurred while the PubSub service was down.
EDIT #1:
I've considered using an Axon saga, which would be great. However, the same questions still exist even if PubSub is a saga:
How to ensure PubSub event handlers process each event exactly once, even after a restart?
Is there a different approach I should be taking to implement a pub-sub pattern in Axon?
Thanks for your help!
I think I can give some guidance in this area.
In your update you've pointed out that you envisioning the usage of a Saga to perform this set up.
I'd however would like to point out that a Saga is meant to 'Orchestrate a Complex Business Transaction between Bounded Contexts/Aggregates'. The scenario you're describing is not a transaction between other contexts and/or aggregates, it's all contained in a single Aggregate Root, the RiskScore.
I'd thus suggest against using a Saga for this situation, as the tool (read: Saga) is relatively heavy wait for what you're describing.
Secondly, from the steps you describe from A to I, it looks as if the components described in steps D and F are purely there to react with a command on the event. Thus, they perform zero business functionality, taking that assumption.
Taking my initial point of a transaction contained in a single Aggregate Root and the fact no business functionality occurs on the dispatching of the command back in to the aggregate, why not contain the entirety of the operation within the RiskScore aggregate?
You can very easily handle the events an Aggregate publishes with the #EventSourcingHandler and on that method apply another event. Or, if you would like to be 'pure' about segregating state updates and apply events, you could just apply more events for the separate risk-score steps there after.
Any how, I don't see why you would need to hold tightly towards the pub-sub pattern. I'd take a solution which resolves the business needs as best as possible. That might be an existing pattern, but could just as well be any other approach you can think off.
This is my two cents to the situation, hope they help!

Sorting Service Bus Queue Messages

i was wondering if there is a way to implement metadata or even multiple metadata to a service bus queue message to be used later on in an application to sort on but still maintaining FIFO in the queue.
So in short, what i want to do is:
Maintaining Fifo, that s First in First Out structure in the queue, but as the messages are coming and inserted to the queue from different Sources i want to be able to sort from which source the message came from with for example metadata.
I know this is possible with Topics where you can insert a property to the message, but also i am unsure if it is possible to implement multiple properties into the topic message.
Hope i made my self clear on what i am asking is possible.
I assume you use .NET API. If this case you can use Properties dictionary to write and read your custom metadata:
BrokeredMessage message = new BrokeredMessage(body);
message.Properties.Add("Source", mySource);
You are free to add multiple properties too. This is the same for both Queues and Topics/Subscriptions.
i was wondering if there is a way to implement metadata or even multiple metadata to a service bus queue message to be used later on in an application to sort on but still maintaining FIFO in the queue.
To maintain FIFO in the queue, you'd have to use Message Sessions. Without message sessions you would not be able to maintain FIFO in the queue itself. You would be able to set a custom property and use it in your application and sort out messages once they are received out of order, but you won't receive message in FIFO order as were asking in your original question.
If you drop the requirement of having an order preserved on the queue, the the answer #Mikhail has provided will be suitable for in-process sorting based on custom property(s). Just be aware that in-process sorting will be not a trivial task.

Can event sourcing be used to resolve late arriving events

We have are developing an application that will receive events from various systems via a message queue (Azure) but it is just possible that some events (messages) will not arrive in the order they were sent. These events will be received and processed by a central CQRS/ES based system but my worry is that if the events are placed in the event store in the wrong order we will get garbage out (for example "order create" after "add order item").
Are typical ES systems meant to resolve this issue or are we meant to ensure that such messages are put in the right order before being pushed into the event store? If you have links to articles that back up either view it would help.
Edit: I think my description is clearly far too vague so the responses, while helpful in understanding CQRS/ES, do not quite answer my problem so I'll add a little more detail and hopefully someone will recognise the problem.
Firstly the players.
the front end web site (not actually relevant to this problem) delivers orders to the management system.
our management system which takes orders from the web site and passes them to the warehouse and is hosted on site.
the warehouse which accepts orders, fulfils them if possible and notifies us when an order is fulfilled or cannot be partially or completely fulfilled.
Linking the warehouse to the management system is a fairly thin Azure cloud based coupling. Messages from the warehouse are sent to a WCF/Soap layer in the cloud, parsed, and sent over the messages bus. Message to the warehouse are sent over the message bus and then, again in the cloud, converted into Soap calls to a server at the warehouse.
The warehouse is very careful to ensure that messages it sends have identifiers that increment without a gap so we can know when a message is missed. However when we take those messages and forward them to the management system they are transported over the message bus and could, in theory, arrive in the wrong order.
Now given that we have a sequence number in the messages we could ensure the messages are put back in the right order before they are sent to the CQRS/ES system but my questions is, is that necessary, can the ES actually be used to reorder the events into the logical order they were intended?
Each message that arrives in Service Bus is tagged with a SequenceNumber. The SequenceNumber is a monotonically increasing, gapless 64-bit integer sequence, scoped to the Queue (or Topic) that provides an absolute order criterion by arrival in the Queue. That order may different from the delivery order due to errors/aborts and exists so you can reconstitute order of arrival.
Two features in Service Bus specific to management of order inside a Queue are:
Sessions. A sessionful queue puts locks on all messages with the same SessionId property, meaning that FIFO is guaranteed for that sequence, since no messages later in the sequence are delivered until the "current" message is either processed or abandoned.
Deferral. The Defer method puts a message aside if the message cannot be processed at this time. The message can later be retrieved by its SequenceNumber, which pulls from the hidden deferral queue. If you need a place to keep track of which messages have been deferred for a session, you can put a data structure holding that information right into the message session, if you use a sessionful queue. You can then pick up that state again elsewhere on an accepted session if you, for instance, fail over processing onto a different machine.
These features have been built specifically for document workflows in Office 365 where order obviously matters quite a bit.
I would have commented on KarlM's answer but stackoverflow won't allow it, so here goes...
It sounds like you want the transport mechanism to provide transactional locking on your aggregate. To me this sounds inherently wrong.
It sounds as though the design being proposed is flawed. Having had this exact problem in the past, I would look at your constraints. Either you want to provide transactional guarantees to the website, or you want to provide them to the warehouse. You can't do both, one always wins.
To be fully distributed: If you want to provide them to the website, then the warehouse must ask if it can begin to fulfil the order. If you want to provide them to the warehouse, then the website must ask if it can cancel the order.
Hope that is useful.
For events generated from a single command handler/aggregate in an "optimistic locking" scenario, I would assume you would include the aggregate version in the event, and thus those events are implicitly ordered.
Events from multiple aggregates should not care about order, because of the transactional guarantees of an aggregate.
Check out http://cqrs.nu/Faq/aggregates , http://cqrs.nu/Faq/command-handlers and related FAQs
For an intro to ES and optimistic locking, look at http://www.jayway.com/2013/03/08/aggregates-event-sourcing-distilled/
You say:
"These events will be received and processed by a central CQRS/ES based system but my worry is that if the events are placed in the event store in the wrong order we will get garbage out (for example "order create" after "add order item")."
There seems to be a misunderstanding about what CQRS pattern with Event Sourcing is.
Simply put Event Sourcing means that you change Aggregates (as per DDD terminology) via internally generated events, the Aggregate persistence is represented by events and the Aggregate can be restored by replaying events. This means that the scope is quite small, the Aggregate itself.
Now, CQRS with Event Sourcing means that these events from the Aggregates are published and used to create Read projections, or other domain models that have different purposes.
So I don't really get your question given the explanations above.
Related to Ordering:
there is already an answer mentioning optimistic locking, so events generated inside a single Aggregate must be ordered and optimistic locking is a solution
Read projections processing events in order. A solution I used in the past was to to publish events on RabbitMQ and process them with Storm.
RabbitMQ has some guarantees about ordering and Storm has some processing affinity features. For Storm, (as far as I remember) allows you to specify that for a given ID (for example an Aggregate ID) the same handler would be used, hence the events are processed in the same order as received from RabbitMQ.
The article on MSDN https://msdn.microsoft.com/en-us/library/jj591559.aspx states "Stored events should be immutable and are always read in the order in which they were saved" under "Performance, Scalability, and consistency". This clearly means that appending events out of order is not tolerated. The same article also states multiple times that while events cannot be altered, corrective events can be made. This would imply again that events are processed in the order they are received to determine the current truth (state of of the aggregate). My conclusion is that we should fixed the messaging order problem before posting events to the event store.

How to resequence after filtering for aggregation /Spring Integration/

I'm doing a project in Spring Integration and I have a big problem.
There are some filtering components in the flow and later in the flow I have an aggregation element.
The problem is that the filtering component does not support to "apply-sequence" property. It filters out some records without modifying the original sequence number however the number of messages are reduced.
Later in the flow I need an aggregation which fails releasing elements since some messages are filtered out.
I don't want to use any special routing elements which have apply-sequence property.
Can you suggest me any common solution for this type of filtering problem?
Thanks,
I'd say you misunderstand the behaviour of the filter and aggregator.
I guees you have some apply-sequence-aware component upstream. So, all messages in that group accept several headers - correlationId - to group messages in the default aggregator; sequenceNumber - the index of the message; sequenceSize - the number of messages in the group.
Filter just checks messages for some condition and sends them to the outpu-channel or does discard logic. It doesn't modify messages. However even if we could do that, it doesn't sounds good anyway.
Assume we have just only two messages in the group. The first on is OK for filtering - we just send it to the aggregator. But the second is discarded, and, yes, it won't be sent to aggregator. And the last one never releases that group, because the sequenceSize isn't reached.
To overcome your requirement you need to have some custom ReleaseStrategy on the aggregator (by default it is SequenceSizeReleaseStrategy). For example to check some state in your system that all messages in the group have been sent independently of true or false result after filter. Or have some fake message for the same reason and check its availability in the group.
In this case you will need just take care about correlationId to group messages in the aggregator.
UPDATE
What is the suggested release strategy for such a scenario? Would it be a good strategy to use timeout as release stretegy?
What I can say that sometimes it is really difficult to find good solution for some integration scenarios. The messaging is stateless by nature, so to correlate and group an undetermined number of messages may be a problem.
There is need to see requirements and environment.
For example when all your messages are processed in the single thread you can safely send some fake marker message in the end directly to the aggregator and check it from ReleaseStrategy. And it will work even when all your messages from the group may be discarded.
If you process those messages in parallel or they are received from different threads, you really won't be able to determine the order of messages and the time for each process.
In this case the TimeoutCountSequenceSizeReleaseStrategy really can help. Of course, there will be need to find the good timeframe compromise according to the requirements to your system.

RESTful Job Assignment

I have a collection of jobs that need processing, http://example.com/jobs. Each job has a status of "new", "assigned" or "finished".
I want slave processes to pick off one "new" job, set it's status to "assigned", and then process it. I want to ensure each job is only processed by a single slave.
I considered having each slave do the following:
GET http://example.com/jobs
Pick one that's "new" and do an http PUT to http://example.com/jobs/123 {"status=assigned"}.
Repeat
The problem is that another slave may have assigned the job to itself between the GET and PUT. I could have the second PUT return a 409 (conflict), which would signal the second slave to try a different job.
Am I on the right track, or should I do this differently?
I would have one process that picks "new" jobs and assigns them. Other processes would independently go in and look to see if they've been assigned a job. You'd have to have some way to identify which process a job is assigned to, so some kind of slave process id would be called for.
(You could use POST too, as what you're trying to do shouldn't be idempotent anyway).
You could give each of your clients a unique ID (possibly a UUID) and have an "assignee/worker" field in your job resource.
GET http://example.com/jobs/
POST { "worker"=$myID } to http://example.com/jobs/123
GET http://example.com/jobs/123 and check that the worker ID is that of the client
You could combine this with conditional requests too.
On top of this, you could have a time out feature if the job queue doesn't hear back from a given client, it puts it back in the queue.
It looks that the statuses are an essential part of your job-domain model. So I would expose this as dedicated sub-resources
# 'idle' is what you called 'new'
GET /jobs/idle
GET /jobs/assigned
# start job
PUT /jobs/assigned/123
Slave is only allowed to gather jobs by GET /jobs/idle. This never includes jobs which are running. Still there could be race conditions (two slaves are getting the set, before one them has started job). I think 400 Bad Request or your mentioned 409 Conflict are alright with that.
I prefer above resource-structure instead of working with payloads (which often looks more "procedural" to me).
I was a little to specific, I don't actually care that the slave gets to pick the job, just that it gets a unique one.
With that in mind, I think #manuel aldana was on the right track, but I've made a few modifications.
I'll keep the /jobs resource, but also expose a /jobs/assigned resource. A single job may exist in both collections.
The slave can POST to /jobs/assigned with no parameters. The server will choose one "new" job, move it to "assigned", and return the url (/jobs/assigned/{jobid} or /jobs/{jobid}) in the Location header with a 201 status.
When the slave finishes the job, it will PUT to /jobs/{jobid} (status=finished).