Implementing a pub-sub pattern with Axon - publish-subscribe

We have a multi-step process we'd like to implement using a pub-sub pattern, and we're considering Axon for a big part of the solution.
Simply, the goal is to generate risk scores for insurance companies. These steps would apply generally to a pub-sub application:
A client begins the process by putting a StartRiskScore message on a bus, specifying the customer ID. The client subscribes to RiskScorePart3 messages for the customer ID.
Actor A, who subscribes to StartRiskScore messages, receives the message, generates part 1 of the risk score, and puts it on the bus as a RiskScorePart1 message, including the customer ID.
Actor B, who subscribes to RiskScorePart1 messages, receives the message, generates part 2 of the risk score, and puts it on the bus as a RiskScorePart2 message, including the customer ID.
Actor C, who subscribes to RiskScorePart2 messages, receives the message, generates part 3 of the risk score, and puts it on the bus as a RiskScorePart3 message, including the customer ID.
The original client, who already subscribed to RiskScorePart3 messages for the customer ID, receives the message and the process is complete.
I considered the following Axon implementation:
A. Make an aggregate called RiskScore
B. StartRiskScore becomes a command associated with the RiskScore aggregate.
C. The command handler for StartRiskScore becomes Actor A. It processes some data and puts a RiskScorePart1 event on the bus.
Now, here's the part I'm concerned about...
D. I'd create a RiskScorePart1 event handler in a separate PubSub object, which would do nothing but put a CreateRiskScorePart2 command on the command bus using the data from the event.
E. In the RiskScore aggregate, a command handler for CreateRiskScorePart2 (Actor B) would do some processing, then put a RiskScorePart2 event on the bus.
F. Similar to step D, a PubSub event handler for RiskScorePart2 would put a CreateRiskScorePart3 command on the command bus.
G. Similar to step E, a RiskScore aggregate command handler for CreateRiskScorePart3 (Actor C) would do some processing, then put a RiskScorePart3 event on the bus.
H. In the aggregate and the RiskScoreProjection query module, a RiskScorePart3 event handler would update the aggregate and projection, respectively.
I. The client is updated by a subscribed query to the projection.
I understand that replay occurs when a service is restarted. That's bad for old events because I don't want to re-fire commands from the PubSub handlers. It's good news for new events that occurred while the PubSub service was down.
EDIT #1:
I've considered using an Axon saga, which would be great. However, the same questions still exist even if PubSub is a saga:
How to ensure PubSub event handlers process each event exactly once, even after a restart?
Is there a different approach I should be taking to implement a pub-sub pattern in Axon?
Thanks for your help!

I think I can give some guidance in this area.
In your update you've pointed out that you envisioning the usage of a Saga to perform this set up.
I'd however would like to point out that a Saga is meant to 'Orchestrate a Complex Business Transaction between Bounded Contexts/Aggregates'. The scenario you're describing is not a transaction between other contexts and/or aggregates, it's all contained in a single Aggregate Root, the RiskScore.
I'd thus suggest against using a Saga for this situation, as the tool (read: Saga) is relatively heavy wait for what you're describing.
Secondly, from the steps you describe from A to I, it looks as if the components described in steps D and F are purely there to react with a command on the event. Thus, they perform zero business functionality, taking that assumption.
Taking my initial point of a transaction contained in a single Aggregate Root and the fact no business functionality occurs on the dispatching of the command back in to the aggregate, why not contain the entirety of the operation within the RiskScore aggregate?
You can very easily handle the events an Aggregate publishes with the #EventSourcingHandler and on that method apply another event. Or, if you would like to be 'pure' about segregating state updates and apply events, you could just apply more events for the separate risk-score steps there after.
Any how, I don't see why you would need to hold tightly towards the pub-sub pattern. I'd take a solution which resolves the business needs as best as possible. That might be an existing pattern, but could just as well be any other approach you can think off.
This is my two cents to the situation, hope they help!

Related

How can an event sourced entity to subscribe to state changes in another entity?

I have an events-sourced entity (C) that needs to change its state in response to state changes in another entity of a different type (P). The logic to whether the state of C should actually change is quite complex and the data to compute that lives in C; moreover, many instances of C should listen to one instance of P, and the set of instances increases over time, so I'd rather they pull out of a stream knowing the ID of P than have P keep track of the IDs of all the Cs and push to them.
I am thinking of doing something such as:
Tag a projection of P's events
Have a Subscribe(P.id) command that gets sent to C
If C is not already subscribing to a P (it can only subscribe to one, and it shouldn't change), fire an event Subscribed(P.id)
In response to the event, use Akka-persistent-query to materialize the stream of events tagged in 1, map them to commands, and run asynchronously with a sync that sends them to my ES entity reference
This seems a bit like an anti pattern to have a stream run in the event handler. I am wondering if there's a better/more supported way to do this without the upstream having to know about the downstream. I decided against Akka pub-sub because it does at-most-once delivery, and I'd like to avoid using Kafka if possible.
You definitely don't want to run the stream in the event handler: the event handler should never side effect.
Assuming that you would like a C to get events from times when that C was not running (including before that C had ever run), this suggests that a stream should be run for each C. Since the subscription will be to one particular P, I'd seriously consider not tagging, but instead using the eventsByPersistenceId stream to get all the events of a P and ignore the ones that aren't of interest. In the stream, you translate those to commands in C's API, including the offset in P's event stream with the command, and send it to C (for at-least-once delivery, a mapAsync with an ask is useful; C will persist an event recording that it processed the offset: this allows the command to be idempotent, as C can acknowledge the command if the offset is less-than-or-equal-to the high water offset in its state).
This stream gets kicked off by the command-handler after successfully persisting a Subscribed(P.id) event (in this case starting from offset 0) and then gets kicked off after the persistent actor is rehydrated if the state shows it's subscribed (in this case starting from one plus the high water offset).
The rationale for not using tagging here arises from an assumption that the number of events C isn't interested in is smaller than the number of events with the tag from Ps that C isn't subscribed to (note that for most of the persistence plugins, the more tags there are, the more overhead there is: a tag which is only used by one particular instance of an entity is often not a good idea). If the tag in question is rarely seen, this assumption might not hold and eventsByTag and filtering by id could be useful.
This does of course have the downside of running discrete streams for every C: depending on how many Cs are subscribed to a given P, the overhead of this may be substantial, and the streams for subscribers which are caught up will be especially wasteful. In this scenario, responsibility for delivering commands to subscribed Cs for a given P can be moved to an actor. The only real change in that scenario is that where C would run the stream, it instead confirms that it is subscribed to the event stream by asking that actor feeding events from the P. Because this approach is a marked step-up in complexity (especially around managing when Cs join and drop out of the shared "caught-up" stream), I'd tend to recommend starting with the stream-per-C approach and then going to the shared stream (it's also worth noting that there can be multiple shared streams: in fact I'd tend to have shared streams be per-ActorSystem (e.g. a "node singleton" per P of interest) so as not to involve remoting), since it's not difficult to make the transition (from C's perspective, there's not really a difference whether the adapted commands are coming from a stream it started or from a stream being run by some other actor).

How to replay Event Sourcing events reliably?

One of great promises of Event Sourcing is the ability to replay events. When there's no relationship between entities (e.g. blob storage, user profiles) it works great, but how to do replay quckly when there are important relationships to check?
For example: Product(id, name, quantity) and Order(id, list of productIds). If we have CreateProduct and then CreateOrder events, then it will succeed (product is available in warehouse), it's easy to implement e.g. with Kafka (one topic with n1 partitions for products, another with n2 partitions for orders).
During replay everything happens more quickly, and Kafka may reorder the events (e.g. CreateOrder and then CreateProduct), which will give us different behavior than originally (CreateOrder will now fail because product doesn't exist yet). It's because Kafka guarantees ordering only within one topic within one partition. The easy solution would be putting everything into one huge topic with one partition, but this would be completely unscalable, as single-threaded replay of bigger databases could take days at least.
Is there any existing, better solution for quick replaying of related entities? Or should we forget about event sourcing and replaying of events when we need to check relationships in our databases, and replaying is good only for unrelated data?
As a practical necessity when event sourcing, you need the ability to conjure up a stream of events for a particular entity so that you can apply your event handler to build up the state. For Kafka, outside of the case where you have so few entities that you can assign an entire topic partition to just the events for a single entity, this entails a linear scan and filter through a partition. So for this reason, while Kafka is very likely to be a critical part of any event-driven/event-based system in relaying events published by a service for consumption by other services (at which point, if we consider the event vs. command dichotomy, we're talking about commands from the perspective of the consuming service), it's not well suited to the role of an event store, which are defined by their ability to quickly give you an ordered stream of the events for a particular entity.
The most popular purpose-built event store is, probably, the imaginatively named Event Store (at least partly due to the involvement of a few prominent advocates of event sourcing in its design and implementation). Alternatively, there are libraries/frameworks like Akka Persistence (JVM with a .Net port) which use existing DBs (e.g. relational SQL DBs, Cassandra, Mongo, Azure Cosmos, etc.) in a way which facilitates their use as an event store.
Event sourcing also as a practical necessity tends to lead to CQRS (they go together very well: event sourcing is arguably the simplest possible persistence model capable of being a write model, while its nearly useless as a read model). The typical pattern seen is that the command processing component of the system enforces constraints like "product exists before being added to the cart" (how those constraints are enforced is generally a question of whatever concurrency model is in use: the actor model has a high level of mechanical sympathy with this approach, but other models are possible) before writing events to the event store and then the events read back from the event store can be assumed to have been valid as of the time they were written (it's possible to later decide a compensating event needs to be recorded). The events from within the event store can be projected to a Kafka topic for communication to another service (the command processing component is the single source of truth for events).
From the perspective of that other service, as noted, the projected events in the topic are commands (the implicit command for an event is "update your model to account for this event"). Semantically, their provenance as events means that they've been validated and are undeniable (they can be ignored, however). If there's some model validation that needs to occur, that generally entails either a conscious decision to ignore that command or to wait until another command is received which allows that command to be accepted.
Ok, you are still thinking how did we developed applications in last 20 years instead of how we should develop applications in the future. There are frameworks that actually fits the paradigms of future perfectly, one of those, which mentioned above, is Akka but more importantly a sub component of it Akka FSM Finite State Machine, which is some concept we ignored in software development for years, but future seems to be more and more event based and we can't ignore anymore.
So how these will help you, Akka is a framework based on Actor concept, every Actor is an unique entity with a message box, so lets say you have Order Actor with id: 123456789, every Event for Order Id: 123456789 will be processed with this Actor and its messages will be ordered in its message box with first in first out principle, so you don't need a synchronisation logic anymore. But you could have millions of Order Actors in your system, so they can work in parallel, when Order Actor: 123456789 processing its events, an Order Actor: 987654321 can process its own, so there is the parallelism and scalability. While your Kafka guaranteeing the order of every message for Key 123456789 and 987654321, everything is green.
Now you can ask, where Finite State Machine comes into play, as you mentioned the problem arise, when addProduct Event arrives before createOrder Event arrives (while being on different Kafka Topics), at that point, State Machine will behave differently when Order Actor is in CREATED state or INITIALISING state, in CREATED state, it will just add the Product, in INITIALISING state probably it will just stash it, until createOrder Event arrives.
These concepts are explained really good in this video and if you want to see a practical example I have a blog for it and this one for a more direct dive.
I think I found the solution for scalable (multi-partition) event sourcing:
create in Kafka (or in a similar system) topic named messages
assign users to partitions (e.g by murmurHash(login) % partitionCount)
if a piece of data is mutable (e.g. Product, Order), every partition should contain own copy of the data
if we have e.g. 256 pieces of a product in our warehouse and 64 partitions, we can initially 'give' every partition 8 pieces, so most CreateOrder events will be processed quickly without leaving user's partition
if a user (a partition) sometimes needs to mutate data in other partition, it should send a message there:
for example for Product / Order domain, partitions could work similarly to Walmart/Tesco stores around a country, and the messages sent between partitions ('stores') could be like CreateProduct, UpdateProduct, CreateOrder, SendProductToMyPartition, ProductSentToYourPartition
the message will become an 'event' as if it was generated by an user
the message shouldn't be sent during replay (already sent, no need to do it twice)
This way even when Kafka (or any other event sourcing system) chooses to reorder messages between partitions, we'll still be ok, because we don't ever read any data outside our single-threaded 'island'.
EDIT: As #LeviRamsey noted, this 'single-threaded island' is basically actor model, and frameworks like Akka can make it a bit easier.

Event sourcing with Kafka streams

I'm trying to implement a simple CQRS/event sourcing proof of concept on top of Kafka streams (as described in https://www.confluent.io/blog/event-sourcing-using-apache-kafka/)
I have 4 basic parts:
commands topic, which uses the aggregate ID as the key for sequential processing of commands per aggregate
events topic, to which every change in aggregate state are published (again, key is the aggregate ID). This topic has a retention policy of "never delete"
A KTable to reduce aggregate state and save it to a state store
events topic stream ->
group to a Ktable by aggregate ID ->
reduce aggregate events to current state ->
materialize as a state store
commands processor - commands stream, left joined with aggregate state KTable. For each entry in the resulting stream, use a function (command, state) => events to produce resulting events and publish them to the events topic
The question is - is there a way to make sure I have the latest version of the aggregate in the state store?
I want to reject a command if violates business rules (for example - a command to modify the entity is not valid if the entity was marked as deleted). But if a DeleteCommand is published followed by a ModifyCommand right after it, the delete command will produce the DeletedEvent, but when the ModifyCommand is processed, the loaded state from the state store might not reflect that yet and conflicting events will be published.
I don't mind sacrificing command processing throughput, I'd rather get the consistency guarantees (since everything is grouped by the same key and should end up in the same partition)
Hope that was clear :) Any suggestions?
I don't think Kafka is good for CQRS and Event sourcing yet, the way you described it, because it lacks a (simple) way of ensuring protection from concurrent writes. This article talks about this in details.
What I mean by the way you described it is the fact that you expect a command to generate zero or more events or to fail with an exception; this is the classical CQRS with Event sourcing. Most of the people expect this kind of Architecture.
You could have Event sourcing however in a different style. Your Command handlers could yield events for every command that is received (i.e. DeleteWasAccepted). Then, an Event handler could eventually handle that Event in an Event sourced way (by rebuilding Aggregate's state from its event stream) and emit other Events (i.e. ItemDeleted or ItemDeletionWasRejected). So, commands are fired-and-forget, sent async, the client does not wait for an immediate response. It waits however for an Event describing the outcome of its command execution.
An important aspect is that the Event handler must process events from the same Aggregate in a serial way (exactly once and in order). This can be implemented using a single Kafka Consumer Group. You can see about this architecture in this video.
Please read this article by my colleague Jesper. Kafka is a great product but actually not a good fit at all for event sourcing
https://medium.com/serialized-io/apache-kafka-is-not-for-event-sourcing-81735c3cf5c
A possible solution I came up with is to implement a sort of optimistic locking mechanism:
Add an expectedVersion field on the commands
Use the KTable Aggregator to increase the version of the aggregate snapshot for each handled event
Reject commands if the expectedVersion doesn't match the snapshot's aggregate version
This seems to provide the semantics I'm looking for

Can event sourcing be used to resolve late arriving events

We have are developing an application that will receive events from various systems via a message queue (Azure) but it is just possible that some events (messages) will not arrive in the order they were sent. These events will be received and processed by a central CQRS/ES based system but my worry is that if the events are placed in the event store in the wrong order we will get garbage out (for example "order create" after "add order item").
Are typical ES systems meant to resolve this issue or are we meant to ensure that such messages are put in the right order before being pushed into the event store? If you have links to articles that back up either view it would help.
Edit: I think my description is clearly far too vague so the responses, while helpful in understanding CQRS/ES, do not quite answer my problem so I'll add a little more detail and hopefully someone will recognise the problem.
Firstly the players.
the front end web site (not actually relevant to this problem) delivers orders to the management system.
our management system which takes orders from the web site and passes them to the warehouse and is hosted on site.
the warehouse which accepts orders, fulfils them if possible and notifies us when an order is fulfilled or cannot be partially or completely fulfilled.
Linking the warehouse to the management system is a fairly thin Azure cloud based coupling. Messages from the warehouse are sent to a WCF/Soap layer in the cloud, parsed, and sent over the messages bus. Message to the warehouse are sent over the message bus and then, again in the cloud, converted into Soap calls to a server at the warehouse.
The warehouse is very careful to ensure that messages it sends have identifiers that increment without a gap so we can know when a message is missed. However when we take those messages and forward them to the management system they are transported over the message bus and could, in theory, arrive in the wrong order.
Now given that we have a sequence number in the messages we could ensure the messages are put back in the right order before they are sent to the CQRS/ES system but my questions is, is that necessary, can the ES actually be used to reorder the events into the logical order they were intended?
Each message that arrives in Service Bus is tagged with a SequenceNumber. The SequenceNumber is a monotonically increasing, gapless 64-bit integer sequence, scoped to the Queue (or Topic) that provides an absolute order criterion by arrival in the Queue. That order may different from the delivery order due to errors/aborts and exists so you can reconstitute order of arrival.
Two features in Service Bus specific to management of order inside a Queue are:
Sessions. A sessionful queue puts locks on all messages with the same SessionId property, meaning that FIFO is guaranteed for that sequence, since no messages later in the sequence are delivered until the "current" message is either processed or abandoned.
Deferral. The Defer method puts a message aside if the message cannot be processed at this time. The message can later be retrieved by its SequenceNumber, which pulls from the hidden deferral queue. If you need a place to keep track of which messages have been deferred for a session, you can put a data structure holding that information right into the message session, if you use a sessionful queue. You can then pick up that state again elsewhere on an accepted session if you, for instance, fail over processing onto a different machine.
These features have been built specifically for document workflows in Office 365 where order obviously matters quite a bit.
I would have commented on KarlM's answer but stackoverflow won't allow it, so here goes...
It sounds like you want the transport mechanism to provide transactional locking on your aggregate. To me this sounds inherently wrong.
It sounds as though the design being proposed is flawed. Having had this exact problem in the past, I would look at your constraints. Either you want to provide transactional guarantees to the website, or you want to provide them to the warehouse. You can't do both, one always wins.
To be fully distributed: If you want to provide them to the website, then the warehouse must ask if it can begin to fulfil the order. If you want to provide them to the warehouse, then the website must ask if it can cancel the order.
Hope that is useful.
For events generated from a single command handler/aggregate in an "optimistic locking" scenario, I would assume you would include the aggregate version in the event, and thus those events are implicitly ordered.
Events from multiple aggregates should not care about order, because of the transactional guarantees of an aggregate.
Check out http://cqrs.nu/Faq/aggregates , http://cqrs.nu/Faq/command-handlers and related FAQs
For an intro to ES and optimistic locking, look at http://www.jayway.com/2013/03/08/aggregates-event-sourcing-distilled/
You say:
"These events will be received and processed by a central CQRS/ES based system but my worry is that if the events are placed in the event store in the wrong order we will get garbage out (for example "order create" after "add order item")."
There seems to be a misunderstanding about what CQRS pattern with Event Sourcing is.
Simply put Event Sourcing means that you change Aggregates (as per DDD terminology) via internally generated events, the Aggregate persistence is represented by events and the Aggregate can be restored by replaying events. This means that the scope is quite small, the Aggregate itself.
Now, CQRS with Event Sourcing means that these events from the Aggregates are published and used to create Read projections, or other domain models that have different purposes.
So I don't really get your question given the explanations above.
Related to Ordering:
there is already an answer mentioning optimistic locking, so events generated inside a single Aggregate must be ordered and optimistic locking is a solution
Read projections processing events in order. A solution I used in the past was to to publish events on RabbitMQ and process them with Storm.
RabbitMQ has some guarantees about ordering and Storm has some processing affinity features. For Storm, (as far as I remember) allows you to specify that for a given ID (for example an Aggregate ID) the same handler would be used, hence the events are processed in the same order as received from RabbitMQ.
The article on MSDN https://msdn.microsoft.com/en-us/library/jj591559.aspx states "Stored events should be immutable and are always read in the order in which they were saved" under "Performance, Scalability, and consistency". This clearly means that appending events out of order is not tolerated. The same article also states multiple times that while events cannot be altered, corrective events can be made. This would imply again that events are processed in the order they are received to determine the current truth (state of of the aggregate). My conclusion is that we should fixed the messaging order problem before posting events to the event store.

Message bus integration and resync of Bounded Contexts after downtime - Service Bus 1.0

I have just downloaded joliver eventstore and looking to wire up a service bus with Windows Service Bus 1.0 for an application separated across more than one Bounded Context process.
If a bounded context has been offline whilst events in other bounded contexts have been created (or may even be a new context that has been deployed), I can see the following sequence of events.
For an example ContextA, ContextB and ContextC, all connected using Service Bus 1.0 and each context with their own event store, they all share the same bus messaging backplane.
ContextC goes offline.
When ContextC comes back-up, other bounded contexts need to be notified of the events that need to be resent to the context that has just come back online. These events are replayed from each of the event stores.
My questions are:
The above scenario would apply to any event sourcing libraries, so is there any infrastructure code on top of this I can use, or do I have to roll my own?
With Windows Service Bus 1.0, how do I marry sequence numbers in my event store to sequence numbers on the Service Bus?
What is the best practice to detect and handle events that have already been received in a safe manner (protecting against message handlers failing)?
The above scenario would apply to any event sourcing libraries, so is there any infrastructure code on top of this I can use, or do I have to roll my own?
The notion of a Projection mechanism tied to the events is certainly common. Unfortunately, there are many many ways of handling how that might be done, depending on your stack, performance requirements and scale and many other factors.
As a result I'm not aware of a commoditized facility of this nature.
The GetEventStore store has an integrated Projection facility which looks extremely powerful and takes the need to build all this off the table. Before its existence, I'd have argued that one shouldnt even consider looking past the the SRPness of the JOES.
You havent said much about your actual stack other than mentioning Azure.
With Windows Service Bus, how do I marry sequence numbers in my event store to sequence numbers on the Service Bus?
You can use stream id + the commit sequence number the MessageId (and use that to ensure duplicates are removed by the bus). You will probably also include properties in the Message metadata.
What is the best practice to detect and handle events that have already been received in a safe manner (protecting against message handlers failing)?
If you're on Azure and considering ServiceBus then the Topics can be used to ensure at least once delivery (and you'll use the sessioning facility). Go watch the two hour deep dive ClemensV Subscribe video plus a few other episodes or you'll spent the same amount of time making mistakes)
To keep broadcast traffic down, if ContextC requests replays from ContextA and ContextB, is there any way for these replay messages to be sent only to ContextC? Or should I not worry about this?
Mu. You started off asking whether this stuff was a good idea but now seem to have baked in an assumption that it's the way to go.
Firstly, this infrastructure is a massive wheel to reinvent. Have you considered simply setting up a topic per BC and having anyone that needs to listen listen?
A key thing here is that you need to bear in mind the fact that just because you can think of cases where BCs need to consume each others events, that this central magic bus that's everywhere will deliver everything everywhere.
EDIT: Answers to your edited versions of questions 2+
With Windows Service Bus 1.0, how do I marry sequence numbers in my event store to sequence numbers on the Service Bus?
Your event store doesnt have a sequence number. It has a commit sequence number per aggregate. You'd typically use a sessioned topic and subscription. Then you need to choose whether you want a global ordering (use a single session id) or per aggregate ordering (use the stream id as the session id).
Once events are on a topic, they have a MessageSequenceNumber and the subscription (when sessioned) delivers (actually the subscriber recieves them) them in sequence.
What is the best practice to detect and handle events that have already been received in a safe manner (protecting against message handlers failing)?
This is built into the Service Bus (or any queueing mechanism). You don't mark the Message completed until it has been successfully processed. Any failure leads to Abandonment (which puts it back on the queue for reprocessing).
The subscriber taking a break, becoming disconnected or work backing up is naturally dealt with by the Topic.