How to use pub/sub pattern in Event Sourcing & CQRS - publish-subscribe

I am developing micro-services, I am using Event Sourcing with CQRS pattern, in my case, If a user is deleted/ updated from one service I want it to publish an event and other service to subscribe it and delete the entries regarding that user from its db as well.
I wanted to ask how can I use pub/sub pattern in Event Sourcing, Which Event store can be used for it as currently I have seen some people using Azure Tables but how can it be used as pub/sub?

Which Event store can be used for it ...?
If you have the luxury of choosing the technology to use, then I would suggest you start out by looking into Greg Young's Event Store
Yes, that's the same guy that introduced CQRS to the world.
(You may also want to review his talk on polyglot data, which includes discussion of pull vs push based models).

how can I use pub/sub pattern in Event Sourcing
This use case naturally lays down on eventsourcing and if accurately to realize it, then the question about notifications will disappear by itself.
It is the best of all to realize interaction by means of the common bus. Each microservice realizing your aggregates or projections is connected in the uniform logical bus, and signed on all events, and also can send any events there.
Of course, when if the system is under a heavy load, it is necessary to do some to optimization, for example, to enter name spaces for events and to specify to the broker of the bus what events and to what microservice it is necessary to deliver. Also if some information is private for microservice, then it makes a sense to make private channel in the bus, however it isn't provided by the theory of eventsourcing, exactly the same as validation between aggregates.
Also thanks to the concept of the common bus, you also receive "as a gift" reactivity for clients of system, for example, of browsers. However you shan't subscribe for projections or statuses of aggregates, only for events. If server events aren't equal client, you can enter the intermediate entity on their broadcasting, however it is not responsibility of storage of events any more.

Related

Distributing events across different JVMs with Axon Server to Subscribing Event Processors (without Event Sourcing)

I'm using Axon Framework (4.1) with aggregates in one module (JVM, container) and projections/Sagas in another module. What I want to do is to have a distributed application taking advantage of CQRS but without Event Sourcing.
It is rather trivial to setup and everything works as expected in a single application. The problem arises when there are several independent modules (across separate JVMs) involved. Out of the box Axon starter uses tracking processors connected to AxonServerEventStore, which allows to have "location transparency" when it comes to listening to the events across different JVMs.
In my case, I don't want any infrastructure for persisting or tracking the events. I just want to distribute the events to any subscribing processors (SEPs) from my aggregates in a fire-and-forget style, just like AxonServerQueryBus is doing to distribute scatter-gather queries, for example.
If I just declare all processors as subscribing as follows:
#Autowired
public void configureEventSubscribers(EventProcessingConfigurer configurer) {
configurer.usingSubscribingEventProcessors();
}
events are reaching all #EventHandler methods in the same JVM, but events are not reaching any handlers in other JVMs anymore. If my understanding is correct, then, Axon Server will distribute the events across JVMs for tracking processors only (TEPs).
Obviously, what I can do, is to use an external message broker (RabbitMQ, Kafka) in combination with SpringAMQPMessageSource (as in the docs) to distribute events to all subscribers through something like fanout in RabbitMQ. This works, but this requires to maintain the broker myself.
What would be nice is to have Axon Server taking care of this just like it takes care of distributing commands and queries (this would give me one less infrastructure piece to care about).
As a side note, I've actually managed to distribute the events to projections using QueryBus and passing events as payloads to GenericQueryMessage sent as scatter-gather queries. Needless to say, this is not a robust solution. But it goes to demonstrate that there is nothing inherently impossible with Axon Server distributing events (just another type of a message, after all) to SEPs or TEPs indifferently.
Finally, the questions:
1) What is the community's recommendation for pure CQRS (without Event Sourcing) using Axon when it comes to location transparency and distributing the events?
2) Is it possible to make Axon Server to distribute events to SEPs across JVMs (eliminating the need for an external message broker)?
Note on Event Sourcing
From Axon Framework's perspective, Event Sourcing is a sole concern of your Command Model. This stance is taken, as Event Sourcing defines the recreation of a model through the events it has published. A Query Model however does not react to commands with publishing events changing its state, it simply listen to (distributed) events to update its state to be queried by others.
As such, the framework only thinks about Event Sourcing when it recreates your Aggregates, by providing the EventSourcingRepository.
The Event Processor's job is to be the "mechanical aspect of providing events to your Event Handlers". This relates to the Q part in CQRS, to recreating the Query Model.
Thus, the Framework does not regard Event Processors to be part of the notion of Event Sourcing.
Answer to your scenario
I do want to emphasize that if you are distributing your application by running several instances of a given app, you will very likely need to have a way to ensure a given event is only handled once.
This is one of the concerns a Tracking Event Processor (TEP) addresses, and it does so by using a Tracking Token.
The Tracking Token essential acts as a marker defining which events have been processed. Added, a given TEP's thread is inclined to have a claim on a token to be able to work, which thus ensure a given event is not handled twice.
Concluding, you will need to define infrastructure to store Tracking Tokens to be able to distributed the event load, essentially opting against the use of the SubscribingEventProcessor entirely.
However, whether the above is an issu does depend on your application landscape.
Maybe you aren't duplicating a given application at all, thus effectively not duplicating a given Tracking Event Processor.
In this case, you can fulfill your request to "not track events", whilst still using Tracking Event Processors.
All you have to do, is to ensure you are not storing them. The interface used to storing tokens, is the TokenStore, for which an in memory version exists.
Using the InMemoryTokenStore in a default Axon set up will however mean you'll technically be replaying your events every time. This occurs due to the default "initial Tracking Token" process. This is, of course, also configurable, for which I'd suggest you to use the following approach:
// Creating the configuration for a TEP
TrackingEventProcessorConfiguration tepConfig =
TrackingEventProcessorConfiguration
.forSingleThreadedProcessing() // Note: could also be multi-threaded
.andInitialTrackingToken(StreamableMessageSource::createHeadToken);
// Registering as default TEP config
EventProcessingConfigurer.
registerTrackingEventProcessorConfiguration(config -> tepConfig);
This should set you up to use TEP, without the necessity to set up infrastructure to store Tokens. Note however, this will require you not to duplicate the given application.
I'd like to end with the following question you've posted:
Is it possible to make Axon Server to distribute events to SEPs across JVMs (eliminating the need for an external message broker)?
As you have correctly noted, SEPs are (currently) only usable for subscribing to events which have been published within a given JVM. Axon Server does not (yet) have a mechanism to bridge events from one JVM to another for the purpose allowing distributed Subscribing Event Processing. I am (as part of AxonIQ) however relatively sure we will look in to this in the future. If such a feature is of importance to successful conclusion of your project, I suggest to contact AxonIQ directly.
If you are considering Apache Kafka for this use case, you might want to look into kalium.alkal.io.
It will make your code to be much simpler
MyObject myObject = ....
kalium.post(myObject); //this is used to send POJOs/protobuffs using Kafka Producer
//On the consumer side. This will use a deserializer with the Kafka Consumer API
kalium.on(MyObject.class, myObject -> {
// do something with the object
}, "consumer_group");

API vs Events in DDD across bounded contexts

When integrating across Bounded Contexts in DDD which of the following is considered better practice?
1) Publish events when an entity changes within a source BC, listen to those events in a consuming BC, shape that data into the entity required and store it within the consuming BC.
or
2) Make an API call synchronously to the BC that owns an entity when that information is required by another BC.
or is there another option that's considered better practice than the above?
If you are interested in autonomy, then you don't want to have services that require other services to be available.
So you should probably be thinking the other way around -- how does the consumer work when the remote data provider is unavailable is your primary use case, and then consider whether there are any enhancements to add when the data provider is live.
This typically means that the each service caches a copy of the data that it will need.
Having the consumers pull data that they need is commonly simpler than trying to push the data to them -- see Greg Young's talk on Polyglot Data.
I think that the question shouldn't be «API vs event», but «sync vs async», and it doesn't have to be with best or worst practices. It depends on your requirements about how you can integrate your BCs. It depends on your domain.
You can implement async integration with API instead of events, doing calls to the remote API every certain period, polling requests.

How should Event Sourcing event handlers be hosted to construct a read model?

There are various example applications and frameworks that implement a CQRS + Event Sourcing architecture and most describe use of an event handler to create a denormalized view from domain events stored in an event store.
One example of hosting this architecture is as a web api that accepts commands to the write side and supports querying the denormalized views. This web api is likely scaled out to many machines in a load balanced farm.
My question is where are the read model event handlers hosted?
Possible scenarios:
Hosted in a single windows service on a separate host.
If so, wouldn't that create a single point of failure? This probably complicates deployment too but it does guarantee a single thread of execution. Downside is that the read model could exhibit increased latency.
Hosted as part of the web api itself.
If I'm using EventStore, for example, for the event storage and event subscription handling, will multiple handlers (one in each web farm process) be fired for each single event and thereby cause contention in the handlers as they try to read/write to their read store? Or are we guaranteed for a given aggregate instance all its events will be processed one at a time in event version order?
I'm leaning towards scenario 2 as it simplifies deployment and also supports process managers that need to also listen to events. Same situation though as only one event handler should be handling a single event.
Can EventStore handle this scenario? How are others handling processing of events in eventually consistent architectures?
EDIT:
To clarify, I'm talking about the process of extracting event data into the denormalized tables rather than the reading of those tables for the "Q" in CQRS.
I guess what I'm looking for are options for how we "should" implement and deploy the event processing for read models/sagas/etc that can support redundancy and scale, assuming of course the processing of events is handled in an idempotent way.
I've read of two possible solutions for processing data saved as events in an event store but I don't understand which one should be used over another.
Event bus
An event bus/queue is used to publish messages after an event is saved, usually by the repository implementation. Interested parties (subscribers), such as read models, or sagas/process managers, use the bus/queue "in some way" to process it in an idempotent way.
If the queue is pub/sub this implies that each downstream dependency (read model, sagas, etc) can only support one process each to subscribe to the queue. More than one process would mean each processing the same event and then competing to make the changes downstream. Idempotent handling should take care of consistency/concurrency issues.
If the queue is competing consumer we at least have the possibility of hosting subscribers in each web farm node for redundancy. Though this requires a queue for each downstream dependency; one for sagas/process managers, one for each read model, etc, and so the repository would have to publish to each for eventual consistency.
Subscription/feed
A subscription/feed where interested parties (subscriber) read an event stream on demand and get events from a known checkpoint for processing into a read model.
This looks great for recreating read models if necessary. However, as per the usual pub/sub pattern, it would seem only one subscriber process per downstream dependency should be used. If we register multiple subscribers for the same event stream, one in each web farm node for example, they will all attempt to process and update the same respective read model.
In our project we use subscription-based projections. The reasons for this are:
Committing to the write-side must be transactional and if you use two pieces of infrastructure (event store and message bus), you have to start using DTC or otherwise you risk your events to be saved to the store but not published to the bus, or the other way around, depending on your implementation. DTC and two-phase commits are nasty things and you do not want to go this way
Events are usually published in the message bus anyway (we do it via subscriptions too) for event-driven communication between different bounded contexts. If you use message subscribers to update your read model, when you decide to rebuilt the read model, your other subscribers will get these messages too and this will bring the system to invalid state. I think you have thought about this already when saying you must only have one subscriber for each published message type.
Message bus consumers have no guarantee on message order and this can bring your read model to mess.
Message consumers usually handle retries by sending the message back to the queue, and usually by the end of the queue, for retrying. This means that your events can become heavily out of order. In addition, usually after some number of retries, message consumer gives up on the poison message and puts it to some DLQ. If this would be your projection, this will mean that one update will be ignored whilst others will be processed. This means that your read model will be in inconsistent (invalid) state.
Considering these reasons, we have single-threaded subscription-based projections that can do whatever. You can do different type of projections with own checkpoints, subscribing to the event store using catch-up subscriptions. We host them in the same process as many other things for the sake of simplicity but this process only runs on one machine. Should we want to scale-out this process, we will have to take the subscriptions/projections out. It can easily be done since this part has virtually no dependencies to other modules, except the read model DTOs itself, which can be shared as an assembly anyway.
By using subscriptions you always project events that have been already committed. If something goes wrong with the projections, the write side is definitely the source of truth and remains so, you just need to fix the projection and run it again.
We have two separate ones - one for projecting to the read model and another one for publishing events to the message bus. This construct has proven to work very well.
Specifically for EventStore, they now have competing consumers, which are server based subscriptions where many clients can subscribe to the subscription group but only one client gets the message.
It sounds like that is what you are after, each node in the farm can subscribe to the subscription group and the node that receives the message does the projection

Handling multiple event dependency in event-driven architecture

What would be best practice if you have an event-driven architecture and a service subscribing to events has to wait for multiple event (of the same kind) before proceeding with creating the next event in the chain?
An example would be a book order handling service that has to wait for each book in the order to have been handled by the warehouse before creating the event that the order has been picked so that the shipping service (or something similar) picks up the order and starts preparing for shipping.
Another useful pattern beside the Aggregator that Tom mentioned above is a saga pattern (a mini workflow).
I've used it before with messaging library called NServiceBus to handle coordinating multiple messages that are correlated to each other.
the pattern is very useful and fits nicely for long-running processes. even if your correlated messages are different messages, like OrderStarted, OrderLineProcessed, OrderCompleted.
You can use the Aggregator pattern, also called Parallel Convoy.
Essentially you need to have some way of identifying messages which need to be aggregated, and when the aggregated set as a whole has been recieved, so that processing can start.
Without going out and buying the book*, the Apache Camel integration platform website has some nice resource on implementing the aggregator pattern. While this is obviously specific to Camel, you can see what kind of things are involved.
* disclaimer, I am not affiliated in any way with Adison Wesley, or any of the authors of the book...

XMPP: adding bidirectionality to pubsub?

I am not sure if pubsub or multiuserchat is the way to go?
What I think I need is pubsub, but with the added ability for subscribers to broadcast messages to the feed as well. Bidirectional information flow, if you will.
The use case is such that subscribers will be subscribed to on average 1000 different feeds, but each individual feed only broadcasts information on average once per week. So, lots of feeds, but low activity in each one. However, b/c there are 1000 different active subscriptions, a subscriber might still be notified of 100 messages per day, and they should be able to "reply" aka post content to any one of those feeds.
It seems like what I need is a pubsub/multiuserchat hybrid. But that doesn't exist, or does it? Any ideas or pointers?
Thanks a bunch!
If a subscriber is publishing data then they are not just a subscriber, they are a publisher. And there is no reason the same entity can't be a publisher and a subscriber at the same time.
As for your more general question about pubsub vs. MUC, that's a question that I find comes up a lot nowadays.
Obviously at first glance MUC and pubsub are very similar, they are both about broadcasting to a group. Many applications could easily use one or the other with no trouble.
To help decide which fits best with your applications, let's go through some of the differences between the two protocols.
MUC:
Is absolutely good for standard chatrooms of online users communicating with each other. If this is what you're doing, use it.
Includes presence, i.e. notifying other occupants about joining, leaving and changing status.
Allows for anonymous private communication between occupants.
Works out of the box with practically any standard XMPP client (for standard chat messages).
Automatic leaving of the room when the user goes offline or disconnects.
Messages with custom payloads are supported, meaning you are limited to routing standard chat messages.
Pubsub:
One or a few publishers transmitting to many read-only subscribers is core pubsub territory. In contrast to MUC the subscribers are not publishing, and are not receiving information about other subscribers.
Server implementations tend to have much more flexible access control for pubsub.
Custom payloads only, no standard chat messages.
Optionally has full item persistence.
A node can be managed as a list of items (ie. add/remove with notification) rather than just simple broadcast.
Subscriptions can persist through being offline.
The points above are just a guide. A lot can typically be achieved through server configuration. As an example, the MUC specification allows for rooms withholding presence broadcasts for certain classes of occupants based on configuration. The catch here is in the implementations... since this is an uncommon usage of MUC, you will find it may not be supported in many MUC implementations. The point being that as MUC was designed for chatting and not generic pubsub, you will largely find all the implementations and tooling around MUC to focus on that kind usage.
Not sure what the problem is. The subscriber simply needs to be a publisher as well. There is nothing stopping them from publishing as well as subscribing (unless the nodes are configured to disallow it).
This appears to be a very typical pubsub case.