WSO2 Complex Event Processor - queue

I have a little problem with WSO2 CEP. I'm using version 3.1.0 and I have a JMS queue.
So, in my queue I have a few different type of events and in CEP I have the same number of different execution plans. My question is about, how can I distinguish incomming events, because now only one execution plan works fine and it get all type of events (so I get a lot of erros, because only one event works with this plan and other doesn't). Is this possible to do, what i'm doing?
Maybe someone had this problem before and could answer me.
Kacu

I am afraid whether your usecase can be achieved because in CEP each event builder is tightly coupled with event stream (event stream contains a strict format). Since event builder gets the events from a specific queue/topic it is not possible to handle different type of event formats.
I can propose two solutions here,
1) Without using a queue, use a topic then create event builders for each event type, but here you may need to write siddhi query to avoid duplicated events.
2) Or sending different event formats to different queue if possible..

Facing the same problem.
I'm trying to avoid the problem with a huge json-mapping (event builder) and filtering in siddhi query (event processor).
from <StreamName>(<eventType> contains 'eventTypeName')
Select <event>, <event> ...
Regards,
Artur

Related

Designing event-based architecture for the customer service

Being a developer with solid experience, i am only entering the world of microservices and event-driven architecture. Things like loose coupling, independent scalability and proper implementation of asynchronous business processes is something that i feel should get simplified as compared with traditional monolith approach. So giving it a try, making a simple PoC for myself.
I am considering making a simple application where user can register, login and change the customer details. However, i want to react on certain events asynchronously:
customer logs in - we send them an email, if the IP address used is new to the system.
customer changes their name, we send them an email notifying of the change.
The idea is to make a separate application that reacts on "CustomerLoggedIn", "CustomerChangeName" events.
Here i can think of three approaches, how to implement this simple functionality, with each of them having some drawbacks. So, when a customer submits their name change:
Store change name Changed name is stored in the DB + an event is sent to Kafkas when the DB transaction is completed. One of the big problems that arise here is that if a customer had 2 tabs open and almost simultaneously submits a change from initial name "Bob" to "Alice" in one tab and from "Bob" to "Jim" in another one, on a database level one of the updates overwrites the other, which is ok, however we cannot guarantee the order of the events to be the same. We can use some checks to ensure that DB update is only done when "the last version" has been seen, thus preventing the second update at all, so only one event will be emitted. But in general case, this pattern will not allow us to preserve the same order of events in the DB as in Kafka, unless we do DB change + Kafka event sending in one distributed transaction, which is anti-pattern afaik.
Change the name in the DB, and use Debezium or similar DB CDC to capture the event and stream it. Here we get a single event source, so ordering problem is solved, however what bothers me is that i lose the ability to enrich the events with business information. Another related drawback is that CDC will stream all the updates in the "customer" table regardless of the business meaning of the event. So, in this case, i will probably need to build a Kafka Streams application to convert the DB CDC events to business events and decouple the DB structure from event structure. The potential benefit of this approach is that i will be able to capture "direct" DB changes in the same manner as those originated in the application.
Emit event from the application, without storing it in the DB. One of the subscribers might to the DB persistence, another will do email sending, etc. The biggest problem i see here is - what do i return to the client? I cannot say "Ok, your name is changed", it's more like "Ok, you request has been recorded and will be processed". In case if the customer quickly hits refresh - he expects to see his new name, as we don't want to explain to the customers what's eventual consistency, do we? Also the order of processing the same event by "email sender" and "db updater" is not guaranteed, so i can send an email before the change is persisted.
I am looking for advices regarding any of these three approaches (and maybe some others i am missing), maybe the usecases when one can be preferrable over others?
It sounds to me like you want event sourcing. In event sourcing, all you need to store is the event: the current state of a customer is derived from replaying the events (either from the beginning of time, or since a snapshot: the snapshot is just an optional optimization). Some other process (there are a few ways to go about this) can then project the events to Kafka for consumption by interested parties. Since every event has a sequence number, you can use the sequence number to prevent concurrent modification (alternatively, the more actor modely event-sourcing implementations can use techniques like cluster sharding in Akka to achieve the same ends).
Doing this, you can have a "write-side" which processes the updates in a strongly consistent manner and can respond to queries which only involve a single customer having seen every update to that point (the consistency boundary basically makes customer in this case an aggregate in domain-driven-design terms). "Read-sides" consuming events are eventually consistent: the latencies are typically fairly short: in this case your services sending emails are read-sides (as would be a hypothetical panel showing names of all customers), but the customer's view of their own data could be served by the write-side.
(The separation into read-sides and write-side (the pluralization is significant) is Command Query Responsibility Segregation, which sometimes gets interpreted as "reads can only be served by a read-side". This is not totally accurate: for one thing a write-side's model needs to be read in order for the write-side to perform its task of validating commands and synchronizing updates, so nearly any CQRS-using project violates that interpretation. CQRS should instead be interpreted as "serve reads from the model that makes the most sense and avoid overcomplicating a model (including that model in the write-side) to support a new read".)
I think I qualify to answer this, having extensively used debezium for simplifying the architecture.
I would prefer Option 2:
Every transaction always results in an event emitted in correct order
Option 1/3 has a corner case, what if transaction succeeds, but application fails to emit the event?
To your point:
Another related drawback is that CDC will stream all the updates in
the "customer" table regardless of the business meaning of the event.
So, in this case, i will probably need to build a Kafka Streams
application to convert the DB CDC events to business events and
decouple the DB structure from event structure.
I really dont think that is a roadblock. The benefit you get is potentially other usecases may crop up where another consumer to this topic may want to read other columns of the table.
Option 1 and 3 are only going to tie this to your core application logic, and that is not doing any favor from simplifying PoV. With option 2, with zero code changes to core application APIs, a developer can independently work on the events, with no need to understand that core logic.

Axon or Kafka to support CQRS/ES

Consider the simple use case in which I want to store product ratings as events in an event store.
I could use two different approaches:
Using Axon: A Rating aggregate is responsible for handling the CreateRatingCommand and sending the RatingCreatedEvent. Sending the event would case the Rating to be stored in the event store. Other event handlers have the possibility to replay the event stream when connecting to the Axon server instance and doing whatever needed with the ratings. In this case, the event handler will be used as a stream processor.
Using Kafka: A KafkaProducer will be used to store a Rating POJO (after proper serialization) in a Kafka topic. Setting the topic's retention time to indefinite would cause no events to get lost in time. Kafka Streams would in this case be used to do the actual rating processing logic.
Some architectural questions appear to me for both approaches:
When using Axon:
Is there any added value to use Axon (or similar solutions) if there is no real state to be maintained or altered within the aggregate? The aggregate just serves as a "dumb" placeholder for the data, but does not provide any state changing logic.
How does Axon handle multiple event handlers of the same event type? Will they all handle the same event (same aggregate id) in parallel, or is the same event only handled once by one of the handlers?
Are events stored in the Axon event store kept until the end of time?
When using Kafka:
Kafka stores events/messages with the same key in the same partition. How does one select the best value for a key in the use case of user-product ratings? UserId, ProductId or a separate topic for both and publish each event in both topics.
Would it be wise to use a separate topic for each user and each product resulting in a massive amount of topics on the cluster? (Approximately <5k products and >10k users).
I don't know if SO is the preferred forum for this kind of questions... I was just wondering what you (would) recommend in this particular use case as the best practise. Looking forward to your feedback and feel free to point out other points of thought I missed in the previous questions.
EDIT#12/11/2020 : I just found a related discussion containing useful information related to my question.
As Jan Galinski already puts it, this hasn't got a fool proof answer to it really. This is worth a broader discussion on for example indeed AxonIQ's Discuss forum. Regardless, there are some questions in here I can definitely give an answer to, so let's get to it:
Axon Question 1 - Axon Framework is as you've noticed used a lot for DDD centric applications. Nothing however forces you to base yourself on that notion at all. You can strip the framework from Event Sourcing specifics, as well as modelling specifics entirely and purely go for the messaging idea of distinct commands, events and queries. It has been a conscious decision to segregate Axon Framework version 3 into these sub-part when version 4 (current) was released actually. Next to that, I think there is great value in not just basing yourself on event messages. Using distinct commands and queries only further decouples your components, making for a far richer and easier to extend application landscape.
Axon Question 2 - This depends on where the #EventHandler annotated methods are located actually. If they're in the same class only one will be invoked. If they're positioned into distinct classes, then both will receive the same event. Furthermore if they're segregated between distinct classes, it is important to note Axon uses an Event Processor as the technical solution to invoking your event handlers. If distinct classes are grouped under the same Event Processor, you can impose a certain ordering which handler is invoked first. Next to this if the event handling should occur in parallel, you will have to configure a so called TrackingEventProcessor (the default in Axon Framework), as it allows configuration of several threads to handle events concurrently. Well, to conclude this section, everything you're asking in question two is an option, neither a necessity. Just a matter of configuration really. Might be worth checking up on this documentation page of Axon Framework on the matter.
Axon Question 3 - As Axon Server serves the purpose of an Event Store, there is no retention period at all. So yes, they're by default kept until the end of time. There is nothing stopping your from dropping the events though, if you feel there's no value in storing the events to for example base all your models on (as you'd do when using Event Sourcing).
It's the Kafka question I'm personally less familiar with (figures as a contributor to Axon Framework I guess). I can give you my two cents on the matter here too though, although I'd recommend a second opinion here:
Kafka Question 1 - From my personal feeling of what such an application would require, I'd assume you'd want to be able to retrieve all data for a given product as efficient as possible. I'd wager it's important that all events are in the same partition to make this process as efficient as possible, is it wouldn't require any merging afterwards. With this in mind, I'd think using the ProductId will make most sense.
Kafka Question 2 - If you are anticipating only 5_000 products and 10_000 users, I'd guess it should be doable to have separate topics for these. Opinion incoming - It is here though were I personally feel that Kafka's intent to provide you direct power to decide on when to use topics over complicates from what you'd actually try to achieve, which business functionality. Giving the power to segregate streams feels more like an after thought from the perspective of application development. As soon as you'd require an enterprise grade/efficient message bus, that's when this option really shines I think, as then you can optimize for bulk.
Hoping all this helps you further #KDW!

Using aggregates and Domain events with nosql storage

I'm wandering on DDD and NoSql field actually. I have a doubt now: i need to produce events from the aggregate and i would like to use a NoSql storage. But how can i be sure that events are saved on the storage AND the changes on the aggregate root not having transactions?
Does it makes sense? Is there a way to do this without being forced to use event sourcing or a transactional db?
Actually i was lookin at implementing a 2 phase commit algorithm but it seems pretty heavy from a performance point of view...
Am i approaching the problem the wrong way?
Stuffed with questions...
Thanks for every suggestion
Enrico
PS
I'm a newbie on stackoverflow so any suggestion/critic/... is more than welcome
Enrico
Edit 1
Well i would need events to notify aggregates that something happened and i they should react to the change. The problem arise when such events are important for the business logic. As far as i understood, after a night of thinking, i can't use a nosql storage to do such things. Let me explain (thinking with loud voice :P):
With ES (1st scenery): I save the "diff" of the data. Then i produce an event associated with it. 2 operations.
With ES (2nd scenery): I save the "diff" of the data. A process, watch the ES and produce the event. But i'm tied to having only one watcher process to ensure the correct ordering of events.
With ES (3d scenery): Idempotent events. The events can be inferred by the state and every reapplication of the event can cause a change on the consumer only once, can have multiple "dequeue" processes, duplicates can't possibly happen. 1 operation, but it introduce heavy limitations on the consumers.
In general: I save the aggregate's data. Then i produce an event associated with it. 2 operations.
Now the question becomes wider imho, is it possible to work with domain events and nosql when such domain events are fundamental part of the business process?
I think that could be a better option to go relational... even if i would need to add quite a lot of machines to get the same performances.
Edit 2
For the sake of completness, searching for "domain events nosql idempotent" on google: http://svendvanderveken.wordpress.com/2011/08/26/transactional-event-based-nosql-storage/
If you need Event Sourcing, you should store events only.
This should be the sequence:
the aggregate root recieves a command
it fires proper events
events are stored
Each aggregate's re-hydratation should be done only by executing events over them. You can create aggregates' snapshots if you measure performance problems on their initialization, but this doesn't require two-phase commits, since you can build snapshots asynchronously via batch.
Note however that you need CQRS and/or Event Sourcing only if your application is heavily concurrent and you need to cope with partition tolerance and compensating actions.
edit
Event Sourcing is alternative to the persistence of object state. You either store the events or the state of the object model. You can save snapshot, but they're just performance tools: your application must be able to work without them. You can consider such snapshots as a caching technique. As an alternative you can persist object state (the classical model), but in that case you don't need to store events.
In my own DDD application, I use observable entities to decouple (via direct events' subscription from the repository) aggregates and their persistence. For example your repository can subscribe each domain events, and execute the actions required by the application (persist to the store, dispatch to a queue and so on...). But as a persistence technique, Event Sourcing is alternative to classical persistence of the observable object state. In most scenarios you don't need both.
edit 2
A final note: if you choose ES, one of the events subscriber can build a relational read-model too.

EventStore: learning how to use

I'm trying to learn EventStore, I like the concept but when I try to apply in practice I'm getting stuck in same point.
Let's see the code:
foreach (var k in stream.CommittedEvents)
{
//handling events
}
Two question about that:
When an app start ups after some maintenance, how do we bookmark in a
safe way what events start to read? Is there a pattern to use?
as soon the events are all consumed, the cycle ends... what about the message arriving run time? I would expect the call blocking until some new message arrive ( of course need to be handled in a thread ) or having something like BeginRead EndRead.
Do I have to bind an ESB to handle run time event or does the EventSore provides some facility to do this?
I try to better explain with an example
Suppose the aggregate is a financial portfolio, and the application is an application showing that portfolio to a trader. Suppose the trader connect to the web app and he looks at his own portfolio. The current state will be the whole history, so I have to read potentially a lot of records to reproduce the status. I guess this could be done by a so called snapshot, but who's responsible for creating it? When one should choose to create an aggregate? How can one guess a snapshot for an aggregate exists ?
For the runtime part: as soon the user look at the reconstructed portfolio state, the real time part begin to run. The user can place an order and a new position can be created by succesfully execute that order in the market. How is the portfolio updated by the infrastructure? I would expect, but maybe I'm completely wrong, having the same event stream being the source of that new event new long position, otherwise I have two path handling the state of the same aggregate. I would like to know if this is how the strategy is supposed to work, even if I feel a little tricky having the two state agents, that can possibly overlap.
Just to clarify how I fear the overlapping:
I know events has to be idempotent, so I know it must not be a
problem anyway,
But let's consider the following:
I subscribe an event bus before streaming the event to update the state of the portfolio. some "open position event" appears on the bus: I must handle them, but maybe the portfolio is not in the correct state to handle it since is not yet actualized. Even if I'm able to handle such events I will find them again when I read the stream.
More insidious: I open the stream and I read all events and I create a state. Then I subscribe to the bus: some message on the bus happen in the middle between the end of the steram reading and the beggining of the subscription: those events are missing and the aggregate is not in the correct state.
Please be patient all, my English is poor and the argument is tricky, hope I managed to share my doubt :)
The current state will be the whole history, so I have to read
potentially a lot of records to reproduce the status. I guess this
could be done by a so called snapshot, but who's responsible for
creating it?
In CQRS and event sourcing, queries are served by projections which are generated from events emitted by aggregates. You don't use the aggregate instance as reconstituted from the event store to display information.
The term snapshot refers specifically to an optimization of the event store which allows rebuilding the aggregate without replaying all of the events.
Projections are essentially event handlers which maintain a denormalized view of aggregates. Events emitted from aggregates are published, possibly out of band, and the projection subscribes to and handles those events. A projection can combine multiple aggregates if a requirement exists to display summary information, for instance. In case of a trading application, each view will typically contain data from various aggregates. Projections are designed in a consumer-driven way - application requirements determine the different views of the underlying data that are needed.
With this type of workflow you have to embrace eventual consistency throughout your application. For instance, if an end user is viewing their portfolio and initiating new trades, the UI has to subscribe to updates to reflect updated projections in an asynchronous manner.
Take a look at here for an overview of CQRS and event sourcing.

CQRS + Event Sourcing: (is it correct that) Commands are generally communicated point-to-point, while Domain Events are communicated through pub/sub?

Didn't know how to shorten that title.
I'm basically trying to wrap my head around the concept of CQRS (http://en.wikipedia.org/wiki/Command-query_separation) and related concepts.
Although CQRS doesn't necessarily incorporate Messaging and Event Sourcing it seems to be a good combination (as can be seen with a lot of examples / blogposts combining these concepts )
Given a use-case for a state change for something (say to update a Question on SO), would you consider the following flow to be correct (as in best practice) ?
The system issues an aggregate UpdateQuestionCommand which might be separated into a couple of smaller commands: UpdateQuestion which is targeted at the Question Aggregate Root, and UpdateUserAction(to count points, etc) targeted at the User Aggregate Root. These are send asynchronously using point-to-point messaging.
The aggregate roots do their thing and if all goes well fire events QuestionUpdated and UserActionUpdated respectively, which contain state that is outsourced to an Event Store.. to be persisted yadayada, just to be complete, not really the point here.
These events are also put on a pub/sub queue for broadcasting. Any subscriber (among which likely one or multiple Projectors which create the Read Views) are free to subscribe to these events.
The general question: Is it indeed best practice, that Commands are communicated Point-to-Point (i.e: The receiver is known) whereas events are broadcasted (I.e: the receiver(s) are unknown) ?
Assuming the above, what would be the advantage/ disadvantage of allowing Commands to be broadcasted through pub/sub instead of point-to-point?
For example: When broadcasting Commands while using Saga's (http://blog.jonathanoliver.com/2010/09/cqrs-sagas-with-event-sourcing-part-i-of-ii/) could be a problem, since the mediation role a Saga needs to play in case of failure of one of the aggregate roots is hindered, because the saga doesn't know which aggregate roots participate to begin with.
On the other hand, I see advantages (flexibility) when broadcasting commands would be allowed.
Any help in clearing my head is highly appreciated.
Yes, for Command or Query there is only one and exactly one receiver (thus you can still load balance), but for Events there could be zero or more receivers (subscribers)