Real-time streaming cep system with delayed reaction [closed] - apache-kafka

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I need help for an archetecture issue.
I develope a cep system based on kafka technology with java.
CEP should have followed characteristics:
distributed (cluster)
scalable
fault-tolerance
CEP should make followed actions:
create events from different sources, which is actually are multi-partitioned kafka-topics (ETL-part)
analyze sequences of that events and if they are matched for a special patterns (scenario) put reaction-record to some store (analyze-part)
every X period query this store to do some communication with a client if it's the time (schedule-part)
During X period if a cancel-event appears, so I remove a reaction record from store.
I created that system using KafkaStreams library. But the archetecture as a result is not so good.
KafkaStreams use RocksDB in backend to store states. There are many problems with managing stores in cluster mode and having a consistent data. Also I cant to make sql-queries to them to resque from iterating every record in store to check if the time for reaction is heppen.
I'm not an architect and I only one who is busy in this task. I was adviced to look at KafkaStreams and Flink for create cep programm. But in fact are these technologies really fit?
There are no question for an ETL-part.
But how can I built an analyze-part and (it's more interesting) query-part? What tools can I use?
I'm grateful for any help and advices
[UPD]
About queries and stores:
We need to check if the time to send a communication is heppen. If it's true so communicate with a person: push-message, email or any other chanel.
select
...where event_time + wait_time < now
After that we need update that record in store to next message of this scenario (and make this algorithm until the person go to last message of scenario or does the cancel action)
Sequence of scenario A:
ev A -> ev B -> ev C -> ev D -----> start scenario -----> ev E or msg c was sent -----> cancel scenario
Messages for scenario A:
msg a (send after wait_time: 10 minutes)
msg b (send after wait_time: 1 day)
msg c (send after wait_time: 7 days) - last
update
... where user_id = xxx and scenario_id = A
If action was made in 2nd point, so we also need to update userStore (there are some information about users, including special counters; they are help to not spam the client and no sending messagies to him at night)
update
... where user_id = xxx
I wrote an engine for CEP with some rules, which I save in special store - scenarioStore.
Thus, there are a several stores:
initialStore (keep last event in scenario sequencies with message parameters, waiting the time to be sent) - ev D
scenarioStore (sequences of events by scenarios) - CEP rules
messageStore (texts and other properties of messages) - msg rules
userStore (information about users)

You can definitely do complex event processing CEP with Kafka Streams. There are even open-source libraries for that kafkastreams-cep.
Kafka Streams framework supports interactive-queries where you can query your state stores to retrieve required data. You can add a REST layer to make it queryable from REST API. Please, see a code example WordCountInteractiveQueriesExample.

Related

Tracking an expected set of Kafka events

Say I have N cities and each will report their temperature for the hour (H) by producing Kafka events. I have a complex model I want to run but want to ensure it doesn't attempt to kick-off before all N are read.
Say they are being produced in batches, I understand that to ensure at-least-once consumption, if a consumer fails mid-batch then it will pick up at the front of the batch. I have built this into my model to count by unique Cities (and if a city is sent multiple times it will overwrite existing records).
My current plan is to set it up as follows:
An application creates an initial event which says "Expect these N cities to report for H o'clock".
The events are persisted (in db, Redis, etc) by another application. After writing, it produces an event which states how many unique cities have been reported in total so far for H.
Some process matches the initial "Expect N" events with "N Written" events. It alerts the rest of the system that the data set for H is ready for creating the model when they are equal.
Does this problem have a name and are there common patterns or libraries available to manage it?
Does the solution as outlined have glaring holes or overcomplicate the issue?
What you're describing sounds like an Aggregator, described by Gregor Hohpe and Bobby Woolf's "Enterprise Integration Patterns" as:
a special Filter that receives a stream of messages and identifies messages that are correlated. Once a complete set of messages has been received [...], the Aggregator collects information from each correlated message and publishes a single, aggregated message to the output channel for further processing.
This could be done on top of Kafka Streams, using its built-in aggregation, or with a stateful service like you suggested.
One other suggestion -- designing processes like this with event-driven choreography can be tricky. I have seen strong engineering teams fail to deliver similar solutions due to diving into the deep end without first learning to swim. If your scale demands it and your organization is already primed for event-driven distributed architecture, then go for it, but if not, consider an orchestration-based alternative (for example, AWS Step Functions, Airflow, or another workflow orchestration tool). These are much easier to reason about and debug.

What's the max of topics I can have on ZeroMQ?

I'm new to ZeroMQ ( I've been using SQS so far ).
I would like to build a system where every time a user logs in, they subscribe to a queue. The all the users subscribed to this queue are interested only in messages directed to them.
I read about topic matching. It seems that I could create a pattern like this:
development.player.234345345
development.player.453423423
integration.player.345354664
And, each worker ( user ) can subscribe to the queue and listen only to the topic they match. i.e. a player 234345345 on the development environment will only subscribe to messages with the topic development.player.234345345
Is this true?
And if so, what are the consequences in ZeroMQ?
Is there a limit on how many topic matching I can have?
ZeroMQ has a very detailed page on how the internals of topic matching works. It looks like you can have as many topics as you want, but topic matching incurrs a runtime cost. It's supposed to be extremely fast:
We believe that application of the above algorithms can give a system
that will be able to match or filter a single message in the range of
nanoseconds or couple of microseconds even it the case of large amount
of different topics and subscriptions.
However, there are some caveats you need to be aware of:
The inverted bitmap technique thus works by pre-indexing a set of
searchable items so that a search request can be resolved with a
minimal number of operations.
It is efficient if and only if the set of searchable items is
relatively stable with respect to the number of search requests.
Otherwise the cost of re-indexing is excessive.
In short, as long as you don't change your subscriptions too often, you should be able to do on the order of thousands of topics at least.
A: Yes, you can
The Max. Number? A harder part...
May would like to read Martin SUSTRIK's post on this:
While ZeroMQ evolves on it's own, Martin, ZeroMQ co-father, has posted on this subject a few interesting facts here, with some further details and design view discussion derrogated here
Efficient Subscription Matching
In ZeroMQ, simple tries are used to store and match PUB/SUB subscriptions. The subscription mechanism was intended for up to 10,000 subscriptions where simple trie works well. However, there are users who use as much as 150,000,000 subscriptions. In such cases there's a need for a more efficient data structure.
Worth reading to have some estimate of where safe-zones are.
Also worth to know, that not all ZeroMQ versions behave the same way.
Recent API uses PUB-side topic filtering, which is not automatic for all previous versions, where SUB-side filtering was used. Translate that into all the network transport, if all messages, irrespective of their's final destiny are broadcast to all SUB-s, just to realise that only one ( user in your use-case ) will match and all the rest will discard the messages, due to topic-filter mismatches.
Thus all your use-cases ought take into account what different ZeroMQ versions ( incl. different non-native language bindings and wrappers ) may
meet and cooperate on the same playground.
Anyway, ZeroMQ is a great tool, nanomsg being in recent years also worth to monitor and challenge.

Implementing sagas with Kafka

I am using Kafka for Event Sourcing and I am interested in implementing sagas using Kafka.
Any best practices on how to do this? The Commander pattern mentioned here seems close to the architecture I am trying to build but sagas are not mentioned anywhere in the presentation.
This talk from this year's DDD eXchange is the best resource I came across wrt Process Manager/Saga pattern in event-driven/CQRS systems:
https://skillsmatter.com/skillscasts/9853-long-running-processes-in-ddd
(requires registering for a free account to view)
The demo shown there lives on github: https://github.com/flowing/flowing-retail
I've given it a spin and I quite like it. I do recommend watching the video first to set the stage.
Although the approach shown is message-bus agnostic, the demo uses Kafka for the Process Manager to send commands to and listen to events from other bounded contexts. It does not use Kafka Streams but I don't see why it couldn't be plugged into a Kafka Streams topology and become part of the broader architecture like the one depicted in the Commander presentation you referenced.
I hope to investigate this further for our own needs, so please feel free to start a thread on the Kafka users mailing list, that's a good place to collaborate on such patterns.
Hope that helps :-)
I would like to add something here about sagas and Kafka.
In general
In general Kafka is a tad different than a normal queue. It's especially good in scaling. And this actually can cause some complications.
One of the means to accomplish scaling, Kafka uses partitioning of the data stream. Data is placed in partitions, which can be consumed at its own rate, independent of the other partitions of the same topic. Here is some info on it: how-choose-number-topics-partitions-kafka-cluster. I'll come back on why this is important.
The most common ways to ensure the order within Kafka are:
Use 1 partition for the topic
Use a partition message key to "assign" the message to a topic
In both scenarios your chronologically dependent messages need to stream through the same topic.
Also, as #pranjal thakur points out, make sure the delivery method is set to "exactly once", which has a performance impact but ensures you will not receive the messages multiple times.
The caveat
Now, here's the caveat: When changing the amount of partitions the message distribution over the partitions (when using a key) will be changed as well.
In normal conditions this can be handled easily. But if you have a high traffic situation, the migration toward a different number of partitions can result in a moment in time in which a saga-"flow" is handled over multiple partitions and the order is not guaranteed at that point.
It's up to you whether this will be an issue in your scenario.
Here are some questions you can ask to determine if this applies to your system:
What will happen if you somehow need to migrate/copy data to a new system, using Kafka?(high traffic scenario)
Can you send your data to 1 topic?
What will happen after a temporary outage of your saga service? (low availability scenario/high traffic scenario)
What will happen when you need to replay a bunch of messages?(high traffic scenario)
What will happen if we need to increase the partitions?(high traffic scenario/outage & recovery scenario)
The alternative
If you're thinking of setting up a saga, based on steps, like a state machine, I would challenge you to rethink your design a bit.
I'll give an example:
Lets consider a booking-a-hotel-room process:
Simplified, it might consist of the following steps:
Handle room reserved (incoming event)
Handle room payed (incoming event)
Send acknowledgement of the booking (after payed and some processing)
Now, if your saga is not able to handle the payment if the reservation hasn't come in yet, then you are relying on the order of events.
In this case you should ask yourself: when will this break?
If you conclude you want to avoid the chronological dependency; consider a system without a saga, or a saga which does not depend on the order of events - i.e.: accepting all messages, even when it's not their turn yet in the process.
Some examples:
aggregators
Modeled as business process: parallel gateways (parallel process flows)
Do note in such a setup it is even more crucial that every action has got an implemented compensating action (rollback action).
I know this is often hard to accomplish; but, if you start small, you might start to like it :-)

Design Storm topology to process and persist usage metrics of use on a web page [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
We are working on web application, that has a feature to generate metrics based on how user is using the app. We are exploring to use Storm to process the user events and generate metrics.
The high level approach we are planning :
On client side (Browser), a Java script component to capture user events and post the event to server, and event message will be posted to RabbitMQ.
Storm spout consumes message from RabbitMQ.
Storm bolt process the message and computes metrics.
Finally metrics get saved to MongoDB
Question :
Bolt has to accumulate event's metrics before saving to MongoDB because of two reasons, need to avoid IO load on MongoDB and metrics logic has dependency on multiple events. So we need to have intermittent persistence for Bolt, and not impacting performance.
How can we add temporary persistence within storm topology while we calculate statistics on the data pulled from RabbitMQ, and then save metrics to permanent persistence MongoDB, only on some interval or some other logical trigger.
Please clarify if I don't fully answer your question but the general gist of your query seems to echo the theme: how can we persist within our storm topology while we calculate statistics on the data pulled from RabbitMQ?
Luckily for you, Storm has already considered this question and developed Storm-Trident, which performs real time aggregation on incoming tuples and allows the topology to persist the aggregated state for DRPC queries and for situations requiring high availability and persistence.
For example, in your particular scenario, you would have this kind of TridentTopology:
TridentTopology topology = new TridentTopology();
TridentState metricsState = topology.newSpout(new RabbitMQConsumer())
.each(new Fields("rawData"), new ComputeMetricsFunction(), new Fields("output"))
.groupBy(new Fields("output"))
.persistentAggregate(new MemoryMapState.Factory(), new AggregatorOfYourChoice(), new Fields("aggregationResult"))
Note: the code isn't 100% accurate but should be considered more as pseudo-code. See Nathan's word count example for code specific implementation (https://github.com/nathanmarz/storm/wiki/Trident-tutorial).

Message bus integration and resync of Bounded Contexts after downtime - Service Bus 1.0

I have just downloaded joliver eventstore and looking to wire up a service bus with Windows Service Bus 1.0 for an application separated across more than one Bounded Context process.
If a bounded context has been offline whilst events in other bounded contexts have been created (or may even be a new context that has been deployed), I can see the following sequence of events.
For an example ContextA, ContextB and ContextC, all connected using Service Bus 1.0 and each context with their own event store, they all share the same bus messaging backplane.
ContextC goes offline.
When ContextC comes back-up, other bounded contexts need to be notified of the events that need to be resent to the context that has just come back online. These events are replayed from each of the event stores.
My questions are:
The above scenario would apply to any event sourcing libraries, so is there any infrastructure code on top of this I can use, or do I have to roll my own?
With Windows Service Bus 1.0, how do I marry sequence numbers in my event store to sequence numbers on the Service Bus?
What is the best practice to detect and handle events that have already been received in a safe manner (protecting against message handlers failing)?
The above scenario would apply to any event sourcing libraries, so is there any infrastructure code on top of this I can use, or do I have to roll my own?
The notion of a Projection mechanism tied to the events is certainly common. Unfortunately, there are many many ways of handling how that might be done, depending on your stack, performance requirements and scale and many other factors.
As a result I'm not aware of a commoditized facility of this nature.
The GetEventStore store has an integrated Projection facility which looks extremely powerful and takes the need to build all this off the table. Before its existence, I'd have argued that one shouldnt even consider looking past the the SRPness of the JOES.
You havent said much about your actual stack other than mentioning Azure.
With Windows Service Bus, how do I marry sequence numbers in my event store to sequence numbers on the Service Bus?
You can use stream id + the commit sequence number the MessageId (and use that to ensure duplicates are removed by the bus). You will probably also include properties in the Message metadata.
What is the best practice to detect and handle events that have already been received in a safe manner (protecting against message handlers failing)?
If you're on Azure and considering ServiceBus then the Topics can be used to ensure at least once delivery (and you'll use the sessioning facility). Go watch the two hour deep dive ClemensV Subscribe video plus a few other episodes or you'll spent the same amount of time making mistakes)
To keep broadcast traffic down, if ContextC requests replays from ContextA and ContextB, is there any way for these replay messages to be sent only to ContextC? Or should I not worry about this?
Mu. You started off asking whether this stuff was a good idea but now seem to have baked in an assumption that it's the way to go.
Firstly, this infrastructure is a massive wheel to reinvent. Have you considered simply setting up a topic per BC and having anyone that needs to listen listen?
A key thing here is that you need to bear in mind the fact that just because you can think of cases where BCs need to consume each others events, that this central magic bus that's everywhere will deliver everything everywhere.
EDIT: Answers to your edited versions of questions 2+
With Windows Service Bus 1.0, how do I marry sequence numbers in my event store to sequence numbers on the Service Bus?
Your event store doesnt have a sequence number. It has a commit sequence number per aggregate. You'd typically use a sessioned topic and subscription. Then you need to choose whether you want a global ordering (use a single session id) or per aggregate ordering (use the stream id as the session id).
Once events are on a topic, they have a MessageSequenceNumber and the subscription (when sessioned) delivers (actually the subscriber recieves them) them in sequence.
What is the best practice to detect and handle events that have already been received in a safe manner (protecting against message handlers failing)?
This is built into the Service Bus (or any queueing mechanism). You don't mark the Message completed until it has been successfully processed. Any failure leads to Abandonment (which puts it back on the queue for reprocessing).
The subscriber taking a break, becoming disconnected or work backing up is naturally dealt with by the Topic.