Can kafka streams state store be used for debouncing events? - apache-kafka

I have a particular use case wherein I want to wait for a few seconds to capture similar messages and send the latest one.
I'm going through this book - Kafka Streams in Action by William P. Bejeck Jr.
The above image shows that we can store state in local state store, can I use this statestore to store a message and wait for a window of 5 secs to see similar messages and just send the latest one?
If yes, will the other messages(which are not similar to current one) have to wait until the current messages window of 5 secs is over - will the processing of other messages which are coming to the stream be stopped?

Related

Kafka Streams: Send one message out of a set of identical messages in a given period of time

I'm currently working with a system that sends many messages to a broker. Some of the messages fired are identical (in terms of content). I need to find a way to prevent these duplicate messages from being consumed, and instead only consume one of them within a period of time.
After investigation, I learnt that Kafka Streams is capable of handling such a use case, particularly using the tumbling window approach. I'm fairly new to Kafka Streams. However I was wondering if there is a way to identify identical messages and filter them out by publishing one of the messages out within that window? Any help/ advice is appreciated.

How can I ensure a live stream is stopped after the broadcast is done?

Customers... Have to love them :)
I built out a web process that starts a live stream in Azure Media Services, but in testing I've seen a couple of times where the end user just closes the browser instead of clicking the end broadcast button I've so nicely set up for them.
The problem then is obvious, the stream keeps on running. Multiply this a few times and I've now got numerous live streams broadcasting nothing but I'm incurring costs.
Is there anything in the configuration in the portal (or even in the stream configuration: client.LiveEvents.CreateAsync(....) ) that can stop these services even if they close off their browser?
A few ways to approach this.
Your web application should prompt the user if they want to end the broadcast if they are closing the browser. This is a browser event that your web application can handle.
From the server side, you can monitor live events by subscribing to eventgrid events. 2 ways to do this as well. Please see the documentation on the eventgrid event schema to learn more about them.
You can either subscribe to the stream level "Microsoft.Media.LiveEventEncoderDisconnected" and monitor that no reconnection come in for a while to stop and delete your live event.
Or you can subscribe to the track level heartbeat events. If all tracks have incoming bitrate dropping to 0; or the last timestamp is no longer increasing, then you can also safely shut down the live event. The heartbeat events come in at every 20 seconds for every track so it could be a little bit verbose.
To learn more about how to subscribe to eventgrid events, you can read this documentation here

Event Replay using TrackingEventProcessor - Axon 3

I'm following the axon-springboot example shared by Allard (https://github.com/abuijze/bootiful-axon).
My understanding so far is: (please correct me if I have misunderstood some of the concepts)
Events are raised and stored in the event store/event bus (Mysql) (using EmbeddedEventStore). Now, event processors (TrackingProcessors - in my case) will pull events from the source (MySql - right?) and event handlers will execute the business logic and update the query storage and message published to RabbitMQ.
First question is where, when and who publishes this message to the RabbitMQ (used by statistics application which has the message listener configured.)
I have configured the TrackingProcessor to try the replay functionality. To execute the replay I stop my processor, delete the token entry for the processor, start the processor and events are replayed and my Query Storage is up-to-date as expected.
Second question is, when the replay is triggered and Query Storage is updated, I don't see any messages being published to the RabbitMQ...so my statistics application is out of sync. Am I doing something wrong?
Can you please advise?
Thanks
Singh
First of all, a correction: it is not the Tracking Processor or the updater of the view model that sends the messages to RabbitMQ. The Events are forwarded to Rabbit as they are published to the Event Bus.
The answer to your first question: messages are published by the SpringAmqpPublisher, which connects directly to the Event Bus, and forwards any published message to RabbitMQ as they are published.
To answer your second question, let's clarify how replays work, first. While it's called a "replay", essentially it's more a "reset". The Tracking Processor uses a TrackingToken to remember its progress of processing the Event Store. When the token is deleted (or just not yet available), the Tracking Processor starts processing from the beginning of the Event Store.
You never reply an entire application, just a single (Tracking) Processor. Just imagine: you re-publish all messages to RabbitMQ again, other components are triggered again, unaware of the fact that these are "old" messages, and user-confirmation emails are sent again, orders placed again, etc. etc.
If your Statistics are out of date, it's because they aren't part of the same processor and aren't rebuilt together with the other element. RabbitMQ doesn't support "replaying", since it doesn't remember the messages after delivering them.
Any model that you want to be able to rebuild, should be managed by a Tracking Processor.
Check out the Axon Reference guide for more information: https://docs.axonframework.org/part3/event-processing.html#event-processors

Building and querying state in Apache Kafka: Kafka Stream?

I am building an apache cluster for my needs and most of it is stateless. But there is one situation because of which I really need state.
To explain lets say I am storing every Pharmacy store that opens and the transactions that happen at that and every store. So the store opens with an initial state of number of medicines. As medicines are sold and more medicines are stocked, the state continually keeps changing.
While Kafka is serving my need of keeping up with live transactions in real time I need to be able to build pharmacy store state and query and find out at any given point the count of a given medicine in a store. Is it possible? Is that what Kafka Stream is used for?
Yes, you can use Kafka Streams to build an application that consumes a Kafka topic and maintains a queryable store that is continuously updated to maintain, as in your example, the current drug inventory.
Check out the documentation to get started: http://docs.confluent.io/current/streams/index.html
Also check out these examples using Kafka Streams' "Interactive Queries" feature:
https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/java/io/confluent/examples/streams/interactivequeries/WordCountInteractiveQueriesExample.java
https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/java/io/confluent/examples/streams/interactivequeries/kafkamusic/KafkaMusicExample.java

How to correctly use Akka's Event Stream?

I've been using Akka's event stream in a Play app as an event bus where I can publish events and subscribe listeners and I wanted to know what are the gotchas I should take into account. Specifically there are two things:
Each Listener is implemented via an actor which receives the published events and processes them. What if the actor's message queue starts to get big? How can I implement back-pressure safely, guaranteeing that each event is eventually processed?
Related to the previous one: how can I persist the unprocessed events so, in the case of a failure the application can start again and process them? I'm aware of the existence of akka-persistence but I'm not sure if that would be the right thing to do in this case: the Listener actors aren't stateful, they don't need to replay past events, I only want to store unprocessed events and delete them once they have been processed.
Considering constraints I would not use Akka's event bus for this purpose.
Main reasons are:
Delivery - You have no guarantees that event listeners are in fact listening (no ACK). It's possible to lose some events on the way.
Persistance - There is no built in way of preserving event bus state.
Scaling - Akka's event bus is a local facility, meaning it's not suitable if in future you would like to create a cluster.
Easiest way to deal with that would be to use message queue such as RabbitMQ.
While back I was using sstone/amqp-client. MQ can provide you with persistent queues (queue for each listener/listener type).