How to list all consumers that are listening to an address on a vertex event bus? - vert.x

I have a vert.x program. I am creating a message consumer and attach it to listen on an address on the vertx event bus . later in the program I am unregistering that consumer . How do I know if the consumer is unregistered successfully ?
following code snippet shows how i register a consumer on an address on vertex event bus
MessageConsumer<JsonObject> consumer = vertx.eventBus().consumer("my_channel", eventHandler)
later after sometime i am unregistering the consumer
consumer.unregister( res -> {
if(res.succeeded()) { System.out.println("consumer deregistered")}
});
so my question is suppose i have reference to an vertx event bus vert.eventBus() object how can i verify it if there any consumers on it ?

As for the first question, you can safely assume consumer is unregistered successfully when res.succeeded() returns true inside unregister handler (as per your example).
For the second part, afaik the event bus does not maintain the list of consumers registered to it, you have to maintain it yourself. The goal would be to have a map or some other collection where you store references for consumers when you register them with .consumer(...) method and remove them after unregister handler returns succeeded.
I think this would be the issue you are referring to (with the advice by the lead architect of vertx): https://groups.google.com/g/vertx/c/d70YlHLL7KU?pli=1

Related

If Service produce event to one topic, only this service have to consume processed event from another topic (Kafka)

I have to implement event driven architecture services with Kafka (Java tech stack).
I drew example:
Imagine that I have 3 external producers (Ms1, Ms2, Ms3), who sends events in to one topic, which my service reads. After receiving event, my service processing some business logic and than pushes event to another topic. Ms1, Ms2, Ms3 subscribe on this topic and listen what come in. My goal is: if Ms1 sent event to topic-1, only Ms1 must received response event from topic-2 (despite the fact that other Consumers are listening to this topic too, they are forbidden to receive event belong to Ms1). If Ms2 sent event to topic-1, than only Ms2 must received event from topic-2.
And I don't know how many consumers/producer will be. It's floating amount. Today it can be 3 external producers/consumers, tomorrow maybe 30 and so on. They can subscribe and unsubscribe.
Kafka records shouldn't "belong" to particular services, IMO, this is mostly metadata about data lineage; maybe that information will be valuable for some other consumer use case that you've not considered yet.
If you have multiple consumers from one topic, there's no logic outside of filtering and explicit partition assignments that would get "all M1 producer events to all M1 consumers"
If you want to lock down access to topics to particular clients, use ACLs and certificates. Otherwise, there's nothing stopping new consumer groups from subscribing to whatever topics they want

Question on implementing cqrs/saga with Kafka

I am studying cqrs and would like to implement the saga pattern to handle distributed transaction using Kafka.
A saga would subscribe the domain event from other aggregate to send a command. My problem is the domain event would be handled by the aggregate event handler as well.
If the aggregate event handler handled the event successfully, the offset would be committed so that the job in the broker gone.
Let's say if the aggregate event handler handle that event successfully, but the saga is not triggered because of some unexpected issue. Since the job is gone, the event would never be picked up by the saga if they both live in same consumer group...
So does it make sense to have a consumer group for the aggregate and create another consumer group for the saga?
so that the job in the broker gone.
I think you're misunderstanding that consuming messages from a Kafka topic does not remove them.
Part of your saga flow should be an action for committing the full offset trace once each service does its complete action
does it make sense to have a consumer group for the aggregate and create another consumer group for the saga?
It makes sense to have intermediate topics, sure. You could use Kafka Streams processors to easily implement that

How to configure Kafka RPC caller topic and group

I'm trying to implement an RPC architecture using Kafka as a message broker. The decision of using Kafka instead of another message broker solution is dictated by the current context.
The actual implementation consists on two different types of service:
The receiver: this service receives messages from a Kafka topic which consumes, processes the messages and then publish the response message to a response topic;
The caller: this service receives HTTP requests, then publish messages to the receiver topic, consumes the response topic of the receiver service for the response message, then returns it as an HTTP response.
The request/response messages published in the topics are related by the message key.
The receiver implementation was fairly simple: at startup, it creates the "request" and "response" topic, then starts consuming the request topic with the service group id (many instances of the receiver will share the same group id in order to implement a proper request balance). When a request arrives, the service processes the request and then publish the response in the response topic.
My problem is with the caller implementation, in particular while consuming the response from the response queue.
With the following assumptions:
The HTTP requests must be managed concurrently;
There could be more than one instance of this caller service.
every single thread/service must receive all the messages in the response topic, in order to find the message with the corresponding request key.
As an example, imagine that two receiver services produce two messages with keys 1 and 2 respectively. These messages will be published in the receiver topic, and processed. The response will then be published in the topic receiver-responses. If the two receiver services share the same group-id, it could be that response 1 arrives to the service that published message 2 and vice versa, resulting in a HTTP timeout.
To avoid this problem, I've managed to think these possible solutions:
Creating a new group for every request (EDIT: but a group cannot be deleted via code, hence it would be necessary another service to clean the zookeeper from these groups);
Creating a new topic for every request, then delete it afterwards.
Hoping that I made myself sufficiently clear - I must admit I am a beginner to Kafka - my question would be:
Which solution is more costly than the other? Or is there another topic/group configuration that could achieve the assumption 3?
Thanks.
I think I've found a possible solution. A group will be automatically deleted by the zookeeper when it's offset doesn't update for a period of time, determined by the configuration offsets.topic.retention.minutes.
The offset update time check should be possible to set up by setting the configuration offsets.retention.check.interval.ms.
This way, when a consumer connects to the response topic searching for the reply message, the created group can be abandoned, and it will be deleted by the zookeeper later in time.

How to Gracefully Turn Off Running Kafka Consumer

I have a need to turn Kafka consumer on/off on the basis of some Database driven property. How can it be achieved.
one way that i have thought of is : throwing exception from consumer when consumer flag is turned off. and container factory config is defined as
factory.setErrorHandler(new SeekToCurrentErrorHandler());
But it actively seeks the same message.
is there any way to turn heart-beat off and then on back again on demand.
You can stop() and start() the listener container.
It appears you are using #KafkaListener since you are using a container factory.
In that case
#KafkaListener(id = "foo" ...)
and then use the KafkaListenerEndpointRegistry bean ...
registry.getListenerContainer("foo").stop();
The Consumer has a few APIs to control its state:
pause()/resume(): Allows to stop/resume consusuming from a set of partitions. The Consumer stays subscribed (so no rebalance) but just does not fetch any new records until resumed
unsubscribe(): Allows to change consumer subscription, if not subscribed to anything, it will just stay connected to the cluster.
If you are "done" with the Consumer, you can also call close() and start a new one when needed

custom Flume interceptor: intercept() method called multiple times for the same Event

TL;DR
When a Flume source fails to push a transaction to the next channel in the pipeline, does it always keep event instances for the next try?
In general, is it safe to have a stateful Flume interceptor, where processing of events depends on previously processed events?
Full problem description:
I am considering the possibility of leveraging guarantees offered by Apache Kafka regarding the way topic partitions are distributed among consumers in a consumer group to perform streaming deduplication in an existing Flume-based log consolidation architecture.
Using the Kafka Source for Flume and custom routing to Kafka topic partitions, I can ensure that every event that should go to the same logical "deduplication queue" will be processed by a single Flume agent in the cluster (for as long as there are no agent stops/starts within the cluster). I have the following setup using a custom-made Flume interceptor:
[KafkaSource with deduplication interceptor]-->()MemoryChannel)-->[HDFSSink]
It seems that when the Flume Kafka source runner is unable to push a batch of events to the memory channel, the event instances that are part of the batch are passed again to my interceptor's intercept() method. In this case, it was easy to add a tag (in the form of a Flume event header) to processed events to distinguish actual duplicates from events in a failed batch that got re-processed.
However, I would like to know if there is any explicit guarantee that Event instances in failed transactions are kept for the next try or if there is the possibility that events are read again from the actual source (in this case, Kafka) and re-built from zero. In that case, my interceptor will consider those events to be duplicates and discard them, even though they were never delivered to the channel.
EDIT
This is how my interceptor distinguishes an Event instance that was already processed from a non-processed event:
public Event intercept(Event event) {
Map<String,String> headers = event.getHeaders();
// tagHeaderName is the name of the header used to tag events, never null
if( !tagHeaderName.isEmpty() ) {
// Don't look further if event was already processed...
if( headers.get(tagHeaderName)!=null )
return event;
// Mark it as processed otherwise...
else
headers.put(tagHeaderName, "");
}
// Continue processing of event...
}
I encountered the similar issue:
When a sink write failed, Kafka Source still hold the data that has already been processed by interceptors. In next attempt, those data will send to interceptors, and get processed again and again. By reading the KafkaSource's code, I believe it's bug.
My interceptor will strip some information from origin message, and will modify the origin message. Due to this bug, the retry mechanism will never work as expected.
So far, The is no easy solution.