How shutdown KafkaListener when error occurs - apache-kafka

I wrote a Listener in this way
#Autowired
private KafkaListenerEndpointRegistry kafkaListenerEndpointRegistry;
#KafkaListener(containerFactory = "cdcKafkaListenerContainerFactory", errorHandler = "errorHandler")
public void consume(#Payload String message) throws Exception {
...
}
#Bean
public KafkaListenerErrorHandler errorHandler() {
return ((message, e) -> {
kafkaListenerEndpointRegistry.stop();
return null;
});
}
In #KafkaListener annotation I specified my error handler that simply stop the consumer.
It seems to work but I've some question to ask.
Is there a built-in errorHandler for this scope? I've read that ContainerStoppingErrorHandler can be use, but I cannot set it because #KafkaListener's errorHandler accept beans of KafkaListenerErrorHandler type.
I see that with kafkaListenerEndpointRegistry.stop(); do a graceful stop. So before stopping the partition offset of the consumed message is committed.
What I would know is what happen when kafkaListenerEndpointRegistry.stop(); is called and before listener is definitely turned off another message arrive into the topic?
Is this message consumed?
I image this scenario
time0: kafkaListenerEndpointRegistry.stop() is called
time1: a message is pushed into the listened topic
time2: kafkaListenerEndpointRegistry.stop() complete graceful stop
I'm worried about a possible message arrive at time1. What would happen in this scenario?

Do not stop the container within the listener.
ContainerStoppingErrorHandler is set on the container factory, not the annotation.
If you are using Spring Boot, just declare the error handler as a bean and boot will wire it in.
Otherwise add the error handler to the connection factory bean.
With this error handler, throwing an exception will immediately stop the container.

Related

Artemis message routing

I'm using ActiveMQ Artemis 2.17.0 and I'm facing routing issues.
I've implementing a plugin that logs the before message route and I see that some message are routed from topic.private.abc.task.V1 to topic.abc.rawmessage.V1.
There is no divert setup and topic and queue are created dynamically by the producers and consumers. There is a setup to map destination clustered.*.> to virtual topics
private TransportConfiguration getServerTransportConfiguration() {
Map<String, Object> extraProps = new HashMap<>();
extraProps.put("virtualTopicConsumerWildcards", "clustered.*.>;2");
Map<String, Object> params = new HashMap<>();
params.put("scheme", "tcp");
params.put("port", port);
params.put("host", hostname);
return new TransportConfiguration("org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptorFactory", params, "netty-acceptor", extraProps);
}
Both topic.private.abc.task.V1 and topic.abc.rawmessage.V1 are valid topics but they are not supposed to be linked.
What could explain that behavior?
Here is the plugin code:
#Override
public void beforeMessageRoute(Message message, RoutingContext context, boolean direct, boolean rejectDuplicates) throws ActiveMQException {
Map<String, Object> map = new HashMap<>();
map.put("RoutingContext", new RoutingContextLogView(context));
logger.info(mapper.writeValueAsString(map));
ActiveMQServerPlugin.super.beforeMessageRoute(message, context, direct, rejectDuplicates);
}
public class RoutingContextLogView {
private RoutingContext routingContext;
public RoutingContextLogView(RoutingContext routingContext) {
this.routingContext = routingContext;
}
public String getAddress() {
return routingContext.getAddress() != null ? routingContext.getAddress().toString() : null;
}
public String getPreviousAddress() {
return routingContext.getPreviousAddress() != null ? routingContext.getPreviousAddress().toString() : null;
}
public String getRoutingType() {
return routingContext.getRoutingType() != null ? routingContext.getRoutingType().name() : null;
}
public String getPreviousRoutingType() {
return routingContext.getPreviousRoutingType() != null ? routingContext.getPreviousRoutingType().name() : null;
}
}
Despite the odd logging the flow followed by the message seems to be OK (i.e. the message is produced to topic.abc.rawmessage.V1 and consumed from topic.abc.rawmessage.V1). I'm just wandering why there is message routing and why the previousAddress in the RoutingContext is wrong.
The RoutingContext object, which is used internally by the broker, is reusable. This is done for performance reasons to prevent having to re-create the RoutingContext for every routing operation no matter what. As one might guess, routing messages is a very common operation in the broker so it pays to optimize it as much as possible. Reusing the RoutingContext means fewer objects are created and thrown away which means less garbage needs to be cleaned up which means fewer pauses and better overall performance by the broker.
The fact that the previousAddress is different here from the address where the current message is going to be routed is not a problem. It just means that the context won't be re-used for this routing operation and therefore will be cleared. As the name suggests, the beforeMessageRoute method is invoked before any routing logic is performed (e.g. clearing the RoutingContext). If you inspect the RoutingContext using afterMessageRoute then you should see that it was cleared and populated with the proper details.
Message "sending" and message "routing" (both of which have plugin hooks) are related but distinct operations. A message is "sent" in response to a client operation. Sends always result in a route. However, not all routes are the results of sends. A message can be routed due to internal broker operations which do not involve a send (e.g. moving messages around a cluster, expiring a message, cancelling an undeliverable message to a dead-letter address, using a divert, etc.).
I would caution you against inspecting internal broker state (which can be subtle and nuanced) and assuming a problem exists when everything else indicates that the broker is functioning normally. In this case you said that you were "facing routing issues" and that "some message are routed from topic.private.abc.task.V1 to topic.abc.rawmessage.V1" when, in fact, there was no routing issue and messages were not actually being routed from topic.private.abc.task.V1 to topic.abc.rawmessage.V1. From what I can see everything is in fact functioning normally.

Vertx not able to handle internal code error to client automatically

I have a verticle which accepts REST request, get data from other verticle through event bus and respond back to client.
vertx.exceptionHandler(event -> logger.error("Vertx exception ", event));
router.get("/api/v1/:param").handler(this::routerHandler);
public void routerHandler(RoutingContext rc) {
vertx.eventBus().request("data", param,
result -> {
if (result.succeeded()) {
logger.info("Request handled successfully");
// intentionally creating exception body() will return String
JsonObject jsonObject = (JsonObject) result.result().body();
rc.response().end(jsonObject)
}else{
logger.error("Request failed");
}
}
When a exception is raised it is printed in exception handler that I setup in vertx instance but after that the vertx is not reporting back the exception immediately to client instead it waits for timeout(30 secs) to occur.
I tried attaching error handler to router object and failure handler to route object but nothing helps to report the exception immediately to client. I know I can have a try catch and report the error in catch block. But I want know if there is any other way to handle this like Servlet or Spring MVC reports back to client even though the exception is not handled in code.
router.errorHandler(500,routingContext -> {
System.out.println(routingContext.failed());
routingContext.response().end("Exception ");
});
router.route().handler(BodyHandler.create()).failureHandler(routingContext -> {
Uncaught exceptions are reported to the context exceptionHandler. By default it prints the exception to the console.
You can configure it but you will not get a reference to the corresponding HTTP request anyway (the exception may come from different things).
Most problems like this are usually found during unit/integration/acceptance testing.
And for the remainders you could set a timeout handler on your router definition to make sure the request is ended before the default 30 seconds.
If you don't want to miss any uncaught exception, you should switch to the Vert.x Rxified API. When using RxJava, any exception thrown will be reported to the subscriber.

Processor topology with error handling and state store rollback

I have given topology with source from topic, processor and sink to other topic
StoreBuilder storeBuilder = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("store"),
Serdes.String(),
Serdes.String());
Topology topology = new Topology();
topology.addSource("incoming", Serdes.String().deserializer(), Serdes.String().deserializer(), "topic");
topology.addProcessor("incoming_first", () -> new MyProcessor(), "incoming");
topology.addStateStore(storeBuilder, "incoming_first");
topology.addSink("sink", "sink", "incoming_first"),
public class MyProcessor implements Processor<String, String> {
private ProcessorContext context;
private KeyValueStore<String, String> stateStore;
#Override
public void init(ProcessorContext context) {
this.context = context;
this.stateStore = (KeyValueStore<String, String>) context.getStateStore("store");
}
#Override
public void process(String key, String value) {
stateStore.put(key, value);
....
throw new RuntimeException();
....
context.forward(); //forward to sink
}
#Override
public void close() {
}
}
My question is how to handle situations when some exception occurs in the processor after write to the state store. Does Kafka has some error handling mechanism with state store rollback to reprocess the message again or forward it to the error topic?
Currently, without any handling, my application entirely dies and I need to restart it.
Also, if I add some try-catch the message identified as ok and my state store is updated and the message is sent to the changelog topic.
Do I need some rollback mechanisms for the state store?
https://issues.apache.org/jira/browse/KAFKA-7192 KIP says that if exceptions occurred the state store should not be processed with EOS, but this is valid only for the case when my entire application dies.
Thanks in advance!
For any exception that is thrown from a Processor the corresponding thread will always die. The only way to prevent this, is by catching all exceptions and handle them accordingly (whatever the right way to handle is for your application).
If a thread dies and you restart your application to recover the thread, it depends on your configuration if the store will be rolled back or not. By default, the store would not be rolled back. Only if you enable exactly-once semantics by setting configuration parameter processing.guarantees="exactly_once" the store would be rolled back on restart.
If you catch any exception in your Processor code and your business logic requires to roll back the store, you need to implement this yourself, by first getting the old values from the store, updating the store, and in cause of an exception putting the old values back into the store to overwrite/undo all your writes.

Kafka Producer : Handle Exception in Async Send with Callback

I need to catch the exceptions in case of Async send to Kafka. The Kafka producer Api comes with a fuction send(ProducerRecord record, Callback callback). But when I tested this against following two scenarios :
Kafka Broker Down
Topic not pre created
The callbacks are not getting called. Rather I am getting warning in the code for unsuccessful send (as shown below).
Questions :
So are the callbacks called only for specific exceptions ?
When does Kafka Client try to connect to Kafka broker while async send : on every batch send or periodically ?
Kafka Warning Image
Note : I am also using linger.ms setting of 25 sec to batch send my records.
public class ProducerDemo {
static KafkaProducer<String, String> producer;
public static void main(String[] args) throws IOException {
final Logger logger = LoggerFactory.getLogger(ProducerDemo.class);
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.ACKS_CONFIG, "1");
properties.setProperty(ProducerConfig.LINGER_MS_CONFIG, "30000");
producer = new KafkaProducer<String, String>(properties);
String topic = "first_topic";
for (int i = 0; i < 5; i++) {
String value = "hello world " + Integer.toString(i);
String key = "id_" + Integer.toString(i);
ProducerRecord<String, String> record = new ProducerRecord<String, String>(topic, key, value);
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
//execute everytime a record is successfully sent or exception is thrown
if(e == null){
// No Exception
}else{
//Exception Handling
}
}
});
}
producer.close();
}
You will get those warning for non-existing topic as a resilience mechanism provided with KafkaProducer. If you wait a bit longer(should be 60 seconds by default), the callback will be called eventually:
Here's my snippet:
So, when something goes wrong and async send is not successful, it will eventually fail with a failed future or/and a callback with exception.
If you are not running it transactionally, it can still mean that some messages from the batch have found their way to the broker, while others haven't.
It will most certainly be a problem if you need a blocking-style acknowledgement to the upstream system(like http ingestion interface, etc.) per every message that is sent to Kafka. The only way to do that is by blocking every message with the future's get, as described in the documentation:
In general, I've noticed a lot of question related to KafkaProducer delivery semantics and guarantees. It can definitely be documented better.
One more thing, since you mentioned linger.ms:
Note that records that arrive close together in time will generally
batch together even with linger.ms=0 so under heavy load batching will
occur regardless of the linger configuration
For the first question, here is the answer.
As per the apache kafka documentation, you can capture below exceptions using onCompletion method when you are implementing Callback interface
https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/producer/Callback.html
For the second question, the combination of below properties control when to send the records and as far as i understand, it's same for synchronous or asynchronous call.
linger.ms
max.block.ms
https://kafka.apache.org/documentation/#linger.ms
So are the callbacks called only for specific exceptions ?
Yes, that's how it works. From documentation (2.5.0):
* Fully non-blocking usage can make use of the {#link Callback} parameter to provide a callback that
* will be invoked when the request is complete.
Notice the important part: when the request is complete, what means that the producer must have accepted the record and sent the ProduceRequest to Kafka Broker. Without digging too deep into internals, this means that broker metadata must be present and the partition must exist.
When it comes to formal specification, you'd need to take a good look at send()'s Javadoc and possibly at KafkaProducer's implementation of doSend method. Out there you're going to see that multiple exceptions can be thrown at the in submitting call (instead of returning a future and invoking callback), e.g. :
if broker metadata is not available in timeout given,
if data could not be serialized,
if serialized form was too large, etc.

Grails: Event listener never stop (with websocket plugin)

I have a problem with my listener (platform core plugin). It never stop after event (afterInsert) was triggered (Mongo standalone)
My service catch the event and use the SimpMessagingTemplate (websocket plugin) to send on topic the data (A news user registration). The data is saved, but the listener is triggered again and again.
p.s. The data not really saved, the consequent exception, rollback the transaction. But in debug mode I can see the assigned id.
#Transactional
class MessageEventService {
SimpMessagingTemplate brokerMessagingTemplate
#Listener(topic = 'afterInsert',namespace = 'gorm')
void afterInsert(User user) {
log.info("DIOBO")
brokerMessagingTemplate.convertAndSend("/topic/myEventTopic", "myEvent: User Add ${user.name}")
}
}
And at the end...
.[grails] Servlet.service() for servlet grails threw exception
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
Why happens?!?!?
Thanks!