What kind of exceptions while deserializing triggers kafka streams default.deserialization.exception.handler? - apache-kafka

Kafka StreamConfig:
Properties properties = new Properties();
properties.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG , LogAndContinueExceptionHandler.class);
...
For example i have custom deserializer implementation:
public class KeyDeserializer implements Deserializer<Key>
Is my assumption right - if any Runtime exception that will occur during deserialization in deserialize method will be cached by default deserialization exception handler or only some kind of Kafka specific ones?
#Override
public Key deserialize(String s, byte[] bytes)
I have not found any explanation in docs. I must be sure that whatever happens during deserialization stream going to log and continue streaming.

Yes, you are correct. When Kafka Streams attempts to deserialize, any exception that occurs is handed over to the DeserializationExceptionHandler, so it's up to the handler how to handle it.
The default config uses the LogAndFailExceptionHandler, which does just what the name implies. Kafka Streams also provides a LogAndContinueExceptionHandler, or you could provide your own implementation.

Related

How to Handle Deserialization Exception & Converting to New Schema with Spring Cloud Stream?

I am have trouble understanding how to properly handle a deserialization exception within Spring Cloud stream. Primarily because the framework implemented does not support headers and the DLQ is supposed to be a separate schema than the original message. So the process flow needs to be: consume message -> deserialization error -> DlqHandler -> serialize with NEW schema -> send to DLQ
The documentation linked below doesn't give a good idea on if that is even possible. I have seen quite a few examples of SeekToCurrentErrorHandler for Spring-Kafka but those to my knowledge are different implementations and do not match with how I could properly get the deserialization error and then have a section for custom code to serialize into a new format and move from there.
My main question is: Is capturing the deserialization exception and reserializing possible with spring cloud streams (kafka)?
Spring Cloud Documentation for DLQ
Yes, but not using the binding retry or DLQ properties.
Instead, add a ListenerContainerCustomizer bean and customize the binding's listener container with a SeekToCurrentErrorHandler configured for the retries you need and, probably, a subclass of the DeadLetterPublishingRecoverer using an appropriately configured KafkaTemplate and possibly overriding the createProducerRecord method.

Publish messages that could not be de-serialized to DLT topic

I do not understand how messages that could not be de-serialized can be written to a DLT topic with spring kafka.
I configured the consumer according to the spring kafka docs and this works well for exceptions that occur after de-serialization of the message.
But when the message is not de-serializable a org.apache.kafka.common.errors.SerializationExceptionis thrown while polling for messages.
Subsequently, SeekToCurrentErrorHandler.handle(Exception thrownException, List<ConsumerRecord<?, ?>> records, ...) is called with this exception but with an empty list of records and is therefore unable to write something to DLT topic.
How can I write those messages to DLT topic also?
The problem is that the exception is thrown by the Kafka client itself so Spring doesn't get to see the actual record that failed.
That's why we added the ErrorHandlingDeserializer2 which can be used to wrap the actual deserializer; the failure is passed to the listener container and re-thrown as a DeserializationException.
See the documentation.
When a deserializer fails to deserialize a message, Spring has no way to handle the problem, because it occurs before the poll() returns. To solve this problem, version 2.2 introduced the ErrorHandlingDeserializer2. This deserializer delegates to a real deserializer (key or value). If the delegate fails to deserialize the record content, the ErrorHandlingDeserializer2 returns a null value and a DeserializationException in a header that contains the cause and the raw bytes. When you use a record-level MessageListener, if the ConsumerRecord contains a DeserializationException header for either the key or value, the container’s ErrorHandler is called with the failed ConsumerRecord. The record is not passed to the listener.
The DeadLetterPublishingRecoverer has logic to detect the exception and publish the failed record.

Handle Deserialization Error (Dead Letter Queue) in a kafka consumer

After some research i found few configuration i can use to handle this.
default.deserialization.exception.handler - from the StreamsConfig
errors.deadletterqueue.topic.name - from the SinkConnector config
I cant seem to find the equivalent valid configuration for a simple consumer
I want to start a simple consumer and have a DLQ handling whether its via just stating the DLQ topic and let kafka produce it ( like in the sink connector) or purely providing my own class that will produce it (like the streams config).
How can i achieve a DLQ with a simple consumer?
EDIT: Another option i figured is simply handling it in my Deserializer class, just catch an exception there and produce it to my DLQ
But it will mean ill need to create a producer in my deserializer class...
Is this the best practice to handle a DLQ from a consumer?

Spring Kafka Consume JsonNode objects

I have a service that is producing Kafka messages with a payload of type com.fasterxml.jackson.databind.JsonNode. When I consume this message I want it to be serialized into a POJO, but I'm getting the following message:
IllegalArgumentException: Incorrect type specified for header
'kafka_receivedMessageKey'. Expected [class com.example.Person] but
actual type is [class com.fasterxml.jackson.databind.node.ObjectNode]
How do I configure either the producer or the consumer parts to work as I intend it to?
I think these are the approaches I can take:
Kafka Consumer should ignore __Key_TypeId__ if it is a subclass of JsonNode
OR
Kafka Producer should produce a message without __Key_TypeId__ header if it is a subclass of JsonNode
But how do I implement either of these approaches? Or is there another option?
See the reference manual.
You can either set JsonSerializer.ADD_TYPE_INFO_HEADERS to false on the producer or JsonDeserializer.USE_TYPE_INFO_HEADERS to false on the consumer.

How to implement HTTP endpoint in Micronaut to pause consumption of messages from Kafka?

let my describe the rationale behind my question:
We have a Micronaut-based application consuming messages from Kafka broker.
The consumed messages are processed and fed to another remote "downstream" application.
If this downstream application is going to restart purposely, it will take a while to get ready accepting further messages from our Micronaut-based application.
So we have the idea to send out Micronaut application a request to SUSPEND/PAUSE consumption of messages from Kafka (e.g. via HTTP to an appropriate endpoint).
The KafkaConsumer interface seems to have appropriate methods to achieve this goal like
public void pause​(java.util.Collection<TopicPartition> partitions)
public void resume​(java.util.Collection<TopicPartition> partitions)
But how to get a reference to the appropriate KafkaConsumer instance fed in to our HTTP endpoint?
We've tried to get it injected to the constructor of the HTTP endpoint/controller class, but this yields
Error instantiating bean of type [HttpController]
Message: Missing bean arguments for type: org.apache.kafka.clients.consumer.KafkaConsumer. Requires arguments: AbstractKafkaConsumerConfiguration consumerConfiguration
It's possible to get a reference of KafkaConsumer instance as method parameter with #Topic annotated receive methods as describes in the Micronaut Kafka documentation,
but this would result in storing this reference as instance variable, get it accessed by the HTTP endpoint, etc. pp. ... which sounds not very convincing:
You get a reference to the KafkaConsumer ONLY when receiving the next message! This might by appropriate for SUSPENDING/PAUSING, but not for RESUMING!
By the way, calling KafkaConsumer.resume(...) on a reference saved as instance variable yields
java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
at org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2201)
at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:2185)
at org.apache.kafka.clients.consumer.KafkaConsumer.resume(KafkaConsumer.java:1842)
[...]
I think the same holds true when implementing KafkaConsumerAware interface to store a reference of the freshly created KafkaConsumer instance.
So are there any ideas how to handle this in an appropriate way?
Thanks
Christian