Kafka Avro serializer, Ignoring serialization exception - apache-kafka

I am using kafka-avro-serializer-6.0.0.jar. When I hit exceptions deserializing events my consumer stops and does not move to the next event. These are usually caused by errors at the producer and have happened because of issues using a new avro schema registry server.
Example:
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 58
Caused by: java.lang.ClassCastException:
I can fix the exceptions, that's not the issue. But to fix the consumers I need to reset each offset manually to latest. This is a lot of hassle in my scenario and I have a lot of consumer groups.
Is there a way for me to ignore these exceptions and move the offset at the consumer? I guess because I am using manual offset commit I have this issue. Anyone knows of ways to configure kafka-avro-serializer-6.0.0.jar to do what I want?
Thanks.

You have mainly two options:
Override the deserializer deserialize method and reimplement it by catching the ClassCastException exception and returning a null Object instead of the deserialized record. These null objects will then be dealt with in your consumer code.
Catch the SerializationException exception on your consumer code and seek your consumer past the bad record offset.
Both solutions are very well explained in this article by Jon Boulineau.

Related

Kafka Streams Retry PAPI and dead letter

I am trying to implement a retry logic within kafka streams processor topology in the event there was an exception producing to a sink topic.
I am using a custom ProductionExceptionHandler to be able to catch exception that happen on "producer.send" to the sink topic upon context.forward
What criteria should I use to be able resend the message to an alternate sink topic if there was an exception sending to original sink topic. Could this be deduced from type of exception in producer exception handler without compromising the transactional nature of the internal producer in Kafka streams.
If we decide to produce to a dead letter queue from production exception handler in some unrecoverable errors, could this be done within the context of "EOS" guarantee or it has to be a custom producer not known to the topology.
Kafka Streams has not built-in support for dead-letter-queue. Hence, you are "on your own" to implement it.
What criteria should I use to be able resend the message to an alternate sink topic if there was an exception sending to original sink topic.
Not sure what you mean by this? Can you elaborate?
Could this be deduced from type of exception in producer exception handler
Also not sure about this part.
without compromising the transactional nature of the internal producer in Kafka streams.
That is not possible. You have no access to the internal producer.
If we decide to produce to a dead letter queue from production exception handler in some unrecoverable errors, could this be done within the context of "EOS" guarantee or it has to be a custom producer not known to the topology.
You would need to maintain your own producer and thus it's out-of-scope for EOS.

Publish messages that could not be de-serialized to DLT topic

I do not understand how messages that could not be de-serialized can be written to a DLT topic with spring kafka.
I configured the consumer according to the spring kafka docs and this works well for exceptions that occur after de-serialization of the message.
But when the message is not de-serializable a org.apache.kafka.common.errors.SerializationExceptionis thrown while polling for messages.
Subsequently, SeekToCurrentErrorHandler.handle(Exception thrownException, List<ConsumerRecord<?, ?>> records, ...) is called with this exception but with an empty list of records and is therefore unable to write something to DLT topic.
How can I write those messages to DLT topic also?
The problem is that the exception is thrown by the Kafka client itself so Spring doesn't get to see the actual record that failed.
That's why we added the ErrorHandlingDeserializer2 which can be used to wrap the actual deserializer; the failure is passed to the listener container and re-thrown as a DeserializationException.
See the documentation.
When a deserializer fails to deserialize a message, Spring has no way to handle the problem, because it occurs before the poll() returns. To solve this problem, version 2.2 introduced the ErrorHandlingDeserializer2. This deserializer delegates to a real deserializer (key or value). If the delegate fails to deserialize the record content, the ErrorHandlingDeserializer2 returns a null value and a DeserializationException in a header that contains the cause and the raw bytes. When you use a record-level MessageListener, if the ConsumerRecord contains a DeserializationException header for either the key or value, the container’s ErrorHandler is called with the failed ConsumerRecord. The record is not passed to the listener.
The DeadLetterPublishingRecoverer has logic to detect the exception and publish the failed record.

Not committing a message from Apache Kafka if exception occurs while operations in Apache Flink

Sometimes, while processing and during sink I am getting few exceptions because of network. I want these message not to be committed so that they can be re processed sometimes in future.
Is there any way to achieve such functionality where I could ignore that individual message from offset commit, so that it can be processed sometimes in future?
One solution, I am currently following is to sink those messages to other topic which we process later.
If any exception occurred during processing of a message, the task will be restarted until this message has been eventually processed. Offsets are only committed for messages that have been fully processed.
So if you don't change anything to the error handling in source and sink, you will get exactly your desired behavior (also known as at-least once guarantee).
Btw, I recommend you to fix your tags.
Your approach of writing messages with errors during processing to a "dead letter queue" is a common and useful pattern. It's also pretty simple and straightforward. Don't change anything.

Alpakka Kafka: Serialization exception with schema registry breaks the stream

I am trying to figure out how to handle an exception with a faulty avro message. I am currently getting
{"#timestamp":"2019-06-13T20:20:38.636+00:00","#version":1,"message":"[Incoming aggregation] Upstream failed.","logger_name":"akka.stream.Materializer","thread_name":"system-akka.actor.default-dispatcher-5","level":"ERROR","level_value":40000,"stack_trace":"org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition conversation-7 at offset 1737997. If needed, please seek past the record to continue consumption.\nCaused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 42\nCaused by: java.lang.ArrayIndexOutOfBoundsException: 51
As can be seen this breaks the stream. I am not able to handle this in the decider as it is part of the Consumer Source.
In the docs it says that I should read the stream as raw bytes and do the parsing manually in a Flow stage further in the processing chain. However I don't think that it is possible if I'm using Schema registry.
Can someone give me a hint on what the proper way to handle this is?
Thank you

Kafka ktable corrupt message handling

We are using Ktabke for aggregation in kafka, its very basic use and have referred from kafka doc.
I am just trying to investigate that if some message consumption fails while aggregating how we can move such message to error topic or dlq.
I found something similar for KStream but not able to find for KTable and i was not able to simply extends KStream solution to ktable.
Reference for KStream
Handling bad messages using Kafka's Streams API
My use case is very simple for any kind of exception just move to error topic and move on to different message
There is no built in support for what you ask atm (Kafka 2.2), but you need to make sure that your application code does not throw any exceptions. All provided handlers that can be configured, are for exceptions thrown by Kafka Streams runtime. Those handlers are providing, because otherwise the user has no chance at all to react to those exception.
Feel free to create feature request Jira.