Flink deserialize Kafka message error doesn't ignore message - apache-kafka

I have a Flink (v1.15) pipeline implementing a custom AbstractDeserializationSchema.
An exception is thrown for a bad message in the deserialize(byte[] message) method whereby I catch the exception and simply return null.
According to the Flink docs (see below) returning null should cause Flink to ignore this message and move onto the next. My example doesn't and reprocesses the message continuously.
Here is a sample of my deserializer. I did some debugging and it does indeed step into the if block and it returns null. The message is then simply reprocessed over and over.
According to the official Flink docs (v1.13 and below) returning null should cause Flink to ignore this message.
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/
This exceprt below is from the above link (v1.13) and I noticed from v1.14 / v1.15 on this has been removed (https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/kafka/)
This is the repeated flow of the task after the XML parsing error
Thank you

Related

Publish messages that could not be de-serialized to DLT topic

I do not understand how messages that could not be de-serialized can be written to a DLT topic with spring kafka.
I configured the consumer according to the spring kafka docs and this works well for exceptions that occur after de-serialization of the message.
But when the message is not de-serializable a org.apache.kafka.common.errors.SerializationExceptionis thrown while polling for messages.
Subsequently, SeekToCurrentErrorHandler.handle(Exception thrownException, List<ConsumerRecord<?, ?>> records, ...) is called with this exception but with an empty list of records and is therefore unable to write something to DLT topic.
How can I write those messages to DLT topic also?
The problem is that the exception is thrown by the Kafka client itself so Spring doesn't get to see the actual record that failed.
That's why we added the ErrorHandlingDeserializer2 which can be used to wrap the actual deserializer; the failure is passed to the listener container and re-thrown as a DeserializationException.
See the documentation.
When a deserializer fails to deserialize a message, Spring has no way to handle the problem, because it occurs before the poll() returns. To solve this problem, version 2.2 introduced the ErrorHandlingDeserializer2. This deserializer delegates to a real deserializer (key or value). If the delegate fails to deserialize the record content, the ErrorHandlingDeserializer2 returns a null value and a DeserializationException in a header that contains the cause and the raw bytes. When you use a record-level MessageListener, if the ConsumerRecord contains a DeserializationException header for either the key or value, the container’s ErrorHandler is called with the failed ConsumerRecord. The record is not passed to the listener.
The DeadLetterPublishingRecoverer has logic to detect the exception and publish the failed record.

Kafka Java API fail to produce a message but didn't throw an exception

I'm using the Java KafkaProducer API to produce a Kafka message.
I'm calling the send method that returns a Future and then a get to block and wait the result.
I was expecting that when the Producer failed to send a message, it will throw an exception but it's not happening.
I have just a log saying:
Error while fetching metadata with correlation id 218444 : {TOPIC=LEADER_NOT_AVAILABLE}
the message wasn't produced, but the API didn't return an error so I didn't have the opportunity to take an action and the message was lost.
How can I handle situations like this one?
Just to clarify, I'm not worried with the specific error because it was a temporary error. I'm worried because reading the API documentation it says that .send(message).get will thrown an exception when fail to send a message but it didn't happen.

Not committing a message from Apache Kafka if exception occurs while operations in Apache Flink

Sometimes, while processing and during sink I am getting few exceptions because of network. I want these message not to be committed so that they can be re processed sometimes in future.
Is there any way to achieve such functionality where I could ignore that individual message from offset commit, so that it can be processed sometimes in future?
One solution, I am currently following is to sink those messages to other topic which we process later.
If any exception occurred during processing of a message, the task will be restarted until this message has been eventually processed. Offsets are only committed for messages that have been fully processed.
So if you don't change anything to the error handling in source and sink, you will get exactly your desired behavior (also known as at-least once guarantee).
Btw, I recommend you to fix your tags.
Your approach of writing messages with errors during processing to a "dead letter queue" is a common and useful pattern. It's also pretty simple and straightforward. Don't change anything.

Kafka ktable corrupt message handling

We are using Ktabke for aggregation in kafka, its very basic use and have referred from kafka doc.
I am just trying to investigate that if some message consumption fails while aggregating how we can move such message to error topic or dlq.
I found something similar for KStream but not able to find for KTable and i was not able to simply extends KStream solution to ktable.
Reference for KStream
Handling bad messages using Kafka's Streams API
My use case is very simple for any kind of exception just move to error topic and move on to different message
There is no built in support for what you ask atm (Kafka 2.2), but you need to make sure that your application code does not throw any exceptions. All provided handlers that can be configured, are for exceptions thrown by Kafka Streams runtime. Those handlers are providing, because otherwise the user has no chance at all to react to those exception.
Feel free to create feature request Jira.

kafka use-case for error topics

I'm trying to put a pipeline in place and I just realized I don't really know why there will be an error and why there will be an error topic. There is some metadata that I will be counting on to be certain values but other than that, is there anything that is a "typical" kafka error? I'm not sure what the "typcial" kafka error topic is used for. This is specifically for a streams application. Thanks for any help.
One example of an error topic in a streaming environment would be that it contains messages that failed to abide by their contract.. example: if your incoming events are meant to be in a certain json format, your spark application will first try to parse the event into a class that fits the events json contract.
If it is in the right format, it is parsed and the app continues.
If it is in the incorrect format, the parsing fails and the json string is sent to the error topic.
Other use cases could be to to send the event back to an error topic to be processed at a later time.. ie network issues connecting to other services.