Kafka Streams Retry PAPI and dead letter - apache-kafka

I am trying to implement a retry logic within kafka streams processor topology in the event there was an exception producing to a sink topic.
I am using a custom ProductionExceptionHandler to be able to catch exception that happen on "producer.send" to the sink topic upon context.forward
What criteria should I use to be able resend the message to an alternate sink topic if there was an exception sending to original sink topic. Could this be deduced from type of exception in producer exception handler without compromising the transactional nature of the internal producer in Kafka streams.
If we decide to produce to a dead letter queue from production exception handler in some unrecoverable errors, could this be done within the context of "EOS" guarantee or it has to be a custom producer not known to the topology.

Kafka Streams has not built-in support for dead-letter-queue. Hence, you are "on your own" to implement it.
What criteria should I use to be able resend the message to an alternate sink topic if there was an exception sending to original sink topic.
Not sure what you mean by this? Can you elaborate?
Could this be deduced from type of exception in producer exception handler
Also not sure about this part.
without compromising the transactional nature of the internal producer in Kafka streams.
That is not possible. You have no access to the internal producer.
If we decide to produce to a dead letter queue from production exception handler in some unrecoverable errors, could this be done within the context of "EOS" guarantee or it has to be a custom producer not known to the topology.
You would need to maintain your own producer and thus it's out-of-scope for EOS.

Related

Exception handling using Kafka rider in MassTransit

In MassTransit while using transport like RabbitMQ when an exception is thrown, the message goes into queue queue_name_error. But using Kafka, there is no topic with _error suffix, nor similar queue on supporting transport. How to handle exceptions properly using Kafka with MassTransit, and where erroneous messages can be found?
Since Kafka (and Azure Event Hub) are essentially log files with a fancy API, there is no need for an _error queue, as there are no queues anyway. There are no dead letters either. So the built-in error handling of MassTransit that moves faulted messages to the _error doesn't apply (nor does it make sense).
You can use the retry middleware (UseMessageRetry, etc.) with topic endpoints, to handle transient exceptions. You can also log the offset of poison messages to deal with them. The offset doesn't change, the messages remain in the topic until the expiration is reached.

Kafka streams shuts down if sink node/topic not reachable?

I want to test scenario when Kafka Streams using Processor API is reading from a source and writing to list of topics and one or two topics are not reachable ( failure test: trying to simulate it by adding 1/2 topics which do not exist in the cluster).
topology.addSource("mysource","source_topic");
topology.addProcessor("STREAM_PROCESSOR",()->new SourceProcessor(),"mysource");
topology.addSink("SINK_1","TOPIC_1","STREAM_PROCESSOR");
topology.addSink("SINK_2","TOPIC_2","STREAM_PROCESSOR");
topology.addSink("SINK_3","TOPIC_3","STREAM_PROCESSOR"); // This topic is not present in cluster
sourceContext.forward(eventName,eventMessage,To.child("sink1 or sink2 or sink3"));
My understanding is kafkaStreams should give error for the topic which is not present and continue forwarding records to topic1 and 2 which exists.
But the behavior I see is it gives the following error :
Exception in thread "StreamProcessor-56da56e4-4ab3-4ca3-bf48-b059558b689f-StreamThread-1"
org.apache.kafka.streams.errors.StreamsException:
task [0_0] Abort sending since an error caught with a previous record (timestamp 1592940025090) to topic "TOPIC_X" due to
org.apache.kafka.common.errors.TimeoutException: Topic "TOPIC_X" not present in metadata after 60000 ms.
Timeout exception caught when sending record to topic "TOPIC_X". This might happen if the producer cannot send data to the
Kafka cluster and thus, its internal buffer fills up. This can also happen if the broker is slow to respond, if the network connection to the
broker was interrupted, or if similar circumstances arise. You can increase producer parameter `max.block.ms` to increase this timeout.
Is this the correct way of simulating non-reachable topics or topic not present issues ? also why does the Kafka streams shuts down with the above error even when we are handling Streams and topology exceptions .
kafka streams should not shutdown if one of the sink topics is not available or reachable for some reason right ? .
Kindly suggest
On the above error I want to forward the error when catching
the StreamsException to Error topic , however kafkastreams stops
prematurely.
catch(StreamsException e)
{
context.forward("","",Error_Topic)
}
Is this an expected behavior ?
Refer : https://docs.confluent.io/current/streams/developer-guide/manage-topics.html#user-topics
does this mean that a non-existent topic is not allowed in kafkastreams topology as a sink node . Please confirm.
It's by design that Kafka Streams shuts downs if it cannot write into a sink topic. The reason is that by default, Kafka Streams guarantees at-least-once processing semantics and if it cannot write data to one sink topic but would continues, at-least-once processing would be violated as there would be data loss in the sink topic.
There is a production.exception.handler configuration that might help. It allows you to swallow certain exception when writing data into an output topic. However, note, that this implies that you have data loss on the corresponding topic.

Handle Deserialization Error (Dead Letter Queue) in a kafka consumer

After some research i found few configuration i can use to handle this.
default.deserialization.exception.handler - from the StreamsConfig
errors.deadletterqueue.topic.name - from the SinkConnector config
I cant seem to find the equivalent valid configuration for a simple consumer
I want to start a simple consumer and have a DLQ handling whether its via just stating the DLQ topic and let kafka produce it ( like in the sink connector) or purely providing my own class that will produce it (like the streams config).
How can i achieve a DLQ with a simple consumer?
EDIT: Another option i figured is simply handling it in my Deserializer class, just catch an exception there and produce it to my DLQ
But it will mean ill need to create a producer in my deserializer class...
Is this the best practice to handle a DLQ from a consumer?

Is consumer offset commited even when failing to post to output topic in Kafka Streams?

If I have a Kafka stream application that fails to post to a topic (because the topic does not exist) does it commit the consumer offset and continue, or will it loop on the same message until it can resolve the output topic? The application merely prints an error and runs fine otherwise from what I can observe.
An example of the error when trying to post to topic:
Error while fetching metadata with correlation id 80 : {super.cool.test.topic=UNKNOWN_TOPIC_OR_PARTITION}
In my mind it would just spin on the same message until the issue is resolved in order to not lose data? I could not find a clear answer on what the default behavior is. We haven't set autocommit to off or anything like that, most of the settings are set to the default.
I am asking as we don't want to end up in a situation where the health check is fine (application is running while printing errors to log) and we are just throwing away tons of Kafka messages.
Kafka Streams will not commit the offsets for this case, as it provides at-least-once processing guarantees (in fact, it's not even possible to reconfigure Kafka Streams differently -- only stronger exactly-once guarantees are possible). Also, Kafka Streams disables auto-commit on the consumer always (and does not allow you to enable it), as Kafka Streams manages committing offset itself.
If you run with default setting, the producer should actually throw an exception and the corresponding thread should die -- you can get a callback if a thread dies, by registering KafkaStreams#uncaughtExceptionHandler().
You can also observe KafkaStreams#state() (or register a callback KafkaStreams#setStateListener()). The state will go to DEAD if all threads are dead (note, there was a bug in older version for which the state was still RUNNING for this case: https://issues.apache.org/jira/browse/KAFKA-5372)
Hence, the application should not be in a healthy state and Kafka Streams will not retry the input message but stop processing and you would need to restart the client. On restart, it would re-read the failed input message an re-try to write to the output topic.
If you want Kafka Streams to retry, you need to increase the producer config reties to avoid that the producer throws an exception and retries writing internally. This may "block" further processing eventually if producer write buffer becomes full.

Kafka ktable corrupt message handling

We are using Ktabke for aggregation in kafka, its very basic use and have referred from kafka doc.
I am just trying to investigate that if some message consumption fails while aggregating how we can move such message to error topic or dlq.
I found something similar for KStream but not able to find for KTable and i was not able to simply extends KStream solution to ktable.
Reference for KStream
Handling bad messages using Kafka's Streams API
My use case is very simple for any kind of exception just move to error topic and move on to different message
There is no built in support for what you ask atm (Kafka 2.2), but you need to make sure that your application code does not throw any exceptions. All provided handlers that can be configured, are for exceptions thrown by Kafka Streams runtime. Those handlers are providing, because otherwise the user has no chance at all to react to those exception.
Feel free to create feature request Jira.