Exception handling using Kafka rider in MassTransit - apache-kafka

In MassTransit while using transport like RabbitMQ when an exception is thrown, the message goes into queue queue_name_error. But using Kafka, there is no topic with _error suffix, nor similar queue on supporting transport. How to handle exceptions properly using Kafka with MassTransit, and where erroneous messages can be found?

Since Kafka (and Azure Event Hub) are essentially log files with a fancy API, there is no need for an _error queue, as there are no queues anyway. There are no dead letters either. So the built-in error handling of MassTransit that moves faulted messages to the _error doesn't apply (nor does it make sense).
You can use the retry middleware (UseMessageRetry, etc.) with topic endpoints, to handle transient exceptions. You can also log the offset of poison messages to deal with them. The offset doesn't change, the messages remain in the topic until the expiration is reached.

Related

Handling errors in MassTransit Kafka consumers

I came across this response from Chris regarding kafka exception handling in MT, and I was wondering - if I'd like to use Kafka DLQs for poison messages (as mentioned here or suggested there under Error Recovery), does it mean I will need to implement it on my own via the MT middleware (send to DLQ in an exception handling middleware)?

Kafka Streams Retry PAPI and dead letter

I am trying to implement a retry logic within kafka streams processor topology in the event there was an exception producing to a sink topic.
I am using a custom ProductionExceptionHandler to be able to catch exception that happen on "producer.send" to the sink topic upon context.forward
What criteria should I use to be able resend the message to an alternate sink topic if there was an exception sending to original sink topic. Could this be deduced from type of exception in producer exception handler without compromising the transactional nature of the internal producer in Kafka streams.
If we decide to produce to a dead letter queue from production exception handler in some unrecoverable errors, could this be done within the context of "EOS" guarantee or it has to be a custom producer not known to the topology.
Kafka Streams has not built-in support for dead-letter-queue. Hence, you are "on your own" to implement it.
What criteria should I use to be able resend the message to an alternate sink topic if there was an exception sending to original sink topic.
Not sure what you mean by this? Can you elaborate?
Could this be deduced from type of exception in producer exception handler
Also not sure about this part.
without compromising the transactional nature of the internal producer in Kafka streams.
That is not possible. You have no access to the internal producer.
If we decide to produce to a dead letter queue from production exception handler in some unrecoverable errors, could this be done within the context of "EOS" guarantee or it has to be a custom producer not known to the topology.
You would need to maintain your own producer and thus it's out-of-scope for EOS.

Spring Kafka Template send with retry based on different cases

I am using Spring Kafka's KafkaTemplate for sending message using the async way and doing proper error handing using callback.
Also, I have configured the Kafka producer to have maximum of retries (MAX_INTEGER).
However there maybe some errors which is related with avro serialization, but for those retry wouldn't help. So how can I escape those error without retries but for other broker related issues I want to do retry?
The serialization exception will occur before the message is sent, so the retries property is irrelevant in that case; it only applies when the message is actually sent.

Is consumer offset commited even when failing to post to output topic in Kafka Streams?

If I have a Kafka stream application that fails to post to a topic (because the topic does not exist) does it commit the consumer offset and continue, or will it loop on the same message until it can resolve the output topic? The application merely prints an error and runs fine otherwise from what I can observe.
An example of the error when trying to post to topic:
Error while fetching metadata with correlation id 80 : {super.cool.test.topic=UNKNOWN_TOPIC_OR_PARTITION}
In my mind it would just spin on the same message until the issue is resolved in order to not lose data? I could not find a clear answer on what the default behavior is. We haven't set autocommit to off or anything like that, most of the settings are set to the default.
I am asking as we don't want to end up in a situation where the health check is fine (application is running while printing errors to log) and we are just throwing away tons of Kafka messages.
Kafka Streams will not commit the offsets for this case, as it provides at-least-once processing guarantees (in fact, it's not even possible to reconfigure Kafka Streams differently -- only stronger exactly-once guarantees are possible). Also, Kafka Streams disables auto-commit on the consumer always (and does not allow you to enable it), as Kafka Streams manages committing offset itself.
If you run with default setting, the producer should actually throw an exception and the corresponding thread should die -- you can get a callback if a thread dies, by registering KafkaStreams#uncaughtExceptionHandler().
You can also observe KafkaStreams#state() (or register a callback KafkaStreams#setStateListener()). The state will go to DEAD if all threads are dead (note, there was a bug in older version for which the state was still RUNNING for this case: https://issues.apache.org/jira/browse/KAFKA-5372)
Hence, the application should not be in a healthy state and Kafka Streams will not retry the input message but stop processing and you would need to restart the client. On restart, it would re-read the failed input message an re-try to write to the output topic.
If you want Kafka Streams to retry, you need to increase the producer config reties to avoid that the producer throws an exception and retries writing internally. This may "block" further processing eventually if producer write buffer becomes full.

Kafka Consumes unprocessable messages - How to reprocess broken messages later?

We are implementing a Kafka Consumer using Spring Kafka. As I understand correctly if processing of a single message fails, there is the option to
Don't care and just ACK
Do some retry handling using a RetryTemplate
If even this doesn't work do some custom failure handling using a RecoveryCallback
I am wondering what your best practices are for that. I think of simple application exceptions, such as DeserializationException (for JSON formatted messages) or longer local storage downtime, etc. Meaning there is needed some extra work, like a hotfix deployment, to fix the broken application to be able to re-process the faulty messages.
Since losing messages (i. e. not processing them) is not an option for us, the only option left is IMO to store the faulty messages in some persistence store, e. g. another "faulty messages" Kafka topic for example, so that those events can be processed again at a later time and there is no need to stop event processing totally.
How do you handle these scenarios?
One example is Spring Cloud Stream, which can be configured to publish failed messages to another topic errors.foo; users can then copy them back to the original topic to try again later.
This logic is done in the recovery callback.
We have a use case where we can't drop any messages at all, even for faulty messages. So when we encounter a faulty message, we will send a default message in place of that faulty record and at the same time send the message to a failed-topic for retry later.