Kafka streams shuts down if sink node/topic not reachable? - apache-kafka

I want to test scenario when Kafka Streams using Processor API is reading from a source and writing to list of topics and one or two topics are not reachable ( failure test: trying to simulate it by adding 1/2 topics which do not exist in the cluster).
topology.addSource("mysource","source_topic");
topology.addProcessor("STREAM_PROCESSOR",()->new SourceProcessor(),"mysource");
topology.addSink("SINK_1","TOPIC_1","STREAM_PROCESSOR");
topology.addSink("SINK_2","TOPIC_2","STREAM_PROCESSOR");
topology.addSink("SINK_3","TOPIC_3","STREAM_PROCESSOR"); // This topic is not present in cluster
sourceContext.forward(eventName,eventMessage,To.child("sink1 or sink2 or sink3"));
My understanding is kafkaStreams should give error for the topic which is not present and continue forwarding records to topic1 and 2 which exists.
But the behavior I see is it gives the following error :
Exception in thread "StreamProcessor-56da56e4-4ab3-4ca3-bf48-b059558b689f-StreamThread-1"
org.apache.kafka.streams.errors.StreamsException:
task [0_0] Abort sending since an error caught with a previous record (timestamp 1592940025090) to topic "TOPIC_X" due to
org.apache.kafka.common.errors.TimeoutException: Topic "TOPIC_X" not present in metadata after 60000 ms.
Timeout exception caught when sending record to topic "TOPIC_X". This might happen if the producer cannot send data to the
Kafka cluster and thus, its internal buffer fills up. This can also happen if the broker is slow to respond, if the network connection to the
broker was interrupted, or if similar circumstances arise. You can increase producer parameter `max.block.ms` to increase this timeout.
Is this the correct way of simulating non-reachable topics or topic not present issues ? also why does the Kafka streams shuts down with the above error even when we are handling Streams and topology exceptions .
kafka streams should not shutdown if one of the sink topics is not available or reachable for some reason right ? .
Kindly suggest
On the above error I want to forward the error when catching
the StreamsException to Error topic , however kafkastreams stops
prematurely.
catch(StreamsException e)
{
context.forward("","",Error_Topic)
}
Is this an expected behavior ?
Refer : https://docs.confluent.io/current/streams/developer-guide/manage-topics.html#user-topics
does this mean that a non-existent topic is not allowed in kafkastreams topology as a sink node . Please confirm.

It's by design that Kafka Streams shuts downs if it cannot write into a sink topic. The reason is that by default, Kafka Streams guarantees at-least-once processing semantics and if it cannot write data to one sink topic but would continues, at-least-once processing would be violated as there would be data loss in the sink topic.
There is a production.exception.handler configuration that might help. It allows you to swallow certain exception when writing data into an output topic. However, note, that this implies that you have data loss on the corresponding topic.

Related

Kafka Streams Retry PAPI and dead letter

I am trying to implement a retry logic within kafka streams processor topology in the event there was an exception producing to a sink topic.
I am using a custom ProductionExceptionHandler to be able to catch exception that happen on "producer.send" to the sink topic upon context.forward
What criteria should I use to be able resend the message to an alternate sink topic if there was an exception sending to original sink topic. Could this be deduced from type of exception in producer exception handler without compromising the transactional nature of the internal producer in Kafka streams.
If we decide to produce to a dead letter queue from production exception handler in some unrecoverable errors, could this be done within the context of "EOS" guarantee or it has to be a custom producer not known to the topology.
Kafka Streams has not built-in support for dead-letter-queue. Hence, you are "on your own" to implement it.
What criteria should I use to be able resend the message to an alternate sink topic if there was an exception sending to original sink topic.
Not sure what you mean by this? Can you elaborate?
Could this be deduced from type of exception in producer exception handler
Also not sure about this part.
without compromising the transactional nature of the internal producer in Kafka streams.
That is not possible. You have no access to the internal producer.
If we decide to produce to a dead letter queue from production exception handler in some unrecoverable errors, could this be done within the context of "EOS" guarantee or it has to be a custom producer not known to the topology.
You would need to maintain your own producer and thus it's out-of-scope for EOS.

how to handle the exception when the DB is down while reading the mesage from kafka topic

In my spring boot application i am reading the message from kafka topic and saving the message in to HBase.
in case the DB is down and the message is consumed from the topic , how should i ensure that the message is not lost. can someone share me a sample code.
If your code encounters an error during the processing of a record, you as the developer, are responsible for handling retries or error catching. spring-kafka can't capture errors outside of the Kafka API for you.
That being said, Kafka will not remove the record just because it's consumed until it fully expires off the topic. You should definitely set enable.auto.commit to false and commit your own offsets after a successful database action, at the expense of potential duplicated records in hbase
I would also like to point out that you should probably be using Kafka Connect, which is meant to integrate external systems to Kafka, not a plain consumer.

When message process fails, can consumer put back message to same topic?

Suppose one of my program consuming message from kafka topic. During processing of message, consumer access some db. Its db acccess fails due to xyz reason. But we dont have to abandon the message. We need to park the message for later processing. In JMS when message processing fails application container put back the message to the queue. It does not lost. In Kafka once it received its offset increases and next message comes. How to handle this ?
There are two approaches to achieve this.
Set the Kafka Acknowledge mode to manual and in case of error terminate the consumer thread without submitting the offset (If group management is enabled new consumer will be added after triggering re balancing and poll the same batch)
Second approach is simple, just have one error topic and publish messages to error topic in case of any error, so later you can consumer them or keep track of them.

Is consumer offset commited even when failing to post to output topic in Kafka Streams?

If I have a Kafka stream application that fails to post to a topic (because the topic does not exist) does it commit the consumer offset and continue, or will it loop on the same message until it can resolve the output topic? The application merely prints an error and runs fine otherwise from what I can observe.
An example of the error when trying to post to topic:
Error while fetching metadata with correlation id 80 : {super.cool.test.topic=UNKNOWN_TOPIC_OR_PARTITION}
In my mind it would just spin on the same message until the issue is resolved in order to not lose data? I could not find a clear answer on what the default behavior is. We haven't set autocommit to off or anything like that, most of the settings are set to the default.
I am asking as we don't want to end up in a situation where the health check is fine (application is running while printing errors to log) and we are just throwing away tons of Kafka messages.
Kafka Streams will not commit the offsets for this case, as it provides at-least-once processing guarantees (in fact, it's not even possible to reconfigure Kafka Streams differently -- only stronger exactly-once guarantees are possible). Also, Kafka Streams disables auto-commit on the consumer always (and does not allow you to enable it), as Kafka Streams manages committing offset itself.
If you run with default setting, the producer should actually throw an exception and the corresponding thread should die -- you can get a callback if a thread dies, by registering KafkaStreams#uncaughtExceptionHandler().
You can also observe KafkaStreams#state() (or register a callback KafkaStreams#setStateListener()). The state will go to DEAD if all threads are dead (note, there was a bug in older version for which the state was still RUNNING for this case: https://issues.apache.org/jira/browse/KAFKA-5372)
Hence, the application should not be in a healthy state and Kafka Streams will not retry the input message but stop processing and you would need to restart the client. On restart, it would re-read the failed input message an re-try to write to the output topic.
If you want Kafka Streams to retry, you need to increase the producer config reties to avoid that the producer throws an exception and retries writing internally. This may "block" further processing eventually if producer write buffer becomes full.

Kafka-streams corrupt message-handling semantics

I notice that Kafka records have a CRC field. If a record in a log file corrupts (e.g. a single bit in the middle of the message gets flipped), what would I expect to see in the streams application in the case of:
The topic is replicated
The topic is not replicated
Since we are using Avro, I can imagine one of the following occurs:
Underlying infrastructure detects CRC error and sources it from another broker
The DeserializationExceptionHandler kicks in
Some other error occurs and the topology falls over or the message is skipped, according to policy
For CRC errors, a exception should be thrown in your Streams application when it tries to deserializer the records. And thus, the DeserializationExceptionHandler kicks in.
In Kafka, all read/writes are handled by the partition leader, and follower brokers only replicate data passively in the background and don't serve any read/write from clients.