Kakfa retries Concept - What Basis retries will be stopped in Kafka? - apache-kafka

As am new to Kafka , trying to understand the retries concept in Kafka . What basis retries process will be completed ?
Example Retries parameter we set as 7 . Now questions here ,
Kafka will be retried in all 7 times ?
Will be tried until successful process ? If so , How Kafka will come to know about successful ?
If that would be depends upon any parameter what Is that parameter and how ?

In distributed systems, retries are inevitable. From network errors to replication issues and even outages in downstream dependencies, services operating at a massive scale must be prepared to encounter, identify, and handle failure as gracefully as possible.
Kafka will retry until the initiated process is successfully completed or retry count is zero.
Kafka maintains the status of each API call ( producer , consumer, and Streams ), and if the error condition meets then retry count is decreased.
Please go through the completeBatch function of the Sender.java in the following URL to get more information.
https://github.com/apache/kafka/blob/68ac551966e2be5b13adb2f703a01211e6f7a34b/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

I guess you are talking about producer retrying to send failed messages.
From kafka producer retries property documentation -
"Setting a value greater than zero will cause the client to resend any
record whose send fails with a potentially transient error."
This means that kafka producer will retry if the error it encountered is considered "Retriable". not all errors are retriable - for example, if the target kafka topic does not exist, theres no point in trying to send the message again.
but if for example the connection was interrupted, it makes sense to try again.
Important to note - retries are only relevant if you have set broker ack != 0.
So, in your example you have 7 retries configured.
I assume that ack is set to a value different than 0 because then no retries will be attempted.
If your message failed with a non-retriable error, Kafka producer will not try to send the message again (it will actually 'give-up' on that message and move on to next messages).
If your message failed with a retriable error, Kafka producer will retry sending until message is successfully sent, or until retries are exhausted (when 7 retries were attempted and none of them succeeded).
Kafka client producer knows when your message was successfully sent to broker because when ack is set to 1\all, the kafka broker is "Acknowledging" any message received and informs the producer (in a kind of handshake between the producer and broker).
see acks & retries # https://kafka.apache.org/documentation/#producerconfigs

Kafka reties happens for transient exceptions such as NotEnoughReplicaException.
In Kafka version <=2.0 default retry is 0.
In Kafka version > 2.0 default retry is Integer.MAX
From kafka 2.1 retries are bounded to timeouts, there are couple of producer configuration such as.
delivery.timeout.ms=120000ms - by default producer will retry for 2 mins, if retry is not successful after 2 mins the request will not send to broker and we have to handle manually.
retry.backoff.ms=100ms - by default every 100ms producer will retry till delivery.timeout reaches.

Related

Kafka Producer Retry and Failed record handling

My requirement as follows -
apart from broker metadata related error -I try to simulate a RecordTooLargeException while sending the message to the Kafka Topic.
For the producer configuration I add acks: all and retries: 5
Also I use addCallback method to send the message.
I received org.apache.kafka.common.errors.RecordTooLargeException: The message is 2000103 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.
but I did not notice any retry ( 5 times ) in the log.
My requirement is retry 5 times , then marked the record as permanent failure and send back to the call back handler - for further reprocess the failed record( ex. send to DLT or DB)
How can I achieve this kind of retry and handling?
It's simple. As per theory KAFKA Producer API doesn't retry on RecordTooLargeException, that means it is a non-retriable exception. If you still want to break this and retry irrespectively, then you can catch that Exception string through the Search String when error returned from the broker and retry from the catch block as many as times you want.
KafkaProducer has two types of errors. Retriable errors are those that can be resolved by sending the message again. For example, a connection error can be resolved because the connection may get reestablished. A “not leader for partition” error can be resolved when a new leader is elected for the partition and the client metadata is refreshed. KafkaProducer can be configured to retry those errors automatically, so the application code will get retriable exceptions only when the number of retries was exhausted and the error was not resolved. Some errors will not be resolved by retrying — for example, “Message size too large.” In those cases, KafkaProducer will not attempt a retry and will return the exception immediately.
-- Kafka: The Definitive Guide 2nd Edition, Chapter 3
RecordTooLargeException is a non-retriable exception, retrying makes no sense if the max.request.size configuration does not change. Therefore, Kafka producer will not attempt a retry and will return the exception immediately. The callback handler will be triggered for further reprocess.

How does the retry logic works in kafka producers?

How does the retry logic works in producers ?
I checked the producer config documentation related to retry but could not understand clearly?
Please simplify and help me understand. Thanks
The Producer config property retries defaults to 0 and is the retry
count if Producer does not get an ack from Kafka Broker. The Producer
will only retry if record send fail is deemed a transient error (API).
The Producer will act as if your producer code resent the record on a
failed attempt. Note that timeouts are re-tried, but retry.backoff.ms
(default to 100 ms) is used to wait after failure before retrying the
request again. If you set retry > 0, then you should also set
max.in.flight.requests.per.connection to 1, or there is the
possibility that a re-tried message could be delivered out of order.
You have to decide if out of order message delivery matters for your
application.
For more details, refer here

Artemis - How to avoid TransactionRolledBackException for Non-Transactional session

I use live/backup with shared-storage, and I use a non-transacted JMS session. I always send one message, and I always receive one message then acknowledge and receive second message only after successful first acknowledge.
I got this exception in my non-transacted session:
Execution of JMS message listener failed. Caused by: [javax.jms.TransactionRolledBackException - AMQ219030: The transaction was rolled back on failover to a backup server]
javax.jms.TransactionRolledBackException: AMQ219030: The transaction was rolled back on failover to a backup server
at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.rollbackOnFailover(ClientSessionImpl.java:904)
at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:927)
at org.apache.activemq.artemis.jms.client.ActiveMQMessage.acknowledge(ActiveMQMessage.java:719)
It happens because the session was marked as "rollbackOnly". I got this state after the following steps:
I use Spring-JMS. Consumer session works 24/7 (infinite loop session.receive())
The Master Node crashed, then the Master node was restarted
After recovery (After a couple of hours), I sent a message to the queue. The consumer read the message and throw Exception on acknowledge(because was marked as rollback-only)
I read message again (this is not very bad for my task) but Redelivery Count has not been increased
My consumer code:
onMessage(Message message) {
if (redeliveryCount(message) > 0){
processAsDublicate(message); // It's not invoked - it is error in my business logic.
}
}
I migrated from another broker and and I thought not to change the client logic
Question:
How to avoid TransactionRolledBackException for Non-Transactional session? If this is not possible i should change consumer code?
Thank you in Advance
UPDATE AFTER ANSWER:
https://github.com/apache/activemq-artemis/tree/2.14.0/examples/features/ha/replicated-failback
This example is not suitable for my case - I don't have non-acknowledged messages. I got this state after the following steps: 1) Restart server 2) consume message 3) acknoledge message
We use a broker for ~30 applications (24/7) ~ 200 consumers in total
For example, on the weekend we restart the JMS Broker
Will all consumers start getting this exception after consume new messages
(They don't have non-acknowledged messages)
The TransactionRolledBackException is expected as you can see in the replicated-failback example.
To prevent a consumer from receiving the same message more times, an idempotent consumer must be implemented, ie Apache Camel provides an Idempotent consumer component that would work with any JMS provider, see: http://camel.apache.org/idempotent-consumer.html

Spring Kafka Stream - Unacknowledged message with no error

I am using #StreamListener to consume the Kafka message.
I have set autoCommitOffset to false and autoCommitOnError to false.
I am sending all failed message to DLQ topic as well for maxAttempt for failure. I have a question while testing the changes.
What will happen if I am not acknowledging the consumed message and also not throwing any error ? Will Kafka send the message automatically after sometime ?
when i throw error, replay kicks in and it does retry till my maxAttempt configuration and the failed message goes to DLQ topic.
Let me know if Kafka support retry if the consumer not throwing any error and not acknowledging the message.
What will happen if I am not acknowledging the consumed message and also not throwing any error ? Will Kafka send the message automatically after sometime ?
No; not unless you process no further messages, and even then, you will only get a redelivery after you restart the application.
Kafka doesn't "acknowledge" discrete messages; it just stores the last processed offset within a partition.

Kafka Producer is not retring when Broker is Down

I have setup up Kafka using version 0.9 with the basic configuration as
1 Broker 1 Topic and 1 Partition.
Below are Producer Configurations that I have added to enable the retry from Producer.
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.RETRIES_CONFIG, 5);
props.put(ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG, 500);
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.MAX_BLOCK_MS_CONFIG, 500);
props.put(ProducerConfig.METADATA_MAX_AGE_CONFIG, 50);
I understand from the documents that
Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error.
Both my Broker & Zookeeper are down and the retry operation is not working.
ERROR o.s.k.s.LoggingProducerListener - Exception thrown when sending a message to topic TestTopic1|
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 500 ms.
I need to know if I am missing anything here for the retry to work.
Resend (retry) works only if you have connection to the Broker and something happened during sending a message.
So, if your Broker is dead, there is no any reason to send message at all - no connection. And that is an exception about.
I think retries should work anyway, even if the broker is down. This is the whole reason to have retries in the first place. Could be a temporary network issue after all.
There is a bug in the Kafka 0.9.0.1 producer which causes retries not to work. See here.
Fixed in 0.9.0.2 (which is not released yet) and 0.10. I'd upgrade the broker to 0.10 and try again.
As #artem answered Kafka producer config is not designed to retry when broker is down. It only retries during transient errors which is pretty much useless to be honest. It beats me why spring-Kafka did not take care of it.
Anyways to solve the situation I handled this with #Retry config with springboot. Checkin this SO answer for details : https://stackoverflow.com/a/65248428/6621377