PhpAmqpLib Getting 'PRECONDITION_FAILED - consumer ack timed out on channel 1', (0, 0), - celery

After adding 'delivery_mode' => 2 started getting error Unrecoverable error: PreconditionFailed(406, 'PRECONDITION_FAILED - consumer ack timed out on channel 1', (0, 0), '') Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
Intermittently its throwing this error and pod is getting restarted.

Related

Kafka Producer Retry and Failed record handling

My requirement as follows -
apart from broker metadata related error -I try to simulate a RecordTooLargeException while sending the message to the Kafka Topic.
For the producer configuration I add acks: all and retries: 5
Also I use addCallback method to send the message.
I received org.apache.kafka.common.errors.RecordTooLargeException: The message is 2000103 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.
but I did not notice any retry ( 5 times ) in the log.
My requirement is retry 5 times , then marked the record as permanent failure and send back to the call back handler - for further reprocess the failed record( ex. send to DLT or DB)
How can I achieve this kind of retry and handling?
It's simple. As per theory KAFKA Producer API doesn't retry on RecordTooLargeException, that means it is a non-retriable exception. If you still want to break this and retry irrespectively, then you can catch that Exception string through the Search String when error returned from the broker and retry from the catch block as many as times you want.
KafkaProducer has two types of errors. Retriable errors are those that can be resolved by sending the message again. For example, a connection error can be resolved because the connection may get reestablished. A “not leader for partition” error can be resolved when a new leader is elected for the partition and the client metadata is refreshed. KafkaProducer can be configured to retry those errors automatically, so the application code will get retriable exceptions only when the number of retries was exhausted and the error was not resolved. Some errors will not be resolved by retrying — for example, “Message size too large.” In those cases, KafkaProducer will not attempt a retry and will return the exception immediately.
-- Kafka: The Definitive Guide 2nd Edition, Chapter 3
RecordTooLargeException is a non-retriable exception, retrying makes no sense if the max.request.size configuration does not change. Therefore, Kafka producer will not attempt a retry and will return the exception immediately. The callback handler will be triggered for further reprocess.

Kakfa retries Concept - What Basis retries will be stopped in Kafka?

As am new to Kafka , trying to understand the retries concept in Kafka . What basis retries process will be completed ?
Example Retries parameter we set as 7 . Now questions here ,
Kafka will be retried in all 7 times ?
Will be tried until successful process ? If so , How Kafka will come to know about successful ?
If that would be depends upon any parameter what Is that parameter and how ?
In distributed systems, retries are inevitable. From network errors to replication issues and even outages in downstream dependencies, services operating at a massive scale must be prepared to encounter, identify, and handle failure as gracefully as possible.
Kafka will retry until the initiated process is successfully completed or retry count is zero.
Kafka maintains the status of each API call ( producer , consumer, and Streams ), and if the error condition meets then retry count is decreased.
Please go through the completeBatch function of the Sender.java in the following URL to get more information.
https://github.com/apache/kafka/blob/68ac551966e2be5b13adb2f703a01211e6f7a34b/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java
I guess you are talking about producer retrying to send failed messages.
From kafka producer retries property documentation -
"Setting a value greater than zero will cause the client to resend any
record whose send fails with a potentially transient error."
This means that kafka producer will retry if the error it encountered is considered "Retriable". not all errors are retriable - for example, if the target kafka topic does not exist, theres no point in trying to send the message again.
but if for example the connection was interrupted, it makes sense to try again.
Important to note - retries are only relevant if you have set broker ack != 0.
So, in your example you have 7 retries configured.
I assume that ack is set to a value different than 0 because then no retries will be attempted.
If your message failed with a non-retriable error, Kafka producer will not try to send the message again (it will actually 'give-up' on that message and move on to next messages).
If your message failed with a retriable error, Kafka producer will retry sending until message is successfully sent, or until retries are exhausted (when 7 retries were attempted and none of them succeeded).
Kafka client producer knows when your message was successfully sent to broker because when ack is set to 1\all, the kafka broker is "Acknowledging" any message received and informs the producer (in a kind of handshake between the producer and broker).
see acks & retries # https://kafka.apache.org/documentation/#producerconfigs
Kafka reties happens for transient exceptions such as NotEnoughReplicaException.
In Kafka version <=2.0 default retry is 0.
In Kafka version > 2.0 default retry is Integer.MAX
From kafka 2.1 retries are bounded to timeouts, there are couple of producer configuration such as.
delivery.timeout.ms=120000ms - by default producer will retry for 2 mins, if retry is not successful after 2 mins the request will not send to broker and we have to handle manually.
retry.backoff.ms=100ms - by default every 100ms producer will retry till delivery.timeout reaches.

Kafka producer timeout exception comes randomaly

I am using below kafka config for one of my producer, functionality works fine.
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "hostaddress:9092");
props.put(ProducerConfig.CLIENT_ID_CONFIG,"usertest");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, "3");
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 1600);
But I get timeout exception randomly, like everything works for some 1 hour to two hours but then suddenly gets following timeout exception for few records.
In my test run, producer sent around 20k msgs and consumer received 18978.
2019-09-24 13:45:43,106 ERROR c.j.b.p.UserProducer$1 [http-nio-8185-exec-13] Send failed for record ProducerRecord(topic=user_test_topic, partition=null, headers=RecordHeaders(headers = [], isReadOnly = false), key=UPDATE_USER, value=CreatePartnerSite [userid=3, name=user123, email=testuser#gmail.com, phone=1234567890]], timestamp=null)
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
2019-09-24 13:45:43,107 ERROR c.j.b.s.UserServiceImpl [http-nio-8185-exec-13] failed to puplish
Try updating "max.block.ms" producer config to more than 60000ms.

Kafka Streams: TimeoutException: Failed to update metadata after 60000 ms

I'm running a Kafka Streams application to consumer one topic with 20 partitions at traffic 15K records/sec. The application does a 1hr windowing and suppress the latest result per window. After running for sometime, it starts getting TimeoutException and then the instance is marked down by Kafka Streams.
Error trace:
Caused by: org.apache.kafka.streams.errors.StreamsException:
task [2_7] Abort sending since an error caught with a previous record
(key 372716656751 value InterimMessage [xxxxx..]
timestamp 1566433307547) to topic XXXXX due to org.apache.kafka.common.errors.TimeoutException:
Failed to update metadata after 60000 ms.
You can increase producer parameter `retries` and
`retry.backoff.ms` to avoid this error.
I already increased the value of that two configs.
retries = 5
retry.backoff.ms = 80000
Should I increase them again as the error message mentioned? What should be a good value for these two values?

Kafka Streams: Retries

Kafka version - 1.0.1
I am getting the following exception at random intervals. Tried increasing the request.timeout.ms to 5 mins, it's still continued to timeout at random intervals again (few hours). It's not clear why the exception arises, but the restart seems to recover from where it left, but requires a manual task. So, tried enabling the retries but that seems no effect, because I don't see any retries in the logs (meaning fails, then attempts first time, fails again, then second time and there on until the max retries). Can you please shed some light on the below exception and advise on how we can let the Kafka stream application continue to run when this exception occurs, perhaps retry again? If we need to increase the request.timeout.ms for max value, what is the downside that we need to be aware of, meaning we should not let the thread go on hung state indefinetely when the broker fails?
props.put(ProducerConfig.RETRIES_CONFIG, 3);
2018-07-05 06:04:25 ERROR Housestream:91 - Unknown Exception occurred
org.apache.kafka.streams.errors.StreamsException: task [1_1] Abort sending since an error caught with a previous record (key GCB21K1X value [L#5e86f18a timestamp 1530783812110)
to topic housestream-digitstore-changelog due to org.apache.kafka.common.errors.TimeoutException: Expiring 201 record(s) for housestream-digitstore-changelog: 30144 ms has passed since last append.
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:118)
at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:627)
at org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:287)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:238)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 201 record(s) for housestream-digitstore-changelog: 30144 ms has passed since last append
Tried increasing the request timeout to max integer value but ran into another Timeout exception.
2018-07-05 12:22:15 ERROR Housestream:179 - Unknown Exception occurred
org.apache.kafka.streams.errors.StreamsException: task [1_0] Exception caught while punctuating processor 'validatequote'
at org.apache.kafka.streams.processor.internals.StreamTask.punctuate(StreamTask.java:267)
at org.apache.kafka.streams.processor.internals.PunctuationQueue.mayPunctuate(PunctuationQueue.java:54)
at org.apache.kafka.streams.processor.internals.StreamTask.maybePunctuateSystemTime(StreamTask.java:619)
at org.apache.kafka.streams.processor.internals.AssignedTasks.punctuate(AssignedTasks.java:430)
at org.apache.kafka.streams.processor.internals.TaskManager.punctuate(TaskManager.java:324)
at org.apache.kafka.streams.processor.internals.StreamThread.punctuate(StreamThread.java:969)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
Caused by: org.apache.kafka.streams.errors.StreamsException: task [1_1] Abort sending since an error caught with a previous record (key 32342 value com.int.digital.QUOTE#2c73fa63 timestamp 153083237883) to topic digital_quote due to org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time 60000 ms..
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:118)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:819)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:760)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:100)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:78)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:87)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:113)
at org.cox.processor.CheckQuote.handleTasks(CheckQuote.java:122)
at org.cox.processor.CheckQuote$1.punctuate(CheckQuote.java:145)
at org.apache.kafka.streams.processor.internals.ProcessorNode$4.run(ProcessorNode.java:131)
at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208)
at org.apache.kafka.streams.processor.internals.ProcessorNode.punctuate(ProcessorNode.java:134)
at org.apache.kafka.streams.processor.internals.StreamTask.punctuate(StreamTask.java:263)
... 8 more
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time 60000 ms.