I have read When does the Apache Kafka client throw a "Batch Expired" exception?, but I feel like I'm missing something.
For my program I don't need high throughput, so I've created a "blocking" write:
kafkaProducer.send(record).get();
Why would this give me a batch expired error sometimes?
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test2-0: 42084 ms has passed since batch creation plus linger time
Related
My requirement as follows -
apart from broker metadata related error -I try to simulate a RecordTooLargeException while sending the message to the Kafka Topic.
For the producer configuration I add acks: all and retries: 5
Also I use addCallback method to send the message.
I received org.apache.kafka.common.errors.RecordTooLargeException: The message is 2000103 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.
but I did not notice any retry ( 5 times ) in the log.
My requirement is retry 5 times , then marked the record as permanent failure and send back to the call back handler - for further reprocess the failed record( ex. send to DLT or DB)
How can I achieve this kind of retry and handling?
It's simple. As per theory KAFKA Producer API doesn't retry on RecordTooLargeException, that means it is a non-retriable exception. If you still want to break this and retry irrespectively, then you can catch that Exception string through the Search String when error returned from the broker and retry from the catch block as many as times you want.
KafkaProducer has two types of errors. Retriable errors are those that can be resolved by sending the message again. For example, a connection error can be resolved because the connection may get reestablished. A “not leader for partition” error can be resolved when a new leader is elected for the partition and the client metadata is refreshed. KafkaProducer can be configured to retry those errors automatically, so the application code will get retriable exceptions only when the number of retries was exhausted and the error was not resolved. Some errors will not be resolved by retrying — for example, “Message size too large.” In those cases, KafkaProducer will not attempt a retry and will return the exception immediately.
-- Kafka: The Definitive Guide 2nd Edition, Chapter 3
RecordTooLargeException is a non-retriable exception, retrying makes no sense if the max.request.size configuration does not change. Therefore, Kafka producer will not attempt a retry and will return the exception immediately. The callback handler will be triggered for further reprocess.
We have a source connector that reads from rdbms and put to kafka. It uses schema registry with avro schema.
I am finding following exceptions in kafka connect log and schema registry log respectively.
1.
Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:426)
WorkerSourceTask{id=A-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:443)
Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:186)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
.
.
Caused by: org.apache.kafka.connect.errors.DataException: Failed to serialize Avro data from topic A :
at io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:91)
at org.apache.kafka.connect.storage.Converter.fromConnectData(Converter.java:63)
.
.
Caused by: org.apache.kafka.common.errors.SerializationException: Error registering Avro schema:
.
.
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Register operation timed out; error code: 50002
.
.
Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:187)
Stopping JDBC source task (io.confluent.connect.jdbc.source.JdbcSourceTask:314)
Closing the Kafka producer with timeoutMillis = 30000 ms.
(org.apache.kafka.clients.producer.KafkaProducer:1182)
2.
Wait to catch up until the offset at 1 (io.confluent.kafka.schemaregistry.storage.KafkaStore:304)
Request Failed with exception (io.confluent.rest.exceptions.DebuggableExceptionMapper:62)
io.confluent.kafka.schemaregistry.rest.exceptions.RestSchemaRegistryTimeoutException: Register operation timed out
at io.confluent.kafka.schemaregistry.rest.exceptions.Errors.operationTimeoutException(Errors.java:132)
.
.
Caused by: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Write to the Kafka store timed out while
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.register(KafkaSchemaRegistry.java:508)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.registerOrForward(KafkaSchemaRegistry.java:553)
.
.
Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreTimeoutException: KafkaStoreReaderThread failed to reach target offset within the timeout interval. targetOffset: 3, offsetReached: 1, timeout(ms): 50
0
So basically schema registry before registering schema moves offset to latest and there it time out 500ms.
My question was this.
How can I find why it is not able to read from kafka?
Does the source connector task restart or poll data for the failed task of one connector? Because in later section of the log I see this.
Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:426)
WorkerSourceTask{id=A-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:443)
So eariler it failed after this, but now it is not printing it, which means it passed.
The key thing to note is that when it failed eariler reading, it failed task for only one connector A and others passed. Later I didn't find the exception for the connector A.
If the task is not starting or connector is not polling again, I need to restart task using rest API.
Any help will be greatly appriciated.
Thanks in advance.
Regarding your question title, read the error.
task will not recover until manually restarted
If you have more than one task, you would still expect to see logs from other tasks.
As far as offset commits, source task offsets would not be committed until the task succeeds, and no logs given show something "moving to latest"
The error has nothing to do with reading from Kafka. The error is a timeout in your schema registry client in the AvroConverter, which isn't required for Kafka Connect.
I'm running a Kafka Streams application to consumer one topic with 20 partitions at traffic 15K records/sec. The application does a 1hr windowing and suppress the latest result per window. After running for sometime, it starts getting TimeoutException and then the instance is marked down by Kafka Streams.
Error trace:
Caused by: org.apache.kafka.streams.errors.StreamsException:
task [2_7] Abort sending since an error caught with a previous record
(key 372716656751 value InterimMessage [xxxxx..]
timestamp 1566433307547) to topic XXXXX due to org.apache.kafka.common.errors.TimeoutException:
Failed to update metadata after 60000 ms.
You can increase producer parameter `retries` and
`retry.backoff.ms` to avoid this error.
I already increased the value of that two configs.
retries = 5
retry.backoff.ms = 80000
Should I increase them again as the error message mentioned? What should be a good value for these two values?
Kafka version - 1.0.1
I am getting the following exception at random intervals. Tried increasing the request.timeout.ms to 5 mins, it's still continued to timeout at random intervals again (few hours). It's not clear why the exception arises, but the restart seems to recover from where it left, but requires a manual task. So, tried enabling the retries but that seems no effect, because I don't see any retries in the logs (meaning fails, then attempts first time, fails again, then second time and there on until the max retries). Can you please shed some light on the below exception and advise on how we can let the Kafka stream application continue to run when this exception occurs, perhaps retry again? If we need to increase the request.timeout.ms for max value, what is the downside that we need to be aware of, meaning we should not let the thread go on hung state indefinetely when the broker fails?
props.put(ProducerConfig.RETRIES_CONFIG, 3);
2018-07-05 06:04:25 ERROR Housestream:91 - Unknown Exception occurred
org.apache.kafka.streams.errors.StreamsException: task [1_1] Abort sending since an error caught with a previous record (key GCB21K1X value [L#5e86f18a timestamp 1530783812110)
to topic housestream-digitstore-changelog due to org.apache.kafka.common.errors.TimeoutException: Expiring 201 record(s) for housestream-digitstore-changelog: 30144 ms has passed since last append.
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:118)
at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:627)
at org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:287)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:238)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 201 record(s) for housestream-digitstore-changelog: 30144 ms has passed since last append
Tried increasing the request timeout to max integer value but ran into another Timeout exception.
2018-07-05 12:22:15 ERROR Housestream:179 - Unknown Exception occurred
org.apache.kafka.streams.errors.StreamsException: task [1_0] Exception caught while punctuating processor 'validatequote'
at org.apache.kafka.streams.processor.internals.StreamTask.punctuate(StreamTask.java:267)
at org.apache.kafka.streams.processor.internals.PunctuationQueue.mayPunctuate(PunctuationQueue.java:54)
at org.apache.kafka.streams.processor.internals.StreamTask.maybePunctuateSystemTime(StreamTask.java:619)
at org.apache.kafka.streams.processor.internals.AssignedTasks.punctuate(AssignedTasks.java:430)
at org.apache.kafka.streams.processor.internals.TaskManager.punctuate(TaskManager.java:324)
at org.apache.kafka.streams.processor.internals.StreamThread.punctuate(StreamThread.java:969)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
Caused by: org.apache.kafka.streams.errors.StreamsException: task [1_1] Abort sending since an error caught with a previous record (key 32342 value com.int.digital.QUOTE#2c73fa63 timestamp 153083237883) to topic digital_quote due to org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time 60000 ms..
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:118)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:819)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:760)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:100)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:78)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:87)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:113)
at org.cox.processor.CheckQuote.handleTasks(CheckQuote.java:122)
at org.cox.processor.CheckQuote$1.punctuate(CheckQuote.java:145)
at org.apache.kafka.streams.processor.internals.ProcessorNode$4.run(ProcessorNode.java:131)
at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208)
at org.apache.kafka.streams.processor.internals.ProcessorNode.punctuate(ProcessorNode.java:134)
at org.apache.kafka.streams.processor.internals.StreamTask.punctuate(StreamTask.java:263)
... 8 more
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time 60000 ms.
I have two broker 1.0.0 kafka cluster and I am running 1.0.0 kafka stream API application against this kafka.I increased the producer request.timeout.ms to 5 minutes to fix producer Timeoutexception.
Currently I am getting below two types of exceptions after running some time. I am trying to fix these exceptions as suggested in Apache Kafka: TimeoutException and then nothing works But incomplete solution was here. Is this solution recommendable (decreasing producer batch.size). Please help.
Exception 1
2017-12-08 13:11:55,129 ERROR o.a.k.s.p.i.RecordCollectorImpl [sample-app-0.0.1-156ec0d4-6d7c-40b0-a493-370f8d9a092c-StreamThread-1] task [2_0] Error sending record (key 5a12c529e532af0b84f5d937 value com.kafka.streams.SampleEvent#54a6900d timestamp 1512536799387) to topic abc due to org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time 60000 ms.; No more records will be sent and no more offsets will be recorded for this task.
2017-12-08 13:11:55,131 ERROR o.a.k.s.p.i.AssignedTasks [sample-app-0.0.1-156ec0d4-6d7c-40b0-a493-370f8d9a092c-StreamThread-1] stream-thread [sample-app-0.0.1-156ec0d4-6d7c-40b0-a493-370f8d9a092c-StreamThread-1] Failed to process stream task 2_0 due to the following error: org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=2_0, processor=KSTREAM-SOURCE-0000000004, topic=Sample-Event, partition=0, offset=508417
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:232)
at org.apache.kafka.streams.processor.internals.AssignedTasks.process(AssignedTasks.java:403)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:317)
at org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:942)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:822)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
Caused by: org.apache.kafka.streams.errors.StreamsException: task [2_0] Abort sending since an error caught with a previous record (key 5a12c529e532af0b84f5d937 value com.kafka.streams.SampleEvent#54a6900d timestamp 1512536799387) to topic abc due to org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time 60000 ms..
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:118)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:819)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:760)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:100)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:78)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:87)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:85)
at org.apache.kafka.streams.kstream.internals.KStreamTransform$KStreamTransformProcessor.process(KStreamTransform.java:56)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:46)
at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:124)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:85)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:80)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:216)
... 6 more
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time 60000 ms.
Exception 2
2017-12-11 11:08:35,257 ERROR o.a.k.s.p.i.RecordCollectorImpl [kafka-producer-network-thread | sample-app-0.0.1-030b5133-df00-4abd-a3de-8bfab114f626-StreamThread-1-producer] task [2_0] Error sending record (key 5a12c529e532af0b84f5d937 value com.kafka.streams.SampleEvent#1758de61 timestamp 1512795449471) to topic abc due to org.apache.kafka.common.errors.TimeoutException: Expiring 14 record(s) for abc-0: 122597 ms has passed since last append; No more records will be sent and no more offsets will be recorded for this task.
2017-12-11 11:08:56,001 ERROR o.a.k.s.p.i.AssignedTasks [sample-app-0.0.1-030b5133-df00-4abd-a3de-8bfab114f626-StreamThread-1] stream-thread [sample-app-0.0.1-030b5133-df00-4abd-a3de-8bfab114f626-StreamThread-1] Failed to commit stream task 2_0 due to the following error: org.apache.kafka.streams.errors.StreamsException: task [2_0] Abort sending since an error caught with a previous record (key 5a12c529e532af0b84f5d937 value com.kafka.streams.SampleEvent#1758de61 timestamp 1512795449471) to topic abc due to org.apache.kafka.common.errors.TimeoutException: Expiring 14 record(s) for abc-0: 122597 ms has passed since last append.
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:118)
at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:627)
at org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:287)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:238)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 14 record(s) for abc-0: 122597 ms has passed since last append
We faced similar issue which we resolved
First issue by setting:
max.block.ms to something higher than currently configured value.
Second issue by: increasing the batch.size and decreasing the linger.ms (might increase latency) on Kafka Producer side. Increasing batch.size would send more batches with fewer messages in each batch.
This looks something that often happens when an expected topic hasn't been created. Try looking further back in the log files.
You can also explicity use the admin client to check which topics exist.
first issue is due to this reason: ( Producer sends continuous heartbeat and it will wait for 60,000 ms (default value) for the metadata. If metadata isn't present in the specified time, it throws streams timeout exception. To fix this, add kafka producer config (ProducerConfig.MAX_BLOCK_MS_CONFIG) to some value grater that 60000ms. This will resolve the issue.
if you are using nifi and kafka with SASL_SSL without kerberos and you are providing kafka client jaas then increase the Metadata wait time to 100sec and Acknowledgment wait time to 100 sec this will work for you.