Kafka producer timeout exception comes randomaly - apache-kafka

I am using below kafka config for one of my producer, functionality works fine.
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "hostaddress:9092");
props.put(ProducerConfig.CLIENT_ID_CONFIG,"usertest");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, "3");
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 1600);
But I get timeout exception randomly, like everything works for some 1 hour to two hours but then suddenly gets following timeout exception for few records.
In my test run, producer sent around 20k msgs and consumer received 18978.
2019-09-24 13:45:43,106 ERROR c.j.b.p.UserProducer$1 [http-nio-8185-exec-13] Send failed for record ProducerRecord(topic=user_test_topic, partition=null, headers=RecordHeaders(headers = [], isReadOnly = false), key=UPDATE_USER, value=CreatePartnerSite [userid=3, name=user123, email=testuser#gmail.com, phone=1234567890]], timestamp=null)
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
2019-09-24 13:45:43,107 ERROR c.j.b.s.UserServiceImpl [http-nio-8185-exec-13] failed to puplish

Try updating "max.block.ms" producer config to more than 60000ms.

Related

Kafka Streams: TimeoutException: Failed to update metadata after 60000 ms

I'm running a Kafka Streams application to consumer one topic with 20 partitions at traffic 15K records/sec. The application does a 1hr windowing and suppress the latest result per window. After running for sometime, it starts getting TimeoutException and then the instance is marked down by Kafka Streams.
Error trace:
Caused by: org.apache.kafka.streams.errors.StreamsException:
task [2_7] Abort sending since an error caught with a previous record
(key 372716656751 value InterimMessage [xxxxx..]
timestamp 1566433307547) to topic XXXXX due to org.apache.kafka.common.errors.TimeoutException:
Failed to update metadata after 60000 ms.
You can increase producer parameter `retries` and
`retry.backoff.ms` to avoid this error.
I already increased the value of that two configs.
retries = 5
retry.backoff.ms = 80000
Should I increase them again as the error message mentioned? What should be a good value for these two values?

Kafka - org.apache.kafka.common.errors.NetworkException

I have a kafka client code which connects to Kafka( Server 0.10.1 and client is 0.10.2) brokers. There are 2 topics with 2 different consumer group in the code and also there is a producer. Getting the NetworkException from the producer code once in a while( once in 2 days, once in 5 days, ...). We see consumer group (Re)joining info in the logs for both the consumer group followed by the NetworkException from the producer future.get() call. Not sure why are we getting this error.
Code :-
final Future<RecordMetadata> futureResponse =
producer.send(new ProducerRecord<>("ping_topic", "ping"));
futureResponse.get();
Exception :-
org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:70)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:57)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25)
Kafka API definition for NetworkException,
"A misc. network-related IOException occurred when making a request.
This could be because the client's metadata is out of date and it is
making a request to a node that is now dead."
Thanks
I had the same error while testing the Kafka Consumer. I was using a sender template for it.
In the consumer configuration I set additionally the following properties:
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 15000);
After sending the Message I added a thread sleep:
ListenableFuture<SendResult<String, String>> future =
senderTemplate.send(MyConsumer.TOPIC_NAME, jsonPayload);
Thread.Sleep(10000).
It was necessary to make the test work, but maybe not suitable for your case.

Kafka Producer: Got error produce response with correlation NETWORK_EXCEPTION

We are running kafka in distributed mode across 2 servers.
I'm sending messages to Kafka through Java sdk to a Queue which has Replication factor 2 and 1 partition.
We are running in async mode.
I don't find anything abnormal in Kafka logs.
Can anyone help in finding out what could be cause?
Properties props = new Properties();
props.put("bootstrap.servers", serverAdress);
props.put("acks", "all");
props.put("retries", "1");
props.put("linger.ms",0);
props.put("buffer.memory",10240000);
props.put("max.request.size", 1024000);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, Object> producer = new org.apache.kafka.clients.producer.KafkaProducer<>(props);
Exception trace:
-2017-08-15T02:36:29,148 [kafka-producer-network-thread | producer-1] WARN producer.internals.Sender - Got error produce response with
correlation id 353736 on topic-partition BPA_BinLogQ-0, retrying (0
attempts left). Error: NETWORK_EXCEPTION
You are getting a NETWORK_EXCEPTION so this should tell you that something is wrong with the network connection to the Kafka Broker you were producing toward. Either the broker shutdown or the TCP connection was shutdown for some reason.
A quick code dive shows the most probable cause: lost connection to the upstream broker, what causes the delivery method to fail internally inside a sender (link) - you might want to start logging trace in Sender to confirm that:
if (response.wasDisconnected()) {
log.trace("Cancelled request with header {} due to node {} being disconnected",
requestHeader, response.destination());
for (ProducerBatch batch : batches.values())
completeBatch(batch, new ProduceResponse.PartitionResponse(Errors.NETWORK_EXCEPTION, String.format("Disconnected from node %s", response.destination())),
correlationId, now);
}
Now with the batch completed in a non-success fashion, it gets retried, but from the logs you have attached it looks like, you ran out of retries (0 attempts left), so it propagates to your level (link):
if (canRetry(batch, response, now)) {
log.warn(
"Got error produce response with correlation id {} on topic-partition {}, retrying ({} attempts left). Error: {}",
....
reenqueueBatch(batch, now);
}
So the ideas are:
investigate your network connectivity - unfortunately this might mean tracing at least on client-side (esp. NetworkClient that does all the upstream broker management) to see if there's any connection loss;
increase producer's retries value (though newer versions of Kafka set it to MAX_INT or so).

Apache KafkaProducer throwing TimeoutException when sending a message

I have a KafkaProducer that has suddenly started throwing TimeoutExceptions when I try to send a message. Even though I have set the max.block.ms property to 60000ms, and the test blocks for 60s, the error message I am getting always has a time of less than 200ms. The only time it actually shows 60000ms is if I run it in debug mode and step through the waitOnMetadata method manually.
error example:
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 101 ms.
Does anyone know why it would suddenly not be able to update the metadata? I know it's not my implementation of the producer that is faulty, as not only have I not changed it since it was working, if I run my tests on another server they all pass. What server side reasons could there be for this? Should I restart my brokers? And why would the timeout message show an incorrect time if I just let it run?
Producer setup:
val props = new Properties()
props.put("bootstrap.servers", getBootstrapServersFor(datacenter.mesosLocal))
props.put("batch.size","0")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("max.block.ms","60000")
new KafkaProducer[String,String](props)
I tried to use the console producer to see if I could send messages and I got a lot of WARN Error while fetching metadata with correlation id 0 : {metadata-1=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient) message back. After stopping and restarting the broker I was then able to send and consume messages again.

kafka producer throw EOFException during running

I'm using Kafka 0.8.2.1 and the New Producer API.
The server is setup as single node in local network.
The problem is that the producer throws an EOFException after running for a while (15 minutes last time I checked), but it doesn't seem to matter because my producer continues to work after this.
The way I initialize the producer:
Map<String, Object> configs = new HashMap<>();
configs.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.1.101:9092");
configs.put(ProducerConfig.ACKS_CONFIG, "1");
configs.put(ProducerConfig.BLOCK_ON_BUFFER_FULL_CONFIG, "false");
configs.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
configs.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
this.producer = new KafkaProducer<>(configs);
And the exception I got:
WARN [2015-06-17 02:07:28,896] org.apache.kafka.common.network.Selector: Error in I/O with /192.168.1.101
! java.io.EOFException: null
! at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62) ~[kafka-clients-0.8.2.1.jar:na]
! at org.apache.kafka.common.network.Selector.poll(Selector.java:248) ~[kafka-clients-0.8.2.1.jar:na]
! at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) [kafka-clients-0.8.2.1.jar:na]
! at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) [kafka-clients-0.8.2.1.jar:na]
! at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) [kafka-clients-0.8.2.1.jar:na]
! at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Even though my code still works with this, I'd like to know why and how to prevent it.
Did you try to change "connections.max.idle.ms"? Its default is 15 minutes. If you see this error every 15 minutes try to reduce this timeout.
https://issues.apache.org/jira/browse/KAFKA-3205
Suggesting increase connections.max.idle.ms