I have a KafkaProducer that has suddenly started throwing TimeoutExceptions when I try to send a message. Even though I have set the max.block.ms property to 60000ms, and the test blocks for 60s, the error message I am getting always has a time of less than 200ms. The only time it actually shows 60000ms is if I run it in debug mode and step through the waitOnMetadata method manually.
error example:
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 101 ms.
Does anyone know why it would suddenly not be able to update the metadata? I know it's not my implementation of the producer that is faulty, as not only have I not changed it since it was working, if I run my tests on another server they all pass. What server side reasons could there be for this? Should I restart my brokers? And why would the timeout message show an incorrect time if I just let it run?
Producer setup:
val props = new Properties()
props.put("bootstrap.servers", getBootstrapServersFor(datacenter.mesosLocal))
props.put("batch.size","0")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("max.block.ms","60000")
new KafkaProducer[String,String](props)
I tried to use the console producer to see if I could send messages and I got a lot of WARN Error while fetching metadata with correlation id 0 : {metadata-1=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient) message back. After stopping and restarting the broker I was then able to send and consume messages again.
Related
I am using below kafka config for one of my producer, functionality works fine.
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "hostaddress:9092");
props.put(ProducerConfig.CLIENT_ID_CONFIG,"usertest");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, "3");
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 1600);
But I get timeout exception randomly, like everything works for some 1 hour to two hours but then suddenly gets following timeout exception for few records.
In my test run, producer sent around 20k msgs and consumer received 18978.
2019-09-24 13:45:43,106 ERROR c.j.b.p.UserProducer$1 [http-nio-8185-exec-13] Send failed for record ProducerRecord(topic=user_test_topic, partition=null, headers=RecordHeaders(headers = [], isReadOnly = false), key=UPDATE_USER, value=CreatePartnerSite [userid=3, name=user123, email=testuser#gmail.com, phone=1234567890]], timestamp=null)
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
2019-09-24 13:45:43,107 ERROR c.j.b.s.UserServiceImpl [http-nio-8185-exec-13] failed to puplish
Try updating "max.block.ms" producer config to more than 60000ms.
I have a kafka client code which connects to Kafka( Server 0.10.1 and client is 0.10.2) brokers. There are 2 topics with 2 different consumer group in the code and also there is a producer. Getting the NetworkException from the producer code once in a while( once in 2 days, once in 5 days, ...). We see consumer group (Re)joining info in the logs for both the consumer group followed by the NetworkException from the producer future.get() call. Not sure why are we getting this error.
Code :-
final Future<RecordMetadata> futureResponse =
producer.send(new ProducerRecord<>("ping_topic", "ping"));
futureResponse.get();
Exception :-
org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:70)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:57)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25)
Kafka API definition for NetworkException,
"A misc. network-related IOException occurred when making a request.
This could be because the client's metadata is out of date and it is
making a request to a node that is now dead."
Thanks
I had the same error while testing the Kafka Consumer. I was using a sender template for it.
In the consumer configuration I set additionally the following properties:
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 15000);
After sending the Message I added a thread sleep:
ListenableFuture<SendResult<String, String>> future =
senderTemplate.send(MyConsumer.TOPIC_NAME, jsonPayload);
Thread.Sleep(10000).
It was necessary to make the test work, but maybe not suitable for your case.
We are running kafka in distributed mode across 2 servers.
I'm sending messages to Kafka through Java sdk to a Queue which has Replication factor 2 and 1 partition.
We are running in async mode.
I don't find anything abnormal in Kafka logs.
Can anyone help in finding out what could be cause?
Properties props = new Properties();
props.put("bootstrap.servers", serverAdress);
props.put("acks", "all");
props.put("retries", "1");
props.put("linger.ms",0);
props.put("buffer.memory",10240000);
props.put("max.request.size", 1024000);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, Object> producer = new org.apache.kafka.clients.producer.KafkaProducer<>(props);
Exception trace:
-2017-08-15T02:36:29,148 [kafka-producer-network-thread | producer-1] WARN producer.internals.Sender - Got error produce response with
correlation id 353736 on topic-partition BPA_BinLogQ-0, retrying (0
attempts left). Error: NETWORK_EXCEPTION
You are getting a NETWORK_EXCEPTION so this should tell you that something is wrong with the network connection to the Kafka Broker you were producing toward. Either the broker shutdown or the TCP connection was shutdown for some reason.
A quick code dive shows the most probable cause: lost connection to the upstream broker, what causes the delivery method to fail internally inside a sender (link) - you might want to start logging trace in Sender to confirm that:
if (response.wasDisconnected()) {
log.trace("Cancelled request with header {} due to node {} being disconnected",
requestHeader, response.destination());
for (ProducerBatch batch : batches.values())
completeBatch(batch, new ProduceResponse.PartitionResponse(Errors.NETWORK_EXCEPTION, String.format("Disconnected from node %s", response.destination())),
correlationId, now);
}
Now with the batch completed in a non-success fashion, it gets retried, but from the logs you have attached it looks like, you ran out of retries (0 attempts left), so it propagates to your level (link):
if (canRetry(batch, response, now)) {
log.warn(
"Got error produce response with correlation id {} on topic-partition {}, retrying ({} attempts left). Error: {}",
....
reenqueueBatch(batch, now);
}
So the ideas are:
investigate your network connectivity - unfortunately this might mean tracing at least on client-side (esp. NetworkClient that does all the upstream broker management) to see if there's any connection loss;
increase producer's retries value (though newer versions of Kafka set it to MAX_INT or so).
I have setup up Kafka using version 0.9 with the basic configuration as
1 Broker 1 Topic and 1 Partition.
Below are Producer Configurations that I have added to enable the retry from Producer.
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.RETRIES_CONFIG, 5);
props.put(ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG, 500);
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.MAX_BLOCK_MS_CONFIG, 500);
props.put(ProducerConfig.METADATA_MAX_AGE_CONFIG, 50);
I understand from the documents that
Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error.
Both my Broker & Zookeeper are down and the retry operation is not working.
ERROR o.s.k.s.LoggingProducerListener - Exception thrown when sending a message to topic TestTopic1|
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 500 ms.
I need to know if I am missing anything here for the retry to work.
Resend (retry) works only if you have connection to the Broker and something happened during sending a message.
So, if your Broker is dead, there is no any reason to send message at all - no connection. And that is an exception about.
I think retries should work anyway, even if the broker is down. This is the whole reason to have retries in the first place. Could be a temporary network issue after all.
There is a bug in the Kafka 0.9.0.1 producer which causes retries not to work. See here.
Fixed in 0.9.0.2 (which is not released yet) and 0.10. I'd upgrade the broker to 0.10 and try again.
As #artem answered Kafka producer config is not designed to retry when broker is down. It only retries during transient errors which is pretty much useless to be honest. It beats me why spring-Kafka did not take care of it.
Anyways to solve the situation I handled this with #Retry config with springboot. Checkin this SO answer for details : https://stackoverflow.com/a/65248428/6621377
I'm using Kafka 0.8.2.1 and the New Producer API.
The server is setup as single node in local network.
The problem is that the producer throws an EOFException after running for a while (15 minutes last time I checked), but it doesn't seem to matter because my producer continues to work after this.
The way I initialize the producer:
Map<String, Object> configs = new HashMap<>();
configs.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.1.101:9092");
configs.put(ProducerConfig.ACKS_CONFIG, "1");
configs.put(ProducerConfig.BLOCK_ON_BUFFER_FULL_CONFIG, "false");
configs.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
configs.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
this.producer = new KafkaProducer<>(configs);
And the exception I got:
WARN [2015-06-17 02:07:28,896] org.apache.kafka.common.network.Selector: Error in I/O with /192.168.1.101
! java.io.EOFException: null
! at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62) ~[kafka-clients-0.8.2.1.jar:na]
! at org.apache.kafka.common.network.Selector.poll(Selector.java:248) ~[kafka-clients-0.8.2.1.jar:na]
! at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) [kafka-clients-0.8.2.1.jar:na]
! at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) [kafka-clients-0.8.2.1.jar:na]
! at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) [kafka-clients-0.8.2.1.jar:na]
! at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Even though my code still works with this, I'd like to know why and how to prevent it.
Did you try to change "connections.max.idle.ms"? Its default is 15 minutes. If you see this error every 15 minutes try to reduce this timeout.
https://issues.apache.org/jira/browse/KAFKA-3205
Suggesting increase connections.max.idle.ms