Kafka fails to publish message when any node in a 3 node cluster is down - apache-kafka

I have kafka 0.10.2.1 deployed in a 3 node cluster with default configuration for the most part. The Producer config is as follows,
"bootstrap.servers", 3;
"retry.backoff.ms", "1000"
"reconnect.backoff.ms", "1000"
"max.request.size", "5242880"
"key.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer"
"value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer"
What I am seeing is that when one node in the cluster is down I cannot publish messages to Kafka anymore. I get the following exception when I do so,
05-Apr-2018 22:29:33,362 PDT ERROR [vm18] [KafkaMessageBroker] (default task-43) |default| Failed to publish message for topic deviceConfigRequest: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for deviceConfigRequest-1: 30967 ms has passed since batch creation plus linger time
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:70) [kafka-clients-0.10.2.1.jar:]
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:65) [kafka-clients-0.10.2.1.jar:]
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25) [kafka-clients-0.10.2.1.jar:]
at x.y.x.KafkaMessageBroker.publishMessage(KafkaMessageBroker.java:151) [classes:]
What am I missing?

Related

Kafka producer resetting the last seen epoch of partition, resulting in timeout

Recently my kafka producer running as a cronjob on a kubernetes cluster has started doing the following when pushing new messages to the queue:
{"#version":1,"source_host":"<job pod name>","message":"[Producer clientId=producer-1] Resetting the last seen epoch of partition <topic name> to 4 since the associated topicId changed from null to JkTOJi-OSzavDEomRvAIOQ","thread_name":"kafka-producer-network-thread | producer-1","#timestamp":"2022-02-11T08:45:40.212+0000","level":"INFO","logger_name":"org.apache.kafka.clients.Metadata"}
This results in the producer running into a timeout:
"exception":{"exception_class":"java.util.concurrent.ExecutionException","exception_message":"org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for <topic name>:120000 ms has passed since batch creation", stacktrace....}
The logs of the kafka consumer pod and kafka cluster pods don't show any out-of-the-ordinary changes.
Has anyone seen this behavior before and if yes, how do I prevent it?
Reason:
the Java API mode generator cannot connect with Kafka
Solution:
On each server Add a sentence to the properties file
host.name = server IP;

Kafka Admin client unregistered causing metadata issues

After migrating our microservice functionality to Spring Cloud function we have been facing issues with one of the producer topics.
Event of type: abc and key: xxx_yyy could not be sent to kafka org.springframework.messaging.MessageHandlingException: error occurred in message handler [org.springframework.cloud.stream.binder.kafka.KafkaMessageChannelBinder$ProducerConfigurationMessageHandler#2333d598]; nested exception is org.springframework.kafka.KafkaException: Send failed; nested exception is org.apache.kafka.common.errors.TimeoutException: Topic pc-abc not present in metadata after 60000 ms.
o.s.kafka.support.LoggingProducerListener - Exception thrown when sending a message with key='byte[15]' and payload='byte[256]' to topic pc-abc and partition 6: org.apache.kafka.common.errors.TimeoutException: Topic pc-abc not present in metadata after 60000 ms.
FYI: Topics are already created in our staging/prod environment and are not to be created as the application starts.
My producer config:
spring.cloud.stream.bindings.pc-abc-out-0.content-type=application/json
spring.cloud.stream.bindings.pc-abc-out-0.destination=pc-abc
spring.cloud.stream.bindings.pc-abc-out-0.producer.header-mode=headers
***spring.cloud.stream.bindings.pc-abc-out-0.producer.partition-count=5***
spring.cloud.stream.bindings.pc-abc-out-0.producer.partitionKeyExpression=payload.key
spring.cloud.stream.kafka.bindings.pc-abc-out-0.producer.sync=true
I am kind of stuck at this point and exhausted. Has anyone else faced this issue?
Spring Cloud version: 2.5.5
Kafka: 2.7.1
The issue is :
The producer is configured with partition-count=5
and Kafka is looking for partition number 6 , which obviously does not exist , I have commented the auto-add partitions property, but the issue still turns up !! Is it stale configuration? How do I force kafka to take up new configuration.

Kafka Streams: TimeoutException: Failed to update metadata after 60000 ms

I'm running a Kafka Streams application to consumer one topic with 20 partitions at traffic 15K records/sec. The application does a 1hr windowing and suppress the latest result per window. After running for sometime, it starts getting TimeoutException and then the instance is marked down by Kafka Streams.
Error trace:
Caused by: org.apache.kafka.streams.errors.StreamsException:
task [2_7] Abort sending since an error caught with a previous record
(key 372716656751 value InterimMessage [xxxxx..]
timestamp 1566433307547) to topic XXXXX due to org.apache.kafka.common.errors.TimeoutException:
Failed to update metadata after 60000 ms.
You can increase producer parameter `retries` and
`retry.backoff.ms` to avoid this error.
I already increased the value of that two configs.
retries = 5
retry.backoff.ms = 80000
Should I increase them again as the error message mentioned? What should be a good value for these two values?

Batch containing 1 record(s) expired due to timeout while requesting metadata from brokers for test-0

today a message prompt out when I try to send message to consumer console through producer console
[2016-11-02 15:12:58,168] ERROR Error when sending message to topic test with
key: null, value: 5 bytes with error:
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Batch containing 1 record(s)
expired due to timeout while requesting metadata from brokers for test-0
Why is this happened? Is this consider as Kafka problem or Zookeeper problem?
Seems that client failed to retrieve metadata for test-0 from the kafka brokers.
Either make sure you are able to connect to the kafka brokers or check if 'advertised.listeners' is set if you are running kafka on IaaS machines.
Well after I rebooted the whole server the problem is gone.

Kafka unrecoverable if broker dies

We have a kafka cluster with three brokers (node ids 0,1,2) and a zookeeper setup with three nodes.
We created a topic "test" on this cluster with 20 partitions and replication factor 2. We are using Java producer API to send messages to this topic. One of the kafka broker intermittently goes down after which it is unrecoverable. To simulate the case, we killed one of the broker manually. As per the kafka arch, it is supposed to self recover, but which is not happening. When I describe the topic on the console, I see the number of ISR's reduced to one for few of the partitions as one of the broker killed. Now, whenever we are trying to push messages via the producer API (either Java client or console producer), we are encountering SocketTimeoutException.. One quick look into the logs says, "Unable to fetch the metadata"
WARN [2015-07-01 22:55:07,590] [ReplicaFetcherThread-0-3][] kafka.server.ReplicaFetcherThread - [ReplicaFetcherThread-0-3],
Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 23711; ClientId: ReplicaFetcherThread-0-3;
ReplicaId: 0; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [zuluDelta,2] -> PartitionFetchInfo(11409,1048576),[zuluDelta,14] -> PartitionFetchInfo(11483,1048576).
Possible cause: java.nio.channels.ClosedChannelException
[2015-07-01 23:37:40,426] WARN Fetching topic metadata with correlation id 0 for topics [Set(test)] from broker [id:1,host:abc-0042.yy.xxx.com,port:9092] failed (kafka.client.ClientUtils$)
java.net.SocketTimeoutException
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86)
at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:221)
at kafka.utils.Utils$.read(Utils.scala:380)
at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:75)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)
at kafka.producer.SyncProducer.send(SyncProducer.scala:113)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
Any leads will be appreciated...
From your error Unable to fetch metadata it could mostly be because you could have set the bootstrap.servers in the producer to the broker that has died.
Ideally, you must have more than one broker in the bootstrap.servers list because if one of the broker fails (or is unreachable) then the other could give you the metadata.
FYI: Metadata is the information about a particular topic that tells how many number of partitions it has, their leader brokers, follower brokers etc.
So, when a key is produced to a partition, its corresponding leader broker will be the one to whom the messages will be sent to.
From your question, your ISR set has only one broker. You could try setting the bootstrap.server to this broker.