I set up a kafka s3 connector but it fails to consume data from kafa due to error:
The coordinator is not available
The kafka is a single node cluster and seems to work fine with other consumers e.g. offset explorer can read data from the topic.
I did check similar questions asked in stackoverflow and all q/a points to offsets.topic.replication.factor should be manually set to 1 instead of default 3 in a single node cluster.
In my case, I checked the topic and it is set to 1.
./kafka-topics.sh --describe --zookeeper broker:2181 --topic 202208.topic.test
.v1
Topic: 202208.topic.test PartitionCount: 1 ReplicationFactor: 1 Configs:
Topic: 202208.topic.test Partition: 0 Leader: 1 Replicas: 1 Isr: 1
The detailed message is as follows:
[2022-08-31 15:46:52,843] DEBUG [Consumer clientId=connector-consumer-s3-sink-0, groupId=connect-s3-sink] Updating last seen epoch from 0 to 0 for partition
prod.master.pxv.trade.eod.v1-0 (org.apache.kafka.clients.Metadata:178)
[2022-08-31 15:46:52,844] DEBUG [Consumer clientId=connector-consumer-s3-sink-0, groupId=connect-s3-sink] Updated cluster metadata updateVersion 106 to MetadataCache{clusterId='hWeMZOpIQ_iC5-iev3lZMQ', nodes=[broker:9092 (id: 1 rack: null)], partitions=[PartitionInfoAndEpoch{partitionInfo=Partition
(topic = 202208.topic.test, partition = 0, leader = 1, replicas = [1], isr = [1], offlineReplicas = []), epoch=0}], controller=broker:9092 (id: 1 rack: null)} (org.apache.kafka.clients.Metadata:263)
[2022-08-31 15:46:52,844] DEBUG [Consumer clientId=connector-consumer-s3-sink-0, groupId=connect-s3-sink] Sending FindCoordinator request to broker broker:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:727)
[2022-08-31 15:46:52,864] DEBUG [Consumer clientId=connector-consumer-s3-sink-0, groupId=connect-s3-sink] Received FindCoordinator response ClientResponse(receivedTimeMs=1661960812864, latencyMs=20, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=3, clientId=connector-consumer-s3-sink-0, correlationId=211), responseBody=FindCoordinatorResponseData(throttleTimeMs=0, errorCode=15, errorMessage='The coordinator is not available.', nodeId=-1, host='', port=-1)) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:741)
did check similar questions asked in stackoverflow and all q/a points to offsets.topic.replication.factor should be manually set to 1 instead of default 3 in a single node cluster.
Correct
More specifically, if you are running a single broker, and any consumer tries to use a consumer group for offset management (such as Kafka Connect), the the offsets topic needs to exist.
I do not think that is related to your logs about the coordinator.
In my case, I checked the topic and it is set to 1... --topic 202208.topic.test
You're not checking the correct topic there. This is how you verify the offsets.topic.replication.factor
kafka-topics.sh --describe --bootstrap-servers broker:9092 --topic __consumer_offsets | grep 'ReplicationFactor'
Note: I changed --zookeeper flag since it is deprecated
So I've been using Kafka 3.1.0 in production environment. One of the VMs had to be live migrated, but due to some issues live migration failed and node has been forcefully migrated, involving full VM restart.
After that VM booted up, Kafka stopped working "completely" - clients were not able to connect and produce/consume anything. JMX metrics were still showing up, but that node showed many partitions as "Offline partitions".
Looking into the logs, that particular node kept showing A LOT of INCONSISTENT_TOPIC_ID errors. Example:
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-3. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-2. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-3. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-2. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-3. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
However, if you take a look at other Kafka brokers, they were showing a bit different errors (I don't have a logs sample) - UNKNOWN_TOPIC_ID...
Another interesting issue - I've described Kafka topic and this is what I've got:
Topic: my-topic TopicId: XXXXXXXXXXXXXXXXXXXXXX PartitionCount: 4 ReplicationFactor: 4 Configs: segment.bytes=214748364,unclean.leader.election.enable=true,retention.bytes=214748364
Topic: my-topic Partition: 0 Leader: 2 Replicas: 5,2,3,0 Isr: 2
Topic: my-topic Partition: 1 Leader: 0 Replicas: 0,1,2,3 Isr: 0
Topic: my-topic Partition: 2 Leader: 2 Replicas: 1,2,3,4 Isr: 2
Topic: my-topic Partition: 3 Leader: 2 Replicas: 2,3,4,5 Isr: 2
Why does it show only 1 ISR when there should be 4 per partition? Why did it happen in the first place?
I've added additional partition and this is what it shows now:
Topic: my-topic TopicId: XXXXXXXXXXXXXXXXXXXXXX PartitionCount: 5 ReplicationFactor: 4 Configs: segment.bytes=214748364,unclean.leader.election.enable=true,retention.bytes=214748364
Topic: my-topic Partition: 0 Leader: 2 Replicas: 5,2,3,0 Isr: 2
Topic: my-topic Partition: 1 Leader: 0 Replicas: 0,1,2,3 Isr: 0
Topic: my-topic Partition: 2 Leader: 2 Replicas: 1,2,3,4 Isr: 2
Topic: my-topic Partition: 3 Leader: 2 Replicas: 2,3,4,5 Isr: 2
Topic: my-topic Partition: 4 Leader: 3 Replicas: 3,4,5,0 Isr: 3,4,5,0
I know there is kafka-reassign-partitions.sh script and it fixed similar issue in preproduction environment, but I am more interested why did it happen in the first place?
Could this be related? I've set the parameter replica.lag.time.max.ms=5000 (over default 500) and even after restarting all nodes it didn't help.
This normally happens when the topic ID in the session does not match the topic ID in the log. To fix this issue you will have to make sure that the topic ID remains consistent across your cluster.
If you are using zookeeper, run this command in zkCli.sh, on one of your nodes that are still in sync and note the topic_id -
[zk: localhost:2181(CONNECTED) 10] get /brokers/topics/my-topic
{"partitions":{"0":[5,1,2],"1":[5,1,2],"2":[5,1,2],"3":[5,1,2],"4":
[5,1,2],"5":[5,1,2],"6":[5,1,2],"7":[5,1,2],"8":[5,1,2],"9":
[5,1,2]},"topic_id":"s3zoLdMp-T3CIotKlkBpMgL","adding_replicas":
{},"removing_replicas":{},"version":3}
Next, for each node, check the file partition.metadata for all the partitions of the topic my-topic. This file can be found in logs.dir (see server.properties).
For example, if logs.dir is set to /media/kafka-data, you can find it at -
/media/kafka-data/my-topic-1/partition.meta for partition 1.
/media/kafka-data/my-topic-2/partition.meta for partition 2, and so on.
The contents of the file may look like this, (you see it matches the topic_id that zookeeper has) -
version: 0
topic_id: s3zoLdMp-T3CIotKlkBpMgL
You'll need to make sure that the value of topic_id in all the parition.metadata files across your cluster for my-topic is the same. If you come across a different topic ID in any of the partitions you can edit it with any text editor (or write a script to do this for you).
Once done, you may need to restart your brokers one at a time for this change to take effect.
I will try to answer why the topic Id in Zookeeper may differ from topic Id stored in partition.metadata:
In certain situations, it is possible that the TopicId stored locally on a broker for a topic differs from the topicId stored for that topic on Zk. Currently, such situation arises when users use a <2.8 client to alterPartitions for a topic on a >=2.8 (including latest 3.4) brokers AND they use --zookeeper flag from the client. Note that --zookeeper has been marked deprecated for a long time and has been replaced by --bootstrap-server which doesn't face this problem.
The result of topic Id discrepancy leads to availability loss for the topic until user performs the mitigation steps listed in KAFKA-14190.
The exact sequence of steps are:
User uses pre 2.8 client to create a new topic in zookeeper directly
No TopicId is generated in Zookeeper
KafkaController listens to the ZNode, and a TopicChange event is created, During handling on this event, controller notices that there is no TopicId, it generated a new one and updates Zk.
At this stage, Zk has a TopicId.
User uses pre 2.8 client to increase the number of partitions for this topic
The client will replace/overwrite the entire existing Znode with new placement information. This will delete the existing TopicId in Zk (that was created by controller in step 3).
Next time KafkaController interacts with this ZNode, it will generate a new TopicId.
Note that we now have two different TopicIds for this topic name.
Broker may have a different topicId (older one) in metadata file and will complain about the mismatch when they encounter a new TopicId.
I started seeing the following error
[2020-06-12 20:09:01,324] ERROR [ReplicaManager broker=3] Error processing append operation on partition __consumer_offsets-10 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.NotEnoughReplicasException: The size of the current ISR Set(3) is insufficient to satisfy the min.isr requirement of 2 for partition __consumer_offsets-10
My setup is having three brokers and all brokers are up. Couple of
things i did before this error was about pop up
I configured min.isr to be 2 in all the brokers. I created a topic
with replication factor 3 and starting producing the message from a
producer with ack = 1 with two brokers down. I brought up all the
brokers and started consumer.
How to go about troubleshooting this error
Consumer is also NOT be able to see this message ( not sure why, the message is supposed to be treated as "committed" as one broker was up when the producer was running)
Couple of facts
It is interesting to see rebalancing didnt happen yet WRT preferred leader starategy
$ kafka-topics --zookeeper 127.0.0.1:2181 --topic stock-prices --describe
Topic: stock-prices PartitionCount: 3 ReplicationFactor: 3 Configs: min.insync.replicas=2
Topic: stock-prices Partition: 0 Leader: 1 Replicas: 1,3,2 Isr: 1,2,3
Topic: stock-prices Partition: 1 Leader: 1 Replicas: 2,1,3 Isr: 1,2,3
Topic: stock-prices Partition: 2 Leader: 1 Replicas: 3,2,1 Isr: 1,2,3
Here's your problem:
You have set min.insync.replicas=2, which means you need at least two broker up and running to publish a message to a topic. If you let down 2 brokers, then you have only one left. Which means your insync.replica requirement is not fulfilled.
This has nothing to do with the Consumers, since this is about the brokers. When you set acks=1 that means your producer gets the acknowledgement when the message is published to one broker. (It will not acknowledge all the replicas are created).
So the problem is, you have your Producer, which gets acknowledged that the message is received, when a single broker (The leader) gets the message. But the leader cannot put replicas since there aren't any brokers up to sync.
One way to get this done is to set the acks=all, so your producer won't get acknowledged until all the replicas are done. It'll retry until the at least 2 in sync replicas are online.
We have Kafka Connect running a postgres connector it is pulling changes from a DB and putting them into a topic. We are getting an error
ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to flush, timed out while waiting for producer to flush outstanding 1 messages
followed by
ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)
followed by
[2020-01-30 09:51:52,219] WARN [Producer clientId=producer-8] Error while fetching metadata with correlation id 606994 : {topicname=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient)
We have verified that the topic does exist, doing a describe on the topic give us this
zookeeper-1 [root#XX /bin]# ./kafka-topics --describe --zookeeper <zookeeper>:2181 --topic topicname
Topic:topicname PartitionCount:1 ReplicationFactor:3 Configs:
Topic: topicname Partition: 0 Leader: 2 Replicas: 2,3,1 Isr: 3,2,1
ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to flush, timed out while waiting for producer to flush outstanding 1 messages
and
[2020-01-30 09:51:52,219] WARN [Producer clientId=producer-8] Error while fetching metadata with correlation id 606994 : {topicname=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient)
are from the same issue. Have you tried reaching your Kafka cluster or upgrading your Kafka? The latest kafka-topics script uses broker hosts instead of ZooKeeper hosts to describe topics.
As for
ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)
Make sure that offset.storage.topic of your Kafka Connect has already been created in Kafka. Workers flush offsets into this topic.
I have two broker kafka 0.10.2.0 cluster.Replication factor is 2. I am running 1.0.0 kafka stream application against this Kafka. In my kafka stream application, producer config has retries = 10 and retry.backoff.ms = 100
After running few minutes, I observed following logs in Kakfa server.log. Due to this Kafka stream application is throwing 'NOT_LEADER_FOR_PARTITION' exception.
What may be the possible reason? Please help me.
[2017-12-12 10:26:02,583] ERROR [ReplicaFetcherThread-0-1], Error for partition [__consumer_offsets,22] to broker 1:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
Each topic is served by one or multiple Brokers - one is leader and the remaining brokers are followers.
A producer needs to send new messages to the leader Broker which internally replicate the data to all followers.
I assume, that your producer client does not connect to the correct Broker, its connect to a follower instead of the leader, and this follower rejects your send request.
Try to run ./kafka-topics.sh --zookeeper localhost:2181 --topic your_topic --describe
Topic:your_topic PartitionCount:3 ReplicationFactor:1 Configs:retention.ms=14400000
Topic: your_topic Partition: 0 Leader: 2 Replicas: 2 Isr: 2
Topic: your_topic Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: your_topic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
In this example you can see that your_topic have 3 partitions meaning all 3 brokers are leaders of that topic each on different partition, s.t broker 2 is leader on partition 0 and broker 0 and broker 1 are followers on partition 0.
Try setting these properties and see if it helps resolve the issue:
props.put(ProducerConfig.RETRIES_CONFIG, 10); //increase to 10 from default of 0
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG,
Integer.toString(Integer.MAX_VALUE)); // increase to infinity from default of 300 s
(Source)