Killing node with __consumer_offsets leads to no message consumption at consumers - apache-kafka

I have 3 node(nodes0,node1,node2) Kafka cluster(broker0, broker1, broker2) with replication factor 2 and Zookeeper(using zookeeper packaged with Kafka tar) running on a different node (node 4).
I had started broker 0 after starting zookeper and then remaining nodes. It is seen in broker 0 logs that it is reading __consumer_offsets and seems they are stored on broker 0. Below are sample logs:
Kafka Version: kafka_2.10-0.10.2.0
2017-06-30 10:50:47,381] INFO [GroupCoordinator 0]: Loading group metadata for console-consumer-85124 with generation 2 (kafka.coordinator.GroupCoordinator)
[2017-06-30 10:50:47,382] INFO [Group Metadata Manager on Broker 0]: Finished loading offsets from __consumer_offsets-41 in 23 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-06-30 10:50:47,382] INFO [Group Metadata Manager on Broker 0]: Loading offsets and group metadata from __consumer_offsets-44 (kafka.coordinator.GroupMetadataManager)
[2017-06-30 10:50:47,387] INFO [Group Metadata Manager on Broker 0]: Finished loading offsets from __consumer_offsets-44 in 5 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-06-30 10:50:47,387] INFO [Group Metadata Manager on Broker 0]: Loading offsets and group metadata from __consumer_offsets-47 (kafka.coordinator.GroupMetadataManager)
[2017-06-30 10:50:47,398] INFO [Group Metadata Manager on Broker 0]: Finished loading offsets from __consumer_offsets-47 in 11 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-06-30 10:50:47,398] INFO [Group Metadata Manager on Broker 0]: Loading offsets and group metadata from __consumer_offsets-1 (kafka.coordinator.GroupMetadataManager)
Also, I can see GroupCoordinator messages in the same broker 0 logs.
[2017-06-30 14:35:22,874] INFO [GroupCoordinator 0]: Preparing to restabilize group console-consumer-34472 with old generation 1 (kafka.coordinator.GroupCoordinator)
[2017-06-30 14:35:22,877] INFO [GroupCoordinator 0]: Group console-consumer-34472 with generation 2 is now empty (kafka.coordinator.GroupCoordinator)
[2017-06-30 14:35:25,946] INFO [GroupCoordinator 0]: Preparing to restabilize group console-consumer-6612 with old generation 1 (kafka.coordinator.GroupCoordinator)
[2017-06-30 14:35:25,946] INFO [GroupCoordinator 0]: Group console-consumer-6612 with generation 2 is now empty (kafka.coordinator.GroupCoordinator)
[2017-06-30 14:35:38,326] INFO [GroupCoordinator 0]: Preparing to restabilize group console-consumer-30165 with old generation 1 (kafka.coordinator.GroupCoordinator)
[2017-06-30 14:35:38,326] INFO [GroupCoordinator 0]: Group console-consumer-30165 with generation 2 is now empty (kafka.coordinator.GroupCoordinator)
[2017-06-30 14:43:15,656] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 3 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-06-30 14:53:15,653] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
While testing fault tolerance for the cluster using the kafka-console-consumer.sh and kafka-console-producer.sh, I see that on killing broker 1 or broker 2, the consumer can still receive new messages coming from producer. The Rebalance is happening correctly.
However, killing broker 0 leads to no new or old messages consumption at any number of consumers.
Below is the state of topic before and after broker 0 is killed.
Before
Topic:test-topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: test-topic Partition: 0 Leader: 2 Replicas: 2,0 Isr: 0,2
Topic: test-topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0,1
Topic: test-topic Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2
After
Topic:test-topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: test-topic Partition: 0 Leader: 2 Replicas: 2,0 Isr: 2
Topic: test-topic Partition: 1 Leader: 1 Replicas: 0,1 Isr: 1
Topic: test-topic Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2
Following are the WARN messages seen in the consumer logs after broker 0 is killed
[2017-06-30 14:19:17,155] WARN Auto-commit of offsets {test-topic-2=OffsetAndMetadata{offset=4, metadata=''}, test-topic-0=OffsetAndMetadata{offset=5, metadata=''}, test-topic-1=OffsetAndMetadata{offset=4, metadata=''}} failed for group console-consumer-34472: Offset commit failed with a retriable exception. You should retry committing offsets. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-30 14:19:10,542] WARN Auto-commit of offsets {test-topic-2=OffsetAndMetadata{offset=4, metadata=''}, test-topic-0=OffsetAndMetadata{offset=5, metadata=''}, test-topic-1=OffsetAndMetadata{offset=4, metadata=''}} failed for group console-consumer-30165: Offset commit failed with a retriable exception. You should retry committing offsets. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
Broker Properties. The remaining default properties are unchanged.
broker.id=0
delete.topic.enable=true
auto.create.topics.enable=false
listeners=PLAINTEXT://XXX:9092
advertised.listeners=PLAINTEXT://XXX:9092
log.dirs=/tmp/kafka-logs-test1
num.partitions=3
zookeeper.connect=XXX:2181
Producer properties. The remaining default properties are unchanged.
bootstrap.servers=XXX,XXX,XXX
compression.type=snappy
Consumer properties. The remaining default properties are unchanged.
zookeeper.connect=XXX:2181
zookeeper.connection.timeout.ms=6000
group.id=test-consumer-group
As far I understand, if node holding/acting GroupCoordinator and __consumer_offsets dies, then the consumer unable to resume normal operations in spite of new leaders elected for partitions.
I see something similar posted in post. This post suggests to restart the dead broker node. However, there would be delay in message consumption in-spite of having more nodes until broker 0 is restarted in production environment.
Q1: How can the above situation be mitigated ?
Q2: Is there a way to change the GroupCoordinator, __consumer_offsets to another node?
Any suggestions/help is appreciated.

Check the replication factor on the __consumer_offsets topic. If it's not 3 then that's your problem.
Run the following command kafka-topics --zookeeper localhost:2181 --describe --topic __consumer_offsets and see if in the first line of output it says "ReplicationFactor:1" or "ReplicationFactor:3".
It's a common problem when doing trials to first setup one node and then this topic gets created with replication factor of 1. Later when you expand to 3 nodes you forget to change the topic level settings on this existing topic so even though the topics you are producing and consuming from are fault tolerant, the offsets topic is still stuck on broker 0 only.

Related

kafka s3 connector fails to consume data due to The coordinator is not available

I set up a kafka s3 connector but it fails to consume data from kafa due to error:
The coordinator is not available
The kafka is a single node cluster and seems to work fine with other consumers e.g. offset explorer can read data from the topic.
I did check similar questions asked in stackoverflow and all q/a points to offsets.topic.replication.factor should be manually set to 1 instead of default 3 in a single node cluster.
In my case, I checked the topic and it is set to 1.
./kafka-topics.sh --describe --zookeeper broker:2181 --topic 202208.topic.test
.v1
Topic: 202208.topic.test PartitionCount: 1 ReplicationFactor: 1 Configs:
Topic: 202208.topic.test Partition: 0 Leader: 1 Replicas: 1 Isr: 1
The detailed message is as follows:
[2022-08-31 15:46:52,843] DEBUG [Consumer clientId=connector-consumer-s3-sink-0, groupId=connect-s3-sink] Updating last seen epoch from 0 to 0 for partition
prod.master.pxv.trade.eod.v1-0 (org.apache.kafka.clients.Metadata:178)
[2022-08-31 15:46:52,844] DEBUG [Consumer clientId=connector-consumer-s3-sink-0, groupId=connect-s3-sink] Updated cluster metadata updateVersion 106 to MetadataCache{clusterId='hWeMZOpIQ_iC5-iev3lZMQ', nodes=[broker:9092 (id: 1 rack: null)], partitions=[PartitionInfoAndEpoch{partitionInfo=Partition
(topic = 202208.topic.test, partition = 0, leader = 1, replicas = [1], isr = [1], offlineReplicas = []), epoch=0}], controller=broker:9092 (id: 1 rack: null)} (org.apache.kafka.clients.Metadata:263)
[2022-08-31 15:46:52,844] DEBUG [Consumer clientId=connector-consumer-s3-sink-0, groupId=connect-s3-sink] Sending FindCoordinator request to broker broker:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:727)
[2022-08-31 15:46:52,864] DEBUG [Consumer clientId=connector-consumer-s3-sink-0, groupId=connect-s3-sink] Received FindCoordinator response ClientResponse(receivedTimeMs=1661960812864, latencyMs=20, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=3, clientId=connector-consumer-s3-sink-0, correlationId=211), responseBody=FindCoordinatorResponseData(throttleTimeMs=0, errorCode=15, errorMessage='The coordinator is not available.', nodeId=-1, host='', port=-1)) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:741)
did check similar questions asked in stackoverflow and all q/a points to offsets.topic.replication.factor should be manually set to 1 instead of default 3 in a single node cluster.
Correct
More specifically, if you are running a single broker, and any consumer tries to use a consumer group for offset management (such as Kafka Connect), the the offsets topic needs to exist.
I do not think that is related to your logs about the coordinator.
In my case, I checked the topic and it is set to 1... --topic 202208.topic.test
You're not checking the correct topic there. This is how you verify the offsets.topic.replication.factor
kafka-topics.sh --describe --bootstrap-servers broker:9092 --topic __consumer_offsets | grep 'ReplicationFactor'
Note: I changed --zookeeper flag since it is deprecated

Kafka 3.1.0 cluster stopped working with errors INCONSISTENT_TOPIC_ID and UNKNOWN_TOPIC_ID

So I've been using Kafka 3.1.0 in production environment. One of the VMs had to be live migrated, but due to some issues live migration failed and node has been forcefully migrated, involving full VM restart.
After that VM booted up, Kafka stopped working "completely" - clients were not able to connect and produce/consume anything. JMX metrics were still showing up, but that node showed many partitions as "Offline partitions".
Looking into the logs, that particular node kept showing A LOT of INCONSISTENT_TOPIC_ID errors. Example:
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-3. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-2. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-3. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-2. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-3. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
However, if you take a look at other Kafka brokers, they were showing a bit different errors (I don't have a logs sample) - UNKNOWN_TOPIC_ID...
Another interesting issue - I've described Kafka topic and this is what I've got:
Topic: my-topic TopicId: XXXXXXXXXXXXXXXXXXXXXX PartitionCount: 4 ReplicationFactor: 4 Configs: segment.bytes=214748364,unclean.leader.election.enable=true,retention.bytes=214748364
Topic: my-topic Partition: 0 Leader: 2 Replicas: 5,2,3,0 Isr: 2
Topic: my-topic Partition: 1 Leader: 0 Replicas: 0,1,2,3 Isr: 0
Topic: my-topic Partition: 2 Leader: 2 Replicas: 1,2,3,4 Isr: 2
Topic: my-topic Partition: 3 Leader: 2 Replicas: 2,3,4,5 Isr: 2
Why does it show only 1 ISR when there should be 4 per partition? Why did it happen in the first place?
I've added additional partition and this is what it shows now:
Topic: my-topic TopicId: XXXXXXXXXXXXXXXXXXXXXX PartitionCount: 5 ReplicationFactor: 4 Configs: segment.bytes=214748364,unclean.leader.election.enable=true,retention.bytes=214748364
Topic: my-topic Partition: 0 Leader: 2 Replicas: 5,2,3,0 Isr: 2
Topic: my-topic Partition: 1 Leader: 0 Replicas: 0,1,2,3 Isr: 0
Topic: my-topic Partition: 2 Leader: 2 Replicas: 1,2,3,4 Isr: 2
Topic: my-topic Partition: 3 Leader: 2 Replicas: 2,3,4,5 Isr: 2
Topic: my-topic Partition: 4 Leader: 3 Replicas: 3,4,5,0 Isr: 3,4,5,0
I know there is kafka-reassign-partitions.sh script and it fixed similar issue in preproduction environment, but I am more interested why did it happen in the first place?
Could this be related? I've set the parameter replica.lag.time.max.ms=5000 (over default 500) and even after restarting all nodes it didn't help.
This normally happens when the topic ID in the session does not match the topic ID in the log. To fix this issue you will have to make sure that the topic ID remains consistent across your cluster.
If you are using zookeeper, run this command in zkCli.sh, on one of your nodes that are still in sync and note the topic_id -
[zk: localhost:2181(CONNECTED) 10] get /brokers/topics/my-topic
{"partitions":{"0":[5,1,2],"1":[5,1,2],"2":[5,1,2],"3":[5,1,2],"4":
[5,1,2],"5":[5,1,2],"6":[5,1,2],"7":[5,1,2],"8":[5,1,2],"9":
[5,1,2]},"topic_id":"s3zoLdMp-T3CIotKlkBpMgL","adding_replicas":
{},"removing_replicas":{},"version":3}
Next, for each node, check the file partition.metadata for all the partitions of the topic my-topic. This file can be found in logs.dir (see server.properties).
For example, if logs.dir is set to /media/kafka-data, you can find it at -
/media/kafka-data/my-topic-1/partition.meta for partition 1.
/media/kafka-data/my-topic-2/partition.meta for partition 2, and so on.
The contents of the file may look like this, (you see it matches the topic_id that zookeeper has) -
version: 0
topic_id: s3zoLdMp-T3CIotKlkBpMgL
You'll need to make sure that the value of topic_id in all the parition.metadata files across your cluster for my-topic is the same. If you come across a different topic ID in any of the partitions you can edit it with any text editor (or write a script to do this for you).
Once done, you may need to restart your brokers one at a time for this change to take effect.
I will try to answer why the topic Id in Zookeeper may differ from topic Id stored in partition.metadata:
In certain situations, it is possible that the TopicId stored locally on a broker for a topic differs from the topicId stored for that topic on Zk. Currently, such situation arises when users use a <2.8 client to alterPartitions for a topic on a >=2.8 (including latest 3.4) brokers AND they use --zookeeper flag from the client. Note that --zookeeper has been marked deprecated for a long time and has been replaced by --bootstrap-server which doesn't face this problem.
The result of topic Id discrepancy leads to availability loss for the topic until user performs the mitigation steps listed in KAFKA-14190.
The exact sequence of steps are:
User uses pre 2.8 client to create a new topic in zookeeper directly
No TopicId is generated in Zookeeper
KafkaController listens to the ZNode, and a TopicChange event is created, During handling on this event, controller notices that there is no TopicId, it generated a new one and updates Zk.
At this stage, Zk has a TopicId.
User uses pre 2.8 client to increase the number of partitions for this topic
The client will replace/overwrite the entire existing Znode with new placement information. This will delete the existing TopicId in Zk (that was created by controller in step 3).
Next time KafkaController interacts with this ZNode, it will generate a new TopicId.
Note that we now have two different TopicIds for this topic name.
Broker may have a different topicId (older one) in metadata file and will complain about the mismatch when they encounter a new TopicId.

Kafka - Troubleshooting .NotEnoughReplicasException

I started seeing the following error
[2020-06-12 20:09:01,324] ERROR [ReplicaManager broker=3] Error processing append operation on partition __consumer_offsets-10 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.NotEnoughReplicasException: The size of the current ISR Set(3) is insufficient to satisfy the min.isr requirement of 2 for partition __consumer_offsets-10
My setup is having three brokers and all brokers are up. Couple of
things i did before this error was about pop up
I configured min.isr to be 2 in all the brokers. I created a topic
with replication factor 3 and starting producing the message from a
producer with ack = 1 with two brokers down. I brought up all the
brokers and started consumer.
How to go about troubleshooting this error
Consumer is also NOT be able to see this message ( not sure why, the message is supposed to be treated as "committed" as one broker was up when the producer was running)
Couple of facts
It is interesting to see rebalancing didnt happen yet WRT preferred leader starategy
$ kafka-topics --zookeeper 127.0.0.1:2181 --topic stock-prices --describe
Topic: stock-prices PartitionCount: 3 ReplicationFactor: 3 Configs: min.insync.replicas=2
Topic: stock-prices Partition: 0 Leader: 1 Replicas: 1,3,2 Isr: 1,2,3
Topic: stock-prices Partition: 1 Leader: 1 Replicas: 2,1,3 Isr: 1,2,3
Topic: stock-prices Partition: 2 Leader: 1 Replicas: 3,2,1 Isr: 1,2,3
Here's your problem:
You have set min.insync.replicas=2, which means you need at least two broker up and running to publish a message to a topic. If you let down 2 brokers, then you have only one left. Which means your insync.replica requirement is not fulfilled.
This has nothing to do with the Consumers, since this is about the brokers. When you set acks=1 that means your producer gets the acknowledgement when the message is published to one broker. (It will not acknowledge all the replicas are created).
So the problem is, you have your Producer, which gets acknowledged that the message is received, when a single broker (The leader) gets the message. But the leader cannot put replicas since there aren't any brokers up to sync.
One way to get this done is to set the acks=all, so your producer won't get acknowledged until all the replicas are done. It'll retry until the at least 2 in sync replicas are online.

Kafka Connect - Error while fetching metadata with correlation id UNKNOWN_TOPIC_OR_PARTITION

We have Kafka Connect running a postgres connector it is pulling changes from a DB and putting them into a topic. We are getting an error
ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to flush, timed out while waiting for producer to flush outstanding 1 messages
followed by
ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)
followed by
[2020-01-30 09:51:52,219] WARN [Producer clientId=producer-8] Error while fetching metadata with correlation id 606994 : {topicname=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient)
We have verified that the topic does exist, doing a describe on the topic give us this
zookeeper-1 [root#XX /bin]# ./kafka-topics --describe --zookeeper <zookeeper>:2181 --topic topicname
Topic:topicname PartitionCount:1 ReplicationFactor:3 Configs:
Topic: topicname Partition: 0 Leader: 2 Replicas: 2,3,1 Isr: 3,2,1
ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to flush, timed out while waiting for producer to flush outstanding 1 messages
and
[2020-01-30 09:51:52,219] WARN [Producer clientId=producer-8] Error while fetching metadata with correlation id 606994 : {topicname=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient)
are from the same issue. Have you tried reaching your Kafka cluster or upgrading your Kafka? The latest kafka-topics script uses broker hosts instead of ZooKeeper hosts to describe topics.
As for
ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)
Make sure that offset.storage.topic of your Kafka Connect has already been created in Kafka. Workers flush offsets into this topic.

Kafka - This server is not the leader for that topic-partition

I have two broker kafka 0.10.2.0 cluster.Replication factor is 2. I am running 1.0.0 kafka stream application against this Kafka. In my kafka stream application, producer config has retries = 10 and retry.backoff.ms = 100
After running few minutes, I observed following logs in Kakfa server.log. Due to this Kafka stream application is throwing 'NOT_LEADER_FOR_PARTITION' exception.
What may be the possible reason? Please help me.
[2017-12-12 10:26:02,583] ERROR [ReplicaFetcherThread-0-1], Error for partition [__consumer_offsets,22] to broker 1:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
Each topic is served by one or multiple Brokers - one is leader and the remaining brokers are followers.
A producer needs to send new messages to the leader Broker which internally replicate the data to all followers.
I assume, that your producer client does not connect to the correct Broker, its connect to a follower instead of the leader, and this follower rejects your send request.
Try to run ./kafka-topics.sh --zookeeper localhost:2181 --topic your_topic --describe
Topic:your_topic PartitionCount:3 ReplicationFactor:1 Configs:retention.ms=14400000
Topic: your_topic Partition: 0 Leader: 2 Replicas: 2 Isr: 2
Topic: your_topic Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: your_topic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
In this example you can see that your_topic have 3 partitions meaning all 3 brokers are leaders of that topic each on different partition, s.t broker 2 is leader on partition 0 and broker 0 and broker 1 are followers on partition 0.
Try setting these properties and see if it helps resolve the issue:
props.put(ProducerConfig.RETRIES_CONFIG, 10); //increase to 10 from default of 0
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG,
Integer.toString(Integer.MAX_VALUE)); // increase to infinity from default of 300 s
(Source)