Kafka topic partitions with leader -1 - apache-kafka

I noticed that few of my kafka topics are behaving in a manner i cannot explain clearly.
For eg:
./kafka-topics.sh --describe --zookeeper ${ip}:2181 --topic test
Topic:test PartitionCount:3 ReplicationFactor:1 Configs:retention.ms=1209600000
Topic: test Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Topic: test Partition: 1 Leader: -1 Replicas: 2 Isr: 2
Topic: test Partition: 2 Leader: 3 Replicas: 3 Isr: 3
I am particularly concerned about Partition: 1 which shows Leader '-1'.
I also notice that roughly 1/3 of the messages produced to this topic fail due to a 'Timeout'. This I believe is a consequence of one partition not having a leader.
I was wondering if anyone has insights into why this issue occurs and how to recover from this in a Production scenario without losing data?
EDIT:
I am using the librdkafka based python producer; and the error message I see is Message failed delivery: KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}

Most probably your second kafka broker is down.
In order to check active Kafka brokers you need to run
./zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids"
And the output should be similar to the one below:
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /brokers/ids
[0, 1, 2]
[zk: localhost:2181(CONNECTED) 1]
If the second broker is not listed in the active brokers then you need to figure out why is not up and running (logs should tell you if something went wrong). I would also suggest to increase the replication-factor since you have a multi-broker configuration.

This often indicates that the broker leading that partition is offline. I would check the offline partitions metric to confirm this, but also check whether broker 2 is currently functional.

Related

Kafka 3.1.0 cluster stopped working with errors INCONSISTENT_TOPIC_ID and UNKNOWN_TOPIC_ID

So I've been using Kafka 3.1.0 in production environment. One of the VMs had to be live migrated, but due to some issues live migration failed and node has been forcefully migrated, involving full VM restart.
After that VM booted up, Kafka stopped working "completely" - clients were not able to connect and produce/consume anything. JMX metrics were still showing up, but that node showed many partitions as "Offline partitions".
Looking into the logs, that particular node kept showing A LOT of INCONSISTENT_TOPIC_ID errors. Example:
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-3. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-2. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-3. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-2. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-3. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
However, if you take a look at other Kafka brokers, they were showing a bit different errors (I don't have a logs sample) - UNKNOWN_TOPIC_ID...
Another interesting issue - I've described Kafka topic and this is what I've got:
Topic: my-topic TopicId: XXXXXXXXXXXXXXXXXXXXXX PartitionCount: 4 ReplicationFactor: 4 Configs: segment.bytes=214748364,unclean.leader.election.enable=true,retention.bytes=214748364
Topic: my-topic Partition: 0 Leader: 2 Replicas: 5,2,3,0 Isr: 2
Topic: my-topic Partition: 1 Leader: 0 Replicas: 0,1,2,3 Isr: 0
Topic: my-topic Partition: 2 Leader: 2 Replicas: 1,2,3,4 Isr: 2
Topic: my-topic Partition: 3 Leader: 2 Replicas: 2,3,4,5 Isr: 2
Why does it show only 1 ISR when there should be 4 per partition? Why did it happen in the first place?
I've added additional partition and this is what it shows now:
Topic: my-topic TopicId: XXXXXXXXXXXXXXXXXXXXXX PartitionCount: 5 ReplicationFactor: 4 Configs: segment.bytes=214748364,unclean.leader.election.enable=true,retention.bytes=214748364
Topic: my-topic Partition: 0 Leader: 2 Replicas: 5,2,3,0 Isr: 2
Topic: my-topic Partition: 1 Leader: 0 Replicas: 0,1,2,3 Isr: 0
Topic: my-topic Partition: 2 Leader: 2 Replicas: 1,2,3,4 Isr: 2
Topic: my-topic Partition: 3 Leader: 2 Replicas: 2,3,4,5 Isr: 2
Topic: my-topic Partition: 4 Leader: 3 Replicas: 3,4,5,0 Isr: 3,4,5,0
I know there is kafka-reassign-partitions.sh script and it fixed similar issue in preproduction environment, but I am more interested why did it happen in the first place?
Could this be related? I've set the parameter replica.lag.time.max.ms=5000 (over default 500) and even after restarting all nodes it didn't help.
This normally happens when the topic ID in the session does not match the topic ID in the log. To fix this issue you will have to make sure that the topic ID remains consistent across your cluster.
If you are using zookeeper, run this command in zkCli.sh, on one of your nodes that are still in sync and note the topic_id -
[zk: localhost:2181(CONNECTED) 10] get /brokers/topics/my-topic
{"partitions":{"0":[5,1,2],"1":[5,1,2],"2":[5,1,2],"3":[5,1,2],"4":
[5,1,2],"5":[5,1,2],"6":[5,1,2],"7":[5,1,2],"8":[5,1,2],"9":
[5,1,2]},"topic_id":"s3zoLdMp-T3CIotKlkBpMgL","adding_replicas":
{},"removing_replicas":{},"version":3}
Next, for each node, check the file partition.metadata for all the partitions of the topic my-topic. This file can be found in logs.dir (see server.properties).
For example, if logs.dir is set to /media/kafka-data, you can find it at -
/media/kafka-data/my-topic-1/partition.meta for partition 1.
/media/kafka-data/my-topic-2/partition.meta for partition 2, and so on.
The contents of the file may look like this, (you see it matches the topic_id that zookeeper has) -
version: 0
topic_id: s3zoLdMp-T3CIotKlkBpMgL
You'll need to make sure that the value of topic_id in all the parition.metadata files across your cluster for my-topic is the same. If you come across a different topic ID in any of the partitions you can edit it with any text editor (or write a script to do this for you).
Once done, you may need to restart your brokers one at a time for this change to take effect.
I will try to answer why the topic Id in Zookeeper may differ from topic Id stored in partition.metadata:
In certain situations, it is possible that the TopicId stored locally on a broker for a topic differs from the topicId stored for that topic on Zk. Currently, such situation arises when users use a <2.8 client to alterPartitions for a topic on a >=2.8 (including latest 3.4) brokers AND they use --zookeeper flag from the client. Note that --zookeeper has been marked deprecated for a long time and has been replaced by --bootstrap-server which doesn't face this problem.
The result of topic Id discrepancy leads to availability loss for the topic until user performs the mitigation steps listed in KAFKA-14190.
The exact sequence of steps are:
User uses pre 2.8 client to create a new topic in zookeeper directly
No TopicId is generated in Zookeeper
KafkaController listens to the ZNode, and a TopicChange event is created, During handling on this event, controller notices that there is no TopicId, it generated a new one and updates Zk.
At this stage, Zk has a TopicId.
User uses pre 2.8 client to increase the number of partitions for this topic
The client will replace/overwrite the entire existing Znode with new placement information. This will delete the existing TopicId in Zk (that was created by controller in step 3).
Next time KafkaController interacts with this ZNode, it will generate a new TopicId.
Note that we now have two different TopicIds for this topic name.
Broker may have a different topicId (older one) in metadata file and will complain about the mismatch when they encounter a new TopicId.

Kafka messages are stored in cluster/ensemble but aren't retrieved by consumer

I have a 2 server zookeeper+Kafka setup with the following config duplicated over the two servers:
Kafka config
broker.id=1 #2 for the second server
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://server1_ip:9092
zookeeper.connect=server1_ip:2181,server2_ip:2181
num.partitions=3
offsets.topic.replication.factor=2 #how many servers should every topic be replicated to
zookeeper config
dataDir=/tmp/zookeeper
clientPort=2181
maxClientCnxns=200
admin.enableServer=false
server.1=server1_ip:2888:3888
server.2=server2_ip:2888:3888
initLimit=20
syncLimit=10
Successfully created a topic using:
/usr/local/kafka/bin/kafka-topics.sh --create --zookeeper server1_ip:2181,server2_ip:2181 --replication-factor 2 --partitions 3 --topic replicatedtest
Doing a Describe on topic using:
/usr/local/kafka/bin/kafka-topics.sh --bootstrap-server=server1_ip:2181,server2_ip:2181 --describe --topic replicatedtest
shows the following:
Topic: replicatedtest PartitionCount: 3 ReplicationFactor: 2 Configs: segment.bytes=1073741824
Topic: replicatedtest Partition: 0 Leader: 2 Replicas: 2,1 Isr: 2,1
Topic: replicatedtest Partition: 1 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: replicatedtest Partition: 2 Leader: 2 Replicas: 2,1 Isr: 2,1
At this point everything looks good. However when I push messages using the following:
/usr/local/kafka/bin/kafka-console-producer.sh --broker-list server1_ip:9092,server2_ip:9092 --topic replicatedtest
>Hello
>Hi
and call the consumer script using:
/usr/local/kafka/bin/kafka-console-consumer.sh --bootstrap-server server1_ip:9092,server2_ip:9092 --topic replicatedtest --from-beginning
The consumer just stalls.
When I check if these messages exist via an admin UI (KafkaMagic) they do come up. So looks like the messages are stored successfully but for some reason the consumer script can't get to them.
Any ideas?
Many thanks in advance!
==Edit==
added a 3rd server. Changed log level to TRACE in tools-log4j.properties and this is what the consumer script outputs:
https://docs.google.com/document/d/12nfML7M2a5QyXQswIZ_QVGuNqkc2DTRLPKqvRfDWKDY/edit?usp=sharing
Some corrections,
offsets.topic.replication.factor does not set default replication factor for the created topics, it is for the internal __offset topic that keeps your offsets of the consumer group
I have never heard or seen a setup with 2 zookeepers, the recommended is odd number where 1 / 3 or 5 at maximum , where usually 3 is more then enough.
for brokers as well the recommended setup is at least 3 replicas with minimum in sync replica of 2
For further assistance please provide the logs of the server / set consumer at debug / issue consumer-groups describe / issue the console consumer with --group test1 for easier investigation
Update: according to your provided docs
The group coordinator is not available
"
I faced a similar issue. The problem was when you start your Kafka broker there is a property associated with the offset topic replication factor which it default to 3
"
Can you do topic --list and make sure __consumer_offsets topic exists?
If not please create it and restart brokers/zookeeprs and try consume again
kafka-topics --bootstrap-server node1:9092 --partitions 50 --replication-factor 3 --create --topic __consumer_offsets
Looks like issue happened because I started off with one node and then decided to move to a cluster/ensemble setup that includes the original node. __consumer_offsets apparently needed to be reset in this case. This is what I ended up doing to solve issue:
stop zookeeper and kafka on all 3 servers
systemctl stop kafka
systemctl stop zookeeper
delete existing topic data
rm -rf /tmp/kafka-logs
rm -rf /tmp/zookeeper/version-2
delete __consumer_offsets (calling the same delete command on each zookeeper instance might not be necessary)
/usr/local/kafka/bin/zookeeper-shell.sh server1_ip:2181 <<< "deleteall /brokers/topics/__consumer_offsets"
/usr/local/kafka/bin/zookeeper-shell.sh server2_ip:2181 <<< "deleteall /brokers/topics/__consumer_offsets"
/usr/local/kafka/bin/zookeeper-shell.sh server3_ip:2181 <<< "deleteall /brokers/topics/__consumer_offsets"
restart servers
systemctl start zookeeper
systemctl start kafka
recreate __consumer_offsets
/usr/local/kafka/bin/kafka-topics.sh --zookeeper server1_ip:2181,server2_ip:2181,server3_ip:2181 --create --topic __consumer_offsets --partitions 50 --replication-factor 3
Solution was based off: https://community.microstrategy.com/s/article/Kafka-cluster-health-check-fails-with-the-error-Group-coordinator-lookup-failed-The-coordinator-is-not-available?language=en_US

Kafka - Troubleshooting .NotEnoughReplicasException

I started seeing the following error
[2020-06-12 20:09:01,324] ERROR [ReplicaManager broker=3] Error processing append operation on partition __consumer_offsets-10 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.NotEnoughReplicasException: The size of the current ISR Set(3) is insufficient to satisfy the min.isr requirement of 2 for partition __consumer_offsets-10
My setup is having three brokers and all brokers are up. Couple of
things i did before this error was about pop up
I configured min.isr to be 2 in all the brokers. I created a topic
with replication factor 3 and starting producing the message from a
producer with ack = 1 with two brokers down. I brought up all the
brokers and started consumer.
How to go about troubleshooting this error
Consumer is also NOT be able to see this message ( not sure why, the message is supposed to be treated as "committed" as one broker was up when the producer was running)
Couple of facts
It is interesting to see rebalancing didnt happen yet WRT preferred leader starategy
$ kafka-topics --zookeeper 127.0.0.1:2181 --topic stock-prices --describe
Topic: stock-prices PartitionCount: 3 ReplicationFactor: 3 Configs: min.insync.replicas=2
Topic: stock-prices Partition: 0 Leader: 1 Replicas: 1,3,2 Isr: 1,2,3
Topic: stock-prices Partition: 1 Leader: 1 Replicas: 2,1,3 Isr: 1,2,3
Topic: stock-prices Partition: 2 Leader: 1 Replicas: 3,2,1 Isr: 1,2,3
Here's your problem:
You have set min.insync.replicas=2, which means you need at least two broker up and running to publish a message to a topic. If you let down 2 brokers, then you have only one left. Which means your insync.replica requirement is not fulfilled.
This has nothing to do with the Consumers, since this is about the brokers. When you set acks=1 that means your producer gets the acknowledgement when the message is published to one broker. (It will not acknowledge all the replicas are created).
So the problem is, you have your Producer, which gets acknowledged that the message is received, when a single broker (The leader) gets the message. But the leader cannot put replicas since there aren't any brokers up to sync.
One way to get this done is to set the acks=all, so your producer won't get acknowledged until all the replicas are done. It'll retry until the at least 2 in sync replicas are online.

Kafka - This server is not the leader for that topic-partition

I have two broker kafka 0.10.2.0 cluster.Replication factor is 2. I am running 1.0.0 kafka stream application against this Kafka. In my kafka stream application, producer config has retries = 10 and retry.backoff.ms = 100
After running few minutes, I observed following logs in Kakfa server.log. Due to this Kafka stream application is throwing 'NOT_LEADER_FOR_PARTITION' exception.
What may be the possible reason? Please help me.
[2017-12-12 10:26:02,583] ERROR [ReplicaFetcherThread-0-1], Error for partition [__consumer_offsets,22] to broker 1:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
Each topic is served by one or multiple Brokers - one is leader and the remaining brokers are followers.
A producer needs to send new messages to the leader Broker which internally replicate the data to all followers.
I assume, that your producer client does not connect to the correct Broker, its connect to a follower instead of the leader, and this follower rejects your send request.
Try to run ./kafka-topics.sh --zookeeper localhost:2181 --topic your_topic --describe
Topic:your_topic PartitionCount:3 ReplicationFactor:1 Configs:retention.ms=14400000
Topic: your_topic Partition: 0 Leader: 2 Replicas: 2 Isr: 2
Topic: your_topic Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: your_topic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
In this example you can see that your_topic have 3 partitions meaning all 3 brokers are leaders of that topic each on different partition, s.t broker 2 is leader on partition 0 and broker 0 and broker 1 are followers on partition 0.
Try setting these properties and see if it helps resolve the issue:
props.put(ProducerConfig.RETRIES_CONFIG, 10); //increase to 10 from default of 0
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG,
Integer.toString(Integer.MAX_VALUE)); // increase to infinity from default of 300 s
(Source)

Kafka uncommitted messages

Lets say the partition has 4 replicas (1 leader, 3 followers) and all are currently in sync. min.insync.replicas is set to 3 and request.required.acks is set to all or -1.
The producer send a message to the leader, the leader appends it to it's log. After that, two of the replicas crashed before they could fetch this message. One remaining replica successfully fetched the message and appended to it's own log.
The leader, after certain timeout, will send an error (NotEnoughReplicas, I think) to the producer since min.insync.replicas condition is not met.
My question is: what will happen to the message which was appended to leader and one of the replica's log?
Will it be delivered to the consumers when crashed replicas come back online and broker starts accepting and committing new messages (i.e. high watermark is forwarded in the log)?
If there is no min.insync.replicas available and producer uses ack=all, then the message is not committed and consumers will not receive that message, even after crashed replicas come back and are added to the ISR list again. You can test this in the following way.
Start two brokers with min.insync.replicas = 2
$ ./bin/kafka-server-start.sh ./config/server-1.properties
$ ./bin/kafka-server-start.sh ./config/server-2.properties
Create a topic with 1 partition and RF=2. Make sure both brokers are in the ISR list.
$ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --create --topic topic1 --partitions 1 --replication-factor 2
Created topic "topic1".
$ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --describe --topic topic1
Topic:topic1 PartitionCount:1 ReplicationFactor:2 Configs:
Topic: topic1 Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Run console consumer and console producer. Make sure produce uses ack=-1
$ ./bin/kafka-console-consumer.sh --new-consumer --bootstrap-server kafka-1:9092,kafka-2:9092 --topic topic1
$ ./bin/kafka-console-producer.sh --broker-list kafka-1:9092,kafka-2:9092 --topic topic1 --request-required-acks -1
Produce some messages. Consumer should receive them.
Kill one of the brokers (I killed broker with id=2). Check that ISR list is reduced to one broker.
$ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --describe --topic topic1
Topic:topic1 PartitionCount:1 ReplicationFactor:2 Configs:
Topic: topic1 Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1
Try to produce again. In the producer you should get some
Error: NOT_ENOUGH_REPLICAS
(one per retry) and finally
Messages are rejected since there are fewer in-sync replicas than required.
Consumer will not receive these messages.
Restart the killed broker and try to produce again.
Consumer will receive these message but not those that you sent while one of the replicas was down.
From my understanding, the watermark will not advance until both failed
follow-broker recovered and caught up.
See this blog post for more details: http://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/
Error observerd
Messages are rejected since there are fewer in-sync replicas than required.
To resolve this i had increase the number of replication factors and it worked