Kafka long coordinator load time and small ISRs - apache-kafka

I'm using Kafka 0.8.2.1, running a topic with 200 partitions and RF=3, with log retention set to about 1GB.
An unknown event caused the cluster to enter the "coordinator load" or "group load" state. A few signals made this apparent: the pykafka-based consumers began to fail during FetchOffsetRequests with error code 14 COORDINATOR_LOAD_IN_PROGRESS for some subset of partitions. These errors were triggered when consuming with a consumer group that had existed since before the coordinator load. In broker logs, messages like this appeared:
[2018-05...] ERROR Controller 17 epoch 20 initiated state change for partition [my.cool.topic,144] from OnlinePartition to OnlinePartition failed (state.change.logger)
kafka.common.StateChangeFailedException: encountered error while electing leader for partition [my.cool.topic,144] due to: Preferred replica 11 for partition [my.cool.topic,144] is either not alive or not in the isr. Current leader and ISR: [{"leader":12,"leader_epoch":7,"isr":[12,13]}].
For some reason, Kafka decided that replica 11 was the "preferred" replica despite the fact that it was not in the ISR. To my knowledge, consumption could continue uninterrupted from either replica 12 or 13 while 11 resynchronized - it's not clear why Kafka chose a non-synced replica as the preferred leader.
The above-described behavior lasted for about 6 hours, during which the pykafka fetch_offsets error made message consumption impossible. While the coordinator load was still in progress, other consumer groups were able to consume the topic without error. In fact, the eventual fix was to restart the broken consumers with a new consumer_group name.
Questions
Is it normal or expected for the coordinator load state to last for 6 hours? Is this load time affected by log retention settings, message production rate, or other parameters?
Do non-pykafka clients handle COORDINATOR_LOAD_IN_PROGRESS by consuming only from the non-erroring partitions? Pykafka's insistence that all partitions return successful OffsetFetchResponses can be a source of consumption downtime.
Why does Kafka sometimes select a non-synced replica as the preferred replica during coordinator loads? How can I reassign partition leaders to replicas in the ISR?
Are all of these questions moot because I should just be using a newer version of Kafka?
Broker config options:
broker.id=10
port=9092
zookeeper.connect=****/kafka5
log.dirs=*****
delete.topic.enable=true
replica.fetch.max.bytes=1048576
replica.fetch.wait.max.ms=500
replica.high.watermark.checkpoint.interval.ms=5000
replica.socket.timeout.ms=30000
replica.socket.receive.buffer.bytes=65536
replica.lag.time.max.ms=10000
replica.lag.max.messages=4000
controller.socket.timeout.ms=30000
message.max.bytes=1000000
auto.create.topics.enable=false
log.index.interval.bytes=4096
log.index.size.max.bytes=10485760
log.retention.hours=96
log.roll.hours=168
log.retention.check.interval.ms=300000
log.segment.bytes=1073741824
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
num.io.threads=8
socket.request.max.bytes=104857600
num.replica.fetchers=4
controller.message.queue.size=10
num.partitions=8
log.flush.interval.ms=60000
log.flush.interval.messages=60000
log.flush.scheduler.interval.ms=2000
num.network.threads=8
socket.receive.buffer.bytes=1048576
socket.send.buffer.bytes=1048576
queued.max.requests=500
fetch.purgatory.purge.interval.requests=100
producer.purgatory.purge.interval.requests=100
controlled.shutdown.enable=true

Related

kafka + This server is not the leader for that topic-partition

I have 5 broker kafka version 0.10 cluster.Replication factor is 3. and this is production kafka
brokers IDS are
101
102
103
104
105
after couple months that cluster was ok , we observed following logs in Kakfa server.log.
from the log we can see many lines of 'This server is not the leader for that topic-partitionb' exception.
the topic - kopa.thrn.bvff have 100 partitions
and we can see that all 100 partitions are balanced and no need to run kafka kafka-reassign-partitions
What may be the possible reason?
Please help me.
[2023-01-19 11:53:37,434] ERROR [ReplicaFetcherThread-0-101], Error for partition [kopa.thrn.bvff,78] to broker 101:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
[2023-01-19 11:53:37,434] ERROR [ReplicaFetcherThread-0-101], Error for partition [kopa.thrn.bvff,23] to broker 101:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
[2023-01-19 11:53:37,434] ERROR [ReplicaFetcherThread-0-101], Error for partition [kopa.thrn.bvff,63] to broker 101:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
[2023-01-19 11:53:37,434] ERROR [ReplicaFetcherThread-0-101], Error for partition [kopa.thrn.bvff,98] to broker 101:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
[2023-01-19 11:53:37,434] ERROR [ReplicaFetcherThread-0-101], Error for partition [kopa.thrn.bvff,3] to broker 101:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
The leader broker and the follower brokers manage each partition in Kafka. Since you have replication factor 3, each partition will have one leader broker and 2 follower brokers.
When the Kafka producer produces data, it connects to the leader and puts the data there, the followers will copy the data from the leader.
Now, the Kafka leader broker can be reassigned based on the leader's availability, if the leader was unavailable for some time for any reason in a distributed environment (busy CPU, network partition etc), Kafka will run the leader election for the partition to elect a leader for the partition.
You can see who is the leader and who is the follower by topic describe command.
In your case, the partition leader has been changed due to some unavailability of the leader. If you have Kafka metrics, you could see those leader election events for the partition. It is hard in a distributed environment to ensure one broker will remain the leader forever.

Why kafka cluster did error "Number of alive brokers '0' does not meet the required replication factor"?

I have 2 kafka brokers and 1 zookeeper. Brokers config: server.properties file:
1 broker:
auto.create.topics.enable=true
broker.id=1
delete.topic.enable=true
group.initial.rebalance.delay.ms=0
listeners=PLAINTEXT://5.1.2.3:9092
log.dirs=/opt/kafka_2.12-2.1.0/logs
log.retention.check.interval.ms=300000
log.retention.hours=168
log.segment.bytes=1073741824
max.message.bytes=105906176
message.max.bytes=105906176
num.io.threads=8
num.network.threads=3
num.partitions=10
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
replica.fetch.max.bytes=105906176
socket.receive.buffer.bytes=102400
socket.request.max.bytes=105906176
socket.send.buffer.bytes=102400
transaction.state.log.min.isr=1
transaction.state.log.replication.factor=1
zookeeper.connect=5.1.3.6:2181
zookeeper.connection.timeout.ms=6000
2 broker:
auto.create.topics.enable=true
broker.id=2
delete.topic.enable=true
group.initial.rebalance.delay.ms=0
listeners=PLAINTEXT://18.4.6.6:9092
log.dirs=/opt/kafka_2.12-2.1.0/logs
log.retention.check.interval.ms=300000
log.retention.hours=168
log.segment.bytes=1073741824
max.message.bytes=105906176
message.max.bytes=105906176
num.io.threads=8
num.network.threads=3
num.partitions=10
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
replica.fetch.max.bytes=105906176
socket.receive.buffer.bytes=102400
socket.request.max.bytes=105906176
socket.send.buffer.bytes=102400
transaction.state.log.min.isr=1
transaction.state.log.replication.factor=1
zookeeper.connect=5.1.3.6:2181
zookeeper.connection.timeout.ms=6000
if i ask zookeeper like this:
echo dump | nc zook_IP 2181
i got:
SessionTracker dump:
Session Sets (3):
0 expire at Sun Jan 04 03:40:27 MSK 1970:
1 expire at Sun Jan 04 03:40:30 MSK 1970:
0x1000bef9152000b
1 expire at Sun Jan 04 03:40:33 MSK 1970:
0x1000147d4b40003
ephemeral nodes dump:
Sessions with Ephemerals (2):
0x1000147d4b40003:
/controller
/brokers/ids/2
0x1000bef9152000b:
/brokers/ids/1
looke fine, but not works :(. Zookeeper see 2 brokers, but in first kafka broker we have error:
ERROR [KafkaApi-1] Number of alive brokers '0' does not meet the required replication factor '1' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
also we use kafka_exporter for prometheus, and he log this error:
Cannot get oldest offset of topic Some.TOPIC partition 9: kafka server: Request was for a topic or partition that does not exist on this broker." source="kafka_exporter.go:296
pls help ! were i mistake in config ?
Are your clocks working? Zookeeper thinks it's 1970
Sun Jan 04 03:40:27 MSK 1970
You may want to look at the rest of the logs or see if Kafka and Zookeeper are actively running and ports are open.
In your first message, after starting a fresh cluster you see this, so it's not a true error
This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
The properties you show, though, have listeners on entirely different subnets and you're not using advertised.listeners
Kafka broker.id changes maybe cause this problem. Clean up the kafka metadata under zk, note: kafka data will be lost
I got this error message in this situation :
Cluster talking in SSL
Every broker is a container
Updated the certificate with new password inside ALL brokers
Rolling update
After the first broker reboot, it spammed this error message and the broker controller talked about "a new broker connected but password verification failed".
Solutions :
Set the new certificate password with the old password
Down then Up your entire cluster at once
(not tested yet) Change the certificate on one broker, reboot it then move to the next broker until you reboot all of them (ending with the controller)

Kafka broker constantly ISR shrinking and expanding?

We have a cluster of 4 nodes in production. We observed that one of the
nodes ran into a situation where it constantly shrunk and expanded ISR for
more than 1 hours and unable to recover until the broker was bounced.
[2017-02-21 14:52:16,518] INFO Partition [skynet-large-stage,5] on broker 0: Shrinking ISR for partition [skynet-large-stage,5] from 2,0 to 0 (kafka.cluster.Partition)
[2017-02-21 14:52:16,543] INFO Partition [skynet-large-stage,37] on broker 0: Shrinking ISR for partition [skynet-large-stage,37] from 1,0 to 0 (kafka.cluster.Partition)
[2017-02-21 14:52:16,544] INFO Partition [skynet-large-stage,13] on broker 0: Shrinking ISR for partition [skynet-large-stage,13] from 1,0 to 0 (kafka.cluster.Partition)
[2017-02-21 14:52:16,545] INFO Partition [__consumer_offsets,46] on broker 0: Shrinking ISR for partition [__consumer_offsets,46] from 3,2,0 to 3,0 (kafka.cluster.Partition)
.
.
I'd like to know what would cause this issue and why the broken broker was not kicked out of ISR.
Kafka version is 0.10.1.0
There was that bug in KAFKA-4477 that got fixed, but in general, I've seen this same problem when Kafka brokers time out when talking to a zookeeper node (default is 6000ms timeout), for some transient network blip, at which point they get kicked out of the cluster, partition leadership changes, clients have to rebalance, etc. For high volume clusters, it's a pain.
Simply increasing this timeout has helped me several times before:
zookeeper.session.timeout.ms
The default value according to the official docs is 6000ms. I found simply increasing it to 15000ms caused the cluster to be rock solid.
Documentation for 0.11.0 Kafka version: https://kafka.apache.org/0110/documentation.html

When does kafka change leader?

I was running my services that work with kafka already for a year and no spontaneous changes of leader happens.
But for the last 2 weeks that started happens quite often.
Kafka log on that:
[2015-09-27 15:35:14,826] INFO [ReplicaFetcherManager on broker 2]
Removed fetcher for partitions [myTopic] (kafka.server.ReplicaFetcherManager)
[2015-09-27 15:35:14,830] INFO Truncating log myTopic-0 to offset 11520979. (kafka.log.Log)
[2015-09-27 15:35:14,845] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 713276 from client ReplicaFetcherThread-0-2 on partition [myTopic,0] failed due to Leader not local for partition [myTopic,0] on broker 2 (kafka.server.ReplicaManager)
[2015-09-27 15:35:14,857] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 256685 from client mirrormaker-1 on partition [myTopic,0] failed due to Leader not local for partition [myTopic,0] on broker 2 (kafka.server.ReplicaManager)
[2015-09-27 15:35:20,171] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions [myTopic,0] (kafka.server.ReplicaFetcherManager)
What can cause switching leader? If there is info in some kafka documentation - please - just point the link. I've failed to find.
System configuration
kafka version: kafka_2.10-0.8.2.1
os: Red Hat Enterprise Linux Server release 6.5 (Santiago)
server.properties (differs from default):
broker.id=001
socket.send.buffer.bytes=1048576
socket.receive.buffer.bytes=1048576
socket.request.max.bytes=104857600
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.bytes=-1
controlled.shutdown.enable=true
auto.create.topics.enable=false
It appears like lead broker is down for that partition. It might be that data directroy(log.dirs) configured in server.properties is out of space and broker is not able to accommodate.
Also, what is replication factor of topic and cluster size of brokers?
I am assuming you have one topic and one partition with a replication factor of 2. Which is not a good configuration for optimal Kafka performance and consumers.
Your Logs are not clear enough for leader switch. Major issue in your topic may be having the only one leader due to the only partition. Now the single file in your logs is getting bigger in size day by day. Kafka internally does rebalancing at some level(details are not confirmed). That can be the reason for your leader switch. But i am not sure.
Also in your 2nd log line its says some of the logs are truncated. Can you please go though the logs in details and check is this happening only after truncation?
As you already mentioned you already checked your Kafka log directory files and their size. Please run the describe when you got this issue. The leader switch will reflect here as well. Or if you can setup some dashboard that will display the leader for past time. Then it will be easy for you to find the root cause.
bin/kafka-topics.sh --describe --zookeeper Zookeeperhost:Port --topic TopicName
Suggestion: i will suggest you to create a new topic with more partitions(read Kafka documentation to get a good idea about optimum number of partitions) and start writing to it. Or you can check, how to change partitions for current topic.
Last Thing: Is leader switch causing some issues in your Clients or you are worried only about warnings?

Why isn't kafka continuing to work on fail of one of the brokers?

I am under the impression that with two brokers with sync turned on my kafka setup should keep on working even on fail of one of the broker.
To test it I made a new topic named topicname. Its description is as follows:
Topic:topicname PartitionCount:1 ReplicationFactor:1 Configs:
Topic: topicname Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Then I ran producer.sh and consumer.sh in the following way:
bin/kafka-console-producer.sh --broker-list localhost:9092,localhost:9095 sync --topic topicname
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topicname --from-beginning
Till both the brokers were working I saw that messages were being received properly by the consumer, but when I killed one of the instance of the brokers through kill command then the consumer stopped showing me any new messages. Instead it showed me the following error message:
WARN [ConsumerFetcherThread-console-consumer-57116_ip-<internalipvalue>-1438604886831-603de65b-0-0], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 865; ClientId: console-consumer-57116; ReplicaId: -1; MaxWait: 100 ms; MinBytes: 1 bytes; RequestInfo: [topicname,0] -> PartitionFetchInfo(9,1048576). Possible cause: java.nio.channels.ClosedChannelException (kafka.consumer.ConsumerFetcherThread)
[2015-08-03 12:29:36,341] WARN Fetching topic metadata with correlation id 1 for topics [Set(topicname)] from broker [id:0,host:<hostname>,port:9092] failed (kafka.client.ClientUtils$)
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)
at kafka.producer.SyncProducer.send(SyncProducer.scala:113)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
I had this similar problem, setting the producer config "topic.metadata.refresh.interval.ms" to -1 (or whatever value is suitable for you) solved the issue for me.
So in my case , I had 3 broker (multi broker set up on my local machine) and created the topic with 3 partitions and replication factor 2.
Test set up:
Before the producer config:
Tried 3 brokers running , killed one of the brokers after producer started, the local Zookeeper updated the ISR and topic metadata info (removed down broker as leader) but the producer did not pick it up (may be due to default 10 mins refresh time).So messages end up failing. I get send exceptions.
After the producer config (-1 in my case):
Tried 3 brokers running , killed one of the brokers after producer started, the local Zookeeper updated the ISR info (removed down broker as leader), the producer refreshed the new ISR/topic metadata info and messages send did not fail.
-1 makes it refresh topic metadata on each failed attempt so may be you want to reduce the refresh time to something reasonable instead.
I think there are two things can make your consumer not work after a broker down for kafka HA cluster:
--replication-factor should bigger than 1 for your topic. so every topic partition can have at least one backup.
replication factor for internal topics for kafka configuration should also bigger than 1:
offsets.topic.replication.factor = 3
transaction.state.log.replication.factor = 3
transaction.state.log.min.isr = 2
This two modification make my producer and consumer still work after broker shutdown (5 broker and every broker goes down once) .
You can see in the topic description that you posted that your topic has only a single replica.
With a single replica there is no fault tolerance and if broker 0 (the broker that contains the replica) goes away, the topic will be unavailable.
Create a topic with more replicas (with --replication-factor 3) to have fault tolerance in case of crashes.
I had run into into the same problem even when using a topic with replication factor of 2.
Setting the following property on the producer worked for me.
"metadata.max.age.ms". (Kafka-0.8.2.1)
Else, my Producer was waiting for 1 minute by default to fetch the new leader and start contacting it
For a topic with replication factor N, Kafka tolerate up to N-1 server failures. E.g. having a replication factor 3 will allow you to handle upto 2 server failure.