Kafka broker constantly ISR shrinking and expanding?

Kafka broker constantly ISR shrinking and expanding? - apache-kafka

We have a cluster of 4 nodes in production. We observed that one of the
nodes ran into a situation where it constantly shrunk and expanded ISR for
more than 1 hours and unable to recover until the broker was bounced.
[2017-02-21 14:52:16,518] INFO Partition [skynet-large-stage,5] on broker 0: Shrinking ISR for partition [skynet-large-stage,5] from 2,0 to 0 (kafka.cluster.Partition)
[2017-02-21 14:52:16,543] INFO Partition [skynet-large-stage,37] on broker 0: Shrinking ISR for partition [skynet-large-stage,37] from 1,0 to 0 (kafka.cluster.Partition)
[2017-02-21 14:52:16,544] INFO Partition [skynet-large-stage,13] on broker 0: Shrinking ISR for partition [skynet-large-stage,13] from 1,0 to 0 (kafka.cluster.Partition)
[2017-02-21 14:52:16,545] INFO Partition [__consumer_offsets,46] on broker 0: Shrinking ISR for partition [__consumer_offsets,46] from 3,2,0 to 3,0 (kafka.cluster.Partition)
.
.
I'd like to know what would cause this issue and why the broken broker was not kicked out of ISR.
Kafka version is 0.10.1.0

There was that bug in KAFKA-4477 that got fixed, but in general, I've seen this same problem when Kafka brokers time out when talking to a zookeeper node (default is 6000ms timeout), for some transient network blip, at which point they get kicked out of the cluster, partition leadership changes, clients have to rebalance, etc. For high volume clusters, it's a pain.
Simply increasing this timeout has helped me several times before:
zookeeper.session.timeout.ms
The default value according to the official docs is 6000ms. I found simply increasing it to 15000ms caused the cluster to be rock solid.
Documentation for 0.11.0 Kafka version: https://kafka.apache.org/0110/documentation.html

Related

Offset for consumer group resetted for one partition

During the last maintenance of Kafka, which required a rolling restart of kafka brokers, we witnessed a reset for consumer group offsets for certain partitions.
At 11:14 am, everything is fine for the consumer group and we don't see a consumer lag:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 0 105130857 105130857 0 st-...
...
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 6 78591770 78591770 0 st-...
However 5 minutes later, during the rolling restart of brokers, we have a reset for one partition and a consumer lag of millions of events.
$ bin/kafka-consumer-groups --bootstrap-server XXX:9093,XXX... --command-config secrets.config --group st-xx --describe
Note: This will not show information about old Zookeeper-based consumers.
[2019-08-26 12:44:13,539] WARN Connection to node -5 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2019-08-26 12:44:13,707] WARN [Consumer clientId=consumer-1, groupId=st-xx] Connection to node -5 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Consumer group 'st-xx' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 0 105132096 105132275 179
...
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 6 15239401 78593165 63353764 ...
In the last two hours, the offset for the partition hasn't recovered and we need to patch it now manually. We had similar issues during the last rolling restart of brokers.
Has anyone seen something like this before? The only clue we could find is this ticket, however we run Kafka version: 1.0.1-kafka3.1.0,

Subset of stream's changelog and repartition partitions not available as broker is down - how stream should behave?

My setup consists of 3 kafka brokers (2.11-1.1.1), a single ZK and a java service that is using the Streams API.
The java service is consuming from topic A, performs a persistent stream operation (backed up by a changelog and a repartition streams topic) and writes to topic B. EOS semantics are enabled.
Given that the changelog and repartition topics have replication factor of 1, how should the streams java app behave in case one of my brokers is down (e.g. in my DEV env the disk is full only for one broker). Will the stream continue to consume even if 1/3 of the changelog and repartition partitions are not reachable?
EDIT: Also given that topics A, B and __consumer_offsets have RF=3.
In my java service logs I see:
2019-01-04 09:14:38,787 UTC WARN kafka-producer-network-thread | trsb-app-
nonprod.snapshot-14fa12b2-ac15-4ecc-8729-8f6c4a0034a7-StreamThread-2-0_4-
producer org.apache.kafka.clients.NetworkClient warn | [Producer
clientId=trsb-app-nonprod.snapshot-14fa12b2-ac15-4ecc-8729-8f6c4a0034a7-
StreamThread-2-0_4-producer, transactionalId=trsb-app-nonprod.snapshot-0_4]
Connection to node 1 could not be established. Broker may not be available.
2019-01-04 09:14:38,797 UTC WARN kafka-producer-network-thread | trsb-app-
nonprod.snapshot-14fa12b2-ac15-4ecc-8729-8f6c4a0034a7-StreamThread-2-1_10-
producer org.apache.kafka.clients.NetworkClient warn | [Producer
clientId=trsb-app-nonprod.snapshot-14fa12b2-ac15-4ecc-8729-8f6c4a0034a7-
StreamThread-2-1_10-producer, transactionalId=trsb-app-nonprod.snapshot-
1_10] Connection to node 1 could not be established. Broker may not be
available.
And nothing is consumed.
In both working broker logs I see:
[2019-01-04 13:56:56,449] WARN Resetting first dirty offset of trsb-app-
nonprod.snapshot-store.invoices-changelog-43 to log start offset 99 since
the checkpointed offset 95 is invalid. (kafka.log.LogCleanerManager$)
[2019-01-04 13:56:56,449] WARN Resetting first dirty offset of trsb-app-
nonprod.snapshot-store.invoices-changelog-40 to log start offset 103 since
the checkpointed offset 100 is invalid. (kafka.log.LogCleanerManager$)

Since you are using exactly once semantics, a minimum of 3 brokers are needed to continue processing, so your app would not continue to process if one of the brokers went down. Read here (see processing.guarantee section) for more info regarding this:
https://kafka.apache.org/10/documentation/streams/developer-guide/config-streams.html#id25

The stream continues to consume, but as the state store, depending on the message key, may no be pushable to its corresponding changelog partition, some keys may fail and these transactions will fail and be rollbacked. As a result, the first key on topic A that once consumed will cause the state store push to fail, will block its partition till the broker is up again. This is because the state store push is part of the EOS transaction.

Initiated state change for partition from OfflinePartition to OnlinePartition failed

I reconfigured my kafka cluster, changing:
default replication factor from 1 to 3 and also
changing the location of the kafka data dir on disk
So after restarting all nodes, the cluster seemed ok but then I noticed all the topics are failing to come online. In the logs there are messages like this for each topic:
state-change.log: [2018-02-01 12:41:42,176] ERROR Controller 826437096 epoch 19 initiated state change for partition [filedrop,0] from OfflinePartition to OnlinePartition failed (state.change.logger)
So none of the topics are usable; Listing topics with kafkacat -L -b shows leaders not available.
Metadata for all topics (from broker -1: lol-045:9092/bootstrap):
7 brokers:
broker 826437096 at lol-044:9092
broker 746155422 at lol-047:9092
broker 651737161 at lol-046:9092
broker 728512596 at lol-048:9092
broker 213763378 at lol-045:9092
broker 622553932 at lol-049:9092
broker 746727274 at lol-050:9092
14 topics:
topic "lol.stripped" with 3 partitions:
partition 2, leader -1, replicas:, isrs:, Broker: Leader not available
partition 1, leader -1, replicas:, isrs:, Broker: Leader notavailable
partition 0, leader -1, replicas:, isrs:, Broker: Leader not available
However, newly created topics are correctly replicated and healthy
topic "lol-kafka-health" with 3 partitions:
partition 2, leader 622553932, replicas: 622553932,213763378,651737161, isrs: 622553932,213763378,651737161
partition 1, leader 213763378, replicas: 622553932,213763378,826437096, isrs: 213763378,826437096,622553932
partition 0, leader 826437096, replicas: 213763378,746727274,826437096, isrs: 826437096,746727274,213763378
So I think some kind of metadata corruption happened during the reconfigure
My question is:
Is there any way I can get these topic partitions online again ?
Given that:
the broker ids were changed during the reconfigure
the zookeeper cluster for kafka went down temporarily during the reconfig
In addition, are there some procedures I can use to investigate how recoverable these topics are?
Many thanks in advance!

The procedure described here allowed me to re-assign leaderless artitions to leaders with new broker Ids:
https://community.cloudera.com/t5/Data-Ingestion-Integration/Move-partitions-from-invalid-leader/td-p/43334

When does kafka change leader?

I was running my services that work with kafka already for a year and no spontaneous changes of leader happens.
But for the last 2 weeks that started happens quite often.
Kafka log on that:
[2015-09-27 15:35:14,826] INFO [ReplicaFetcherManager on broker 2]
Removed fetcher for partitions [myTopic] (kafka.server.ReplicaFetcherManager)
[2015-09-27 15:35:14,830] INFO Truncating log myTopic-0 to offset 11520979. (kafka.log.Log)
[2015-09-27 15:35:14,845] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 713276 from client ReplicaFetcherThread-0-2 on partition [myTopic,0] failed due to Leader not local for partition [myTopic,0] on broker 2 (kafka.server.ReplicaManager)
[2015-09-27 15:35:14,857] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 256685 from client mirrormaker-1 on partition [myTopic,0] failed due to Leader not local for partition [myTopic,0] on broker 2 (kafka.server.ReplicaManager)
[2015-09-27 15:35:20,171] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions [myTopic,0] (kafka.server.ReplicaFetcherManager)
What can cause switching leader? If there is info in some kafka documentation - please - just point the link. I've failed to find.
System configuration
kafka version: kafka_2.10-0.8.2.1
os: Red Hat Enterprise Linux Server release 6.5 (Santiago)
server.properties (differs from default):
broker.id=001
socket.send.buffer.bytes=1048576
socket.receive.buffer.bytes=1048576
socket.request.max.bytes=104857600
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.bytes=-1
controlled.shutdown.enable=true
auto.create.topics.enable=false

It appears like lead broker is down for that partition. It might be that data directroy(log.dirs) configured in server.properties is out of space and broker is not able to accommodate.
Also, what is replication factor of topic and cluster size of brokers?

I am assuming you have one topic and one partition with a replication factor of 2. Which is not a good configuration for optimal Kafka performance and consumers.
Your Logs are not clear enough for leader switch. Major issue in your topic may be having the only one leader due to the only partition. Now the single file in your logs is getting bigger in size day by day. Kafka internally does rebalancing at some level(details are not confirmed). That can be the reason for your leader switch. But i am not sure.
Also in your 2nd log line its says some of the logs are truncated. Can you please go though the logs in details and check is this happening only after truncation?
As you already mentioned you already checked your Kafka log directory files and their size. Please run the describe when you got this issue. The leader switch will reflect here as well. Or if you can setup some dashboard that will display the leader for past time. Then it will be easy for you to find the root cause.
bin/kafka-topics.sh --describe --zookeeper Zookeeperhost:Port --topic TopicName
Suggestion: i will suggest you to create a new topic with more partitions(read Kafka documentation to get a good idea about optimum number of partitions) and start writing to it. Or you can check, how to change partitions for current topic.
Last Thing: Is leader switch causing some issues in your Clients or you are worried only about warnings?

Why isn't kafka continuing to work on fail of one of the brokers?

I am under the impression that with two brokers with sync turned on my kafka setup should keep on working even on fail of one of the broker.
To test it I made a new topic named topicname. Its description is as follows:
Topic:topicname PartitionCount:1 ReplicationFactor:1 Configs:
Topic: topicname Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Then I ran producer.sh and consumer.sh in the following way:
bin/kafka-console-producer.sh --broker-list localhost:9092,localhost:9095 sync --topic topicname
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topicname --from-beginning
Till both the brokers were working I saw that messages were being received properly by the consumer, but when I killed one of the instance of the brokers through kill command then the consumer stopped showing me any new messages. Instead it showed me the following error message:
WARN [ConsumerFetcherThread-console-consumer-57116_ip-<internalipvalue>-1438604886831-603de65b-0-0], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 865; ClientId: console-consumer-57116; ReplicaId: -1; MaxWait: 100 ms; MinBytes: 1 bytes; RequestInfo: [topicname,0] -> PartitionFetchInfo(9,1048576). Possible cause: java.nio.channels.ClosedChannelException (kafka.consumer.ConsumerFetcherThread)
[2015-08-03 12:29:36,341] WARN Fetching topic metadata with correlation id 1 for topics [Set(topicname)] from broker [id:0,host:<hostname>,port:9092] failed (kafka.client.ClientUtils$)
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)
at kafka.producer.SyncProducer.send(SyncProducer.scala:113)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

I had this similar problem, setting the producer config "topic.metadata.refresh.interval.ms" to -1 (or whatever value is suitable for you) solved the issue for me.
So in my case , I had 3 broker (multi broker set up on my local machine) and created the topic with 3 partitions and replication factor 2.
Test set up:
Before the producer config:
Tried 3 brokers running , killed one of the brokers after producer started, the local Zookeeper updated the ISR and topic metadata info (removed down broker as leader) but the producer did not pick it up (may be due to default 10 mins refresh time).So messages end up failing. I get send exceptions.
After the producer config (-1 in my case):
Tried 3 brokers running , killed one of the brokers after producer started, the local Zookeeper updated the ISR info (removed down broker as leader), the producer refreshed the new ISR/topic metadata info and messages send did not fail.
-1 makes it refresh topic metadata on each failed attempt so may be you want to reduce the refresh time to something reasonable instead.

I think there are two things can make your consumer not work after a broker down for kafka HA cluster:
--replication-factor should bigger than 1 for your topic. so every topic partition can have at least one backup.
replication factor for internal topics for kafka configuration should also bigger than 1:
offsets.topic.replication.factor = 3
transaction.state.log.replication.factor = 3
transaction.state.log.min.isr = 2
This two modification make my producer and consumer still work after broker shutdown (5 broker and every broker goes down once) .

You can see in the topic description that you posted that your topic has only a single replica.
With a single replica there is no fault tolerance and if broker 0 (the broker that contains the replica) goes away, the topic will be unavailable.
Create a topic with more replicas (with --replication-factor 3) to have fault tolerance in case of crashes.

I had run into into the same problem even when using a topic with replication factor of 2.
Setting the following property on the producer worked for me.
"metadata.max.age.ms". (Kafka-0.8.2.1)
Else, my Producer was waiting for 1 minute by default to fetch the new leader and start contacting it

For a topic with replication factor N, Kafka tolerate up to N-1 server failures. E.g. having a replication factor 3 will allow you to handle upto 2 server failure.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse