What should be the expected behavior of Consumer when Kafka Broker or Zookeeper goes down? - apache-kafka

I have a local test setup for Kafka and Zookeeper.
Here are the details.
OS: Windows
Kafka: 2.0.1 (Single Node Single Broker)
According to these Stackoverflow threads, Consumers should go down if the zookeeper or broker goes down.
What happens if Zookeeper fails completely?
Kafka cluster zookeeper failure handling
Even if the Zookeeper is down, Producer(Conduktor) and Consumer(Node js) are Working fine. The consumer is still registered.
What should be the expected behavior of consumers?

Related

kafka + what could be the reasons for kafka broker isn't the leader for topic partition

we have HDP cluster - 2.6.4 with ambari 2.6.1 version
we have 3 kafka brokers with version 10.1 , and 3 zookeeper servers
we saw in the /var/log/kafka/server.log many errors messages as :
in this example we have 6601 errors lines about:
This server is not the leader for that topic-partition
example
[2019-01-06 14:56:53,312] ERROR [ReplicaFetcherThread-0-1011], Error for partition [topic1-example,34] to broker 1011:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
we check the connectivity's between the kafka brokers and connectivity seems to be ok ( we verify the /var/log/messages and dmesg on the Linux kafka machines )
we are also suspect is the connections between the zookeeper client on kafka brokers to the zookeepers servers
but we not know how to check the relationship between client on kafka to zookeeper servers
we also know that kafka send heartbeat to zookeeper servers ( I think the heartbeat value is 2 seconds ) but we not sure if this is the right direction to search what cause the leader to disappears
any ideas what are the reasons that - kafka broker isn't the leader for topic partition ?
other related links
https://jar-download.com/artifacts/org.apache.kafka/kafka-clients/0.10.2.0/source-code/org/apache/kafka/common/protocol/Errors.java
kafka : one broker keeping print INFO log : "NOT_LEADER_FOR_PARTITION"
https://github.com/SOHU-Co/kafka-node/issues/297

Does kafka client connect to zookeeper or is it behind the scene

Kafka client code directly refers to the broker ip and port and in case if it is down will zookeeper direct to another broker. is zookeper always behind the scene
In the case you provide only one broker address in the client code, and it goes down, plus your client restarts, then your client will also be down. Zookeeper will not be used here because the broker will not be reachable.
If you give more than one broker address in the client, then it's more resilient in that the Kafka Controller process periodically fetches a list of all alive brokers in the cluster from Zookeeper and is responsible for sending that information back to the clients via the leader of the partitions they get assigned. Zookeeper is indirectly used here, but does not communicate with any external clients
If I got the question in the right way the answer is no.
The Kafka clients need connection only to Kafka brokers and Zookeeper isn't involved at all. Clients needs to write/read leader partitions on brokers.
If the Kafka brokers set in the brokers list aren't available, the clients can connect and cannot start to send/receive messages.
Only in the old version 0.8.0 the Zookeeper was involved for consumers which saved offset on Zookeeper. Starting from 0.9.0, the consumers save offset in Kafka topics so Zookeeper isn't needed anymore.

Kafka - consumers / producers works with all Zookeper instances down

I've configured a cluster of Kafka brokers and a cluster of Zk instances using kafka_2.11-1.1.0 distribution archive.
For Kafka brokers I've configured config/server.properties
broker.id=1,2,3
zookeeper.connect=box1:2181,box2:2181,box3:2181
For Zk instances I've configured config/zookeeper.properties:
server.1=box1:2888:3888
server.2=box3:2888:3888
server.3=box3:2888:3888
I've created a basic producer and a basic consumer and I don't know why I am able to write messages / read messages even if I shut down all the Zookeeper
instances and have all the Kafka brokers up and running.
Even booting up new consumers, producers works without any issue.
I thought having a quorum of Zk instances is a vital point for a Kafka cluster.
For both consumer and producer, I've used following configuration:
bootrapServers=box1:9092,box2:9092,box3:9092
Thanks
I thought having a quorum of Zk instances is a vital point for a Kafka cluster.
Zookeeper quorum is vital for managing partition lists, leaders, etc. In general, ZK is necessary for management that is done by the cluster coordinator in the cluster.
Basically, right now (with ZK down), you cannot modify topics (as the partition metadata is stored in ZK), start up / shut down brokers (as they use ZK for discovery) and other similar operations.
Even booting up new consumers, producers works without any issue.
Producer/consumer operations reach out to brokers only. The broker instance can still append to the log, and can still communicate with other brokers to have replication. So it is possible to send a message, get it received by broker and saved to disk, with other brokers replicating (as they are continuously sending fetch requests to the leader (and they know who this partition's leader is because they saved that data when ZK was still running)).

Purpose of Zookeeper in Kafka

As from the latest consumer versions of Kafka, the consumers aren't dependent on ZooKeeper. But "https://kafka.apache.org/" says Kafka requires Zookeeper, so start zookeeper server. why is it so?. Once a topic has been created, even though I terminate Zookeeper it works. So the purpose of Zookeeper is only for creating a Topic? If so why not move creating Topic also to be independent of zookeeper
Kafka topics (still) require Zookeeper for electing a leader, communicating server failure, and storing the list of topics, plus some extra metadata such as replica location and topic configurations.
Kafka Wiki - How does Kafka depend on Zookeeper
Confluent and the Kafka community are trying to move away from the Zookeeper dependency. For example, the Confluent Schema Registry can now use Kafka for leader election. Related blog from Confluent - https://www.confluent.io/blog/how-to-prepare-for-kip-500-kafka-zookeeper-removal-guide/
And in Confluent Cloud, Amazon MSK, and other hosted Kafka offerings, you generally have no access to Zookeeper at all.
The consumers are not dependent on Zookeeper as they are client-side. Likewise with the producers.
Zookeeper is required for the Kafka brokers themselves. Kafka brokers use Zookeeper to co-ordinate and synchronise themselves.

What happens if Zookeeper fails completely?

we have setup a Kafka/Zookeeper Cluster consisting of 3 Brokers. We have one producer, sending messages to one specific Kafka topic and a few consumer groups reading from said topic. Those consumers perform a leader election via Zookeeper for themselves (independent from Kafka).
The versions used are:
Kafka: 0.9.0.1
Zookeeper: 3.4.6 (included in the Kafka-Package)
All processes are managed by Supervisor. So far, everything works just fine. What we tried now (for testing purposes) was to simply kill off all Zookeeper processes and see what happens.
As we expected, our consumer processes couldn't connect to Zookeeper anymore. But unexpectedly, the Kafka Brokers still worked. Our producer didn't complain at all and was still able to write into the topic. While I couldn't use kafka/bin/kafka-topics.sh or similar, since they all require a zookeeper-parameter, I could still see the actual size of the topic-log grow. After restarting the zookeeper processes, everything again worked just like before.
What we couldn't figure out is now... what actually happened there?
We thought, Kafka would require a working Zookeeper-Connection and we couldn't find any explanation for this behaviour online.
When you have one node of zookeeper, broker will not be able to contact zookeeper, after broker discovers zookeeper is not reachable, broker also will become unreachable. Hence the producer and consumer.
In case of producer it starts dropping(reject the record). In case of consumer it can happen that, the read record which is not ack'ed may end up processing again when broker is up and ready...
in case of 3node zk one node failure is acceptable as quorum is still satisfied... but cant afford the 2node failures which will lead to the above consequences...