Purpose of Zookeeper in Kafka - apache-kafka

As from the latest consumer versions of Kafka, the consumers aren't dependent on ZooKeeper. But "https://kafka.apache.org/" says Kafka requires Zookeeper, so start zookeeper server. why is it so?. Once a topic has been created, even though I terminate Zookeeper it works. So the purpose of Zookeeper is only for creating a Topic? If so why not move creating Topic also to be independent of zookeeper

Kafka topics (still) require Zookeeper for electing a leader, communicating server failure, and storing the list of topics, plus some extra metadata such as replica location and topic configurations.
Kafka Wiki - How does Kafka depend on Zookeeper
Confluent and the Kafka community are trying to move away from the Zookeeper dependency. For example, the Confluent Schema Registry can now use Kafka for leader election. Related blog from Confluent - https://www.confluent.io/blog/how-to-prepare-for-kip-500-kafka-zookeeper-removal-guide/
And in Confluent Cloud, Amazon MSK, and other hosted Kafka offerings, you generally have no access to Zookeeper at all.

The consumers are not dependent on Zookeeper as they are client-side. Likewise with the producers.
Zookeeper is required for the Kafka brokers themselves. Kafka brokers use Zookeeper to co-ordinate and synchronise themselves.

Related

Does a kafka consumer machine need to run zookeeper?

So my question is this: If i have a server running Kafka (And zookeeper), and another machine only consuming messages, does the consumer machine need to run zookeeper too? Or does the server take care of all?
No.
Role of Zookeeper in Kafka is:
Broker registration: (cluster membership) with heartbeats mechanism to keep the list current
Storing topic configuration: which topics exist, how many partitions each
has, where are the replicas, who is the preferred leader, list of ISR for
partitions
Electing controller: The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions.
So Zookeeper is required only for kafka broker. There is no need to have Zookeper on the producer or consumer side.
The consumer does not need zookeeper
You have not mentioned which version of Kafka or the clients you're using.
Kafka consumers using 0.8 store their offsets in Zookeeper, so it is required for them. However, no, you would not run Zookeeper and consumers on the same server
From 0.9 and later, clients are separate from needing it (unless you want to manage external connections to Zookeeper on your own for storing data)

kafka machines in the cluster and kafka communications

We have kafka cluster with 3 kafka brokers nodes and 3 zookeepers servers
kafka version - 10.1 ( hortonworks )
from my understanding since all meta data is located on the zookeeper servers , and kafka brokers are using this data ( kafka talk with zookeeper server via port 2181 )
I just wondering if each kafka machine talk with other kafka in the cluster , or maybe kafka are get/put the data only on/from the zookeepers servers ?
So dose kafka service need to communicate with other kafka in the cluster ? ,
Or maybe kafka machines get all is need only from the zookeepers server ?
Kafka brokers certainly need to communicate with each other, most importantly to replica data. Data produced to Kafka is replicated across brokers for fault-tolerance and data durability. Partition followers send FetchRequests to partition leaders in order to replicate the data.
Additionally, the Controller broker sends a LeaderAndIsr request to brokers whenever a partition leader/follower is changed - that's how it informs brokers to start leading a partition or replicating it.
I would recommend these two introductory articles of mine in order to help you get more context:
https://hackernoon.com/thorough-introduction-to-apache-kafka-6fbf2989bbc1
https://hackernoon.com/apache-kafkas-distributed-system-firefighter-the-controller-broker-1afca1eae302

What is the actual role of ZooKeeper in Kafka 2.1?

I have seen some similar questions as follows:
www.quora.com/What-is-the-actual-role-of-Zookeeper-in-Kafka-What-benefits-will-I-miss-out-on-if-I-don%E2%80%99t-use-Zookeeper-and-Kafka-together
Is Zookeeper a must for Kafka?
But I want to know the latest information about this question.
What is the actual role of ZooKeeper in Kafka 2.1?
Zookeeper is required to run a Kafka Cluster.
It is used by Kafka brokers to perform elections (controller and topic leaders), to store topic metadata and various other things (ACLs, dynamic broker configs, quotas, Producer Ids)
Since Kafka 0.9, clients don't require access to Zookeeper, only brokers rely on it.

Kafka - consumers / producers works with all Zookeper instances down

I've configured a cluster of Kafka brokers and a cluster of Zk instances using kafka_2.11-1.1.0 distribution archive.
For Kafka brokers I've configured config/server.properties
broker.id=1,2,3
zookeeper.connect=box1:2181,box2:2181,box3:2181
For Zk instances I've configured config/zookeeper.properties:
server.1=box1:2888:3888
server.2=box3:2888:3888
server.3=box3:2888:3888
I've created a basic producer and a basic consumer and I don't know why I am able to write messages / read messages even if I shut down all the Zookeeper
instances and have all the Kafka brokers up and running.
Even booting up new consumers, producers works without any issue.
I thought having a quorum of Zk instances is a vital point for a Kafka cluster.
For both consumer and producer, I've used following configuration:
bootrapServers=box1:9092,box2:9092,box3:9092
Thanks
I thought having a quorum of Zk instances is a vital point for a Kafka cluster.
Zookeeper quorum is vital for managing partition lists, leaders, etc. In general, ZK is necessary for management that is done by the cluster coordinator in the cluster.
Basically, right now (with ZK down), you cannot modify topics (as the partition metadata is stored in ZK), start up / shut down brokers (as they use ZK for discovery) and other similar operations.
Even booting up new consumers, producers works without any issue.
Producer/consumer operations reach out to brokers only. The broker instance can still append to the log, and can still communicate with other brokers to have replication. So it is possible to send a message, get it received by broker and saved to disk, with other brokers replicating (as they are continuously sending fetch requests to the leader (and they know who this partition's leader is because they saved that data when ZK was still running)).

Kafka consumer api (no zookeeper configuration)

I am using Kafka client library comes with Kafka 0.11.0.1. I noticed that using kafkaconsumer does not need to configure zookeeper anymore. Does that mean zookeep server will automatically be located by the kafka bootstrap server?
Since Kafka 0.9 the KafkaConsumer implementation stores offsets commit and consumer group information in Kafka brokers themselves. This eliminates the zookeeper dependency and increases the scalability of the consumers.