Kafka and zookeeper dependancies - apache-kafka

My company is about to introduce kafka. However, i was not able to conprehend why either zookeeper or kafka confinguration, does not require to specify one or another existence.
For example, i neither find definition of kafka ip in zookeeper nor in kakfa definition of zookeeper ip in their config.
Can someone explain ?

for kafka server you should have server.properties file. It contains property zookeeper.connect
official documentation: https://kafka.apache.org/documentation/#brokerconfigs

Related

How to retrieve zookeeper host details from Kafka brokerslist

I have list of Brokers for my Kafka cluster. How can I get the zookeeper host using Brokerslist?
If I got your question right you want to register your brokers at a zookeeper cluster. This actually works the other way round: You have to tell each broker where your zookeeper-server (or cluster) can be found. Have a look at the broker configuration setting zookeeper.connect. Together with the broker.id it will register each broker at the zookeeper cluster.
Example:
broker.id=1
zookeeper.connect=zk-host-1:2181,zk-host-2:2181,zk-host-3:2181
Hope that answers your question.
You cannot.
Zookeeper is intended to be abstracted away. There is no such API or method to get Zookeepers connected to a broker.
You'll need to SSH to a broker in that list (which you could do from Java}

What is the actual role of ZooKeeper in Kafka 2.1?

I have seen some similar questions as follows:
www.quora.com/What-is-the-actual-role-of-Zookeeper-in-Kafka-What-benefits-will-I-miss-out-on-if-I-don%E2%80%99t-use-Zookeeper-and-Kafka-together
Is Zookeeper a must for Kafka?
But I want to know the latest information about this question.
What is the actual role of ZooKeeper in Kafka 2.1?
Zookeeper is required to run a Kafka Cluster.
It is used by Kafka brokers to perform elections (controller and topic leaders), to store topic metadata and various other things (ACLs, dynamic broker configs, quotas, Producer Ids)
Since Kafka 0.9, clients don't require access to Zookeeper, only brokers rely on it.

Can you use Consul instead of Zookeeper for Kafka

I am looking at using Kafka but the documentation states that I need to set up Zookeeper. I already have a service discovery set up, I am using Consul. I don't want to have to look after Zookeeper as well.
Can you use Consul instead of Zookeeper to run Kafka? If so is there any documentation on how to do this anywhere?
Apache Kafka currently supports only Apache Zookeeper. It doesn't have any out of the box support for Consul.
This long-standing Kafka issue tracks adding Zookeeper alternatives. It may never happen, so in the meantime, there's the option of running a Zookeeper proxy like zetcd or parkeeper in front of etcd or Consul.
Kafka only uses Zookeeper for broker and topic discovery. Short of adding Consul support to the code base yourself, there is no other alternatives.

Why do we need to mention Zookeeper details even though Apache Kafka configuration file already has it?

I am using Apache Kafka in (Plain Vanilla) Hadoop Cluster for the past few months and out of curiosity I am asking this question. Just to gain additional knowledge about it.
Kafka server.properties file already has the below parameter :
zookeeper.connect=localhost:2181
And I am starting Kafka Server/Broker with the following command :
bin/kafka-server-start.sh config/server.properties
So I assume that Kafka automatically infers the Zookeeper details by the time we start the Kafka server itself. If that's the case, then why do we need to explicitly mention the zookeeper properties while we create Kafka topics the syntax for which is given below for your reference :
bin/kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 1 --partitions 1 --topic test
As per the Kafka documentation we need to start zookeeper before starting Kafka server. So I don't think Kafka can be started by commenting out the zookeeper details in Kafka's server.properties file
But atleast can we use Kafka to create topics and to start Kafka Producer/Consumer without explicitly mentioning about zookeeper in their respective commands ?
The zookeeper.connect parameter in the Kafka properties file is needed for having each Kafka broker in the cluster connecting to the Zookeeper ensemble.
Zookeeper will keep information about connected brokers and handling the controller election. Other than that, it keeps information about topics, quotas and ACL for example.
When you use the kafka-topics.sh tool, the topic creation happens at Zookeeper level first and then thanks to it, information are propagated to Kafka brokers and topic partitions are created and assigned to them (thanks to the elected controller). This connection to Zookeeper will not be needed in the future thanks to the new Admin Client API which provides some admin operations executed against Kafka brokers directly. For example, there is a opened JIRA (https://issues.apache.org/jira/browse/KAFKA-5561) and I'm working on it for having the tool using such API for topic admin operations.
Regarding producer and consumer ... the producer doesn't need to connect to Zookeeper while only the "old" consumer (before 0.9.0 version) needs Zookeeper connection because it saves topic offsets there; from 0.9.0 version, the "new" consumer saves topic offsets in real topics (__consumer_offsets). For using it you have to use the bootstrap-server option on the command line insteand of the zookeeper one.

Kafka bootstrap.servers config value as subset of brokers

For kafka bootstrap.servers config one should provide all the broker details.
If not, how does kafka producer/consumer will resolve all brokers available?
If yes, Do producer/consumer needs restart after adding a node?
The bootstrap.servers config is mandatory for producer and consumer and it must have one broker host:port at least in order to allow producer/consumer to connect to the cluster and then getting all the other metadata (where topics are, who are the leaders and so on).
If you add a node, you don't need to restart producer/consumer but of course the current configuration will be stale. I mean ...
If you have a cluster with broker1 and broker2 and you put them into the bootstrap.servers config then producer/consumer connect to the cluster.
Let's imagine that you add broker3 and some new topics/partitions are deployed there. Your producer/consumer will be able to write/read to/from these new topics/partitions because information about them (so that they are on broker3) will be available through asking metadata to broker1 or broker2.
Of course if you shutdown broker1 and broker2 and your cluster will be just made of broker3, you'll have to restart producer/consumer with the bootstrap.servers config with just broker3.
The bootstrap.servers config is just (as the name implies) a configuration used on starting; then each client will be able to connect directly to a broker (for reading/writing topic/partitions there) even if such broker isn't in the bootstrap.servers config.