Differences between zookeeper-server-start.sh and kafka-server-start.sh - apache-kafka

Is one of them more recommended/preferred to use than the other?

Kafka uses Zookeeper so you must start a Zookeeper server before starting the Kafka broker. Zookeeper and Kafka broker are two distinct things and both of them are required in order to run a Kafka Cluster. Kafka is a distributed system and is built to use Zookeeper which is responsible for controller election, topic configuration, clustering etc.
In order to run Zookeeper you need to set the parameters in the configuration file config/zookeeper.properties and then start the ZK server using
bin/zookeeper-server-start.sh config/zookeeper.properties
Then you need to run at least one Kafka broker which can be configured in config/server.properties file and then start it using
bin/kafka-server-start.sh config/server.properties

Zookeeper-server-start.sh is to start your zookeeper server which by default runs on port 2181.
To use kafka brokers, topics and partition you need to have your zookeeper server running, zookeeper works as manager for kafka brokers.
Kafka-server-start.sh is to start your kafka broker.
Zookeeper-server-start.shtakes a
zookeeper.propertiesfile for the configuration
Kafka-server-start takes a Kafka
server.properties file for configuration

Related

How to connect kafka producer and consumer to a machine that is running kafka and zookeeper

I have a ubuntu machine that is having kafka and zookeepr installed in it, I am using spring boot for making the consumer and producer, locally the process works, however, when the deploy the producer and consumer jar to another machine it doesn't work
Kafka defaults to only listen locally.
You need to set these in Kafka's server.properties
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://<external-ip>:9092
https://www.confluent.io/blog/kafka-listeners-explained/
Then, obviously, don't use localhost:9092 in your remote client code.
You should never need Zookeeper connection details. Besides, as of Kafka 3.3.1, Zookeeper isn't required at all.

how zookeper talk with kafka to know kafka is up

we have with 3 kafka machine and 3 zookeper servers
while kafka machines are not co-hosted with zookeper server ( kafka are on different machines , OS is redhat version 7.x )
in order to get the brokers id , we do the following on the zookeper servers
cd /usr/hdp/current/zookeeper-server/bin
./zkCli.sh
ls /brokers/ids
results should be the three brokers id's as
1011 1012 1013
my question is - in which way zookeper know that broker is up?
or to be more specific
which cli zookeper execute in order to identify that kafka broker is up ?
Zookeeper is basically a distributed key-value store. Upon startup, a Kafka broker connects to Zookeeper (using the zookeeper.connect setting) and create a znode (a key-value pair) with its own broker.id under /brokers/ids. Kafka brokers then stay connected to Zookeeper while they are running.
The znode is created as "Ephemeral" (this is a feature of Zookeeper). It means that Zookeeper will delete it if the broker disconnects.
This way, Zookeeper knows at any time which brokers are alive (it does not necessarily mean the broker is healthy!). This is used by brokers to discover the other brokers in a cluster.

Kafka Offsets.storage

What is the "offsets.storage" for kafka 0.10.1.1?
As per the documentation it shows up under Old Consumer Configs as "zookeeper".
offsets.storage zookeeper Select where offsets should be stored (zookeeper or kafka).
My consumer is spring-boot-1.5.13 RELEASE app which uses kafka-clients-0.10.1.1 internally. As per the source code ConsumerConfig.scala, offsetStorage is "zookeeper", but when I run the consumer, I see the "__consumer_offsets" are getting created under /tmp/kafka-logs directory which is defined in server.properties [i.e. broker];
Moreover it doesn't show up under zookeeper ephemeral nodes, when I check with zookeeper-shell.sh.
ls /consumers
[]
If the offsets.stroage is zookeeper, then why does the __consumer_offsets show up under /tmp/kafka-logs and doesn't show up in zookeeper ephemeral nodes?
Spring Kafka uses the "new" consumer (Java) not the old scala consumer.

Why do we need to mention Zookeeper details even though Apache Kafka configuration file already has it?

I am using Apache Kafka in (Plain Vanilla) Hadoop Cluster for the past few months and out of curiosity I am asking this question. Just to gain additional knowledge about it.
Kafka server.properties file already has the below parameter :
zookeeper.connect=localhost:2181
And I am starting Kafka Server/Broker with the following command :
bin/kafka-server-start.sh config/server.properties
So I assume that Kafka automatically infers the Zookeeper details by the time we start the Kafka server itself. If that's the case, then why do we need to explicitly mention the zookeeper properties while we create Kafka topics the syntax for which is given below for your reference :
bin/kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 1 --partitions 1 --topic test
As per the Kafka documentation we need to start zookeeper before starting Kafka server. So I don't think Kafka can be started by commenting out the zookeeper details in Kafka's server.properties file
But atleast can we use Kafka to create topics and to start Kafka Producer/Consumer without explicitly mentioning about zookeeper in their respective commands ?
The zookeeper.connect parameter in the Kafka properties file is needed for having each Kafka broker in the cluster connecting to the Zookeeper ensemble.
Zookeeper will keep information about connected brokers and handling the controller election. Other than that, it keeps information about topics, quotas and ACL for example.
When you use the kafka-topics.sh tool, the topic creation happens at Zookeeper level first and then thanks to it, information are propagated to Kafka brokers and topic partitions are created and assigned to them (thanks to the elected controller). This connection to Zookeeper will not be needed in the future thanks to the new Admin Client API which provides some admin operations executed against Kafka brokers directly. For example, there is a opened JIRA (https://issues.apache.org/jira/browse/KAFKA-5561) and I'm working on it for having the tool using such API for topic admin operations.
Regarding producer and consumer ... the producer doesn't need to connect to Zookeeper while only the "old" consumer (before 0.9.0 version) needs Zookeeper connection because it saves topic offsets there; from 0.9.0 version, the "new" consumer saves topic offsets in real topics (__consumer_offsets). For using it you have to use the bootstrap-server option on the command line insteand of the zookeeper one.

Kafka Server daemon to be launched on every broker?

I have 3 node Zookeeper cluster and a 10 node Kafka cluster. After launching Zookeeper daemon on 3 nodes, I then proceed to launch Kafka daemon by using the command
./kafka-server-start.sh "config/server.properties"
And my server.properties is correctly configured containing the proper Zookeeper connection string eg:
zookeeper.connect=192.168.140.23:2181,192.168.140.24:2181,192.168.140.25:2181
My question is do I need to start the Kafka daemon on all of the 10 broker nodes using ./kafka-server-start.sh "config/server.properties" or starting it on just one of them will suffice ?
For reference:
producers.properties
metadata.broker.list=192.168.140.23:9092,192.168.140.24:9092,192.168.140.25:9092,192.168.140.26:9092,192.168.140.27:9092,192.168.140.11:9092,192.168.140.12:9092,192.168.140.13:9092,192.168.140.14:9092
consumer.properties
zookeeper.connect=192.168.140.23:2181,192.168.140.24:2181,192.168.140.25:2181
You have to start all the Kafka servers on those 10 nodes by issuing "./kafka-server-start.sh ...". An automatic tool might be a good way to do this.