Why do we need to mention Zookeeper details even though Apache Kafka configuration file already has it? - apache-kafka

I am using Apache Kafka in (Plain Vanilla) Hadoop Cluster for the past few months and out of curiosity I am asking this question. Just to gain additional knowledge about it.
Kafka server.properties file already has the below parameter :
zookeeper.connect=localhost:2181
And I am starting Kafka Server/Broker with the following command :
bin/kafka-server-start.sh config/server.properties
So I assume that Kafka automatically infers the Zookeeper details by the time we start the Kafka server itself. If that's the case, then why do we need to explicitly mention the zookeeper properties while we create Kafka topics the syntax for which is given below for your reference :
bin/kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 1 --partitions 1 --topic test
As per the Kafka documentation we need to start zookeeper before starting Kafka server. So I don't think Kafka can be started by commenting out the zookeeper details in Kafka's server.properties file
But atleast can we use Kafka to create topics and to start Kafka Producer/Consumer without explicitly mentioning about zookeeper in their respective commands ?

The zookeeper.connect parameter in the Kafka properties file is needed for having each Kafka broker in the cluster connecting to the Zookeeper ensemble.
Zookeeper will keep information about connected brokers and handling the controller election. Other than that, it keeps information about topics, quotas and ACL for example.
When you use the kafka-topics.sh tool, the topic creation happens at Zookeeper level first and then thanks to it, information are propagated to Kafka brokers and topic partitions are created and assigned to them (thanks to the elected controller). This connection to Zookeeper will not be needed in the future thanks to the new Admin Client API which provides some admin operations executed against Kafka brokers directly. For example, there is a opened JIRA (https://issues.apache.org/jira/browse/KAFKA-5561) and I'm working on it for having the tool using such API for topic admin operations.
Regarding producer and consumer ... the producer doesn't need to connect to Zookeeper while only the "old" consumer (before 0.9.0 version) needs Zookeeper connection because it saves topic offsets there; from 0.9.0 version, the "new" consumer saves topic offsets in real topics (__consumer_offsets). For using it you have to use the bootstrap-server option on the command line insteand of the zookeeper one.

Related

Can i have kafka connect service as a standalone service

I am using apache kafka in 1st server and apache zookeeper in 2nd server.
I want to have kafka connect service in other server.Is it possible to use a standalone service. I need to have apache kafka connect or confluent kafka connect
There is no such thing as "Confluent Kafka (Connect)"
Kafka Connect is Apache 2.0 Licensed and released as part of Apache Kafka. Confluent and other vendors write plugins (free, or enterprise-licensed) for Kafka Connect.
Yes, it it recommended to run Connect as a separate set of servers than either then brokers or zookeepers. In order to run it, you will need to download all of Kafka, then use bin/connect-distributed, or you can run it via Docker containers.
You can easily run a Kafka Connect Standalone (Single Process) service from any server, provided you have configured both connector and workers properties correctly.
A gist of it here, if you are interested.
In standalone mode all work is performed in a single process. It is easy to setup and get started but it does not benefit from some of the features of Kafka Connect such as fault tolerance.
You can start a standalone process with the following command:
bin/connect-standalone.sh config/connect-standalone.properties connector1.properties connector2.properties
The first parameter is the configuration for the worker, it includes connection parameters, serialization format, and how frequently to commit offsets.
All workers (both standalone and distributed) require a few configs:
bootstrap.servers - List of Kafka servers used to bootstrap connections to Kafka
Key/Value.converter - Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka.
offset.storage.file.filename - File to store offset data
A Simple standalone connecter (Import Data from a file into KAFKA)
kafka-topics --create --zookeeper localhost:2181 --replication-factor 2 --partitions 10 --topic connect-test to create a topic called connect-test with 10 partitions (Up to us) and replication factor of 2
To start a standalone Kafka Connector, we need following three configuration files located under C:\kafka_2.11-1.1.0\config. Update these configuration files as following
connect-standalone.properties
offset.storage.file.filename=C:/kafka_2.11-1.1.0/connectConfig/connect.offsets
connect-file-source.properties
file=C:/kafka/Workspace/kafka.txt
connect-file-sink.properties
file=test.sink.txt
Execute the following command
Connect-standalone.bat C:\kafka_2.11-1.1.0\config\connect-standalone.properties C:\kafka_2.11-1.1.0\config\connect-file-source.properties C:\kafka_2.11-1.1.0\config\connect-file-sink.properties
Once the Connector is started, initially the data in kafka.txt would be synced to test.sync.txt and the data is published to the Kafka Topic named, connect-test. Then any changes to the kafka.txt file would be synced to test.sync.txt and published to connect-test topic.
Create a Consumer
kafka-console-consumer.bat --bootstrap-server localhost:9096 --topic connect-test --from-beginning
CLI Output
kafka-console-consumer --bootstrap-server localhost:9096 --topic connect-test --from-beginning
{"schema":{"type":"string","optional":false},"payload":"This is the stream data for the KAFKA Connector"}
Add a new line into “Consuming the events now” into the kafka.txt
CLI Output
kafka-console-consumer --bootstrap-server localhost:9096 --topic connect-test --from-beginning
{"schema":{"type":"string","optional":false},"payload":"This is the stream data for the KAFKA Connector"}
{"schema":{"type":"string","optional":false},"payload":"Consuming the events now"}

Kafka Topic Creation with --bootstrap-server gives timeout Exception (kafka version 2.5)

When trying to create topic using --bootstrap-server,
I am getting exception "Error while executing Kafka topic command: Timed out waiting for a node" :-
kafka-topics --bootstrap-server localhost:9092 --topic boottopic --replication-factor 3 --partitions
However following works fine, using --zookeeper :-
kafka-topics --zookeeper localhost:2181--topic boottopic --replication-factor 3 --partitions
I am using Kafka version 2.5 and as per knowledge since version >2.2, all the offsets and metadata are stored on the broker itself. So, while creating topic there's no need to connect to zookeeper.
Please help to understand this behaviour
Note - I have set up a Zookeeper quorum and Kafka broker cluster each containing 3 instance on a single machine (for dev purposes)
Old question, but Ill answer anyways for the sake of internet wisdom.
You probably have auth set, when using --bootstrap-server you need to also specify your credentials with --command-config
since version >2.2, all the ... metadata are stored on the broker itself
False. Topic metadata is still stored on Zookeeper until KIP-500 is completed.
The AdminClient.createTopics() method, however that is used internally will delegate to Zookeeper from the Controller broker node in the cluster.
Hard to say what the error is, but most common issue is that Kafka is not running, you have SSL enabled and the certs are wrong, or the listeners are misconfigured.
For example, in the listeners, the default broker port on a Cloudera Kafka installation would be 6667, not 9092
each containing 3 instance on a single machine
Running 3 instances on one machine does not improve resiliency or performance unless you have 3 CPUs and 3 separate HDDs on that one motherboard.
"Error while executing Kafka topic command: Timed out waiting for a
node"
This seems like your broker is down or is inaccessible from where you are running those commands or it hasn't started yet (perhaps still starting).
Sometimes the broker startup takes long because it performs some cleaning operations. You may want to check your Kafka broker startup logs and see if it is ready and then try creating the topics by giving in the bootstrap servers.
There could also be some errors during your Kafka broker startup like Too many open files or wrong zookeeper url, zookeeper not being accessible by your broker, to name a few.
If you are able to create topics by passing in your Zookeeper URL means that zookeeper is up but does not necessarily mean that your Kafka broker(s) are also up and running.
Since a zookeeper can start without a broker but not vice-versa.

Kafka topics not created empty

I have a Kafka cluster consisting on 3 servers all connected through Zookeeper. But when I delete a topic that has some information and create the topic again with the same name, the offset does not start from zero.
I tried restarting both Kafka and Zookeeper and deleting the topics directly from Zookeeper.
What I expect is to have a clean topic When I create it again.
I found the problem. A consumer was consuming from the topic and the topic was never actually deleted. I used this tool to have a GUI that allowed me to see the topics easily https://github.com/tchiotludo/kafkahq. Anyway, the consumers can be seen running this:
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092

How to delete kafka topic from cluster version : 0.10.2.1

I am not able to delete kafka topic, Its marked for deletion but never gets deleted. Iam running kafka cluster with zookeeper cluster.
version of kafka : 0.10.2.1
Can anyone help me , with the list of steps that one needs to follow in order to delete a topic in kafka cluster.
Went through various queries in stackoverflow but could not find a valid workable answer.
You should have enabled its property at config before starting kafka server; it is disabled at default. To enable deletion property first stop kafka server and then open the server.properties in config file
and then uncomment #delete.topic.enable=true or add
delete.topic.enable=true
at the end of the file.
Now you can start kafka server and then you can delete any topic you want via:
bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic YOUR_TOPIC_NAME.
You could use Kafka Tool
Download link here
Then connect to your kafka server .
After that you could see the available topics in that server. From there you can select and delete the topics .

Kafka bootstrap-servers vs zookeeper in kafka-console-consumer

I'm trying to test run a single Kafka node with 3 brokers & zookeeper. I wish to test using the console tools. I run the producer as such:
kafka-console-producer --broker-list localhost:9092,localhost:9093,localhost:9094 --topic testTopic
Then I run the consumer as such:
kafka-console-consumer --zookeeper localhost:2181 --topic testTopic --from-beginning
And I can enter messages in the producer and see them in the consumer, as expected. However, when I run the updated version of the consumer using bootstrap-server, I get nothing. E.g
kafka-console-consumer --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --topic testTopic --from-beginning
This worked fine when I had one broker running on port 9092 so I'm thoroughly confused. Is there a way I can see what zookeeper is providing as the bootstrap server? Is the bootstrap server different from the broker list? Kafka compiled with Scala 2.11.
I have no idea what was wrong. Likely I put Kafka or Zookeeper in a weird state. After deleting the topics in the log.dir of each broker AND the zookeeper topics in /brokers/topics then recreating the topic, Kafka consumer behaved as expected.
Bootstrap servers are same as kafka brokers. And if you want to see the list of bootstrap server zookeeper is providing, you can query ZNode information via any ZK client. All active brokers are registered under /brokers/ids/[brokerId]. All you need is zkQuorum address. Below command will give you
list of active bootstrap servers :
./zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids"
I experienced the same problem when using mismatched versions of:
Kafka client libraries
Kafka scripts
Kafka brokers
In my exact scenario I was using Confluent Kafka client libraries version 0.10.2.1 with Confluent Platform 3.3.0 w/ Kafka broker 0.11.0.0. When I downgraded my Confluent Platform to 3.3.2 which matched my client libraries the consumer worked as expected.
My theory is that the latest kafka-console-consumer using the new Consumer API was only retrieving messages using the latest format. There were a number of message format changes introduced in Kafka 0.11.0.0.