Can i have kafka connect service as a standalone service - apache-kafka

I am using apache kafka in 1st server and apache zookeeper in 2nd server.
I want to have kafka connect service in other server.Is it possible to use a standalone service. I need to have apache kafka connect or confluent kafka connect

There is no such thing as "Confluent Kafka (Connect)"
Kafka Connect is Apache 2.0 Licensed and released as part of Apache Kafka. Confluent and other vendors write plugins (free, or enterprise-licensed) for Kafka Connect.
Yes, it it recommended to run Connect as a separate set of servers than either then brokers or zookeepers. In order to run it, you will need to download all of Kafka, then use bin/connect-distributed, or you can run it via Docker containers.

You can easily run a Kafka Connect Standalone (Single Process) service from any server, provided you have configured both connector and workers properties correctly.
A gist of it here, if you are interested.
In standalone mode all work is performed in a single process. It is easy to setup and get started but it does not benefit from some of the features of Kafka Connect such as fault tolerance.
You can start a standalone process with the following command:
bin/connect-standalone.sh config/connect-standalone.properties connector1.properties connector2.properties
The first parameter is the configuration for the worker, it includes connection parameters, serialization format, and how frequently to commit offsets.
All workers (both standalone and distributed) require a few configs:
bootstrap.servers - List of Kafka servers used to bootstrap connections to Kafka
Key/Value.converter - Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka.
offset.storage.file.filename - File to store offset data
A Simple standalone connecter (Import Data from a file into KAFKA)
kafka-topics --create --zookeeper localhost:2181 --replication-factor 2 --partitions 10 --topic connect-test to create a topic called connect-test with 10 partitions (Up to us) and replication factor of 2
To start a standalone Kafka Connector, we need following three configuration files located under C:\kafka_2.11-1.1.0\config. Update these configuration files as following
connect-standalone.properties
offset.storage.file.filename=C:/kafka_2.11-1.1.0/connectConfig/connect.offsets
connect-file-source.properties
file=C:/kafka/Workspace/kafka.txt
connect-file-sink.properties
file=test.sink.txt
Execute the following command
Connect-standalone.bat C:\kafka_2.11-1.1.0\config\connect-standalone.properties C:\kafka_2.11-1.1.0\config\connect-file-source.properties C:\kafka_2.11-1.1.0\config\connect-file-sink.properties
Once the Connector is started, initially the data in kafka.txt would be synced to test.sync.txt and the data is published to the Kafka Topic named, connect-test. Then any changes to the kafka.txt file would be synced to test.sync.txt and published to connect-test topic.
Create a Consumer
kafka-console-consumer.bat --bootstrap-server localhost:9096 --topic connect-test --from-beginning
CLI Output
kafka-console-consumer --bootstrap-server localhost:9096 --topic connect-test --from-beginning
{"schema":{"type":"string","optional":false},"payload":"This is the stream data for the KAFKA Connector"}
Add a new line into “Consuming the events now” into the kafka.txt
CLI Output
kafka-console-consumer --bootstrap-server localhost:9096 --topic connect-test --from-beginning
{"schema":{"type":"string","optional":false},"payload":"This is the stream data for the KAFKA Connector"}
{"schema":{"type":"string","optional":false},"payload":"Consuming the events now"}

Related

Kafka Topic Creation with --bootstrap-server gives timeout Exception (kafka version 2.5)

When trying to create topic using --bootstrap-server,
I am getting exception "Error while executing Kafka topic command: Timed out waiting for a node" :-
kafka-topics --bootstrap-server localhost:9092 --topic boottopic --replication-factor 3 --partitions
However following works fine, using --zookeeper :-
kafka-topics --zookeeper localhost:2181--topic boottopic --replication-factor 3 --partitions
I am using Kafka version 2.5 and as per knowledge since version >2.2, all the offsets and metadata are stored on the broker itself. So, while creating topic there's no need to connect to zookeeper.
Please help to understand this behaviour
Note - I have set up a Zookeeper quorum and Kafka broker cluster each containing 3 instance on a single machine (for dev purposes)
Old question, but Ill answer anyways for the sake of internet wisdom.
You probably have auth set, when using --bootstrap-server you need to also specify your credentials with --command-config
since version >2.2, all the ... metadata are stored on the broker itself
False. Topic metadata is still stored on Zookeeper until KIP-500 is completed.
The AdminClient.createTopics() method, however that is used internally will delegate to Zookeeper from the Controller broker node in the cluster.
Hard to say what the error is, but most common issue is that Kafka is not running, you have SSL enabled and the certs are wrong, or the listeners are misconfigured.
For example, in the listeners, the default broker port on a Cloudera Kafka installation would be 6667, not 9092
each containing 3 instance on a single machine
Running 3 instances on one machine does not improve resiliency or performance unless you have 3 CPUs and 3 separate HDDs on that one motherboard.
"Error while executing Kafka topic command: Timed out waiting for a
node"
This seems like your broker is down or is inaccessible from where you are running those commands or it hasn't started yet (perhaps still starting).
Sometimes the broker startup takes long because it performs some cleaning operations. You may want to check your Kafka broker startup logs and see if it is ready and then try creating the topics by giving in the bootstrap servers.
There could also be some errors during your Kafka broker startup like Too many open files or wrong zookeeper url, zookeeper not being accessible by your broker, to name a few.
If you are able to create topics by passing in your Zookeeper URL means that zookeeper is up but does not necessarily mean that your Kafka broker(s) are also up and running.
Since a zookeeper can start without a broker but not vice-versa.

Why do we need to mention Zookeeper details even though Apache Kafka configuration file already has it?

I am using Apache Kafka in (Plain Vanilla) Hadoop Cluster for the past few months and out of curiosity I am asking this question. Just to gain additional knowledge about it.
Kafka server.properties file already has the below parameter :
zookeeper.connect=localhost:2181
And I am starting Kafka Server/Broker with the following command :
bin/kafka-server-start.sh config/server.properties
So I assume that Kafka automatically infers the Zookeeper details by the time we start the Kafka server itself. If that's the case, then why do we need to explicitly mention the zookeeper properties while we create Kafka topics the syntax for which is given below for your reference :
bin/kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 1 --partitions 1 --topic test
As per the Kafka documentation we need to start zookeeper before starting Kafka server. So I don't think Kafka can be started by commenting out the zookeeper details in Kafka's server.properties file
But atleast can we use Kafka to create topics and to start Kafka Producer/Consumer without explicitly mentioning about zookeeper in their respective commands ?
The zookeeper.connect parameter in the Kafka properties file is needed for having each Kafka broker in the cluster connecting to the Zookeeper ensemble.
Zookeeper will keep information about connected brokers and handling the controller election. Other than that, it keeps information about topics, quotas and ACL for example.
When you use the kafka-topics.sh tool, the topic creation happens at Zookeeper level first and then thanks to it, information are propagated to Kafka brokers and topic partitions are created and assigned to them (thanks to the elected controller). This connection to Zookeeper will not be needed in the future thanks to the new Admin Client API which provides some admin operations executed against Kafka brokers directly. For example, there is a opened JIRA (https://issues.apache.org/jira/browse/KAFKA-5561) and I'm working on it for having the tool using such API for topic admin operations.
Regarding producer and consumer ... the producer doesn't need to connect to Zookeeper while only the "old" consumer (before 0.9.0 version) needs Zookeeper connection because it saves topic offsets there; from 0.9.0 version, the "new" consumer saves topic offsets in real topics (__consumer_offsets). For using it you have to use the bootstrap-server option on the command line insteand of the zookeeper one.

Consuming and Producing Kafka messages on different servers

How can I produce and consume messages from different servers?
I tried the Quickstart tutorial, but there is no instructions on how to setup for multi server clusters.
My Steps
Server A
1)bin/zookeeper-server-start.sh config/zookeeper.properties
2)bin/kafka-server-start.sh config/server.properties
3)bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor
1 --partitions 1 --topic test
4)bin/kafka-console-producer.sh --broker-list SERVER-a.IP:9092 --topic test
Server B
1A)bin/kafka-console-consumer.sh --bootstrap-server SERVER-a.IP:9092 --topic
test --from-beginning
1B)bin/kafka-console-consumer.sh --bootstrap-server SERVER-a.IP:2181 --topic
test --from-beginning
When I run 1A) consumer and enter messages into the producer, there is no messages appearing in the consumer. Its just blank.
When I run 1B consumer instead, I get a huge & very fast stream of error logs in Server A until I Ctrl+C the consumer. See below
Error log on Server A streaming at hundreds per second
WARN Exception causing close of session 0x0 due to java.io.EOFException (org.apache.zookeeper.server.NIOServerCnxn)
O Closed socket connection for client /188.166.178.40:51168 (no session established for client) (org.apache.zookeeper.server.NIOServerCnxn)
Thanks
Yes, if you want to have your producer on Server A and your consumer on server B, you are in the right direction.
You need to run a Broker on server A to make it work.
bin/kafka-server-start.sh config/server.properties
The other commands are correct.
If anyone is looking for a similar topic for kafka-steams application, it appears that multiple kafka cluster is not supported yet:
Here is a documentation from kafka: https://kafka.apache.org/10/documentation/streams/developer-guide/config-streams.html#bootstrap-servers
bootstrap.servers
(Required) The Kafka bootstrap servers. This is the same setting that is used by the underlying producer and consumer clients to connect to the Kafka cluster. Example: "kafka-broker1:9092,kafka-broker2:9092".
Tip:
Kafka Streams applications can only communicate with a single Kafka
cluster specified by this config value. Future versions of Kafka
Streams will support connecting to different Kafka clusters for
reading input streams and writing output streams.

Kafka bootstrap-servers vs zookeeper in kafka-console-consumer

I'm trying to test run a single Kafka node with 3 brokers & zookeeper. I wish to test using the console tools. I run the producer as such:
kafka-console-producer --broker-list localhost:9092,localhost:9093,localhost:9094 --topic testTopic
Then I run the consumer as such:
kafka-console-consumer --zookeeper localhost:2181 --topic testTopic --from-beginning
And I can enter messages in the producer and see them in the consumer, as expected. However, when I run the updated version of the consumer using bootstrap-server, I get nothing. E.g
kafka-console-consumer --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --topic testTopic --from-beginning
This worked fine when I had one broker running on port 9092 so I'm thoroughly confused. Is there a way I can see what zookeeper is providing as the bootstrap server? Is the bootstrap server different from the broker list? Kafka compiled with Scala 2.11.
I have no idea what was wrong. Likely I put Kafka or Zookeeper in a weird state. After deleting the topics in the log.dir of each broker AND the zookeeper topics in /brokers/topics then recreating the topic, Kafka consumer behaved as expected.
Bootstrap servers are same as kafka brokers. And if you want to see the list of bootstrap server zookeeper is providing, you can query ZNode information via any ZK client. All active brokers are registered under /brokers/ids/[brokerId]. All you need is zkQuorum address. Below command will give you
list of active bootstrap servers :
./zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids"
I experienced the same problem when using mismatched versions of:
Kafka client libraries
Kafka scripts
Kafka brokers
In my exact scenario I was using Confluent Kafka client libraries version 0.10.2.1 with Confluent Platform 3.3.0 w/ Kafka broker 0.11.0.0. When I downgraded my Confluent Platform to 3.3.2 which matched my client libraries the consumer worked as expected.
My theory is that the latest kafka-console-consumer using the new Consumer API was only retrieving messages using the latest format. There were a number of message format changes introduced in Kafka 0.11.0.0.

Kafka: org.apache.zookeeper.KeeperException$NoNodeException while creating topic on multi server setup

I am trying to setup multi node Kafka-0.8.2.2 cluster with 1 Producer, 1 consumer and 3 brokers all on different machines.
While creating topic on producer, I am getting error as org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/ids. Complete console output is available here. There is no error in Kafka Producer's log.
Command I am using to run Kafka is:
./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic edwintest
Note: Zookeeper service is running on all the servers, and all three brokers are having Kafka servers running on them (Only Brokers need Kafka Server. Right?).
Configuration of my producer.properties is as:
metadata.broker.list=<IP.OF.BROKER.1>:9092,<IP.OF.BROKER.2>:9092,<IP.OF.BROKER.3>:9092
producer.type=sync
compression.codec=none
serializer.class=kafka.serializer.DefaultEncoder
Below are some of the many articles I was using as reference:
Zookeeper & Kafka Install : A single node and a multiple broker cluster - 2016
Step by Step of Installing Apache Kafka and Communicating with Spark
At the very first glance it seems like you're calling create topic to a local zookeeper which is not aware of any of your kafka-brookers. You should call ./bin/kafka-topics.sh --create --zookeeper <IP.OF.BROKER.1>:2181
The issue was because I was trying to connect to the zookeeper of localhost. My understanding was zookeeper is needed to be running on producer, consumer, and the Kafka brokers, and the communication is done between producer -> broker and broker -> consumer via zookeeper. But that was incorrect. Actually:
Zookeeper and Kafka servers should be running only on on broker servers. While creating the topic or publishing the content to the topic, public DNS of any of the Kafka broker should be passed with --zookeeper option. There is no need to run Kafka server on producer or consumer instance.
Correct command will be:
./bin/kafka-topics.sh --create --zookeeper <Public-DNS>:<PORT> --replication-factor 1 --partitions 3 --topic edwintest
where: Public-DNS is the DNS of any of the Kafka broker and PORT is the port of the zookeeper service.