How to connect kafka producer and consumer to a machine that is running kafka and zookeeper - apache-kafka

I have a ubuntu machine that is having kafka and zookeepr installed in it, I am using spring boot for making the consumer and producer, locally the process works, however, when the deploy the producer and consumer jar to another machine it doesn't work

Kafka defaults to only listen locally.
You need to set these in Kafka's server.properties
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://<external-ip>:9092
https://www.confluent.io/blog/kafka-listeners-explained/
Then, obviously, don't use localhost:9092 in your remote client code.
You should never need Zookeeper connection details. Besides, as of Kafka 3.3.1, Zookeeper isn't required at all.

Related

Kafka broker setup

To connect to a Kafka cluster I've been provided with a set of bootstrap servers with name and port :
s1:90912
s2:9092
s3:9092
Kafka and Zookeeper are running on the instance s4. From reading https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-properties-bootstrap-servers.html, it states:
bootstrap server is a comma-separated list of host and port pairs that
are the addresses of the Kafka brokers in a "bootstrap" Kafka cluster
that a Kafka client connects to initially to bootstrap itself.
I reference the above bootstrap server definition as I'm trying to understand the relationship between the kafka brokers s1,s2,s3 and kafka,zookeeper running on s4.
To connect to the Kafka cluster, I set the broker to a CSV list of 's1,s1,s3'. When I send messages to the CSV list of brokers, to verify the messages are added to the topic, I ssh onto the s4 box and view the messages on the topic.
What is the link between the Kafka brokers s1,s2,s3 and s4? I cannot ssh onto any of the brokers s1,s2,s3 as these brokers do not seem accessible using ssh, should s1,s2,s3 be accessible?
The individual responsible for the setup of the Kafka box is no longer available, and I'm confused as to how this configuration works. I've searched for config references of the brokers s1,s2,s3 on s4 but there does not appear to be any configuration.
When Kafka is being set up and configured what allows the linking between the brokers (in this case s1,s2,s3) and s4?
I start Kafka and Zookeeper on the same server, s4.
Should Kafka and Zookeeper also be running on s1,s2,s3?
What is the link between the Kafka brokers s1,s2,s3 and s4?
As per the Kafka documentation about adding nodes to a cluster, each server must share the same zookeeper.connect string and have a unique broker.id to be part of the cluster.
You may check which nodes are in the cluster via zookeeper-shell with an ls /brokers/ids, or via the Kafka AdminClient API, or kafkacat -L
should s1,s2,s3 be accessible?
Via SSH? They don't have to be.
They should respond to TCP connections from your Kafka client machines on their Kafka server ports, though
Should Kafka and Zookeeper also be running on s1,s2,s3?
You should not have 4 Zookeeper servers in a cluster (odd numbers, only)
Otherwise, you've at least been given some ports for Kafka on those machines, therefore Kafka should be

Kafka Producer From Remote Server

I'm developing a streaming API with Apache Kafka version (2.1.0). I have a Kafka cluster and an external server.
The external Server will be producing data to be consumed on the Kafka cluster.
Let's denote the external Server as E and the cluster as C . E doesn't have Kafka installed. I run on it a JAR file to produce messages. Here is the snippet for Producer properties:
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "bootstrapIp:9092");
properties.put(ProducerConfig.CLIENT_ID_CONFIG, "producer");
I set bootstrapIp to the Kafka broker IP.
In the cluster side, I start the consumer console using this command:
kafka-console-consumer --bootstrap-server bootstrapIp:9092 --topic T1 --from-beginning
I set bootstrapIp to the cluster bootstrap server IP.
When running the producer and the consumer on the cluster, it works very fine, but when i run the producer in the external server (E) and the consumer in the cluster (C) the data not being consumed.
In localhost every thing is working fine also when i run the producer and the consumer in the cluster (C) everything is working fine, when running the producer externally i can't consume the data in the cluster.
The ping from cluster(C) to external server (E) is working, but i can't see where the problem is exactly.
I am not able to figure out how to consume messages from an external server.
EDIT
From the external server (E) i telnet the (bootstrapIp):
telnet bootstrapIp 9092 and it works, i don't understand the problem
Tbis works for me:
From server.properties Uncomment
listeners=PLAINTEXT://:9092
And
advertised.listeners=PLAINTEXT://<HOST IP>:9092
Replace with actual IP. My case:
advertised.listeners=PLAINTEXT://192.168.75.132:9092

Kafka and Kafka Connect deployment environment

if I already have Kafka running on premises, is Kafka Connect just a configuration on top of my existing Kafka, or does Kafka Connect require it's own Server/Environment separate from that of my existing Kafka?
Kafka Connect is part of Apache Kafka, but it runs as a separate process, called a Kafka Connect Worker. Except in a sandbox environment, you would usually deploy it on a separate machine/node from your Kafka brokers.
This diagram shows conceptually how it runs, separate from your brokers:
You can run Kafka Connect on a single node, or as part of a cluster (for throughput and redundancy).
You can read more here about installation and configuration and architecture of Kafka Connect.
Kafka Connect is its own configuration on top of your bootstrap-server's configuration.
For Kafka Connect you can choose between a standalone server or distributed connect servers and you'll have to update the corresponding properties file to point to your currently running Kafka server(s).
Look under {kafka-root}/config and you'll see
You'll basically update connect-standalone or connect-distributed properties based on your need.

Differences between zookeeper-server-start.sh and kafka-server-start.sh

Is one of them more recommended/preferred to use than the other?
Kafka uses Zookeeper so you must start a Zookeeper server before starting the Kafka broker. Zookeeper and Kafka broker are two distinct things and both of them are required in order to run a Kafka Cluster. Kafka is a distributed system and is built to use Zookeeper which is responsible for controller election, topic configuration, clustering etc.
In order to run Zookeeper you need to set the parameters in the configuration file config/zookeeper.properties and then start the ZK server using
bin/zookeeper-server-start.sh config/zookeeper.properties
Then you need to run at least one Kafka broker which can be configured in config/server.properties file and then start it using
bin/kafka-server-start.sh config/server.properties
Zookeeper-server-start.sh is to start your zookeeper server which by default runs on port 2181.
To use kafka brokers, topics and partition you need to have your zookeeper server running, zookeeper works as manager for kafka brokers.
Kafka-server-start.sh is to start your kafka broker.
Zookeeper-server-start.shtakes a
zookeeper.propertiesfile for the configuration
Kafka-server-start takes a Kafka
server.properties file for configuration

Setup kafka-connect to fetch data from remote brokers

I'm trying to set up Kafka connect sink connector. Kafka connect is part of Kafka connect worker (confluent-3.2.0). I have a Kafka broker (confluent-3.2.0) up and running on machine A. I want to set up Kafka-connect-sink connector on another machine B to consume messages, using a custom Kafka-connect-sink connector jar. Assume that Kafka broker and Zoo keeper ports on machine A are open to machine B.
So should I install/setup confluent-3.2.0 on machine B (Since Kafka Connect is part of Kafka package) by setting the classpath to the Kafka-connect-sink connector jar and run the following command?
./bin/connect-distributed.sh worker.properties
Yes. What you describe will work and is the easiest way to setup this system even though on machine B you really only need the start script, the configuration properties file, the jars for Kafka Connect, and the jars for the custom connector.