kafka-python basic producing and consuming not working - apache-kafka

I am new to kafka and I'm trying to run basic example.
My kafka is running with this config: https://developer.confluent.io/quickstart/kafka-docker/
python 3.7; kafka installation as follows: pip install kafka-python
(2.0.2)
I follow this doc; then I run two consoles (one for each of consume and produce)
consumer:
from kafka import KafkaConsumer
for m in KafkaConsumer('my-topic', bootstrap_servers='broker'):
print(m)
producer:
from kafka import KafkaProducer
p = KafkaProducer(bootstrap_servers='broker')
p.send('my-topic', b'my message!')
And after p.send() I expect to that consumer gets the message. But nothing happens.
What is wrong with my setup?
Edit: consoles are run container within the same docker-compose

broker only resolves inside the docker-compose network, if you are running the scripts in the host, you should use localhost.
And if you are running the scripts as containers in the same docker-compose, you should use broker:29092 since there is where Kafka is listening for connections from within the docker-compose network.

Related

How to connect kafka producer and consumer to a machine that is running kafka and zookeeper

I have a ubuntu machine that is having kafka and zookeepr installed in it, I am using spring boot for making the consumer and producer, locally the process works, however, when the deploy the producer and consumer jar to another machine it doesn't work
Kafka defaults to only listen locally.
You need to set these in Kafka's server.properties
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://<external-ip>:9092
https://www.confluent.io/blog/kafka-listeners-explained/
Then, obviously, don't use localhost:9092 in your remote client code.
You should never need Zookeeper connection details. Besides, as of Kafka 3.3.1, Zookeeper isn't required at all.

Kafka Producer From Remote Server

I'm developing a streaming API with Apache Kafka version (2.1.0). I have a Kafka cluster and an external server.
The external Server will be producing data to be consumed on the Kafka cluster.
Let's denote the external Server as E and the cluster as C . E doesn't have Kafka installed. I run on it a JAR file to produce messages. Here is the snippet for Producer properties:
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "bootstrapIp:9092");
properties.put(ProducerConfig.CLIENT_ID_CONFIG, "producer");
I set bootstrapIp to the Kafka broker IP.
In the cluster side, I start the consumer console using this command:
kafka-console-consumer --bootstrap-server bootstrapIp:9092 --topic T1 --from-beginning
I set bootstrapIp to the cluster bootstrap server IP.
When running the producer and the consumer on the cluster, it works very fine, but when i run the producer in the external server (E) and the consumer in the cluster (C) the data not being consumed.
In localhost every thing is working fine also when i run the producer and the consumer in the cluster (C) everything is working fine, when running the producer externally i can't consume the data in the cluster.
The ping from cluster(C) to external server (E) is working, but i can't see where the problem is exactly.
I am not able to figure out how to consume messages from an external server.
EDIT
From the external server (E) i telnet the (bootstrapIp):
telnet bootstrapIp 9092 and it works, i don't understand the problem
Tbis works for me:
From server.properties Uncomment
listeners=PLAINTEXT://:9092
And
advertised.listeners=PLAINTEXT://<HOST IP>:9092
Replace with actual IP. My case:
advertised.listeners=PLAINTEXT://192.168.75.132:9092

Kafka connect cluster setup or launching connect workers

I am going through kafka connect, and i am trying to get the concepts.
Let us say I have kafka cluster (nodes k1, k2 and k3) setup and it is running, now i want to run kafka connect workers in different nodes say c1 and c2 in distributed mode.
Few questions.
1) To run or launch kafka connect in distributed mode I need to use command ../bin/connect-distributed.sh, which is available in kakfa cluster nodes, so I need to launch kafka connect from any one of the kafka cluster nodes? or any node from where I launch kafka connect needs to have kafka binaries so that i will be able to use ../bin/connect-distributed.sh
2) I need to copy the my connector plugins to any kafka cluster node( or to all cluster nodes?) from where I do the step 1?
3) how does kafka copies these connector plugins to worker node before starting jvm process on the worker node? because the plugin is the one which has my task code and it needs to be copied to worker in order to start the process in worker.
4) Do i need to install anything in connect cluster nodes c1 and c2, like need to install java or any kafka connect related?
5) In some places it says use confluent platform but i would like to start it with apache kafka connect alone first.
can some one please through some light or even pointer to some resources would also help.
Thank you.
1) In order to have a highly available kafka-connect service you need to run at least two instances of connect-distributed.sh on two distinct machines that have the same group.id. You can find more details regarding the configuration of each worker here. For improved performance, Connect should be ran independently of the broker and Zookeeper machines.
2) Yes, you need to place all your connectors under plugin.path (normally under /usr/share/java/) on every machine that you are planning to run kafka-connect.
3) kafka-connect will load the connectors on startup. You don't need to handle this. Note that if your kafka-connect instance is running and a new connector is added, you need to restart the service.
4) You need to have Java installed on all your machines. For Confluent Platform particularly:
Java 1.7 and 1.8 are supported in this version of Confluent Platform
(Java 1.9 is currently not supported). You should run with the
Garbage-First (G1) garbage collector. For more information, see the
Supported Versions and Interoperability.
5) It depends. Confluent was founded by the original creators of Apache Kafka and it comes as a more complete distribution adding schema management, connectors and clients. It also comes with KSQL which is quite useful if you need to act on certain events. Confluent simply adds on top of the Apache Kafka distribution, it's not a modified version.
Answer given by Giorgos is correct. I ran few connectors and now I understand it better.
I am just trying to put it differently.
In Kafka connect there are two things involved one is Worker and second is connector.Below is on details about running distributed Kafka connect.
Kafka connect Worker is a Java process on which the connector/connect task will run. So first thing is we need to launch worker, to run/launch a worker we need java installed on that machine then we need Kafka connect related sh/bat files to launch worker and kafka libs which will be used by kafka connect worker, for this we will just simply copy/install Kafka in the worker machine, also we need to copy all the connector and connect-task related jars/dependencies in "plugin.path" as defined in the below worker properties file, now worker machine is ready, to start worker we need to invoke ./bin/connect-distributed.sh ./config/connect-distributed.properties, here connect-distributed.properties will have configuration for worker. The same thing has to be repeated in each machine where we need to run Kafka connect.
Now the worker java process is running in all machines, the woker config will have group.id property, the workers which have this same property value will be forming a group/cluster of workers.
Each worker process will expose rest endpoint (default http://localhost:8083/connectors), to launch/start a connector on the running workers, we need do http-post a connector config json, based on the given config the worker will start the connector and the number of tasks in the above group/cluster workers.
Example: Connect post,
curl -X POST -H "Content-Type: application/json" --data '{"name": "local-file-sink", "config": {"connector.class":"FileStreamSinkConnector", "tasks.max":"3", "file":"test.sink.txt", "topics":"connect-test" }}' http://localhost:8083/connectors

Setup kafka-connect to fetch data from remote brokers

I'm trying to set up Kafka connect sink connector. Kafka connect is part of Kafka connect worker (confluent-3.2.0). I have a Kafka broker (confluent-3.2.0) up and running on machine A. I want to set up Kafka-connect-sink connector on another machine B to consume messages, using a custom Kafka-connect-sink connector jar. Assume that Kafka broker and Zoo keeper ports on machine A are open to machine B.
So should I install/setup confluent-3.2.0 on machine B (Since Kafka Connect is part of Kafka package) by setting the classpath to the Kafka-connect-sink connector jar and run the following command?
./bin/connect-distributed.sh worker.properties
Yes. What you describe will work and is the easiest way to setup this system even though on machine B you really only need the start script, the configuration properties file, the jars for Kafka Connect, and the jars for the custom connector.

Configure zookeeper for kafka

I want to install kafka in my centos 6.5 machine. In kafka installation tutorial, I came to know that it needs zookeeper to run. I have already install hbase which also uses zookeeper service internally and zookeeper service only starts when I start hbase service.
So in order to install kafka, do I need install zookeeper separately? Please suggest.
Kafka is designed to use zookeeper by default. If you have already installed zookeeper in your system, you can create a bash script to start the zookeeper whenever you start the kafka. On your zookeeper installation directory there should be zkServer.sh start (to start zookeeper) and in kafka installation directory kafka-server-start.sh (to start the kafka).
Kafka architecture works best with distributed platform, if you are experimenting with sudo cluster, you can look for alternative message brokers like HiveMQ or RabbitMQ.
You can look further discussions at: Kafka: Is Zookeeper a must?
Installing the zookeeper cluster is the best practice. You can use it for hbase and kafka.(just define the different root dir in zk)