We are trying to launch multiple standalone kafka hdfs connectors on a given node.
For each connector, we are setting the rest.port and offset.storage.file.filename to different ports and path respectively.
Also kafka broker JMX port is # 9999.
When I start the kafka standalone connector, I get the error
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 9999; nested exception is:
java.net.BindException: Address already in use (Bind failed)
Though the rest.port is set to 9100
kafka version: 2.12-0.10.2.1
kafka-connect-hdfs version: 3.2.1
Please help.
We are trying to launch multiple standalone kafka hdfs connectors on a given node.
Have you considered running these multiple connectors within a single instance of Kafka Connect? This might make things easier.
Kafka Connect itself can handle running multiple connectors within a single worker process. Kafka Connect in distributed mode can run on a single node, or across multiple ones.
For those who trying to use rest.port flag and still getting Address already in use error. That flag has been marked as deprecated in KIP-208 and finally removed in PR.
From that point listeners can be used to change default REST port.
Examples from Javadoc
listeners=HTTP://myhost:8083
listeners=HTTP://:8083
Configuring and Running Workers - Standalone mode
You may have open Kafka Connect connections that you don't know about. You can check this with:
ps -ef | grep connect
If you find any, kill those processes.
Related
I am deploying a confluent single-node so I am only manipulating connect-standalone.properties and server.properties.
I am trying to connect a remote producer to my local set-up so I have the following overrides in server.properties
listeners=PLAINTEXT://10.20.23.105:9092,EXTERNAL://10.20.23.105:29092
advertised.listeners=PLAINTEXT://10.20.23.105:9092,EXTERNAL://localhost:29092
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,EXTERNAL:PLAINTEXT
After checking using Offset Explorer, I can see that Kafka is still working and I am successfully getting the remote stream. However, Connect fails upon trying to start the service.
[2023-02-13 10:26:02,992] ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectDistributed:85)
org.apache.kafka.connect.errors.ConnectException: Failed to connect to and describe Kafka cluster. Check worker's broker connection and security properties.
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:79)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:60)
at org.apache.kafka.connect.cli.ConnectDistributed.startConnect(ConnectDistributed.java:96)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:79)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listNodes
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:73)
What are the possible fixes for this problem?
I have checked out this question Kafka-connect, Bootstrap broker disconnected, but since I am still using PLAINTEXT for my external listener, there shouldn't need to be any changes to the workers right?
Kafka Connect isn't the problem. Start debugging with kafka-console-producer, for example.
listeners should not be hard-coded to any one IP.
Use bind addresses to allow connection from all interfaces.
listeners=PLAINTEXT://0.0.0.0:9092,EXTERNAL://0.0.0.0:29092
For advertised listeners, "external" addresses should use a LAN IP. Not clear what your "localhost" listener is needed for here, since any connection that IP from that same machine would route back to itself, by default. More importantly, you don't need two ports opened for the same protocol-connection.
You've not shown your connect worker properties, but if it is running on an external machine, make sure there is no firewall interfering with the connection, and that you are using the correct IP/hostname and ports.
Steps
I have used two kafka brokers and I have started zookeeper,kafka server and kafka connect services.
I have one source type kafka connector which can be used for getting data from Database.
If i start the connector[connector 1] by using the rest API, then it will hit any one kafka server [Server 1] using load balancer.After that server 1 will store and running the connector.But server 2 does not know the connector [connector 1] which is running in the server 1.
Expectation
So if the kafka server 1 is down, then the another kafka server 2 should be able to run the connector in the failed kafka server 1.
While starting the connector, kafka server should know how many connectors are in running, so that if any one broker failed to do the job then another server will be able to continue the job.
Reality
Another Kafka server 2 which is not doing the job as per the requirement.
is there any thing to make it by configuration setup with kafka?.
Kindly suggest me some ideas.
Kafka Server 1
Kafka Server 2
It appears that you have started all processes in single pods.
You should run Kafka, Zookeeper, and Connect all as separate services in different pods.
I suggest you refer the Confluent or Strimzi sites to find Kafka Kubernetes Helm Charts / Operators
But to answer the question - You could give one or more broker to connect-distributed.properties bootstrap.server value. Then each broker is connected to as part of the Kafka cluster, and will reconnect in the event that one broker is unavailable
"Kakfa servers" (brokers) do not run Connectors
If you want to run a cluster of connect workers, you also need to setup their rest.advertised.listener address so that they can communicate with each other.
Able to connect successfully to local kafka broker/cluster running locally (dockerized) using Conduktor, but when trying to connect to Kafka cluster running on Unix VM, getting below error.
Error:
"The broker [...] is reachable but Kafka can't connect. Ensure you have access to the advertised listeners of the the brokers and the proper authorization"
Appreciate any assistance.
running locally (dockerized)
When running in docker, you need to ensure that the ports are accessible from outside of your container. To verify this, try doing a telnet <ip> <port> and check if you are able to connect.
Since the error message says, the broker is reachable, I suppose you would be able to successfully telnet to the broker.
Next, check your broker config called advertised.listeners. Here you need to mention your IP:Port combination where IP is what you will be giving in your client program i.e. Conduktor.
An example for that would be
advertised.listeners=PLAINTEXT://1.2.3.4:9092
and then restart your broker and reconnect. If you are using ssl then you need to provide some extra configuration. See Configuring Kafka brokers for more.
Try to add in /etc/hosts (Unix-like) or C:\Windows\System32\drivers\etc\hosts (windows-like) the Kafka server in such manner kafka_server_ip kafka_server_name_in_dns (e.g. 10.10.0.1 kafka).
I am currently working with a simple project to query the messages from Apache Kafka topic using Apache Drill. And now I am encountering an error when running the Apache Drill cluster when running this command.
sqlline.bat -u "jdbc:drill:zk=localhost:2181"
And the error that I encountered is:
No active Drillbit endpoint found from ZooKeeper. Check connection parameters
I am using the single cluster instance of ZooKeeper that came from Apache Kafka.
Can anyone help me with this problem? Is it ok to use the Zookeeper from Apache Kafka installation with Drill?
sqlline.bat -u "jdbc:drill:zk=localhost:2181" command only connects to running DrillBit. If you have Drill running in distributed mode, please replace localhost with the correct IP address of the node, where Zookeeper is running and update port if needed.
If you want to start Drill in embedded mode, you may try running drill-embedded.bat or sqlline.bat -u "jdbc:drill:zk=local" command.
For more details please refer to https://drill.apache.org/docs/starting-drill-on-windows/.
I am going through kafka connect, and i am trying to get the concepts.
Let us say I have kafka cluster (nodes k1, k2 and k3) setup and it is running, now i want to run kafka connect workers in different nodes say c1 and c2 in distributed mode.
Few questions.
1) To run or launch kafka connect in distributed mode I need to use command ../bin/connect-distributed.sh, which is available in kakfa cluster nodes, so I need to launch kafka connect from any one of the kafka cluster nodes? or any node from where I launch kafka connect needs to have kafka binaries so that i will be able to use ../bin/connect-distributed.sh
2) I need to copy the my connector plugins to any kafka cluster node( or to all cluster nodes?) from where I do the step 1?
3) how does kafka copies these connector plugins to worker node before starting jvm process on the worker node? because the plugin is the one which has my task code and it needs to be copied to worker in order to start the process in worker.
4) Do i need to install anything in connect cluster nodes c1 and c2, like need to install java or any kafka connect related?
5) In some places it says use confluent platform but i would like to start it with apache kafka connect alone first.
can some one please through some light or even pointer to some resources would also help.
Thank you.
1) In order to have a highly available kafka-connect service you need to run at least two instances of connect-distributed.sh on two distinct machines that have the same group.id. You can find more details regarding the configuration of each worker here. For improved performance, Connect should be ran independently of the broker and Zookeeper machines.
2) Yes, you need to place all your connectors under plugin.path (normally under /usr/share/java/) on every machine that you are planning to run kafka-connect.
3) kafka-connect will load the connectors on startup. You don't need to handle this. Note that if your kafka-connect instance is running and a new connector is added, you need to restart the service.
4) You need to have Java installed on all your machines. For Confluent Platform particularly:
Java 1.7 and 1.8 are supported in this version of Confluent Platform
(Java 1.9 is currently not supported). You should run with the
Garbage-First (G1) garbage collector. For more information, see the
Supported Versions and Interoperability.
5) It depends. Confluent was founded by the original creators of Apache Kafka and it comes as a more complete distribution adding schema management, connectors and clients. It also comes with KSQL which is quite useful if you need to act on certain events. Confluent simply adds on top of the Apache Kafka distribution, it's not a modified version.
Answer given by Giorgos is correct. I ran few connectors and now I understand it better.
I am just trying to put it differently.
In Kafka connect there are two things involved one is Worker and second is connector.Below is on details about running distributed Kafka connect.
Kafka connect Worker is a Java process on which the connector/connect task will run. So first thing is we need to launch worker, to run/launch a worker we need java installed on that machine then we need Kafka connect related sh/bat files to launch worker and kafka libs which will be used by kafka connect worker, for this we will just simply copy/install Kafka in the worker machine, also we need to copy all the connector and connect-task related jars/dependencies in "plugin.path" as defined in the below worker properties file, now worker machine is ready, to start worker we need to invoke ./bin/connect-distributed.sh ./config/connect-distributed.properties, here connect-distributed.properties will have configuration for worker. The same thing has to be repeated in each machine where we need to run Kafka connect.
Now the worker java process is running in all machines, the woker config will have group.id property, the workers which have this same property value will be forming a group/cluster of workers.
Each worker process will expose rest endpoint (default http://localhost:8083/connectors), to launch/start a connector on the running workers, we need do http-post a connector config json, based on the given config the worker will start the connector and the number of tasks in the above group/cluster workers.
Example: Connect post,
curl -X POST -H "Content-Type: application/json" --data '{"name": "local-file-sink", "config": {"connector.class":"FileStreamSinkConnector", "tasks.max":"3", "file":"test.sink.txt", "topics":"connect-test" }}' http://localhost:8083/connectors