Good day,
Base on https://docs.confluent.io/kafka-connect-jdbc/current/index.html#installing-jdbc-drivers ,
I use the following command to isntall the jdbc connector:
confluent-hub install confluentinc/kafka-connect-jdbc:latest
Command run successfully, I can see confluentinc-kafka-connect-jdbc folder is created under <confluent-plaform>/share/confluent-hub-components.
And here is the screen shot to show the result of my install command:
After that, I following the next instruction, to upload the jdbc drivers jar file to share/java/kafka-connect-jdbc.
After that, I come to https://docs.confluent.io/kafka-connect-jdbc/current/source-connector/index.html , to load the db connector, first step, I use the list command to list down the connector I have by using following command:
confluent local services connect connector list
The output is show as follow:
[meow#localhost confluent-7.0.1]$ confluent local services connect connector list
The local commands are intended for a single-node development environment only,
NOT for production usage. https://docs.confluent.io/current/cli/index.html
Bundled Connectors:
file-sink
file-source
replicator
There is no connector name jdbc-source in the list, thus, I cant proceed to the next step to continue.
May I know what mistake on my steps?
After running confluent-hub install you must restart the Kafka Connect worker for it to pick up the new connector.
Since you're using the Confluent CLI the commands are:
confluent local services connect stop
confluent local services connect start
Edit: your screenshot shows that you told the Confluent Hub client not to update any of the Kafka Connect worker configurations. Therefore the worker will not pick up the connector that you've installed.
You should run the Confluent Hub client again and tell it to update the Kafka Connect worker configurations when prompted, and then restart the Kafka Connect worker. After that it will pick up the new connector.
Related
How can I connect kafka events to a mongodb sink?
The resources I found on the net using confluent they make a cluster for you and didn't find how to connect my already existing cluster
You need to install the Mongo Connector into the plugin.path of your connect properties file, then start Kafka Connect using one of the bin/connect- scripts in your Kafka installation
I am currently working with a simple project to query the messages from Apache Kafka topic using Apache Drill. And now I am encountering an error when running the Apache Drill cluster when running this command.
sqlline.bat -u "jdbc:drill:zk=localhost:2181"
And the error that I encountered is:
No active Drillbit endpoint found from ZooKeeper. Check connection parameters
I am using the single cluster instance of ZooKeeper that came from Apache Kafka.
Can anyone help me with this problem? Is it ok to use the Zookeeper from Apache Kafka installation with Drill?
sqlline.bat -u "jdbc:drill:zk=localhost:2181" command only connects to running DrillBit. If you have Drill running in distributed mode, please replace localhost with the correct IP address of the node, where Zookeeper is running and update port if needed.
If you want to start Drill in embedded mode, you may try running drill-embedded.bat or sqlline.bat -u "jdbc:drill:zk=local" command.
For more details please refer to https://drill.apache.org/docs/starting-drill-on-windows/.
I am going through kafka connect, and i am trying to get the concepts.
Let us say I have kafka cluster (nodes k1, k2 and k3) setup and it is running, now i want to run kafka connect workers in different nodes say c1 and c2 in distributed mode.
Few questions.
1) To run or launch kafka connect in distributed mode I need to use command ../bin/connect-distributed.sh, which is available in kakfa cluster nodes, so I need to launch kafka connect from any one of the kafka cluster nodes? or any node from where I launch kafka connect needs to have kafka binaries so that i will be able to use ../bin/connect-distributed.sh
2) I need to copy the my connector plugins to any kafka cluster node( or to all cluster nodes?) from where I do the step 1?
3) how does kafka copies these connector plugins to worker node before starting jvm process on the worker node? because the plugin is the one which has my task code and it needs to be copied to worker in order to start the process in worker.
4) Do i need to install anything in connect cluster nodes c1 and c2, like need to install java or any kafka connect related?
5) In some places it says use confluent platform but i would like to start it with apache kafka connect alone first.
can some one please through some light or even pointer to some resources would also help.
Thank you.
1) In order to have a highly available kafka-connect service you need to run at least two instances of connect-distributed.sh on two distinct machines that have the same group.id. You can find more details regarding the configuration of each worker here. For improved performance, Connect should be ran independently of the broker and Zookeeper machines.
2) Yes, you need to place all your connectors under plugin.path (normally under /usr/share/java/) on every machine that you are planning to run kafka-connect.
3) kafka-connect will load the connectors on startup. You don't need to handle this. Note that if your kafka-connect instance is running and a new connector is added, you need to restart the service.
4) You need to have Java installed on all your machines. For Confluent Platform particularly:
Java 1.7 and 1.8 are supported in this version of Confluent Platform
(Java 1.9 is currently not supported). You should run with the
Garbage-First (G1) garbage collector. For more information, see the
Supported Versions and Interoperability.
5) It depends. Confluent was founded by the original creators of Apache Kafka and it comes as a more complete distribution adding schema management, connectors and clients. It also comes with KSQL which is quite useful if you need to act on certain events. Confluent simply adds on top of the Apache Kafka distribution, it's not a modified version.
Answer given by Giorgos is correct. I ran few connectors and now I understand it better.
I am just trying to put it differently.
In Kafka connect there are two things involved one is Worker and second is connector.Below is on details about running distributed Kafka connect.
Kafka connect Worker is a Java process on which the connector/connect task will run. So first thing is we need to launch worker, to run/launch a worker we need java installed on that machine then we need Kafka connect related sh/bat files to launch worker and kafka libs which will be used by kafka connect worker, for this we will just simply copy/install Kafka in the worker machine, also we need to copy all the connector and connect-task related jars/dependencies in "plugin.path" as defined in the below worker properties file, now worker machine is ready, to start worker we need to invoke ./bin/connect-distributed.sh ./config/connect-distributed.properties, here connect-distributed.properties will have configuration for worker. The same thing has to be repeated in each machine where we need to run Kafka connect.
Now the worker java process is running in all machines, the woker config will have group.id property, the workers which have this same property value will be forming a group/cluster of workers.
Each worker process will expose rest endpoint (default http://localhost:8083/connectors), to launch/start a connector on the running workers, we need do http-post a connector config json, based on the given config the worker will start the connector and the number of tasks in the above group/cluster workers.
Example: Connect post,
curl -X POST -H "Content-Type: application/json" --data '{"name": "local-file-sink", "config": {"connector.class":"FileStreamSinkConnector", "tasks.max":"3", "file":"test.sink.txt", "topics":"connect-test" }}' http://localhost:8083/connectors
I am trying to upgrade from the apache kafka to the confluent kafka
As the storage of the temp folder is quite limited I have changed the log.dirs of server.properties to a custom folder
log.dirs=<custom location>
Then try to start kafka server via the Confluent CLI (version 4.0) using below command :
bin/confluent start kafka
However when I check the kafka data folder, the data still persitted under the temp folder instead of the customzied one.
I have tried to start kafka server directly which is not using the Confluent CLI
bin/kafka-server-start etc/kafka/server.properties
then seen the config has been picked up properly
is this a bug with confluent CLI or it is supposed to be
I am trying to upgrade from the apache kafka to the confluent kafka
There is no such thing as "confluent kafka".
You can refer to the Apache or Confluent Upgrade documentation steps for switching Kafka versions, but at the end of the day, both are Apache Kafka.
On a related note: You don't need Kafka from the Confluent site to run other parts of the Confluent Platform.
The confluent command, though, will read it's own embedded config files for running on localhost only, and is not intended to integrate with external brokers / zookeepers.
Therefore, kafka-server-start is the production way to run Apache Kafka
Confluent CLI is meant to be used during development with Confluent Platform. Therefore, it currently gathers all the data and logs under a common location in order for a developer to be able to easily inspect (with confluent log or manually) and delete (with confluent destroy or manually) such data.
You are able to change this common location by setting
export CONFLUENT_CURRENT=<top-level-logs-and-data-directory>
and get which location is used any time with:
confluent current
The rest of the properties are used as set in the various .properties files for each service.
I'm trying to set up Kafka connect sink connector. Kafka connect is part of Kafka connect worker (confluent-3.2.0). I have a Kafka broker (confluent-3.2.0) up and running on machine A. I want to set up Kafka-connect-sink connector on another machine B to consume messages, using a custom Kafka-connect-sink connector jar. Assume that Kafka broker and Zoo keeper ports on machine A are open to machine B.
So should I install/setup confluent-3.2.0 on machine B (Since Kafka Connect is part of Kafka package) by setting the classpath to the Kafka-connect-sink connector jar and run the following command?
./bin/connect-distributed.sh worker.properties
Yes. What you describe will work and is the easiest way to setup this system even though on machine B you really only need the start script, the configuration properties file, the jars for Kafka Connect, and the jars for the custom connector.