Confluent Cloud with MongoDB sink connect is not working - mongodb

I'm trying to connectting confluent kafka connect with mongodb sink,but its not working as expected, it throws null pointer exception.
My confluent development environment are running in GCP VM instance.
And installed "confluent-hub install mongodb/kafka-connect-mongodb:latest"
Below are my sink configuration.
{
"name": "today-menu-sink",
"config": {
"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max":"1",
"topics":"newuser",
"connection.uri":"mongodb+srv://*********************.mongodb.net",
"database":"BigBoxStore",
"collection":"users",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":false,
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url" : "http://*******:8081" ,
"value.converter.schemas.enable":true
}
}

Related

Kafka connect could not find JdbcSinkConnector even if it's installed

I have installed JDBC connector running confluent-hub install --no-prompt confluentinc/kafka-connect-jdbc:10.2.5 inside my kafka connect connector, but when I try to implement a new sink using I have the following error : Failed to find any class that implements Connector and which name matches io.confluent.connect.jdbc.JdbcSinkConnector
Sink I'm trying to use
{
"name": "jdbc-sink-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "my_topic",
"connection.url": "jdbc:postgresql://ip:port/postgres",
"connection.user": "postgres",
"connection.password": "PASSWORD",
"auto.create": "true"
}
}
I'm using confluentinc/cp-kafka-connect:6.1.0 image
If I build an image with confluent-hub install --no-prompt confluentinc/kafka-connect-jdbc:10.2.5 and use this image it works.
So looks like we need to restart kafka connect after install ?
we need to restart kafka connect after install ?
Yes, the JVM doesn't pick up new plugins until (re)started

NoClassDefFoundError while running kafka connect for mongodb in standalone mode

I am trying to set up debezium connector with MongoDB in a standalone mode on my local machine by following this.
I have set up a replica set of MongoDB with 3 nodes, 1 master 2 replica (host = localhost, port = 27017, 27018, 27019) . I have started kafka and zookeper on the local machine as a quickstart guide.
After this, I downloaded the MongoDB connector plugin jar from here.
Set plugin.path variable to mongodb plugin jars as :
plugin.path=dbz_connector_mongodb_jar_path,KAFKA_HOME/libs
created connector config for standalone mode.
KAFKA_HOME/config/my_mongo_connector.properties :
{
"name": "my-mongo-connector",
"config": {
"connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
"mongodb.hosts": "rc0/127.0.0.1:27017",
"mongodb.name": "myMongoConnceter",
"collection.include.list": "dbname.collectionName"
}
}
Now when I run kafka connect with below command :
cd KAFKA_HOME
bin/connect-standalone.sh config/connect-standalone.properties config/my_mongo_connector.properties
I see below error :
java.lang.NoClassDefFoundError: com/mongodb/MongoException
at java.base/java.lang.Class.getDeclaredConstructors0(Native Method)
at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3137)
at java.base/java.lang.Class.getConstructor0(Class.java:3342)
at java.base/java.lang.Class.newInstance(Class.java:556)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:392)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.getPluginDesc(DelegatingClassLoader.java:362)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:334)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:268)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:260)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initPluginLoader(DelegatingClassLoader.java:229)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:206)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:61)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:78)
Caused by: java.lang.ClassNotFoundException: com.mongodb.MongoException
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 13 more
Then I added mongo-java-driver-3.12.10.jar file to the kafka connect plugins folder which has this class com.mongodb.MongoException . But still, I face the same error.
I have also found that this jar is loading while running kafka connect command :
INFO Loading plugin from: debezium-connector-mongodb_path/mongo-java-driver-3.12.10.jar (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:246)
INFO Registered loader: PluginClassLoader{pluginLocation=file:debezium-connector-mongodb_path/mongo-java-driver-3.12.10.jar} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:269)
EDIT :
My problem of NoClassDefFoundError was solved by correcting plugin.path.
Earlier it was /opt/debezium-connector-mongodb and I changed it to /opt , as suggested in the comment.
My another mistake : I was taking connector propeties file in json format, so changed it into properties file format :
name:my-mongo-connector
connector.class:io.debezium.connector.mongodb.MongoDbConnector
mongodb.hosts=rc0/127.0.0.1:27017
mongodb.name=myMongoConnector
collection.include.list=dbname.collectionName
topic.creation.default.replication.factor=1
topic.creation.default.partitions=2
mongodb.members.auto.discover=false
Now kafka connect started but on insertion of new entry in mongo collection, kafka topic is not creating.
So now the problem is unable to create kafka topic even after data insertion in DB.

Debezium can not capture change from MongoDB

I am using debezium mongo connnect 1.4.2 on Kafka connect 2.2. And it seems 'collection.include.list' configuration is preventing Debezium getting the collection data change. If I delete the collection.include.list config, the capture will start work. But will apply on all the collections which I don't want.
Can anyone send me some example about how collection.include.list can be configured? I tried to input '<db_name>[.]<collection_name>' , however I keep on got this warning and no data was captured.
[2021-04-03 07:58:21,971] WARN After applying the include/exclude list filters, no changes will be captured. Please check your configuration! (io.debezium.connector.mongodb.MongoDbSchema:96)
My config is like below:
{
"name": "pipeline-mongo-connector",
"config": {
"connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
"mongodb.hosts": "xxxx_host:3717",
"mongodb.name": "pipeline_mongo",
"mongodb.user": "xxxxxxx",
"mongodb.password":"xxxxxx",
"collection.include.list": "prod-datapipeline[.]*"
}
}
Thanks!

Exception while Deserialize avro data using ConfluentSchemaRegistry?

I am new to flink and Kafka. I am trying to deserialize avro data using Confluent Schema registry. I have already installed flink and Kafka on ec2 machine. Also, the "test" topic has been created before running code.
Code Path: https://gist.github.com/mandar2174/5dc13350b296abf127b92d0697c320f2
The code does the following operation as part of implementation:
1) Create a flink DataStream object using a list of user element. (User class is avro generated class)
2) Write the Datastream source to Kafka using AvroSerializationSchema.
3) Read the data from Kafka using ConfluentRegistryAvroDeserializationSchema by reading the schema from Confluent Schema registry.
Command to run flink executable jar:
./bin/flink run -c com.streaming.example.ConfluentSchemaRegistryExample /opt/flink-1.7.2/kafka-flink-stream-processing-assembly-0.1.jar
Exception while running code:
java.io.IOException: Unknown data format. Magic number does not match
at org.apache.flink.formats.avro.registry.confluent.ConfluentSchemaRegistryCoder.readSchema(ConfluentSchemaRegistryCoder.java:55)
at org.apache.flink.formats.avro.RegistryAvroDeserializationSchema.deserialize(RegistryAvroDeserializationSchema.java:66)
at org.apache.flink.streaming.util.serialization.KeyedDeserializationSchemaWrapper.deserialize(KeyedDeserializationSchemaWrapper.java:44)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:140)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:665)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:94)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
at java.lang.Thread.run(Thread.java:748)
Avro schema which I am using for User class is as below:
{
"type": "record",
"name": "User",
"namespace": "com.streaming.example",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "favorite_number",
"type": [
"int",
"null"
]
},
{
"name": "favorite_color",
"type": [
"string",
"null"
]
}
]
}
Can someone point out what steps I am missing as part of deserializing avro data using confluent Kafka schema registry?
How you wrote the Avro data needs to use the Registry as well in order for the deserializer that depends on it to work.
But this is an open PR in Flink, still for adding a ConfluentRegistryAvroSerializationSchema class
The workaround, I believe would be to use AvroDeserializationSchema, which does not depend on the Registry.
If you did want to use the Registry in the producer code, then you'd have to do so outside of Flink until that PR is merged.

How do I create the json for creating a distributed Kafka Connect Instance with a transformation?

Using standalone mode I create a connector and my customized transformation like that:
name=rabbitmq-source
connector.class=com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector
tasks.max=1
rabbitmq.host=rabbitmq-server
rabbitmq.queue=answers
kafka.topic=net.gutefrage.answers
transforms=extractFields
transforms.extractFields.type=net.gutefrage.connector.transforms.ExtractFields$Value
transforms.extractFields.fields=body,envelope.routingKey
transforms.extractFields.structName=net.gutefrage.events
But for a distributed connector what is the syntax for the PUT request to the Connect REST API? I cannot find any example in the docs.
Already tried a couple of things like:
cat <<EOF >/tmp/connector
{
"name": "rabbitmq-source",
"config": {
"connector.class": "com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector",
"tasks.max": "1",
"rabbitmq.host": "rabbitmq-server",
"rabbitmq.queue": "answers",
"kafka.topic": "net.gutefrage.answers",
"transforms": "extractFields",
"transforms.extractFields": {
"type": "net.gutefrage.connector.transforms.ExtractFields$Value",
"fields": "body,envelope.routingKey",
"structName": "net.gutefrage.events"
}
}
}
EOF
curl -vs --stderr - -X POST -H "Content-Type: application/json" --data #/tmp/connector "http://localhost:8083/connectors"
rm /tmp/connector
or also this did not work:
{
"name": "rabbitmq-source",
"config": {
"connector.class": "com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector",
"tasks.max": "1",
"rabbitmq.host": "rabbitmq-server",
"rabbitmq.queue": "answers",
"kafka.topic": "net.gutefrage.answers",
"transforms": "extractFields",
"transforms.extractFields.type": "net.gutefrage.connector.transforms.ExtractFields$Value",
"transforms.extractFields.fields": "body,envelope.routingKey",
"transforms.extractFields.structName": "net.gutefrage.events"
}
}
For the last variant I get the following error:
{"error_code":400,"message":"Connector configuration is invalid and contains the following 1 error(s):\nInvalid value class net.gutefrage.connector.transforms.ExtractFields for configuration transforms.extractFields.type: Error getting config definition from Transformation: null\nYou can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"}
Please note that with the properties format it works nicely (using Landoops Create New Connector UI in fast-data-dev. Interesting that the Landoop's Ui feature 'translate to curl' produces the very same json as my second example)
Update
To be sure that it's not a problem with Landoop, docker and with my custom transformation, I've started zookeeper, broker, schema registry and Kafka Connect in distributed mode with the standard distributed properties from COP 3.3.0
bin/connect-distributed etc/schema-registry/connect-avro-distributed.properties
which logs
[2017-09-13 14:07:52,930] INFO Loading plugin from: /opt/connectors/confluent-oss-gf-assembly-1.0.jar (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:176)
[2017-09-13 14:07:53,711] INFO Registered loader: PluginClassLoader{pluginLocation=file:/opt/connectors/confluent-oss-gf-assembly-1.0.jar} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:199)
[2017-09-13 14:07:53,711] INFO Added plugin 'com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
[2017-09-13 14:07:53,712] INFO Added plugin 'net.gutefrage.connector.transforms.ExtractFields$Key' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
[2017-09-13 14:07:53,712] INFO Added plugin 'net.gutefrage.connector.transforms.ExtractFields$Value' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
All good so far. Then I created a connector config:
cat <<EOF >/tmp/connector
{
"name": "rabbitmq-source",
"config": {
"connector.class": "com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector",
"tasks.max": "1",
"rabbitmq.host": "rabbitmq-server",
"rabbitmq.queue": "answers",
"kafka.topic": "net.gutefrage.answers",
"transforms": "extractFields",
"transforms.extractFields.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.extractFields.field": "body"
}
}
EOF
Please note that there I do now use the standard (bundled) extract field transform.
When I post that with curl -vs --stderr - -X POST -H "Content-Type: application/json" --data #/tmp/connector "http://localhost:8083/connectors"
I get the same
{"error_code":400,"message":"Connector configuration is invalid and contains the following 1 error(s):\nInvalid value class org.apache.kafka.connect.transforms.ExtractField for configuration transforms.extractFields.type: Error getting config definition from Transformation: null\nYou can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"}*
Make sure the $Value in transforms.extractFields.type=net.gutefrage.connector.transforms.ExtractFields$Value is not interpreted as a variable by the bash command cat. It worked for me.
If you want to run the Kafka Connect worker in standalone mode, then you must start the worker and supply the worker configuration file and one or more connector configuration files. All of those configuration files are in Java properties format, so the first configuration sample you provided is the correct format:
name=rabbitmq-source
connect.class=com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector
tasks.max=1
rabbitmq.host=rabbitmq-server
rabbitmq.queue=answers
kafka.topic=net.gutefrage.answers
transforms=extractFields
transforms.extractFields.type=net.gutefrage.connector.transforms.ExtractFields$Value
transforms.extractFields.fields=body,envelope.routingKey
transforms.extractFields.structName=net.gutefrage.events
If you want to run the Kafka Connect worker in distributed mode, then you will have to first start the distributed worker and then create the connector as a second step using the REST API and a PUT request with a JSON document to the /connectors endpoint. That JSON document would match the format of you're second JSON document:
{
"name": "rabbitmq-source",
"config": {
"connector.class": "com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector",
"tasks.max": "1",
"rabbitmq.host": "rabbitmq-server",
"rabbitmq.queue": "answers",
"kafka.topic": "net.gutefrage.answers",
"transforms": "extractFields",
"transforms.extractFields.type": "net.gutefrage.connector.transforms.ExtractFields$Value",
"transforms.extractFields.fields": "body,envelope.routingKey",
"transforms.extractFields.structName": "net.gutefrage.events"
}
}
The Confluent CLI, included in Confluent's Open Source Platform that includes Kafka, is a developer tool to help you quickly get started by running a Zookeeper instance, a Kafka broker, the Confluent Schema Registry, the REST proxy, and a Connect worker in distributed mode. When you load a connector, you specify the connector configuration as either a JSON file or property file, converting the latter to a JSON format using jq.
However, the error you reported is:
{
"error_code":400,
"message":"Connector configuration is invalid and contains the following 1 error(s):\nInvalid value class net.gutefrage.connector.transforms.ExtractFields for configuration transforms.extractFields.type: Error getting config definition from Transformation: null\nYou can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"
}
The important part of this error message is "Error getting config definition from Transformation: null". Although this is a bit too cryptic, it means that the config() method of the net.gutefrage.connector.transforms.ExtractFields Java class is returning null.
Make sure that the net.gutefrage.connector.transforms.ExtractFields$Valuestring you specified is the correct fully qualified name for the nested static class Value, and that the Value class fully and correctly implements the org.apache.kafka.connect.transforms.Transformation<? extends ConnectRecord<R>> interface. Note that the config() method must return a non-null ConfigDef object.
Take a look at this example of a Single Message Transform (SMT) that ships with Apache Kafka, or Robin's blog post for other examples.
For using json format of connector config and the CP connect CLI, the jq tool has to be installed on the machine where the Kafka-Connect Cluster is running.
E.g. for Landoops fast-data-dev environment you'll have to
docker exec rabbitmqconnect_fast-data-dev_1 apk add --no-cache jq
Then this will work:
docker exec rabbitmqconnect_fast-data-dev_1 /opt/confluent-3.3.0/bin/confluent config rabbitmq-source -d /tmp/connector-config.json
This doesn't solve the issue when using the connector REST endpoint though.
With fast-data-dev you can build a JAR file for any connector, and then just add it in the classpath with the instructions at
https://github.com/Landoop/fast-data-dev#enable-additional-connectors
The UI will auto-detect the new connector - and provide you instructions when you hit NEW for the new connector at:
http://localhost:3030/kafka-connect-ui
What would also be worth trying - as fast-data-dev comes already with an generic MQTT sink connector, is trying it out. See instructions at http://docs.datamountaineer.com/en/latest/mqtt-sink.html
You would effectively need to do
connect.mqtt.kcql=INSERT INTO /answers SELECT body FROM net.gutefrage.answers
As this is a generic MQTT connector - possibly you will need to add the rabbitmq client library using the enable-additional-connectors instructions