Kafka Connect Protobuf Configuration - apache-kafka

I'm trying to create a kafka sink connector that uses a protobuf value converter. I've got a version of this configuration working with JSON, however I now need to change it to use protobuf messages.
I'm trying to create a connector with the following request:
curl -X POST localhost:8083/connectors -H "Content-Type: application/json" -d '
{
"name": "jdbc-sink-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"topics": "TEST_PROTO",
"connection.url": "${DB_URL}",
"value.converter": "io.confluent.connect.protobuf.ProtobufConverter",
"key.converter": "io.confluent.connect.protobuf.ProtobufConverter",
"auto.create": true,
"auto.evolve": true,
"type": "sink",
"connection.user": "{DB_USER}",
"connection.password": "${DB_PASS}"
}
}
This gives the following 400 error message:
Invalid value io.confluent.connect.protobuf.ProtobufConverter for configuration value.converter: Class io.confluent.connect.protobuf.ProtobufConverter could not be found
I'm not quite understanding why I'm not able to include this here. From what I can see the documentation suggests this is an appropriate value: https://docs.confluent.io/current/connect/userguide.html
Please can anyone help?

I guess that in this case, you are missing these configurations:
value.converter.schema.registry.url
key.converter.schema.registry.url
key.converter.schemas.enable
value.converter.schemas.enable
I addition to these also try using the latest jars of jdbc and latest version for confluent platform. If this doesn't works please let me know.

Adding more to the above answer for clarity, you need to mention the values for the below keys while configuring kafka-connect.
value.converter = "io.confluent.connect.protobuf.ProtobufConverter"
key.converter = "io.confluent.connect.protobuf.ProtobufConverter"
value.converter.schema.registry.url = URL (You should have schema registry service installed and all the producers registering the schema to the service registry before writing to the broker)
key.converter.schema.registry.url = Can be the same URL as above
key.converter.schemas.enable = true (if using Protobuf)
value.converter.schemas.enable = true (if using Protobuf)
To verify if the converter is loaded successfully, you can check the INFO logs.

Related

Unable to trigger Custom Producer Interceptor via JDBC Source Connector

I have created a custom Producer Interceptor (AuditProducerInterceptor) which accepts some custom configs(application_id, type etc.). I have generated a jar from the AuditProducerInterceptor project and placed the jar inside kafka-connect at /usr/share/java/monitoring-interceptors. When i try to post JDBC-Source connector with below configurations, my audit interceptor is not triggered.
{
"name": "jdbc-source-xx-xxxx-xxx-xxx",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:sqlserver://{{ip}}:1433;databaseName=XX;useNTLMv2=true",
"connection.user": "SA",
"connection.password": "Admin1234",
"producer.interceptor.classes": "com.optum.payer.common.kafka.audit.interceptor.AuditProducerInterceptor",
"topic.prefix": "MyTestTopic",
"query": "SELECT ID, chart_id, request_id, UpdatedDate FROM xxx.xxx WITH (NOLOCK)",
"mode": "timestamp",
"timestamp.column.name": "UpdatedDate",
"producer.audit.application.id": "HelloApplication",
"producer.audit.type": "test type",
"poll.interval.ms": "10",
"tasks.max": "1",
"batch.max.rows": "100",
"validate.non.null": "false",
"numeric.mapping":"best_fit",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://{{ip}}:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://{{ip}}:8081"
}}
As u can see in the configuration, i have added below props in connector config to trigger the custom interceptor. But I dont see any logs in Kafka-Connect related to AuditProducerInterceptor.
"producer.interceptor.classes": "com.optum.payer.common.kafka.audit.interceptor.AuditProducerInterceptor"
"producer.audit.application.id": "HelloApplication",
"producer.audit.type": "test type"
I tried adding these three config in kafka-connect config and I am able to trigger the interceptor. But I want to trigger the interceptor via JDBC source connector so that i can pass the custom props(application_id,type etc) via connector.
Please help me solve this issue
If you have allowed client overrides (enabled by default) in the Connect worker, you'll want to use producer.override prefix
From docs
Starting with 2.3.0, client configuration overrides can be configured individually per connector by using the prefixes producer.override. and consumer.override. for Kafka sources or Kafka sinks respectively.

kafka FileStreamSourceConnector write an avro file to topic with key field

I want to use kafka FileStreamSourceConnector to write a local avro file into a topic.
My connector config looks like this:
curl -i -X PUT -H "Content-Type:application/json" http://localhost:8083/connectors/file_source_connector/config \
-d '{
"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"topic": "my_topic",
"file": "/data/log.avsc",
"format.include.keys": "true",
"source.auto.offset.reset": "earliest",
"tasks.max": "1",
"value.converter.schemas.enable": "true",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
}'
Then when I print out the topic, the key fields are null.
Updated on 2021-03-29:
After watching this video 🎄Twelve Days of SMT 🎄 - Day 2: ValueToKey and ExtractField from Robin,
I applied SMT to my connector config:
curl -i -X PUT -H "Content-Type:application/json" http://localhost:8083/connectors/file_source_connector_02/config \
-d '{
"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"topic": "my_topic",
"file": "/data/log.avsc",
"tasks.max": "1",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"transforms": "ValueToKey, ExtractField",
"transforms.ValueToKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.ValueToKey.fields":"id",
"transforms.ExtractField.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.ExtractField.field":"id"
}'
However, the connector is failed:
Caused by: org.apache.kafka.connect.errors.DataException: Only Struct objects supported for [copying fields from value to key], found: java.lang.String
I would use ValueToKey transformer.
In bad case ignorig values and setting random key.
For details look at:ValueToKey
FileStreamSource assumes UTF8 encoded, line delimited files are your input, not binary files such as Avro. Last I checked, format.include.keys is not a valid config for the connector either.
Therefore each consumed event will be a string, and subsequently, transforms that require Structs with field names will not work
You can use the Hoist transform to create a Struct from each "line", but this still will not parse your data to make the ID field accessible to move to the key.
Also, your file is AVSC, which is JSON formatted, not Avro, so I'm not sure what the goal is by using the AvroConverter, or having "schemas.enable": "true". Still, the lines read by the connector are not parsed by converters such that fields are accessible, only serialized when sent to Kafka
My suggestion would be to write some other CLI script using plain producer libraries to parse the file, extract the schema, register that with Schema Registry, build a producer record for each entity in the file, and send them

Kafka connect MongoDB Kafka connector using JSONConverter not working

I'm trying to configure the Kafka connector to use mongoDB as the source and send the records into Kakfa topics.
I've successfully done so, but I'm trying to do it with the JSONConverter in order to also save the schema with the payload.
My problem is that the connector is saving the data as follows:
{ "schema": { "type": "string" } , "payload": "{....}" }
In other words, it's automatically assuming the actual JSON is a string and it's saving the schema as String.
This is how I'm setting up the connector:
curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d '{
"name": "newtopic",
"config": {
"tasks.max":1,
"connector.class":"com.mongodb.kafka.connect.MongoSourceConnector",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enabled": "true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enabled": "true",
"connection.uri":"[MONGOURL]",
"database":"dbname",
"collection":"collname",
"pipeline":"[]",
"topic.prefix": "",
"publish.full.document.only": "true"
}}'
Am I missing something for the configuration? Is it simply not able to guess the schema of the document stored in MongoDB, so it goes with String?
There's nothing to guess. Mongo is schemaless, last I checked, so the schema is a string or bytes. I would suggest using AvroConverter or setting schema enabled to false
You may also want to try using Debezium to see if you get different results
The latest version 1.3 of the MongoDB connector should solve this issue.
It even provides an option for inferring the source schema.

how to override key.serializer in kafka connect jdbc

I am doing mysql to kafka connection using kafka jdbc source connector. Everything working fine. Now i need to pass key.serializer and value.serializer to encrypt data as show at macronova. but i didn't found any changes in output.
POST API to start source connector
curl -X POST -H "Content-Type: application/json" --data '{
"name": "jdbc-source-connector-2",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"tasks.max": 10,
"connection.url": "jdbc:mysql://localhost:3306/connect_test?user=roo&password=roo",
"mode": "incrementing",
"table.whitelist" : "test",
"incrementing.column.name": "id",
"timestamp.column.name": "modified",
"topic.prefix": "table-",
"poll.interval.ms": 1000
}
}' http://localhost:8083/connectors
Connectors take Converters only, not serializers via key and value properties
If you want to encrypt a whole string, you'd need to implement your own converter or edit your code that writes into the database to write into Kafka instead, then consume and write to the database as well as other downstream systems

How do I create the json for creating a distributed Kafka Connect Instance with a transformation?

Using standalone mode I create a connector and my customized transformation like that:
name=rabbitmq-source
connector.class=com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector
tasks.max=1
rabbitmq.host=rabbitmq-server
rabbitmq.queue=answers
kafka.topic=net.gutefrage.answers
transforms=extractFields
transforms.extractFields.type=net.gutefrage.connector.transforms.ExtractFields$Value
transforms.extractFields.fields=body,envelope.routingKey
transforms.extractFields.structName=net.gutefrage.events
But for a distributed connector what is the syntax for the PUT request to the Connect REST API? I cannot find any example in the docs.
Already tried a couple of things like:
cat <<EOF >/tmp/connector
{
"name": "rabbitmq-source",
"config": {
"connector.class": "com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector",
"tasks.max": "1",
"rabbitmq.host": "rabbitmq-server",
"rabbitmq.queue": "answers",
"kafka.topic": "net.gutefrage.answers",
"transforms": "extractFields",
"transforms.extractFields": {
"type": "net.gutefrage.connector.transforms.ExtractFields$Value",
"fields": "body,envelope.routingKey",
"structName": "net.gutefrage.events"
}
}
}
EOF
curl -vs --stderr - -X POST -H "Content-Type: application/json" --data #/tmp/connector "http://localhost:8083/connectors"
rm /tmp/connector
or also this did not work:
{
"name": "rabbitmq-source",
"config": {
"connector.class": "com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector",
"tasks.max": "1",
"rabbitmq.host": "rabbitmq-server",
"rabbitmq.queue": "answers",
"kafka.topic": "net.gutefrage.answers",
"transforms": "extractFields",
"transforms.extractFields.type": "net.gutefrage.connector.transforms.ExtractFields$Value",
"transforms.extractFields.fields": "body,envelope.routingKey",
"transforms.extractFields.structName": "net.gutefrage.events"
}
}
For the last variant I get the following error:
{"error_code":400,"message":"Connector configuration is invalid and contains the following 1 error(s):\nInvalid value class net.gutefrage.connector.transforms.ExtractFields for configuration transforms.extractFields.type: Error getting config definition from Transformation: null\nYou can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"}
Please note that with the properties format it works nicely (using Landoops Create New Connector UI in fast-data-dev. Interesting that the Landoop's Ui feature 'translate to curl' produces the very same json as my second example)
Update
To be sure that it's not a problem with Landoop, docker and with my custom transformation, I've started zookeeper, broker, schema registry and Kafka Connect in distributed mode with the standard distributed properties from COP 3.3.0
bin/connect-distributed etc/schema-registry/connect-avro-distributed.properties
which logs
[2017-09-13 14:07:52,930] INFO Loading plugin from: /opt/connectors/confluent-oss-gf-assembly-1.0.jar (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:176)
[2017-09-13 14:07:53,711] INFO Registered loader: PluginClassLoader{pluginLocation=file:/opt/connectors/confluent-oss-gf-assembly-1.0.jar} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:199)
[2017-09-13 14:07:53,711] INFO Added plugin 'com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
[2017-09-13 14:07:53,712] INFO Added plugin 'net.gutefrage.connector.transforms.ExtractFields$Key' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
[2017-09-13 14:07:53,712] INFO Added plugin 'net.gutefrage.connector.transforms.ExtractFields$Value' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
All good so far. Then I created a connector config:
cat <<EOF >/tmp/connector
{
"name": "rabbitmq-source",
"config": {
"connector.class": "com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector",
"tasks.max": "1",
"rabbitmq.host": "rabbitmq-server",
"rabbitmq.queue": "answers",
"kafka.topic": "net.gutefrage.answers",
"transforms": "extractFields",
"transforms.extractFields.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.extractFields.field": "body"
}
}
EOF
Please note that there I do now use the standard (bundled) extract field transform.
When I post that with curl -vs --stderr - -X POST -H "Content-Type: application/json" --data #/tmp/connector "http://localhost:8083/connectors"
I get the same
{"error_code":400,"message":"Connector configuration is invalid and contains the following 1 error(s):\nInvalid value class org.apache.kafka.connect.transforms.ExtractField for configuration transforms.extractFields.type: Error getting config definition from Transformation: null\nYou can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"}*
Make sure the $Value in transforms.extractFields.type=net.gutefrage.connector.transforms.ExtractFields$Value is not interpreted as a variable by the bash command cat. It worked for me.
If you want to run the Kafka Connect worker in standalone mode, then you must start the worker and supply the worker configuration file and one or more connector configuration files. All of those configuration files are in Java properties format, so the first configuration sample you provided is the correct format:
name=rabbitmq-source
connect.class=com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector
tasks.max=1
rabbitmq.host=rabbitmq-server
rabbitmq.queue=answers
kafka.topic=net.gutefrage.answers
transforms=extractFields
transforms.extractFields.type=net.gutefrage.connector.transforms.ExtractFields$Value
transforms.extractFields.fields=body,envelope.routingKey
transforms.extractFields.structName=net.gutefrage.events
If you want to run the Kafka Connect worker in distributed mode, then you will have to first start the distributed worker and then create the connector as a second step using the REST API and a PUT request with a JSON document to the /connectors endpoint. That JSON document would match the format of you're second JSON document:
{
"name": "rabbitmq-source",
"config": {
"connector.class": "com.github.jcustenborder.kafka.connect.rabbitmq.RabbitMQSourceConnector",
"tasks.max": "1",
"rabbitmq.host": "rabbitmq-server",
"rabbitmq.queue": "answers",
"kafka.topic": "net.gutefrage.answers",
"transforms": "extractFields",
"transforms.extractFields.type": "net.gutefrage.connector.transforms.ExtractFields$Value",
"transforms.extractFields.fields": "body,envelope.routingKey",
"transforms.extractFields.structName": "net.gutefrage.events"
}
}
The Confluent CLI, included in Confluent's Open Source Platform that includes Kafka, is a developer tool to help you quickly get started by running a Zookeeper instance, a Kafka broker, the Confluent Schema Registry, the REST proxy, and a Connect worker in distributed mode. When you load a connector, you specify the connector configuration as either a JSON file or property file, converting the latter to a JSON format using jq.
However, the error you reported is:
{
"error_code":400,
"message":"Connector configuration is invalid and contains the following 1 error(s):\nInvalid value class net.gutefrage.connector.transforms.ExtractFields for configuration transforms.extractFields.type: Error getting config definition from Transformation: null\nYou can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"
}
The important part of this error message is "Error getting config definition from Transformation: null". Although this is a bit too cryptic, it means that the config() method of the net.gutefrage.connector.transforms.ExtractFields Java class is returning null.
Make sure that the net.gutefrage.connector.transforms.ExtractFields$Valuestring you specified is the correct fully qualified name for the nested static class Value, and that the Value class fully and correctly implements the org.apache.kafka.connect.transforms.Transformation<? extends ConnectRecord<R>> interface. Note that the config() method must return a non-null ConfigDef object.
Take a look at this example of a Single Message Transform (SMT) that ships with Apache Kafka, or Robin's blog post for other examples.
For using json format of connector config and the CP connect CLI, the jq tool has to be installed on the machine where the Kafka-Connect Cluster is running.
E.g. for Landoops fast-data-dev environment you'll have to
docker exec rabbitmqconnect_fast-data-dev_1 apk add --no-cache jq
Then this will work:
docker exec rabbitmqconnect_fast-data-dev_1 /opt/confluent-3.3.0/bin/confluent config rabbitmq-source -d /tmp/connector-config.json
This doesn't solve the issue when using the connector REST endpoint though.
With fast-data-dev you can build a JAR file for any connector, and then just add it in the classpath with the instructions at
https://github.com/Landoop/fast-data-dev#enable-additional-connectors
The UI will auto-detect the new connector - and provide you instructions when you hit NEW for the new connector at:
http://localhost:3030/kafka-connect-ui
What would also be worth trying - as fast-data-dev comes already with an generic MQTT sink connector, is trying it out. See instructions at http://docs.datamountaineer.com/en/latest/mqtt-sink.html
You would effectively need to do
connect.mqtt.kcql=INSERT INTO /answers SELECT body FROM net.gutefrage.answers
As this is a generic MQTT connector - possibly you will need to add the rabbitmq client library using the enable-additional-connectors instructions