how to override key.serializer in kafka connect jdbc - apache-kafka

I am doing mysql to kafka connection using kafka jdbc source connector. Everything working fine. Now i need to pass key.serializer and value.serializer to encrypt data as show at macronova. but i didn't found any changes in output.
POST API to start source connector
curl -X POST -H "Content-Type: application/json" --data '{
"name": "jdbc-source-connector-2",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"tasks.max": 10,
"connection.url": "jdbc:mysql://localhost:3306/connect_test?user=roo&password=roo",
"mode": "incrementing",
"table.whitelist" : "test",
"incrementing.column.name": "id",
"timestamp.column.name": "modified",
"topic.prefix": "table-",
"poll.interval.ms": 1000
}
}' http://localhost:8083/connectors

Connectors take Converters only, not serializers via key and value properties
If you want to encrypt a whole string, you'd need to implement your own converter or edit your code that writes into the database to write into Kafka instead, then consume and write to the database as well as other downstream systems

Related

Sink data from DLQ topic to another table as CLOB

I'm using a connector to sink records from a topic to a DB. There are also some values redirected to DLQ (Dead Letter Queue). The records in DLQ may contain wrong types, sizes, non-avro values etc. What I want to do is, sinking all the records to an Oracle DB table. This table will only have 2 columns; a CLOB for the entire message and record date.
To sink from Kafka, we need a schema. Since this topic will contain many types of records, I can't create a proper schema (or can I?). I just want to sink the messages as a whole, how can I achieve this?
I've tried it with this schema and connector:
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{\"type\":\"record\",\"name\":\"DLQ_TEST\",\"namespace\":\"DLQ_TEST\",\"fields\":[
{\"name\":\"VALUE\",\"type\":[\"null\",\"string\"]},
{\"name\":\"RECORDDATE\",\"type\":[\"null\",\"long\"]}]}"}' http://server:8071/subjects/DLQ_INSERT-value/versions
curl -i -X PUT -H "Content-Type:application/json" http://server:8083/connectors/sink_DLQ_INSERT/config -d
'{
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "jdbc:oracle:thin:#oracleserver:1357/VITORCT",
"table.name.format": "GLOBAL.DLQ_TEST_DLQ",
"connection.password": "${file:/kafka/vty/pass.properties:vitweb_pwd}",
"connection.user": "${file:/kafka/vty/pass.properties:vitweb_user}",
"tasks.max": "1",
"topics": "DLQ_TEST_dlq",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "true",
"auto.create": "false",
"insert.mode": "insert",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter"
}'
I don't exactly understand how to make connector use this schema.

SMT's to create kafka connector string partition key through connector config

I've been implementing a kafka connector for PostgreSQL (I'm using the debezium kafka connector and running all the pieces through docker). I need a custom partition key, and so I've been using the SMT to achieve this. However, the approach that I'm using creates a Struct, and I need it to be a string. This article runs through how to set up the partition key as an int, but I can't access the config file to set up the appropriate transforms. Currently my kafka connector looks like this
curl -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{
"name": "connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "postgres",
"database.password": "password",
"database.dbname" : "postgres",
"database.server.name": "postgres",
"table.include.list": "public.table",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.table",
"transforms": "routeRecords,unwrap,createkey",
"transforms.routeRecords.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.routeRecords.regex": "(.*)",
"transforms.routeRecords.replacement": "table",
"transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState",
"transforms.createkey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createkey.fields": "id"
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
}
}'
I know that I have to extract the value of the column but I'm just not sure how.
ValueToKey creates a Struct from a list of fields, as it is documented.
You need one more transform to extract a specific field from a Struct, as shown in the linked post.
org.apache.kafka.connect.transforms.ExtractField$Key
Note: This does not "set" the partition of the actual Kafka record, only the key, which is then hashed by the Producer to get the partition

kafka FileStreamSourceConnector write an avro file to topic with key field

I want to use kafka FileStreamSourceConnector to write a local avro file into a topic.
My connector config looks like this:
curl -i -X PUT -H "Content-Type:application/json" http://localhost:8083/connectors/file_source_connector/config \
-d '{
"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"topic": "my_topic",
"file": "/data/log.avsc",
"format.include.keys": "true",
"source.auto.offset.reset": "earliest",
"tasks.max": "1",
"value.converter.schemas.enable": "true",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
}'
Then when I print out the topic, the key fields are null.
Updated on 2021-03-29:
After watching this video 🎄Twelve Days of SMT 🎄 - Day 2: ValueToKey and ExtractField from Robin,
I applied SMT to my connector config:
curl -i -X PUT -H "Content-Type:application/json" http://localhost:8083/connectors/file_source_connector_02/config \
-d '{
"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"topic": "my_topic",
"file": "/data/log.avsc",
"tasks.max": "1",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"transforms": "ValueToKey, ExtractField",
"transforms.ValueToKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.ValueToKey.fields":"id",
"transforms.ExtractField.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.ExtractField.field":"id"
}'
However, the connector is failed:
Caused by: org.apache.kafka.connect.errors.DataException: Only Struct objects supported for [copying fields from value to key], found: java.lang.String
I would use ValueToKey transformer.
In bad case ignorig values and setting random key.
For details look at:ValueToKey
FileStreamSource assumes UTF8 encoded, line delimited files are your input, not binary files such as Avro. Last I checked, format.include.keys is not a valid config for the connector either.
Therefore each consumed event will be a string, and subsequently, transforms that require Structs with field names will not work
You can use the Hoist transform to create a Struct from each "line", but this still will not parse your data to make the ID field accessible to move to the key.
Also, your file is AVSC, which is JSON formatted, not Avro, so I'm not sure what the goal is by using the AvroConverter, or having "schemas.enable": "true". Still, the lines read by the connector are not parsed by converters such that fields are accessible, only serialized when sent to Kafka
My suggestion would be to write some other CLI script using plain producer libraries to parse the file, extract the schema, register that with Schema Registry, build a producer record for each entity in the file, and send them

Retry Attempt without data loss when sink side solr is down during runtime

curl -X POST -H "Content-Type: application/json" --data '{
"name": "t1",
"config": {
"tasks.max": "1",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"connector.class": "com.github.jcustenborder.kafka.connect.solr.HttpSolrSinkConnector",
"topics": "TRAN",
"solr.queue.size": "100",
"solr.commit.within": "10",
"solr.url": "http://192.168.2.221:27052/solr/TRAN",
"errors.retry.delay.max.ms":"5000",
"errors.retry.timeout":"600000",
"errors.tolerance":"all",
"errors.log.enable":"true",
"errors.log.include.messages":"false",
"errors.deadletterqueue.topic.name":"DEAD_TRAN",
"errors.deadletterqueue.topic.replication.factor":"1",
"retry.backoff.ms":"1000",
"reconnect.backoff.ms":"5000",
"reconnect.backoff.max.ms":"600000"
}
}' http://localhost:8083/connectors
Need to retry ( without any data loss) based on count from connector config if solr server is down during runtime.
In my case, Its working perfectly whenboth connector and solr are in running state [Active].
But while only solr server is down, there is no retry process until my data passed to the solr leads to data loss..
Error Information shown below
Connector Config from the Kafka Connect Log
I've just checked the SinkTask implementation of that specific connector and it does throw a RetriableException in the put() method.
In theory, and according to your connector configuration, it should block for 10 minutes ("errors.retry.timeout" : "600000"). If your SolR instance recover within the 10 minutes there shouldn't be any problem in terms of data loss.
If you want to fully block your connector until solR is up on his feet, have you tried to set "errors.retry.timeout" : "-1"?
As per the documentation of errors.retry.timeout:
The maximum duration in milliseconds that a failed operation will be
reattempted. The default is 0, which means no retries will be
attempted. Use -1 for infinite retries.
PS: IMHO this might lead to a deadlock situation if for some reason a single message is permanently failing its sink operation (i.e: if the sink is rejecting the operation).

How to make Instaclustr Kafka Sink Connector work with Avro serialized value to postgres?

I have a Kafka topic of Avro-serialized value.
I am trying to set up a JDBC(postgres) sink connector to dump these messages in the postgres table.
But, I am getting below error.
"org.apache.kafka.common.config.ConfigException: Invalid value io.confluent.connect.avro.AvroConverter for configuration value.converter: Class io.confluent.connect.avro.AvroConverter could not be found."
My Sink.json is
{"name": "postgres-sink",
"config": {
"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max":"1",
"topics": "<topic_name>",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "instaclustr_schema_registry_host:8085",
"connection.url": "jdbc:postgresql://postgres:5432/postgres?currentSchema=local",
"connection.user": "postgres",
"connection.password": "postgres",
"auto.create": "true",
"auto.evolve":"true",
"pk.mode":"none",
"table.name.format": "<table_name>"
}
}
Also, I have made changes in the connect-distributed.properties(bootstrap servers).
The command I am running is -
curl -X POST -H "Content-Type: application/json" --data #postgres-sink.json https://<instaclustr_schema_registry_host>:8083/connectors
io.confluent.connect.avro.AvroConverter is not part of the Apache Kafka distribution. You can either just run Apache Kafka as part of Confluent Platform (which ships with the converter and is easier) or you can download it separately and install it yourself.