Kafka Connect - JDBC Source Connector - Setting Avro Schema - apache-kafka

How can I make Kafka Connect JDBC connector to predefined Avro schema ? It creates a new version when the connecter is created. I am reading from DB2 and putting into Kafka topic.
I am setting schema name and version during creation but it does not work!!! Here is my connector settings :
{
"name": "kafka-connect-jdbc-db2-tst-2",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "1",
"connection.url": "jdbc:db2://mydb2:50000/testdb",
"connection.user": "DB2INST1",
"connection.password": "12345678",
"query":"SELECT CORRELATION_ID FROM TEST.MYVIEW4 ",
"mode": "incrementing",
"incrementing.column.name": "CORRELATION_ID",
"validate.non.null": "false",
"topic.prefix": "tst-4" ,
"auto.register.schemas": "false",
"use.latest.version": "true",
"transforms": "RenameField,SetSchemaMetadata",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "CORRELATION_ID:id",
"transforms.SetSchemaMetadata.type": "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
"transforms.SetSchemaMetadata.schema.name": "foo.bar.MyMessage",
"transforms.SetSchemaMetadata.schema.version": "1"
}
}
And here are the schemas : V.1 is mine, and V.2 is created by JDBC source connector:
$ curl localhost:8081/subjects/tst-4-value/versions/1 | jq .
{
"subject": "tst-4-value",
"version": 1,
"id": 387,
"schema": "{"type":"record","name":"MyMessage",
"namespace":"foo.bar","fields":[{"name":"id","type":"int"}]}"
}
$ curl localhost:8081/subjects/tst-4-value/versions/2 | jq .
{
"subject": "tst-4-value",
"version": 2,
"id": 386,
"schema": "{"type":"record","name":"MyMessage","namespace":"foo.bar",
"fields":[{"name":"id","type":"int"}],
"connect.version":1,
"connect.name":"foo.bar.MyMessage"
}"
}
Any idea how can I force Kafka connector to use my schema ?
Thanks in advance,

Related

Task becomes UNASSIGNED for Debezium MySQL source connector

I am using debezium 1.9. I created a connector using below config
{
"name": "user_management_db-connector-5",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "XXXX",
"database.port": "3306",
"database.user": "XXX",
"database.password": "XXX",
"database.server.id": "12345",
"database.server.name": "ula-stg-db",
"database.include.list": "user_management_db",
"database.history.kafka.bootstrap.servers": "kafka.ulastg.xyz:9094,kafka.ulastg.xyz:9092",
"database.history.kafka.topic": "dbhistory.user_management_db",
"snapshot.mode" : "schema_only",
"snapshot.locking.mode" : "none",
"table.include.list": "user_management_db.user,user_management_db.store,user_management_db.store_type,user_management_db.user_segment,user_management_db.user_segment_mapping",
"transforms":"Reroute",
"transforms.Reroute.type":"io.debezium.transforms.ByLogicalTableRouter",
"transforms.Reroute.topic.regex":"(.*)user_management_db(.+)",
"transforms.Reroute.topic.replacement":"$1cdc",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"topic.creation.default.include": "ula-stg-db.+",
"topic.creation.default.partitions": 20,
"topic.creation.default.replication.factor": 2,
"topic.creation.default.cleanup.policy": "delete",
"topic.creation.default.delete.retention.ms": 300000,
"errors.log.enable": true,
"errors.log.include.messages" :true
}
}
The connector gets created and I can see events in the topic ula-stge-db.cdc
The problem is after some time ( approximately a day ) I see events stop getting populated. I do not see any error in connector logs.
It only throws a generic info in regular interval
2022-07-12 09:24:25,654 INFO || WorkerSourceTask{id=promo_management_db-connector-5-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. [org.apache.kafka.connect.runtime.WorkerSourceTask]
The connector status is now shown as below
{
"name": "user_management_db-connector-5",
"connector": {
"state": "RUNNING",
"worker_id": "172.31.65.156:8083"
},
"tasks": [
{
"id": 0,
"state": "UNASSIGNED",
"worker_id": "172.31.71.28:8083"
}
],
"type": "source"
}
How to debug further ?
P.S: I am connecting to AWS RDS MySql. And Kafka is hosted in an EC2.

Ignore updating missing fields with Confluent JDBC Connector

Say I have a Postgres database table with the fields "id", "flavor" and "item". I also have a Kafka topic with two messages (let's ignore the kafka-key and assume the ID is in the value for now. Schema-definition is also omitted):
{"id": 1, "flavor": "chocolate"}
{"id": 1, "item": "cookie"}
Now I'd like to use the Confluent JDBC (Sink) Connector for persisting the kafka messages in UPSERT-mode, hoping to get to the following end result in the database:
id | flavor | item
----------------------
1 | chocolate | cookie
Ẁhat I did get, however, is was this:
id | flavor | item
----------------------
1 | null | cookie
I assume that's because the second message uses an UPDATE-statement where it infers the "null" values for fields that weren't provided, and writes those null-values over my actual data.
Is there a way to get to my desired result by changing the configuration of either the Confluent JDBC Connector or PostgreSQL 12? Failing that, is there another reasonably supported postgreSQL compatible connector out there that can do this?
Here's my connector configuration (connection details obviously redacted):
{
"name": "sink-jdbc-upsertstest",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "TEST-upserts",
"connection.url": "jdbc:postgresql://host:port/database",
"connection.user": "user",
"connection.password": "password",
"dialect.name": "ExtendedPostgreSqlDatabaseDialect",
"table.name.format": "upsert-test",
"batch.size": "100",
"insert.mode": "upsert",
"auto.evolve": "true",
"auto.create": "true",
"pk.mode": "record_value",
"pk.fields": "id",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"errors.deadletterqueue.topic.name": "dlq_upserts",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.replication.factor": "1",
"errors.deadletterqueue.context.headers.enable": "true"
}
}

kafka connect sql server incremental changes from cdc

I am very new to Kafka (started reading and setting up in my Sandbox environment from just a week) and trying to setup SQL Server JDBC connector.
I have setup Confluent community as per confluent guide and installed io.debezium.connector.sqlserver.SqlServerConnector using confluent-hub
I enabled CDC on SQL Server Database and required table and it is working fine.
I have tried following connectors (one at-a-time):
io.debezium.connector.sqlserver.SqlServerConnector
io.confluent.connect.jdbc.JdbcSourceConnector
both are loading fine with status of connector and task running fine with no errors as seen below:
Here is my io.confluent.connect.jdbc.JdbcSourceConnector confuguration
{
"name": "mssql-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "timestamp",
"timestamp.column.name": "CreatedDateTime",
"query": "select * from dbo.sampletable",
"tasks.max": "1",
"table.types": "TABLE",
"key.converter.schemas.enable": "false",
"topic.prefix": "data_",
"value.converter.schemas.enable": "false",
"connection.url": "jdbc:sqlserver://SQL2016:1433;databaseName=sampledb",
"connection.user": "kafka",
"connection.password": "kafkaPassword#789",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"poll.interval.ms": "5000",
"table.poll.interval.ms": "120000"
}
}
Here is my io.confluent.connect.jdbc.JdbcSourceConnector connector
{
"name": "mssql-connector",
"config": {
"connector.class" : "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max" : "1",
"database.server.name" : "SQL2016",
"database.hostname" : "SQL2016",
"database.port" : "1433",
"database.user" : "kafka",
"database.password" : "kafkaPassword#789",
"database.dbname" : "sampleDb",
"database.history.kafka.bootstrap.servers" : "kafkanode1:9092",
"database.history.kafka.topic": "schema-changes.sampleDb"
}
}
Both connectors are creating snapshot of a table in a topic (means it pulls all the rows initially)
but when I make changes to a table "sampletable" (insert/update/delete), those changes are not being pulled to kafka.
can someone please help me understand how to make CDC working with Kafka?
Thanks
This seem to have worked 100%. I am posting answer just in case someone like me stuck on jdbc source connector.
{
"name": "piilog-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "incrementing",
"value.converter.schemas.enable": "false",
"connection.url": "jdbc:sqlserver://SQL2016:1433;databaseName=SIAudit",
"connection.user": "kafka",
"connection.password": "kafkaPassword#789",
"query": "select * from dbo.sampletable",
"incrementing.column.name": "Id",
"validate.non.null": false,
"topic.prefix": "data_",
"tasks.max": "1",
"poll.interval.ms": "5000",
"table.poll.interval.ms": "5000"
}
}

Kafka Connect JDBC failed on JsonConverter

I am working on a design MySQL -> Debezium -> Kafka -> Flink -> Kafka -> Kafka Connect JDBC -> MySQL. Following is sample message i write from Flink (I also tried using Kafka console producer as well)
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int64",
"optional": false,
"field": "id"
},
{
"type": "string",
"optional": true,
"field": "name"
}
],
"optional": true,
"name": "user"
},
"payload": {
"id": 1,
"name": "Smith"
}
}
but connect failed on JsonConverter
DataException: JsonConverter with schemas.enable requires "schema" and "payload" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:338)
I have debugged and in method public SchemaAndValue toConnectData(String topic, byte[] value) value is null. My sink configurations are:
{
"name": "user-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "user",
"connection.url": "jdbc:mysql://localhost:3306/my_db?verifyServerCertificate=false",
"connection.user": "root",
"connection.password": "root",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
Can someone please help me on this issue?
I think an issue is not related with the value serialization (of Kafka message). It is rather problem with the key of the message.
What is your key.converter? I think it is the same like value.converter (org.apache.kafka.connect.json.JsonConverter). Your key might be simple String, that doesn't contain schema, payload
Try to change key.converter to org.apache.kafka.connect.storage.StringConverter
For Kafka Connect you set default Converters, but you can also set specific one for your particular Connector configuration (that will overwrite default one). For that you have to modify your config request:
{
"name": "user-sink",
"config": {
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "user",
"connection.url": "jdbc:mysql://localhost:3306/my_db?verifyServerCertificate=false",
"connection.user": "root",
"connection.password": "root",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}

Build a pipeline with kafka connect using jdbc connectors for sqlite

I am new to kafka connect, trying to build a pipeline to get data from sqlite to kafka topic.
Assuming your sqlite DB is at /tmp/test.db then use this config:
{
"name": "jdbc-source",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "1",
"connection.url": "jdbc:sqlite:/tmp/test.db",
"mode": "incrementing",
"incrementing.column.name": "id",
"topic.prefix": "test-sqlite-jdbc-",
"name": "jdbc-source"
}
}
For more details, see :
https://docs.confluent.io/current/connect/connect-jdbc/docs/source_connector.html#quick-start
https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/