is there an way to transform key field values to lower case in debezium sql server source connector? [closed] - apache-kafka

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to transform SQL-Server column names to lower case while storing it in a Kafka topic. I am using debezium as my source connector

It can be done using Kafka Connect Common Transformations by Jeremy Custenborder
SQL Server table:
Id Name Description Weight Pro_Id
101 aaa Sample_Test 3.14 2020-02-21 13:32:06.5900000
102 eee testdata1 3.14 2020-02-21 13:32:06.5900000
Step 1: Download the kafka connect common transformations jar file by Jeremy Custenborder in confluent hub from this link
Step 2: place the jar file in /usr/share/java or /kafka/libs based on your kafka environment
Step 3: Create the debezium SQL-Server source connector
{
"name": "sqlserver_src_connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"database.server.name": "sqlserver",
"database.hostname": "*.*.*.*",
"database.port": "1433",
"database.user": "username",
"database.password": "password",
"database.dbname": "db_name",
"table.whitelist": "dbo.tablename",
"transforms": "unwrap,changeCase",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.changeCase.type" : "com.github.jcustenborder.kafka.connect.transform.common.ChangeCase$Value",
"transforms.changeCase.from" : "UPPER_UNDERSCORE",
"transforms.changeCase.to" : "LOWER_UNDERSCORE",
"database.history.kafka.bootstrap.servers": "*.*.*.*",
"database.history.kafka.topic": "schema-changes-tablename"
}
}
Step 4: kafka topic data
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "string",
"optional": true,
"field": "description"
},
{
"type": "double",
"optional": true,
"field": "weight"
},
{
"type": "int64",
"optional": false,
"name": "io.debezium.time.NanoTimestamp",
"version": 1,
"field": "pro_id"
}
],
"optional": true,
"name": "sqlserver.dbo.tablename"
},
"payload": {
"id": 101,
"name": "aaa",
"description": "Sample_Test",
"weight": 3.14,
"pro_id": 1582291926590000000
}
}
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "string",
"optional": true,
"field": "description"
},
{
"type": "double",
"optional": true,
"field": "weight"
},
{
"type": "int64",
"optional": false,
"name": "io.debezium.time.NanoTimestamp",
"version": 1,
"field": "pro_id"
}
],
"optional": true,
"name": "sqlserver.dbo.tablename"
},
"payload": {
"id": 102,
"name": "eee",
"description": "testdata1",
"weight": 3.14,
"pro_id": 1582291926590000000
}
}
thanks for the help Jiri Pechanec and Chris Cranford #Naros from debezium community

Related

Debezium MS SQL Server produces wrong JSON format not recognized by Flink

I have the following setting (verified using curl connector/connector-name):
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"database.user": "admin",
"database.dbname": "test",
"database.hostname": "mssql-host",
"database.password": "xxxxxxx",
"database.history.kafka.bootstrap.servers": "server:9092", "database.history.kafka.topic": "dbhistory.test", "value.converter.schemas.enable": "false",
"name": "mssql-cdc",
"database.server.name": "test",
"database.port": "1433",
"include.schema.changes": "false"
}
I was able to pull CDC events into Kafka topic. It is in following format:
{
"schema": {
"type": "struct",
"fields": [
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
{
"type": "string",
"optional": true,
"field": "value"
}
],
"optional": true,
"name": "test.dbo.tryme2.Value",
"field": "before"
},
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
{
"type": "string",
"optional": true,
"field": "value"
}
],
"optional": true,
"name": "test.dbo.tryme2.Value",
"field": "after"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "version"
},
{
"type": "string",
"optional": false,
"field": "connector"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "int64",
"optional": false,
"field": "ts_ms"
},
{
"type": "string",
"optional": true,
"name": "io.debezium.data.Enum",
"version": 1,
"parameters": {
"allowed": "true,last,false,incremental"
},
"default": "false",
"field": "snapshot"
},
{
"type": "string",
"optional": false,
"field": "db"
},
{
"type": "string",
"optional": true,
"field": "sequence"
},
{
"type": "string",
"optional": false,
"field": "schema"
},
{
"type": "string",
"optional": false,
"field": "table"
},
{
"type": "string",
"optional": true,
"field": "change_lsn"
},
{
"type": "string",
"optional": true,
"field": "commit_lsn"
},
{
"type": "int64",
"optional": true,
"field": "event_serial_no"
}
],
"optional": false,
"name": "io.debezium.connector.sqlserver.Source",
"field": "source"
},
{
"type": "string",
"optional": false,
"field": "op"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "id"
},
{
"type": "int64",
"optional": false,
"field": "total_order"
},
{
"type": "int64",
"optional": false,
"field": "data_collection_order"
}
],
"optional": true,
"field": "transaction"
}
],
"optional": false,
"name": "test.dbo.tryme2.Envelope"
},
"payload": {
"before": null,
"after": {
"id": 777,
"value": "xxxxxx"
},
"source": {
"version": "1.8.1.Final",
"connector": "sqlserver",
"name": "test",
"ts_ms": 1647169350996,
"snapshot": "true",
"db": "test",
"sequence": null,
"schema": "dbo",
"table": "tryme2",
"change_lsn": null,
"commit_lsn": "00000043:00000774:000a",
"event_serial_no": null
},
"op": "r",
"ts_ms": 1647169350997,
"transaction": null
}
}
In Flink, when I created a source table using the topic, I get:
Caused by: java.io.IOException: Corrupt Debezium JSON message
I already have "value.converter.schemas.enable": "false", why doesn't this work?
Just found out that the configuration was hierarchical, meaning your have to supply both value.converter and value.converter.schemas.enable to override Kafka Connect worker configuration at connector level.
I sincerely hope there were some sort of validation so I did not have to wonder for hours.
Also if schema is desired, there is a Flink configuration hidden in the doc:
In order to interpret such messages, you need to add the option 'debezium-json.schema-include' = 'true' into above DDL WITH clause (false by default). Usually, this is not recommended to include schema because this makes the messages very verbose and reduces parsing performance.
I have to say this is really bad developer experience.

org.apache.kafka.connect.transforms.ReplaceField does not work

The documentation I used: https://docs.confluent.io/platform/current/connect/transforms/replacefield.html
I use this connector to rename PersonId column to Id by using the org.apache.kafka.connect.transforms.ReplaceField and setting renames to PersonId:Id
{
"name": "SQL_Connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"database.hostname": "hostname",
"database.port": "1433",
"database.user": "user",
"database.password": "password",
"database.dbname": "sqlserver",
"database.server.name": "server",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.test",
"transforms": "RenameField,addStaticField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "PersonId:Id",
"transforms.addStaticField.type":"org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.addStaticField.static.field":"table",
"transforms.addStaticField.static.value":"changedtablename",
}
}
But when I get the value in the topic the field PersonId is not changed:
{
"schema": {
"type": "struct",
"fields": [
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "PersonId"
}
],
"optional": true,
"name": "test.Value",
"field": "before"
},
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "PersonId"
}
],
"optional": true,
"name": "test.Value",
"field": "after"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "version"
},
{
"type": "string",
"optional": false,
"field": "connector"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "int64",
"optional": false,
"field": "ts_ms"
},
{
"type": "string",
"optional": true,
"name": "io.debezium.data.Enum",
"version": 1,
"parameters": {
"allowed": "true,last,false"
},
"default": "false",
"field": "snapshot"
},
{
"type": "string",
"optional": false,
"field": "db"
},
{
"type": "string",
"optional": false,
"field": "schema"
},
{
"type": "string",
"optional": false,
"field": "table"
},
{
"type": "string",
"optional": true,
"field": "change_lsn"
},
{
"type": "string",
"optional": true,
"field": "commit_lsn"
},
{
"type": "int64",
"optional": true,
"field": "event_serial_no"
}
],
"optional": false,
"name": "io.debezium.connector.sqlserver.Source",
"field": "source"
},
{
"type": "string",
"optional": false,
"field": "op"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
},
{
"type": "string",
"optional": true,
"field": "table"
}
],
"optional": false,
"name": "test.Envelope"
},
"payload": {
"before": null,
"after": {
"PersonId": 1,
},
"source": {
"version": "1.0.3.Final",
"connector": "sqlserver",
"name": "test",
"ts_ms": 1627628793596,
"snapshot": "true",
"db": "test",
"schema": "dbo",
"table": "TestTable",
"change_lsn": null,
"commit_lsn": "00023472:00000100:0001",
"event_serial_no": null
},
"op": "r",
"ts_ms": 1627628793596,
"table": "changedtablename"
}
}
How do I change the Field?
You can only replace fields that are at the top-level of the Kafka Record, as the example in the doc shows.
That being said, you will need to extract the after field first

Need primary key information in Debezium connector for postgres insert events

I am using Debezium connector for postgres with Kafka connect.
For an insert row event written to Kafka by the connector, I need information about which columns are primary keys and which are not. Is there a way to achieve this ?
Pasting a sample insert event generated in Kafka:
"schema": {
"type": "struct",
"fields": [
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
{
"type": "int32",
"optional": false,
"field": "bucket_type"
}
],
"optional": true,
"name": "postgresconfigdb.config.alert_configs.Value",
"field": "before"
},
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
{
"type": "int32",
"optional": false,
"field": "bucket_type"
}
],
"optional": true,
"name": "postgresconfigdb.config.alert_configs.Value",
"field": "after"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "version"
},
{
"type": "string",
"optional": false,
"field": "connector"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "int64",
"optional": false,
"field": "ts_ms"
},
{
"type": "string",
"optional": true,
"name": "io.debezium.data.Enum",
"version": 1,
"parameters": {
"allowed": "true,last,false"
},
"default": "false",
"field": "snapshot"
},
{
"type": "string",
"optional": false,
"field": "db"
},
{
"type": "string",
"optional": false,
"field": "schema"
},
{
"type": "string",
"optional": false,
"field": "table"
},
{
"type": "int64",
"optional": true,
"field": "txId"
},
{
"type": "int64",
"optional": true,
"field": "lsn"
},
{
"type": "int64",
"optional": true,
"field": "xmin"
}
],
"optional": false,
"name": "io.debezium.connector.postgresql.Source",
"field": "source"
},
{
"type": "string",
"optional": false,
"field": "op"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "id"
},
{
"type": "int64",
"optional": false,
"field": "total_order"
},
{
"type": "int64",
"optional": false,
"field": "data_collection_order"
}
],
"optional": true,
"field": "transaction"
}
],
"optional": false,
"name": "postgresconfigdb.config.alert_configs.Envelope"
},
"payload": {
"before": null,
"after": {
"id": 1100,
"bucket_type": 10
},
"source": {
"version": "1.2.0.Final",
"connector": "postgresql",
"name": "postgresconfigdb",
"ts_ms": 1599830887858,
"snapshot": "true",
"db": "configdb",
"schema": "config",
"table": "alert_configs",
"txId": 2139888,
"lsn": 379356048,
"xmin": null
},
"op": "r",
"ts_ms": 1599830887859,
"transaction": null
}
}
Here the columns in the table are 'id' and 'bucket_type', the values of which are reported in the json-path payload->after.
There is information about columns which are not null in the column specific 'optional' boolean field, however no information about which columns are primary keys. (id in this case)
you find information about what fields are PK columns in Kafka key.

Kafka Connect - JDBC Avro connect how define custom schema registry

I was following tutorial on kafka connect, and I am wondering if there is a possibility to define a custom schema registry for a topic which data came from a MySql table.
I can't find where define it in my json/connect config and I don't want to create a new version of that schema after creating it.
My MySql table called stations has this schema
Field | Type
---------------+-------------
code | varchar(4)
date_measuring | timestamp
attributes | varchar(256)
where the attributes contains a Json data and not a String (I have to use that type because the Json field of the attributes are variable.
My connector is
{
"value.converter.schema.registry.url": "http://localhost:8081",
"_comment": "The Kafka topic will be made up of this prefix, plus the table name ",
"key.converter.schema.registry.url": "http://localhost:8081",
"name": "jdbc_source_mysql_stations",
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"transforms": [
"ValueToKey"
],
"transforms.ValueToKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.ValueToKey.fields": [
"code",
"date_measuring"
],
"connection.url": "jdbc:mysql://localhost:3306/db_name?useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=UTC",
"connection.user": "confluent",
"connection.password": "**************",
"table.whitelist": [
"stations"
],
"mode": "timestamp",
"timestamp.column.name": [
"date_measuring"
],
"validate.non.null": "false",
"topic.prefix": "mysql-"
}
and creates this schema
{
"subject": "mysql-stations-value",
"version": 1,
"id": 23,
"schema": "{\"type\":\"record\",\"name\":\"stations\",\"fields\":[{\"name\":\"code\",\"type\":\"string\"},{\"name\":\"date_measuring\",\"type\":{\"type\":\"long\",\"connect.version\":1,\"connect.name\":\"org.apache.kafka.connect.data.Timestamp\",\"logicalType\":\"timestamp-millis\"}},{\"name\":\"attributes\",\"type\":\"string\"}],\"connect.name\":\"stations\"}"
}
Where "attributes" field is of course a String.
Unlike I would apply it this other schema.
{
"fields": [
{
"name": "code",
"type": "string"
},
{
"name": "date_measuring",
"type": {
"connect.name": "org.apache.kafka.connect.data.Timestamp",
"connect.version": 1,
"logicalType": "timestamp-millis",
"type": "long"
}
},
{
"name": "attributes",
"type": {
"type": "record",
"name": "AttributesRecord",
"fields": [
{
"name": "H1",
"type": "long",
"default": 0
},
{
"name": "H2",
"type": "long",
"default": 0
},
{
"name": "H3",
"type": "long",
"default": 0
},
{
"name": "H",
"type": "long",
"default": 0
},
{
"name": "Q",
"type": "long",
"default": 0
},
{
"name": "P1",
"type": "long",
"default": 0
},
{
"name": "P2",
"type": "long",
"default": 0
},
{
"name": "P3",
"type": "long",
"default": 0
},
{
"name": "P",
"type": "long",
"default": 0
},
{
"name": "T",
"type": "long",
"default": 0
},
{
"name": "Hr",
"type": "long",
"default": 0
},
{
"name": "pH",
"type": "long",
"default": 0
},
{
"name": "RX",
"type": "long",
"default": 0
},
{
"name": "Ta",
"type": "long",
"default": 0
},
{
"name": "C",
"type": "long",
"default": 0
},
{
"name": "OD",
"type": "long",
"default": 0
},
{
"name": "TU",
"type": "long",
"default": 0
},
{
"name": "MO",
"type": "long",
"default": 0
},
{
"name": "AM",
"type": "long",
"default": 0
},
{
"name": "N03",
"type": "long",
"default": 0
},
{
"name": "P04",
"type": "long",
"default": 0
},
{
"name": "SS",
"type": "long",
"default": 0
},
{
"name": "PT",
"type": "long",
"default": 0
}
]
}
}
],
"name": "stations",
"namespace": "com.mycorp.mynamespace",
"type": "record"
}
Any suggestion please?
In case it's not possible, I suppose I'll have to create a KafkaStream to create another topic, even if I would avoid it.
Thanks in advance!
I don't think you're asking anything about using a "custom" registry (which you'd do with the two lines that say which registry you're using), but rather how you can parse the data / apply a schema after the record is pulled from the database
You can write your own Transform, or you can use Kstreams, which are really the main options here. There is a SetSchemaMetadata transform, but I'm not sure that'll do what you want (parse a string into an Avro record)
Or if you must shove JSON data into a single database attribute, maybe you shouldn't use Mysql and rather a document database which has more flexible data constraints.
Otherwise, you can use BLOB rather than varchar and put binary Avro data into that column, but then you'd still need a custom deserializer to read the data

Cannot sink Kafka stream to JDBC: unrecoverable exception

I'm trying to sink a stream of data from a MQTT Source with schema enabled to a Microsoft SQL Server database.
I've followed many posts on the matter, regardless I receive the following error:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:484)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:265)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:182)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:150)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The MQTT Source configuration is:
connector.class=com.datamountaineer.streamreactor.connect.mqtt.source.MqttSourceConnector
key.converter.schemas.enable=true
value.converter.schemas.enable=true
name=schema-mqtt-source
connect.mqtt.kcql=INSERT INTO schema_IoT SELECT * FROM machine/sensor/mytopic/test WITHCONVERTER=`com.datamountaineer.streamreactor.connect.converters.source.JsonSimpleConverter`
value.converter=org.apache.kafka.connect.json.JsonConverter
connect.mqtt.service.quality=0
key.converter=org.apache.kafka.connect.json.JsonConverter
connect.mqtt.hosts=tcp://host:1884
A data sample that is ingested into Kafka is the following:
{
"timestamp": 1526912884265,
"partition": 0,
"key": {
"schema": {
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "topic"
},
{
"type": "string",
"optional": false,
"field": "id"
}
],
"optional": false,
"name": "com.datamountaineer.streamreactor.connect.converters.MsgKey"
},
"payload": {
"topic": "machine/sensor/mytopic/test",
"id": "0"
}
},
"offset": 0,
"topic": "schema_IoT",
"value": {
"schema": {
"type": "struct",
"fields": [
{
"type": "int64",
"optional": false,
"field": "sentOn"
},
{
"type": "struct",
"fields": [
{
"type": "int64",
"optional": false,
"field": "fan"
},
{
"type": "int64",
"optional": false,
"field": "buzzer"
},
{
"type": "int64",
"optional": false,
"field": "light"
},
{
"type": "double",
"optional": false,
"field": "temperature"
},
{
"type": "string",
"optional": false,
"field": "assetName"
},
{
"type": "int64",
"optional": false,
"field": "led"
},
{
"type": "boolean",
"optional": false,
"field": "water"
}
],
"optional": false,
"name": "metrics",
"field": "metrics"
}
],
"optional": false,
"name": "machine_sensor_mytopic_test"
},
"payload": {
"sentOn": 1526913070679,
"metrics": {
"fan": 1,
"buzzer": 0,
"light": 255,
"temperature": 22.296352538102642,
"assetName": "SIMopcua",
"led": 0,
"water": false
}
}
}
}
Finally the properties of the jdbc-sink connector config. file are:
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
connection.password=password
topics=schema_IoT
batch.size=10
key.converter.schemas.enable=true
auto.evolve=true
connection.user=username
name=sink-mssql
value.converter.schemas.enable=true
auto.create=true
connection.url=jdbc:sqlserver://hostname:port;databaseName=mydb;user=username;password=mypsd;
value.converter=org.apache.kafka.connect.json.JsonConverter
insert.mode=insert
key.converter=org.apache.kafka.connect.json.JsonConverter
What am I doing wrong? Any help is appreciated.
Fabio