kafka connect sql server incremental changes from cdc - apache-kafka

I am very new to Kafka (started reading and setting up in my Sandbox environment from just a week) and trying to setup SQL Server JDBC connector.
I have setup Confluent community as per confluent guide and installed io.debezium.connector.sqlserver.SqlServerConnector using confluent-hub
I enabled CDC on SQL Server Database and required table and it is working fine.
I have tried following connectors (one at-a-time):
io.debezium.connector.sqlserver.SqlServerConnector
io.confluent.connect.jdbc.JdbcSourceConnector
both are loading fine with status of connector and task running fine with no errors as seen below:
Here is my io.confluent.connect.jdbc.JdbcSourceConnector confuguration
{
"name": "mssql-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "timestamp",
"timestamp.column.name": "CreatedDateTime",
"query": "select * from dbo.sampletable",
"tasks.max": "1",
"table.types": "TABLE",
"key.converter.schemas.enable": "false",
"topic.prefix": "data_",
"value.converter.schemas.enable": "false",
"connection.url": "jdbc:sqlserver://SQL2016:1433;databaseName=sampledb",
"connection.user": "kafka",
"connection.password": "kafkaPassword#789",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"poll.interval.ms": "5000",
"table.poll.interval.ms": "120000"
}
}
Here is my io.confluent.connect.jdbc.JdbcSourceConnector connector
{
"name": "mssql-connector",
"config": {
"connector.class" : "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max" : "1",
"database.server.name" : "SQL2016",
"database.hostname" : "SQL2016",
"database.port" : "1433",
"database.user" : "kafka",
"database.password" : "kafkaPassword#789",
"database.dbname" : "sampleDb",
"database.history.kafka.bootstrap.servers" : "kafkanode1:9092",
"database.history.kafka.topic": "schema-changes.sampleDb"
}
}
Both connectors are creating snapshot of a table in a topic (means it pulls all the rows initially)
but when I make changes to a table "sampletable" (insert/update/delete), those changes are not being pulled to kafka.
can someone please help me understand how to make CDC working with Kafka?
Thanks

This seem to have worked 100%. I am posting answer just in case someone like me stuck on jdbc source connector.
{
"name": "piilog-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "incrementing",
"value.converter.schemas.enable": "false",
"connection.url": "jdbc:sqlserver://SQL2016:1433;databaseName=SIAudit",
"connection.user": "kafka",
"connection.password": "kafkaPassword#789",
"query": "select * from dbo.sampletable",
"incrementing.column.name": "Id",
"validate.non.null": false,
"topic.prefix": "data_",
"tasks.max": "1",
"poll.interval.ms": "5000",
"table.poll.interval.ms": "5000"
}
}

Related

Task becomes UNASSIGNED for Debezium MySQL source connector

I am using debezium 1.9. I created a connector using below config
{
"name": "user_management_db-connector-5",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "XXXX",
"database.port": "3306",
"database.user": "XXX",
"database.password": "XXX",
"database.server.id": "12345",
"database.server.name": "ula-stg-db",
"database.include.list": "user_management_db",
"database.history.kafka.bootstrap.servers": "kafka.ulastg.xyz:9094,kafka.ulastg.xyz:9092",
"database.history.kafka.topic": "dbhistory.user_management_db",
"snapshot.mode" : "schema_only",
"snapshot.locking.mode" : "none",
"table.include.list": "user_management_db.user,user_management_db.store,user_management_db.store_type,user_management_db.user_segment,user_management_db.user_segment_mapping",
"transforms":"Reroute",
"transforms.Reroute.type":"io.debezium.transforms.ByLogicalTableRouter",
"transforms.Reroute.topic.regex":"(.*)user_management_db(.+)",
"transforms.Reroute.topic.replacement":"$1cdc",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"topic.creation.default.include": "ula-stg-db.+",
"topic.creation.default.partitions": 20,
"topic.creation.default.replication.factor": 2,
"topic.creation.default.cleanup.policy": "delete",
"topic.creation.default.delete.retention.ms": 300000,
"errors.log.enable": true,
"errors.log.include.messages" :true
}
}
The connector gets created and I can see events in the topic ula-stge-db.cdc
The problem is after some time ( approximately a day ) I see events stop getting populated. I do not see any error in connector logs.
It only throws a generic info in regular interval
2022-07-12 09:24:25,654 INFO || WorkerSourceTask{id=promo_management_db-connector-5-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. [org.apache.kafka.connect.runtime.WorkerSourceTask]
The connector status is now shown as below
{
"name": "user_management_db-connector-5",
"connector": {
"state": "RUNNING",
"worker_id": "172.31.65.156:8083"
},
"tasks": [
{
"id": 0,
"state": "UNASSIGNED",
"worker_id": "172.31.71.28:8083"
}
],
"type": "source"
}
How to debug further ?
P.S: I am connecting to AWS RDS MySql. And Kafka is hosted in an EC2.

Kafka MongoDB sink Connector exception handling

I have created a connector from Kafka to MongoDB to sink the data. In some cases, there is a case in which I got the wrong data on my topic. So that topic sink with the DB at that time it will give me a duplicate key issue due to the index which I created.
But in this case, I want to move that data in dlq. But it is not moving the record.
this is my connector can anyone please help me with this.
{
"name": "test_1",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "test",
"connection.uri": "xxx",
"database": "test",
"collection": "test_record",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://xxx:8081",
"document.id.strategy.overwrite.existing": "true",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.ProvidedInKeyStrategy",
"transforms": "hk",
"transforms.hk.type": "org.apache.kafka.connect.transforms.HoistField$Key",
"transforms.hk.field": "_id",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneTimestampsStrategy",
"write.method": "upsert",
"errors.tolerance":"all",
"errors.deadletterqueue.topic.name":"dlq_sink",
"errors.deadletterqueue.context.headers.enable":true,
"errors.retry.delay.max.ms": 60000,
"errors.retry.timeout": 300000
}
}
Thanks,

What kind of data got routed to a dead letter queue topic?

I Have implemented Dead Letter Queues error handling in Kafka. It works and the data are sent to DLQ topics. I am not understanding what types of data got routed in DLQ topics.
1st picture is the data that got routed into DLQ Topics and the second one is the normal data that got sunk into databases.
Does anyone have any idea how does that key got changed as I have used id as a key?
Here is my source and sink properties:
"name": "jdbc_source_postgresql_analytics",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:postgresql://192.168.5.40:5432/abc",
"connection.user": "abc",
"connection.password": "********",
"topic.prefix": "test_",
"mode": "timestamp+incrementing",
"incrementing.column.name": "id",
"timestamp.column.name": "updatedAt",
"validate.non.null": true,
"table.whitelist": "test",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": false,
"value.converter.schemas.enable": false,
"catalog.pattern": "public",
"transforms": "createKey,extractInt",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields": "id",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field": "id",
"errors.tolerance": "all"
}
}
sink properties:
{
"name": "es_sink_analytics",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"type.name": "_doc",
"key.converter.schemas.enable": "false",
"topics": "TEST",
"topic.index.map": "TEST:te_test",
"value.converter.schemas.enable": "false",
"connection.url": "http://192.168.10.40:9200",
"connection.username": "******",
"connection.password": "********",
"key.ignore": "false",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name": "dlq-error-es",
"errors.deadletterqueue.topic.replication.factor": "1",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"schema.ignore": "true",
"error.tolerance":"all"
}
}

Delete document with MongoSinkConnector

I'm able to insert/update documents in mongo but i'm struggling to delete documents.
This is how the data is recorded in kafka topic by a debezium connect from a SQL Server source(the last row is how DELETE operation looks like):
{"user_code":1001} {"user_code":1001,"first_name":"Sally","last_name":"Thomas","email":"sally.thomas#acme.com"}
{"user_code":1002} {"user_code":1002,"first_name":"George","last_name":"Bailey","email":"gbailey#foobar.com"}
{"user_code":1003} {"user_code":1003,"first_name":"Edward","last_name":"Walker","email":"ed#walker.com"}
{"user_code":1003} null
In this example even if after the NULL value, the document with user_code 1003 still in MongoDB.
Below is how I'm configuring my MongoSinkConnector (I've already tried both mongodb.delete.on.null.values and delete.on.null.values but none of them worked):
{
"name": "inventory-connector-sink-2",
"config": {
"connector.class" : "com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max" : "1",
"topics": "server1.dbo.customers",
"connection.uri": "mongodb://root:root#mongo:27017/",
"database": "testDB",
"collection": "customers_2",
"database.history.kafka.bootstrap.servers" : "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": false,
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": false,
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.FullKeyStrategy",
"writemodel.strategy":"com.mongodb.kafka.connect.sink.writemodel.strategy.ReplaceOneBusinessKeyStrategy",
"mongodb.delete.on.null.values": true
}
}
I've also tried using PartialValueStrategy but no luck.
PS: I'm working with confluentinc/cp-kafka-connect docker image for my sink connector.

Kafka Connect - JDBC Source Connector - Setting Avro Schema

How can I make Kafka Connect JDBC connector to predefined Avro schema ? It creates a new version when the connecter is created. I am reading from DB2 and putting into Kafka topic.
I am setting schema name and version during creation but it does not work!!! Here is my connector settings :
{
"name": "kafka-connect-jdbc-db2-tst-2",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "1",
"connection.url": "jdbc:db2://mydb2:50000/testdb",
"connection.user": "DB2INST1",
"connection.password": "12345678",
"query":"SELECT CORRELATION_ID FROM TEST.MYVIEW4 ",
"mode": "incrementing",
"incrementing.column.name": "CORRELATION_ID",
"validate.non.null": "false",
"topic.prefix": "tst-4" ,
"auto.register.schemas": "false",
"use.latest.version": "true",
"transforms": "RenameField,SetSchemaMetadata",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "CORRELATION_ID:id",
"transforms.SetSchemaMetadata.type": "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
"transforms.SetSchemaMetadata.schema.name": "foo.bar.MyMessage",
"transforms.SetSchemaMetadata.schema.version": "1"
}
}
And here are the schemas : V.1 is mine, and V.2 is created by JDBC source connector:
$ curl localhost:8081/subjects/tst-4-value/versions/1 | jq .
{
"subject": "tst-4-value",
"version": 1,
"id": 387,
"schema": "{"type":"record","name":"MyMessage",
"namespace":"foo.bar","fields":[{"name":"id","type":"int"}]}"
}
$ curl localhost:8081/subjects/tst-4-value/versions/2 | jq .
{
"subject": "tst-4-value",
"version": 2,
"id": 386,
"schema": "{"type":"record","name":"MyMessage","namespace":"foo.bar",
"fields":[{"name":"id","type":"int"}],
"connect.version":1,
"connect.name":"foo.bar.MyMessage"
}"
}
Any idea how can I force Kafka connector to use my schema ?
Thanks in advance,