Debezium tableChanges empty - debezium

Running debezium with curl with the following configuration
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/
-d '{ "name": "transactions-connector",
"config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname":"localhost",
"database.port": "3306", "database.user": "root",
"database.password": "*****", "database.server.id": "1", "database.server.name":"*****", "database.include.list": "*****", "database.history.kafka.bootstrap.servers": "localhost:9092", "database.history.kafka.topic": "dbhistory.transactions",
"table.include.list": "transactions,customers",
"database.dbname": "******",
"snapshot.mode": "initial",
"snapshot.locking.mode": "none",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
"internal.key.converter.schemas.enable": "false",
"internal.value.converter.schemas.enable": "false",
"transforms": "unwrap",
"transforms.unwrap.add.source.fields": "ts_ms",
"tombstones.on.delete": "false",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
} }'```
in kafka consumer response, all schema level changes are coming but row level data is missing.
```{
"source" : {
"server" : "C02F5AZSMD6M"
},
"position" : {
"ts_sec" : 1628625011,
"file" : "binlog.000010",
"pos" : 1473,
"snapshot" : true
},
"databaseName" : "meesho_test",
"ddl" : "CREATE TABLE `transactions` (\n `id` bigint NOT NULL AUTO_INCREMENT,\n `amount` double NOT NULL,\n `callback_url_id` int DEFAULT NULL,\n `client_id` int DEFAULT NULL,\n `client_transaction_id` varchar(63) NOT NULL,\n `created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,\n `error_code` varchar(63) DEFAULT NULL,\n `error_message` varchar(255) DEFAULT NULL,\n `iso_country_code` varchar(3) DEFAULT 'IN',\n `md` text,\n `parent_pg_transaction_id` varchar(63) DEFAULT NULL,\n `parent_transaction_id` varchar(63) DEFAULT NULL,\n `mode` varchar(16) NOT NULL,\n `pg_transaction_id` varchar(63) DEFAULT NULL,\n `profile` varchar(63) DEFAULT NULL,\n `status` varchar(63) NOT NULL,\n `transaction_id` varchar(63) NOT NULL,\n `type` varchar(16) NOT NULL,\n `updated_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,\n PRIMARY KEY (`id`),\n UNIQUE KEY `UK_transaction_id` (`transaction_id`),\n UNIQUE KEY `UK_client_transaction_id` (`client_transaction_id`)\n) ENGINE=InnoDB AUTO_INCREMENT=10003 DEFAULT CHARSET=utf8mb3",
"tableChanges" : [ ]
}
No row level insert, update, delete operations are being captured.
table changes field empty in the response object

Add database prefix on your table whitelist
"table.include.list": "DBNAME.transactions,DBNAME.customers",

Related

CDC PostgreSQL with Debezium while ignoring columns

I have a postgreSQL table with 10 columns, we need to enable CDC on this table to capture changes only in ONE of the columns ignoring the other nine. Our debezium has the following configurations:
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.history.file.filename": "/data/postgresql-d-connection-teste3-history.dat",
"database.user": "postgres",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"database.dbname": "postgres",
"max.queue.size": "81290",
"tasks.max": "1",
"transforms": "Reroute",
"database.server.name": "xxxx",
"offset.flush.timeout.ms": "60000",
"transforms.Reroute.topic.regex": "(.*)",
"buffer.memory": "2048",
"database.port": "5432",
"plugin.name": "wal2json",
"offset.flush.interval.ms": "10000",
"tombstones.on.delete": "false",
"transforms.Reroute.topic.replacement": "teste3",
"decimal.handling.mode": "string",
"database.hostname": "xxxx",
"database.password": "xxxx",
"name": "postgresql-d-connection-teste3",
"table.include.list": "public.test",
"max.batch.size": "20480",
"database.history": "io.debezium.relational.history.FileDatabaseHistory"
}
We also already tried to set these parameters in the JSON, without success:
"column.include.list": "public.test.{id|name}"
"column.exclude.list": "public.test.{id|name}"
I have the same scenario working fine with MS SQL Server, where I only had to execute these query:
EXEC sys.sp_cdc_enable_table
#source_schema = N'xxxx',
#source_name = N'xxxx',
#captured_column_list = N'col1, col2, col3
#supports_net_changes = 0
Is it possible to achieve the same goal with postgreSQL?
Thanks in advance

kafka sink connector. How insert record_key to table if record_value = null?

I have a topic where the data is only record_key and the record_value is null. I need to write the fields from the record_key to the table through the sink connector, but I get an error. How do I insert values from a record_key, ignoring record_value = null?
{
"name": "ObjectForDelete.sink",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"transforms": [
"uuid",
"name"
],
"topics": [
"ObjectForDeletePG"
],
"transforms.uuid.type": "org.apache.kafka.connect.transforms.InsertField$Key",
"transforms.uuid.topic.field": "key.uuid",
"transforms.name.type": "org.apache.kafka.connect.transforms.InsertField$Key",
"transforms.name.topic.field": "key.name",
"connection.url": "jdbc:postgresql://127.0.0.1:5432/ulk",
"connection.user": "root",
"connection.password": "****",
"dialect.name": "PostgreSqlDatabaseDialect",
"insert.mode": "insert",
"table.name.format": "ObjectForDelete",
"pk.mode": "none",
"pk.fields": [],
"db.timezone": "Europe/Kiev"
}
ERROR:
Caused by: org.apache.kafka.connect.errors.ConnectException: Sink connector 'ObjectForDelete.sink' is configured with 'delete.enabled=false' and 'pk.mode=kafka' and therefore requires records with a non-null Struct value and non-null Struct schema, but found record at (topic='ObjectForDeletePG',partition=0,offset=0,timestamp=1623758581060) with a null value and null value schema.

Change the topic name in a HDFS2 SINK CONNECTOR integrated with HIVE

Good morning
When I work with a HDFS2 connector sink, integrated with HIVE, the database table get the name of the topic. Is there a way to choice the name of the table?.
This is the config of my conector:
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"hive.integration": "true",
"hive.database": "databaseEze",
"hive.metastore.uris": "thrift://server1.dc.es.arioto:9083",
"transforms.InsertField.timestamp.field": "carga",
"flush.size": "100000",
"tasks.max": "2",
"timezone": "Europe/Paris",
"transforms": "RenameField,InsertField,carga_format",
"rotate.interval.ms": "900000",
"locale": "en-GB",
"logs.dir": "/logs",
"format.class": "io.confluent.connect.hdfs.avro.AvroFormat",
"transforms.InsertField.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"transforms.RenameField.renames": "var1:Test1,var2:Test2,var3:test3",
"transforms.carga_format.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.carga_format.target.type": "string",
"transforms.carga_format.format": "yyyyMMdd",
"hadoop.conf.dir": "/etc/hadoop/",
"schema.compatibility": "BACKWARD",
"topics": "Skiel-Tracking-Replicator",
"hdfs.url": "hdfs://database/user/datavaseEze/",
"transforms.InsertField.topic.field": "ds_topic",
"partition.field.name": "carga",
"transforms.InsertField.partition.field": "test_partition",
"value.converter.schema.registry.url": "http://schema-registry-eze-dev.ocjc.serv.dc.es.arioto",
"partitioner.class": "io.confluent.connect.storage.partitioner.FieldPartitioner",
"name": "KAFKA-HDFS-HIVE-TEST",
"transforms.fx_carga_format.field": "carga",
"transforms.InsertField.offset.field": "test_offset"
}
With that config, the table will name **Skiel-Tracking-Replicator** and I want that the table name will be d9nvtest.
You can use the RegexRouter Single Message Transform to modify the topic name.
{
"transforms" : "renameTopic",
"transforms.renameTopic.type" : "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.renameTopic.regex" : "Skiel-Tracking-Replicator",
"transforms.renameTopic.replacement": "d9nvtest"
}
See https://rmoff.net/2020/12/11/twelve-days-of-smt-day-4-regexrouter/
While using RegexRouter with kafka-connect-hdfs, this issue occurs - https://github.com/confluentinc/kafka-connect-hdfs/issues/236
Last comment here specifies that these two are conceptually incompatible.

Multiple replication slot for debezium connector

I want to create multiple debezium connector with different replication slot. But I am Unable to create multiple replication slot for postgres debezium connector.
I am using docker container for Postgres & kafka. I tried setting up max_replication_slots = 2 in postgressql.conf file & also different slot.name. but still it did not create 2 replication slot for me.
{
"config": {
"batch.size": "49152",
"buffer.memory": "100663296",
"compression.type": "lz4",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.dbname": "Db1",
"database.hostname": "DBhost",
"database.password": "dbpwd",
"database.port": "5432",
"database.server.name": "serve_name",
"database.user": "usename",
"decimal.handling.mode": "double",
"hstore.handling.mode": "json",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "debezium-702771",
"plugin.name": "wal2json",
"schema.refresh.mode": "columns_diff_exclude_unchanged_toast",
"slot.drop_on_stop": "true",
"slot.name": "debezium1",
"table.whitelist": "tabel1",
"time.precision.mode": "adaptive_time_microseconds",
"transforms": "Reroute",
"transforms.Reroute.topic.regex": "(.*).public.(.*)",
"transforms.Reroute.topic.replacement": "$1.$2",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"
},
"name": "debezium-702771",
"tasks": [],
"type": "source"
}
{
"config": {
"batch.size": "49152",
"buffer.memory": "100663296",
"compression.type": "lz4",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.dbname": "Db1",
"database.hostname": "DBhost",
"database.password": "dbpwd",
"database.port": "5432",
"database.server.name": "serve_name",
"database.user": "usename",
"decimal.handling.mode": "double",
"hstore.handling.mode": "json",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "debezium-702772",
"plugin.name": "wal2json",
"schema.refresh.mode": "columns_diff_exclude_unchanged_toast",
"slot.drop_on_stop": "true",
"slot.name": "debezium2",
"table.whitelist": "tabel1",
"time.precision.mode": "adaptive_time_microseconds",
"transforms": "Reroute",
"transforms.Reroute.topic.regex": "(.*).public.(.*)",
"transforms.Reroute.topic.replacement": "$1.$2",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"
},
"name": "debezium-702772",
"tasks": [],
"type": "source"
}
It creates multiple connector but not multiple replication slot even after giving different slot name. Do I need to do anything over here.

debezium - change of topic name gives the error cross-database references

I am using this debezium-examples
source.json
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "184054",
"database.server.name": "dbserver1",
"database.whitelist": "inventory",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$3"
}
}
jdbc-sink.json
{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "customers",
"connection.url": "jdbc:postgresql://postgres:5432/inventory?user=postgresuser&password=postgrespw",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
I have run this example its working fine.But when I have made some changes as discuss in the following scenario. it giving me 'cross-database references' error.
Scenario
I have remove these properties from source
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$3"
now it creating topic in kafka as follow
dbServer1.inventory.products
dbserver1.inventory.products_on_hand
dbserver1.inventory.customers
dbserver1.inventory.orders
When I specified topic= dbserver1.inventory.customers in jdbc-sink, it giving me the following exception
ERROR: cross-database references are not implemented:
"dbserver1.inventory.customers" at character 14
postgres_1 | STATEMENT: CREATE TABLE "dbserver1"."inventory"."customers" (
postgres_1 | "last_name" TEXT NOT NULL,
postgres_1 | "id" INT NOT NULL,
postgres_1 | "first_name" TEXT NOT NULL,
postgres_1 | "email" TEXT NOT NULL,
postgres_1 | PRIMARY KEY("id"))
connect_1 | 2019-01-29 09:39:18,931 WARN || Create failed, will attempt amend if table already exists [io.confluent.connect.jdbc.sink.DbStructure]
connect_1 | org.postgresql.util.PSQLException: ERROR: cross-database references are not implemented: "dbserver1.inventory.customers"
connect_1 | Position: 14
Note: Its not duplicate as other question is also posted by me, which is covering different scenario
change inventory -> dbserver1
(databasename).(schemaname).(tablename)
{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "customers",
"connection.url": "jdbc:postgresql://postgres:5432/dbserver1?user=postgresuser&password=postgrespw",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
The table.name.format sink property solved this for me. It allows you to override what the destination table name will be. See https://docs.confluent.io/3.1.1/connect/connect-jdbc/docs/sink_config_options.html