How to replicate all changes from source to destination db using debezium and confluent-sink-connector running on docker - apache-kafka

The below code is my Dockerfile for Kafka-connect-JDBC and MySQL-driver
FROM debezium/connect:1.3
ENV KAFKA_CONNECT_JDBC_DIR=$KAFKA_CONNECT_PLUGINS_DIR/kafka-connect-jdbc
ENV MYSQL_DRIVER_VERSION 8.0.20
ARG KAFKA_JDBC_VERSION=5.5.0
RUN curl -k -SL "https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-${MYSQL_DRIVER_VERSION}.tar.gz" \
| tar -xzf - -C /kafka/libs --strip-components=1 mysql-connector-java-8.0.20/mysql-connector-java-${MYSQL_DRIVER_VERSION}.jar
RUN mkdir $KAFKA_CONNECT_JDBC_DIR && cd $KAFKA_CONNECT_JDBC_DIR &&\
curl -sO https://packages.confluent.io/maven/io/confluent/kafka-connect-jdbc/$KAFKA_JDBC_VERSION/kafka-connect-jdbc-$KAFKA_JDBC_VERSION.jar
docker build . --tag kafka kafka-connect-sink
Below is my source db json
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.99.102:8083/connectors/ -d '{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "184054",
"database.server.name": "dbserver1",
"database.include.list": "inventory",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.inventory"
}
}'
Below is my destination db sink json
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.99.102:8083/connectors/ -d '{
"name": "inventory-connector-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:mysql://192.168.0.104:3306/pk?useSSL=false",
"connection.user": "pavan",
"connection.password": "root",
"topics": "dbserver1.inventory.customers",
"table.name.format": "pk.customers",
"auto.create": "true",
"auto.evolve": "true",
"delete.enabled": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_key",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite"
}
}'
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.99.102:8083/connectors/ -d '{
"name": "inventory-connector-sink-addresses",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:mysql://192.168.0.104:3306/pk?useSSL=false",
"connection.user": "pavan",
"connection.password": "root",
"topics": "dbserver1.inventory.addresses",
"table.name.format": "pk.addresses",
"auto.create": "true",
"auto.evolve": "true",
"delete.enabled": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_key",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite"
}
}'
with this configuration i need to subscribe to each topic but problem is i had 100+ tables to get replicate in destination db is there anyway i can do it in single json configuration so that i can subscribe to all topics.

You can use topics (or topics.regex) property to define the list of topics to consume and table.name.format property of JBDC Sink connector or RegexRouter SMT (or combine them) to override destination table names:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.99.102:8083/connectors/ -d '{
"name": "inventory-connector-sink-addresses",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:mysql://192.168.0.104:3306/pk?useSSL=false",
"connection.user": "pavan",
"connection.password": "root",
"topics": "dbserver1.inventory.addresses,dbserver1.inventory.customers",
"auto.create": "true",
"auto.evolve": "true",
"delete.enabled": "true",
"insert.mode": "upsert",
"pk.fields": "",
"pk.mode": "record_key",
"transforms": "route,unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "pk.$3"
}
}'

Related

Kafka connect path format not working properly

I create connector with the script below, but in S3, I see partition format of /year=2015/month=12/day=07/hour=15/ . Is there a way to implement partition of 'dt'=YYYY-MM-dd/'hour'=HH/ format ?
curl -X POST \
-H "Content-Type: application/json" \
--data '{
"name": "content.logging.test",
"config": {
"topics": "content.logging",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"s3.region": "ap-northeast-1",
"s3.bucket.name": "kafka-connect-test",
"locale": "en-US",
"timezone": "UTC",
"tasks.max": 1,
"flush.size": 10,
"partitioner.class": "io.confluent.connect.storage.partitioner.HourlyPartitioner",
"partition.duration.ms": 3600000,
"path.format": "'dt'=YYYY-MM-dd/'hour'=HH/"
}
}' http://$CONNECT_REST_ADVERTISED_HOST_NAME:8083/connectors
You should use the TimeBasedPartitioner if you want to use a format
https://docs.confluent.io/kafka-connect-s3-sink/current/index.html#partitioning-records-into-s3-objects

Debezium should only read new changes

Even though I'm using snapshot.mode:schema_only, I'm getting complete records of the database whereas I only want the new ones. Are there any other modifications that I should do?
This is the config of my source connector:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" \
localhost:8083/connectors/ \
-d '{ "name": "inventory-connector",
"config": { "connector.class":"io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"snapshot.mode":"schema_only",
"database.hostname": "----",
"database.port": "3306",
"database.user": "---",
"database.password": "----",
"database.server.id": "1",
"database.server.name": "data",
"database.whitelist": "---",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.---"
}
}'

Debezium Connector Outbox Transform

I am trying to use a MySql Source Connector with the Outbox SMT supported by debezium with the following config. I am using the latest jars of debezium-core and debezium-mysql-connector (1.1)
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost: 8083/connectors/ -d '{
"name": "debezium-mysql-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "MySql",
"database.port": "3306",
"database.user": "**",
"database.password": "**",
"database.server.id": "1033113244",
"database.server.name": "anomaly-changelog",
"database.whitelist": "anomaly",
"database.history.kafka.bootstrap.servers": "Kafka:9092",
"database.history.kafka.topic": "anomaly.schema.history",
"transforms": "outbox,reroute",
"transforms.reroute.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.reroute.regex": "(.*)",
"transforms.reroute.replacement": "$1-SMT",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter"
}
}'
But I am still getting the following error:
{"error_code":400,"message":"Connector configuration is invalid and contains the following 2 error(s):\nInvalid value io.debezium.transforms.outbox.EventRouter for configuration transforms.outbox.type: Class io.debezium.transforms.outbox.EventRouter could not be found.\nInvalid value null for configuration transforms.outbox.type: Not a Transformation}
I don't see why it is not being recognized.
You can try something like this:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" demo:8083/connectors/ -d '{ "name": "order-connector", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "tasks.max": "1", "database.hostname": "mariadb_order", "database.port": "3306", "database.user": "root", "database.password": "***", "database.server.id": "223344", "database.server.name": "orderdbserver","table.whitelist": "orderdb.outbox", "transforms": "outbox", "transforms.outbox.type" :"io.debezium.transforms.outbox.EventRouter", "database.history.kafka.bootstrap.servers": "kafka:9092", "database.history.kafka.topic":"dbhistory.orderdb", "transforms.outbox.table.fields.additional.placement" : "aggregateid:envelope:id" } }'

Multiple replication slot for debezium connector

I want to create multiple debezium connector with different replication slot. But I am Unable to create multiple replication slot for postgres debezium connector.
I am using docker container for Postgres & kafka. I tried setting up max_replication_slots = 2 in postgressql.conf file & also different slot.name. but still it did not create 2 replication slot for me.
{
"config": {
"batch.size": "49152",
"buffer.memory": "100663296",
"compression.type": "lz4",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.dbname": "Db1",
"database.hostname": "DBhost",
"database.password": "dbpwd",
"database.port": "5432",
"database.server.name": "serve_name",
"database.user": "usename",
"decimal.handling.mode": "double",
"hstore.handling.mode": "json",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "debezium-702771",
"plugin.name": "wal2json",
"schema.refresh.mode": "columns_diff_exclude_unchanged_toast",
"slot.drop_on_stop": "true",
"slot.name": "debezium1",
"table.whitelist": "tabel1",
"time.precision.mode": "adaptive_time_microseconds",
"transforms": "Reroute",
"transforms.Reroute.topic.regex": "(.*).public.(.*)",
"transforms.Reroute.topic.replacement": "$1.$2",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"
},
"name": "debezium-702771",
"tasks": [],
"type": "source"
}
{
"config": {
"batch.size": "49152",
"buffer.memory": "100663296",
"compression.type": "lz4",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.dbname": "Db1",
"database.hostname": "DBhost",
"database.password": "dbpwd",
"database.port": "5432",
"database.server.name": "serve_name",
"database.user": "usename",
"decimal.handling.mode": "double",
"hstore.handling.mode": "json",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "debezium-702772",
"plugin.name": "wal2json",
"schema.refresh.mode": "columns_diff_exclude_unchanged_toast",
"slot.drop_on_stop": "true",
"slot.name": "debezium2",
"table.whitelist": "tabel1",
"time.precision.mode": "adaptive_time_microseconds",
"transforms": "Reroute",
"transforms.Reroute.topic.regex": "(.*).public.(.*)",
"transforms.Reroute.topic.replacement": "$1.$2",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"
},
"name": "debezium-702772",
"tasks": [],
"type": "source"
}
It creates multiple connector but not multiple replication slot even after giving different slot name. Do I need to do anything over here.

Kafka connect jdbc source mssql server loading millions record throwing out of memory error

I have tried to load 77 millions of record from MSSQL server to Kafka topic through Kafka connect JDBC source.
Tried batch approach given batch.max.rows as 1000. In this case, after 1000 records, it's throughout of memory. Please share suggestions on how to make it works
Below are connector approach i tried
curl -X POST http://test.com:8083/connectors -H "Content-Type: application/json" -d '{
"name": "mssql_jdbc_rsitem_pollx",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:sqlserver://test:1433;databaseName=xxx",
"connection.user": "xxxx",
"connection.password": "xxxx",
"topic.prefix": "mssql-rsitem_pollx-",
"mode":"incrementing",
"table.whitelist" : "test",
"timestamp.column.name": "itemid",
"max.poll.records" :"100",
"max.poll.interval.ms":"3000",
"validate.non.null": false
}
}'
curl -X POST http://test.com:8083/connectors -H "Content-Type: application/json" -d '{
"name": "mssql_jdbc_test_polly",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "10",
"connection.url": "jdbc:sqlserver://test:1433;databaseName=xxx;defaultFetchSize=10000;useCursorFetch=true",
"connection.user": "xxxx",
"connection.password": "xxxx",
"topic.prefix": "mssql-rsitem_polly-",
"mode":"incrementing",
"table.whitelist" : "test",
"timestamp.column.name": "itemid",
"poll.interval.ms":"86400000",
"validate.non.null": false
}
}'
try to increase Java heap size, write in command line:
export KAFKA_HEAP_OPTS="-Xms1g -Xmx2g"
you can change the "Xmx2g" part to match your capacity.