Debezium Connector Outbox Transform - apache-kafka

I am trying to use a MySql Source Connector with the Outbox SMT supported by debezium with the following config. I am using the latest jars of debezium-core and debezium-mysql-connector (1.1)
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost: 8083/connectors/ -d '{
"name": "debezium-mysql-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "MySql",
"database.port": "3306",
"database.user": "**",
"database.password": "**",
"database.server.id": "1033113244",
"database.server.name": "anomaly-changelog",
"database.whitelist": "anomaly",
"database.history.kafka.bootstrap.servers": "Kafka:9092",
"database.history.kafka.topic": "anomaly.schema.history",
"transforms": "outbox,reroute",
"transforms.reroute.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.reroute.regex": "(.*)",
"transforms.reroute.replacement": "$1-SMT",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter"
}
}'
But I am still getting the following error:
{"error_code":400,"message":"Connector configuration is invalid and contains the following 2 error(s):\nInvalid value io.debezium.transforms.outbox.EventRouter for configuration transforms.outbox.type: Class io.debezium.transforms.outbox.EventRouter could not be found.\nInvalid value null for configuration transforms.outbox.type: Not a Transformation}
I don't see why it is not being recognized.

You can try something like this:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" demo:8083/connectors/ -d '{ "name": "order-connector", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "tasks.max": "1", "database.hostname": "mariadb_order", "database.port": "3306", "database.user": "root", "database.password": "***", "database.server.id": "223344", "database.server.name": "orderdbserver","table.whitelist": "orderdb.outbox", "transforms": "outbox", "transforms.outbox.type" :"io.debezium.transforms.outbox.EventRouter", "database.history.kafka.bootstrap.servers": "kafka:9092", "database.history.kafka.topic":"dbhistory.orderdb", "transforms.outbox.table.fields.additional.placement" : "aggregateid:envelope:id" } }'

Related

Debezium NPE while creating oracle source connector

I am creating a Debezium Oracle source connector with the following curl post command. but getting null pointer exception, not able to find where could be issued.
curl command
curl -X POST -H "Content-Type: application/json" --data '{ "name":"debez_ora_cdc", "config":{ "connector.class":"io.debezium.connector.oracle.OracleConnector", "tasks.max":"1", "connection.url":"jdbc:oracle:thin:#host_ip:port:test", "connection.user":"logminer_user", "connection.password":"logminer_user", "table.include.list": "logminer_user.users", "database.server.name": "server1",
"database.tablename.case.insensitive": "true", "database.hostname": "host_ip",
"database.port": "1522", "database.user": "logminer_user", "database.password": "logminer_user", "database.dbname": "test", "database.history.kafka.bootstrap.servers": "localhost:9092",
"database.history.kafka.topic": "server1.oracle.history", "database.history.skip.unparseable.ddl": "true", "include.schema.changes": "true", "snapshot.mode": "initial", "errors.log.enable": "true"
} }' http://localhost:8083/connectors
Exception
{"name":"debez_ora_cdc","connector":{"state":"RUNNING","worker_id":"null:-1"},"tasks":[{"id":0,"state":"FAILED","worker_id":"null:-1","trace":"org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.\n\tat io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:50)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:116)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: io.debezium.DebeziumException: java.lang.NullPointerException\n\tat io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:85)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.doSnapshot(ChangeEventSourceCoordinator.java:155)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:137)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:109)\n\t... 5 more\nCaused by: java.lang.NullPointerException\n\tat io.debezium.relational.TableEditorImpl.columnWithName(TableEditorImpl.java:46)\n\tat io.debezium.relational.TableEditorImpl.hasColumnWithName(TableEditorImpl.java:50)\n\tat io.debezium.relational.TableEditorImpl.lambda$updatePrimaryKeys$0(TableEditorImpl.java:103)\n\tat java.util.ArrayList.removeIf(ArrayList.java:1413)\n\tat io.debezium.relational.TableEditorImpl.updatePrimaryKeys(TableEditorImpl.java:102)\n\tat io.debezium.relational.TableEditorImpl.create(TableEditorImpl.java:267)\n\tat io.debezium.relational.Tables.lambda$overwriteTable$2(Tables.java:192)\n\tat io.debezium.util.FunctionalReadWriteLock.write(FunctionalReadWriteLock.java:84)\n\tat io.debezium.relational.Tables.overwriteTable(Tables.java:186)\n\tat io.debezium.jdbc.JdbcConnection.readSchema(JdbcConnection.java:1209)\n\tat io.debezium.connector.oracle.OracleSnapshotChangeEventSource.readTableStructure(OracleSnapshotChangeEventSource.java:181)\n\tat io.debezium.connector.oracle.OracleSnapshotChangeEventSource.readTableStructure(OracleSnapshotChangeEventSource.java:35)\n\tat io.debezium.relational.RelationalSnapshotChangeEventSource.doExecute(RelationalSnapshotChangeEventSource.java:114)\n\tat io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:76)\n\t... 8 more\n"}],"type":"source"}

Debezium Connector - read from beginning and stop working connector

I am trying to use Debezium to connect to my Postgres database. I would like to copy data from a specific table. Using this configuration I only copy the newest data. Should I only change a snapshot.mode?
"name": "prod-contact-connect",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.user": "user",
"database.dbname": "db_name",
"slot.name": "debezium_contact",
"tasks.max": "1",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"publication.name": "dbz_publication",
"transforms": "unwrap",
"database.server.name": "connect.prod.contact",
"database.port": "5432",
"plugin.name": "pgoutput",
"table.whitelist": "specific_table_name",
"database.sslmode": "disable",
"database.hostname": "localhost",
"database.password": "pass",
"name": "prod-contact-connect",
"transforms.unwrap.add.fields": "op,table,schema,name",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"snapshot.mode": "never"
}
}
by the way, how can I stop working the debezium connector for a moment? There is some enable flag?

Kafka connect path format not working properly

I create connector with the script below, but in S3, I see partition format of /year=2015/month=12/day=07/hour=15/ . Is there a way to implement partition of 'dt'=YYYY-MM-dd/'hour'=HH/ format ?
curl -X POST \
-H "Content-Type: application/json" \
--data '{
"name": "content.logging.test",
"config": {
"topics": "content.logging",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"s3.region": "ap-northeast-1",
"s3.bucket.name": "kafka-connect-test",
"locale": "en-US",
"timezone": "UTC",
"tasks.max": 1,
"flush.size": 10,
"partitioner.class": "io.confluent.connect.storage.partitioner.HourlyPartitioner",
"partition.duration.ms": 3600000,
"path.format": "'dt'=YYYY-MM-dd/'hour'=HH/"
}
}' http://$CONNECT_REST_ADVERTISED_HOST_NAME:8083/connectors
You should use the TimeBasedPartitioner if you want to use a format
https://docs.confluent.io/kafka-connect-s3-sink/current/index.html#partitioning-records-into-s3-objects

Debezium should only read new changes

Even though I'm using snapshot.mode:schema_only, I'm getting complete records of the database whereas I only want the new ones. Are there any other modifications that I should do?
This is the config of my source connector:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" \
localhost:8083/connectors/ \
-d '{ "name": "inventory-connector",
"config": { "connector.class":"io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"snapshot.mode":"schema_only",
"database.hostname": "----",
"database.port": "3306",
"database.user": "---",
"database.password": "----",
"database.server.id": "1",
"database.server.name": "data",
"database.whitelist": "---",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.---"
}
}'

How to replicate all changes from source to destination db using debezium and confluent-sink-connector running on docker

The below code is my Dockerfile for Kafka-connect-JDBC and MySQL-driver
FROM debezium/connect:1.3
ENV KAFKA_CONNECT_JDBC_DIR=$KAFKA_CONNECT_PLUGINS_DIR/kafka-connect-jdbc
ENV MYSQL_DRIVER_VERSION 8.0.20
ARG KAFKA_JDBC_VERSION=5.5.0
RUN curl -k -SL "https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-${MYSQL_DRIVER_VERSION}.tar.gz" \
| tar -xzf - -C /kafka/libs --strip-components=1 mysql-connector-java-8.0.20/mysql-connector-java-${MYSQL_DRIVER_VERSION}.jar
RUN mkdir $KAFKA_CONNECT_JDBC_DIR && cd $KAFKA_CONNECT_JDBC_DIR &&\
curl -sO https://packages.confluent.io/maven/io/confluent/kafka-connect-jdbc/$KAFKA_JDBC_VERSION/kafka-connect-jdbc-$KAFKA_JDBC_VERSION.jar
docker build . --tag kafka kafka-connect-sink
Below is my source db json
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.99.102:8083/connectors/ -d '{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "184054",
"database.server.name": "dbserver1",
"database.include.list": "inventory",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.inventory"
}
}'
Below is my destination db sink json
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.99.102:8083/connectors/ -d '{
"name": "inventory-connector-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:mysql://192.168.0.104:3306/pk?useSSL=false",
"connection.user": "pavan",
"connection.password": "root",
"topics": "dbserver1.inventory.customers",
"table.name.format": "pk.customers",
"auto.create": "true",
"auto.evolve": "true",
"delete.enabled": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_key",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite"
}
}'
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.99.102:8083/connectors/ -d '{
"name": "inventory-connector-sink-addresses",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:mysql://192.168.0.104:3306/pk?useSSL=false",
"connection.user": "pavan",
"connection.password": "root",
"topics": "dbserver1.inventory.addresses",
"table.name.format": "pk.addresses",
"auto.create": "true",
"auto.evolve": "true",
"delete.enabled": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_key",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite"
}
}'
with this configuration i need to subscribe to each topic but problem is i had 100+ tables to get replicate in destination db is there anyway i can do it in single json configuration so that i can subscribe to all topics.
You can use topics (or topics.regex) property to define the list of topics to consume and table.name.format property of JBDC Sink connector or RegexRouter SMT (or combine them) to override destination table names:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.99.102:8083/connectors/ -d '{
"name": "inventory-connector-sink-addresses",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:mysql://192.168.0.104:3306/pk?useSSL=false",
"connection.user": "pavan",
"connection.password": "root",
"topics": "dbserver1.inventory.addresses,dbserver1.inventory.customers",
"auto.create": "true",
"auto.evolve": "true",
"delete.enabled": "true",
"insert.mode": "upsert",
"pk.fields": "",
"pk.mode": "record_key",
"transforms": "route,unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "pk.$3"
}
}'