Debezium NPE while creating oracle source connector - debezium

I am creating a Debezium Oracle source connector with the following curl post command. but getting null pointer exception, not able to find where could be issued.
curl command
curl -X POST -H "Content-Type: application/json" --data '{ "name":"debez_ora_cdc", "config":{ "connector.class":"io.debezium.connector.oracle.OracleConnector", "tasks.max":"1", "connection.url":"jdbc:oracle:thin:#host_ip:port:test", "connection.user":"logminer_user", "connection.password":"logminer_user", "table.include.list": "logminer_user.users", "database.server.name": "server1",
"database.tablename.case.insensitive": "true", "database.hostname": "host_ip",
"database.port": "1522", "database.user": "logminer_user", "database.password": "logminer_user", "database.dbname": "test", "database.history.kafka.bootstrap.servers": "localhost:9092",
"database.history.kafka.topic": "server1.oracle.history", "database.history.skip.unparseable.ddl": "true", "include.schema.changes": "true", "snapshot.mode": "initial", "errors.log.enable": "true"
} }' http://localhost:8083/connectors
Exception
{"name":"debez_ora_cdc","connector":{"state":"RUNNING","worker_id":"null:-1"},"tasks":[{"id":0,"state":"FAILED","worker_id":"null:-1","trace":"org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.\n\tat io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:50)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:116)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: io.debezium.DebeziumException: java.lang.NullPointerException\n\tat io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:85)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.doSnapshot(ChangeEventSourceCoordinator.java:155)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:137)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:109)\n\t... 5 more\nCaused by: java.lang.NullPointerException\n\tat io.debezium.relational.TableEditorImpl.columnWithName(TableEditorImpl.java:46)\n\tat io.debezium.relational.TableEditorImpl.hasColumnWithName(TableEditorImpl.java:50)\n\tat io.debezium.relational.TableEditorImpl.lambda$updatePrimaryKeys$0(TableEditorImpl.java:103)\n\tat java.util.ArrayList.removeIf(ArrayList.java:1413)\n\tat io.debezium.relational.TableEditorImpl.updatePrimaryKeys(TableEditorImpl.java:102)\n\tat io.debezium.relational.TableEditorImpl.create(TableEditorImpl.java:267)\n\tat io.debezium.relational.Tables.lambda$overwriteTable$2(Tables.java:192)\n\tat io.debezium.util.FunctionalReadWriteLock.write(FunctionalReadWriteLock.java:84)\n\tat io.debezium.relational.Tables.overwriteTable(Tables.java:186)\n\tat io.debezium.jdbc.JdbcConnection.readSchema(JdbcConnection.java:1209)\n\tat io.debezium.connector.oracle.OracleSnapshotChangeEventSource.readTableStructure(OracleSnapshotChangeEventSource.java:181)\n\tat io.debezium.connector.oracle.OracleSnapshotChangeEventSource.readTableStructure(OracleSnapshotChangeEventSource.java:35)\n\tat io.debezium.relational.RelationalSnapshotChangeEventSource.doExecute(RelationalSnapshotChangeEventSource.java:114)\n\tat io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:76)\n\t... 8 more\n"}],"type":"source"}

Related

What are the extra topics created when creating and debezium source connector

Q1) Following is my config which I used while creating the kafka connector for MySQL source.
{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"snapshot.locking.mode": "minimal",
"database.user": "cdc_user",
"tasks.max": "3",
"database.history.kafka.bootstrap.servers": "10.49.115.X:9092,10.48.X.211:9092,10.X.178.121:9092,10.53.4.X:9092",
"database.history.kafka.topic": "history.cdc.fkw.supply.mp.seller_platform",
"database.server.name": "cdc.fkw.supply.mp",
"heartbeat.interval.ms": "5000",
"database.port": "3306",
"table.whitelist": "seller_platform.Contacts, seller_platform.EmailVerificationConfigs, seller_platform.financial_account_tag, seller_platform.HolidayConfigs, seller_platform.Preferences, seller_platform.Sellers",
"database.hostname": "something.cloud.in",
"database.password": "ABCDE",
"database.history.kafka.recovery.poll.interval.ms": "5000",
"name": "cdc.fkw.supply.mp.seller_platform.connector",
"database.history.skip.unparseable.ddl": "true",
"errors.tolerance": "all",
"database.whitelist": "seller_platform",
"snapshot.mode": "when_needed"
}
curl -s --location --request GET "http://10.24.18.167:80/connectors/cdc.fkw.supply.mp.seller_platform.connector/topics" | jq '.'
{
"cdc.fkw.supply.mp.seller_platform.connector": {
"topics": [
"cdc.fkw.supply.mp.seller_platform.Sellers",
"cdc.fkw.supply.mp",
"cdc.fkw.supply.mp.seller_platform.HolidayConfigs",
"cdc.fkw.supply.mp.seller_platform.EmailVerificationConfigs",
"cdc.fkw.supply.mp.seller_platform.Contacts",
"cdc.fkw.supply.mp.seller_platform.Preferences",
"__debezium-heartbeat.cdc.fkw.supply.mp",
"cdc.fkw.supply.mp.seller_platform.financial_account_tag"
]
}
}
Why cdc.fkw.supply.mp and __debezium-heartbeat.cdc.fkw.supply.mp topic gets created?
I see some garbage data inside these 2 topics.
Q2)
Is there any rest api to know the kafka connect converter configuration on the worker server?
If there is no API, then what is the path of the the configuration file where we store all worker properties?
This is the link of the worker properties:
https://docs.confluent.io/platform/current/connect/references/allconfigs.html
curl -s --location --request GET "http://10.24.18.167:80"
{"version":"6.1.1-ccs","commit":"c209f70c6c2e52ae","kafka_cluster_id":"snBlf-kfTdCYWEO9IIEXTA"}%
A1)
The heartbeat topic stores the details of all the kafka topics which the connector is using so that the connector can send heartbeat to it.
the database.server.name value named topic is created to store any schema changes that takes place in the database.
https://debezium.io/documentation/reference/1.7/connectors/mysql.html#mysql-schema-change-topic

Debezium should only read new changes

Even though I'm using snapshot.mode:schema_only, I'm getting complete records of the database whereas I only want the new ones. Are there any other modifications that I should do?
This is the config of my source connector:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" \
localhost:8083/connectors/ \
-d '{ "name": "inventory-connector",
"config": { "connector.class":"io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"snapshot.mode":"schema_only",
"database.hostname": "----",
"database.port": "3306",
"database.user": "---",
"database.password": "----",
"database.server.id": "1",
"database.server.name": "data",
"database.whitelist": "---",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.---"
}
}'

How to replicate all changes from source to destination db using debezium and confluent-sink-connector running on docker

The below code is my Dockerfile for Kafka-connect-JDBC and MySQL-driver
FROM debezium/connect:1.3
ENV KAFKA_CONNECT_JDBC_DIR=$KAFKA_CONNECT_PLUGINS_DIR/kafka-connect-jdbc
ENV MYSQL_DRIVER_VERSION 8.0.20
ARG KAFKA_JDBC_VERSION=5.5.0
RUN curl -k -SL "https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-${MYSQL_DRIVER_VERSION}.tar.gz" \
| tar -xzf - -C /kafka/libs --strip-components=1 mysql-connector-java-8.0.20/mysql-connector-java-${MYSQL_DRIVER_VERSION}.jar
RUN mkdir $KAFKA_CONNECT_JDBC_DIR && cd $KAFKA_CONNECT_JDBC_DIR &&\
curl -sO https://packages.confluent.io/maven/io/confluent/kafka-connect-jdbc/$KAFKA_JDBC_VERSION/kafka-connect-jdbc-$KAFKA_JDBC_VERSION.jar
docker build . --tag kafka kafka-connect-sink
Below is my source db json
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.99.102:8083/connectors/ -d '{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "184054",
"database.server.name": "dbserver1",
"database.include.list": "inventory",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.inventory"
}
}'
Below is my destination db sink json
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.99.102:8083/connectors/ -d '{
"name": "inventory-connector-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:mysql://192.168.0.104:3306/pk?useSSL=false",
"connection.user": "pavan",
"connection.password": "root",
"topics": "dbserver1.inventory.customers",
"table.name.format": "pk.customers",
"auto.create": "true",
"auto.evolve": "true",
"delete.enabled": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_key",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite"
}
}'
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.99.102:8083/connectors/ -d '{
"name": "inventory-connector-sink-addresses",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:mysql://192.168.0.104:3306/pk?useSSL=false",
"connection.user": "pavan",
"connection.password": "root",
"topics": "dbserver1.inventory.addresses",
"table.name.format": "pk.addresses",
"auto.create": "true",
"auto.evolve": "true",
"delete.enabled": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_key",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite"
}
}'
with this configuration i need to subscribe to each topic but problem is i had 100+ tables to get replicate in destination db is there anyway i can do it in single json configuration so that i can subscribe to all topics.
You can use topics (or topics.regex) property to define the list of topics to consume and table.name.format property of JBDC Sink connector or RegexRouter SMT (or combine them) to override destination table names:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.99.102:8083/connectors/ -d '{
"name": "inventory-connector-sink-addresses",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:mysql://192.168.0.104:3306/pk?useSSL=false",
"connection.user": "pavan",
"connection.password": "root",
"topics": "dbserver1.inventory.addresses,dbserver1.inventory.customers",
"auto.create": "true",
"auto.evolve": "true",
"delete.enabled": "true",
"insert.mode": "upsert",
"pk.fields": "",
"pk.mode": "record_key",
"transforms": "route,unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "pk.$3"
}
}'

Debezium Connector Outbox Transform

I am trying to use a MySql Source Connector with the Outbox SMT supported by debezium with the following config. I am using the latest jars of debezium-core and debezium-mysql-connector (1.1)
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost: 8083/connectors/ -d '{
"name": "debezium-mysql-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "MySql",
"database.port": "3306",
"database.user": "**",
"database.password": "**",
"database.server.id": "1033113244",
"database.server.name": "anomaly-changelog",
"database.whitelist": "anomaly",
"database.history.kafka.bootstrap.servers": "Kafka:9092",
"database.history.kafka.topic": "anomaly.schema.history",
"transforms": "outbox,reroute",
"transforms.reroute.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.reroute.regex": "(.*)",
"transforms.reroute.replacement": "$1-SMT",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter"
}
}'
But I am still getting the following error:
{"error_code":400,"message":"Connector configuration is invalid and contains the following 2 error(s):\nInvalid value io.debezium.transforms.outbox.EventRouter for configuration transforms.outbox.type: Class io.debezium.transforms.outbox.EventRouter could not be found.\nInvalid value null for configuration transforms.outbox.type: Not a Transformation}
I don't see why it is not being recognized.
You can try something like this:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" demo:8083/connectors/ -d '{ "name": "order-connector", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "tasks.max": "1", "database.hostname": "mariadb_order", "database.port": "3306", "database.user": "root", "database.password": "***", "database.server.id": "223344", "database.server.name": "orderdbserver","table.whitelist": "orderdb.outbox", "transforms": "outbox", "transforms.outbox.type" :"io.debezium.transforms.outbox.EventRouter", "database.history.kafka.bootstrap.servers": "kafka:9092", "database.history.kafka.topic":"dbhistory.orderdb", "transforms.outbox.table.fields.additional.placement" : "aggregateid:envelope:id" } }'

Kafka connect jdbc source mssql server loading millions record throwing out of memory error

I have tried to load 77 millions of record from MSSQL server to Kafka topic through Kafka connect JDBC source.
Tried batch approach given batch.max.rows as 1000. In this case, after 1000 records, it's throughout of memory. Please share suggestions on how to make it works
Below are connector approach i tried
curl -X POST http://test.com:8083/connectors -H "Content-Type: application/json" -d '{
"name": "mssql_jdbc_rsitem_pollx",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:sqlserver://test:1433;databaseName=xxx",
"connection.user": "xxxx",
"connection.password": "xxxx",
"topic.prefix": "mssql-rsitem_pollx-",
"mode":"incrementing",
"table.whitelist" : "test",
"timestamp.column.name": "itemid",
"max.poll.records" :"100",
"max.poll.interval.ms":"3000",
"validate.non.null": false
}
}'
curl -X POST http://test.com:8083/connectors -H "Content-Type: application/json" -d '{
"name": "mssql_jdbc_test_polly",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "10",
"connection.url": "jdbc:sqlserver://test:1433;databaseName=xxx;defaultFetchSize=10000;useCursorFetch=true",
"connection.user": "xxxx",
"connection.password": "xxxx",
"topic.prefix": "mssql-rsitem_polly-",
"mode":"incrementing",
"table.whitelist" : "test",
"timestamp.column.name": "itemid",
"poll.interval.ms":"86400000",
"validate.non.null": false
}
}'
try to increase Java heap size, write in command line:
export KAFKA_HEAP_OPTS="-Xms1g -Xmx2g"
you can change the "Xmx2g" part to match your capacity.