What property to use in kafka mysql source connector to register a new version for any schema change - apache-kafka

This is the configuration of schema-registry.properties
listeners=http://10.X.X.76:8081
kafkastore.bootstrap.servers=PLAINTEXT://10.XXX:9092,PLAINTEXT://10.XXX:9092,PLAINTEXT://10.XXXX.1:9092,PLAINTEXT://1XXXX.69:9092
kafkastore.topic=_schemas
debug=false
master.eligibility=true
This is the configuration of my connector,
{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"snapshot.locking.mode": "minimal",
"database.user": "cdc_user",
"tasks.max": "3",
"database.history.kafka.bootstrap.servers": "10.49.115.249:9092,10.48.130.211:9092,10.54.178.121:9092,10.53.4.69:9092",
"database.history.kafka.topic": "history.cdc.fkw.supply.mp.seller_facility",
"database.server.name": "cdc.fkw.supply.mp",
"heartbeat.interval.ms": "5000",
"database.port": "3306",
"table.whitelist": "seller_facility.addresses, seller_facility.location, seller_facility.default_location, seller_facility.location_document_mapping",
"database.hostname": "dog-rr.ffb-supply-ffb-supply-mp.prod.altair.fkcloud.in",
"database.password": "6X5DpJrVzI",
"database.history.kafka.recovery.poll.interval.ms": "5000",
"name": "cdc.fkw.supply.mp.seller_facility.connector",
"database.history.skip.unparseable.ddl": "true",
"errors.tolerance": "all",
"database.whitelist": "seller_facility",
"snapshot.mode": "when_needed"
}
How do I register a new schema when there is any change in the schema?
What property can I add to do so, so that it just adds a new version to schema registry for that particular topic and is fully compatible.

Assuming your key/value.converter are using one of the Confluent ones, such as AvroConverter for example, any new/removed database columns will automatically be picked up by the Connect framework, and registered to the Registry as part of the serialization in the KafkaAvroSerializer process.
Changing the database column types might generate errors, for example, changing VARCHAR to INT

Related

How to add table to Debezium Postgres connector

What should be the steps for adding a new table into a Postgres connector? my connector is tracking 2 tables (table1 and table2) and I want to add anther table (table3) that already exists and have data in my DB.
This is my current config:
{
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.dbname": "db_name",
"database.hostname": "HOST_URL",
"database.password": "$PASSWORD",
"database.port": "5432",
"database.server.name": "db_data",
"database.sslmode": "require",
"database.user": "user_replication",
"plugin.name": "pgoutput",
"publication.autocreate.mode": "disabled",
"table.include.list": "public.table1, public.table2"
},
"name": "db-to-kafka-source"
}
I have tried to modify the connector "table.include.list" and add "public.table3" to the list, but it doesn't seems to trigger the snapshot process for this table.
Any ideas?
no the snapshot will not be triggered. This should be solved with ad-hoc snapshots planned for 1.6. In the meantime you can fire up a temprorary connector just to execut the snapshot of the new table and then resume the original one.

Debezium Connector - read from beginning and stop working connector

I am trying to use Debezium to connect to my Postgres database. I would like to copy data from a specific table. Using this configuration I only copy the newest data. Should I only change a snapshot.mode?
"name": "prod-contact-connect",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.user": "user",
"database.dbname": "db_name",
"slot.name": "debezium_contact",
"tasks.max": "1",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"publication.name": "dbz_publication",
"transforms": "unwrap",
"database.server.name": "connect.prod.contact",
"database.port": "5432",
"plugin.name": "pgoutput",
"table.whitelist": "specific_table_name",
"database.sslmode": "disable",
"database.hostname": "localhost",
"database.password": "pass",
"name": "prod-contact-connect",
"transforms.unwrap.add.fields": "op,table,schema,name",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"snapshot.mode": "never"
}
}
by the way, how can I stop working the debezium connector for a moment? There is some enable flag?

Kafka-confluent: How to use pk.mode=record_key for upsert and delete mode in JDBC sink connector?

In Kafka confluent, how can we use upsert using the source as CSV file while using pk.mode=record_key for composite key in the MySQL table? The upsert mode is working while using the pk.mode=record_values. Is there any additional configuration that needs to be done?
I am getting this error if I am trying with pk.mode=record_key. Error - Caused by: org.apache.kafka.connect.errors.ConnectException: Need exactly one PK column defined since the key schema for records is a primitive type.
Below is my JDBC sink connector configuration:
{
"name": "<name>",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "<topic name>",
"connection.url": "<url>",
"connection.user": "<user name>",
"connection.password": "*******",
"insert.mode": "upsert",
"batch.size": "50000",
"table.name.format": "<table name>",
"pk.mode": "record_key",
"pk.fields": "field1,field2",
"auto.create": "true",
"auto.evolve": "true",
"max.retries": "10",
"retry.backoff.ms": "3000",
"mode": "bulk",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://localhost:8081"
}
}
You need to use pk.mode of record.value.
This means take field(s) from the value of the message and use them as the primary key in the target table and for UPSERT purposes.
If you set record.key it will try to take the key field(s) from the Kafka message key. Unless you've actually got the values in your message key, this is not the setting that you want to use.
These might help you further:
Kafka Connect JDBC Sink deep-dive: Working with Primary Keys
📹https://rmoff.dev/kafka-jdbc-video
📹https://rmoff.dev/ksqldb-jdbc-sink-video

Adding new Debezium Connector to Apache Kafka restarts snapshot

I'm using debezium/Kafka(v1.0) that uses Apache Kafka 2.4.
In addition, I have deployed a debezium mysql connector that monitor some tables configured to take snapshot at beginning, at this point is all good.
After a time I have the need to start monitoring other tables, so I create another connector, this time without snapshot because is not needed.
This produces that the first connector start to take the snapshot again.
Is this an expected behavior?
How is the procedure to start monitoring new tables without the others connector take the snapshot again?
thanks in advance.
EDIT configs added:
First Connector
{
"name": "orion_connector_con_snapshot_prod_v1",
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.hostname": "my_host",
"database.port": "3306",
"database.user": "my_db",
"database.password": "*********************",
"database.server.name": "orion_kafka",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "history_orion",
"database.history.skip.unparseable.ddl": "true",
"database.history.store.only.monitored.tables.ddl": "true",
"table.whitelist": "my_db.my_table_1,my_db.my_table_2,my_db.my_table_3",
"snapshot.mode": "when_needed",
"snapshot.locking.mode": "none"
}
The second connector that start the problem
{
"name": "nexo_impactos_connector_sin_snapshot_v1",
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.hostname": "my_host",
"database.port": "3306",
"database.user": "my_db",
"database.password": "*********************",
"database.server.name": "nexo_kafka",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "nexo_impactos",
"database.history.skip.unparseable.ddl": "true",
"database.history.store.only.monitored.tables.ddl": "true",
"table.whitelist": "other_db.other_table",
"snapshot.mode": "schema_only",
"snapshot.locking.mode": "none"
}

kafka connector jdbc-sink syntax error at the end

i have an issue about jdbc-sink with this arch.
postgres1 ---> kafka ---> postgres2
the producer working fine, but the consumer has an error :
connect_1 | org.apache.kafka.connect.errors.RetriableException:
java.sql.SQLException: java.sql.BatchUpdateException: Batch entry 0
INSERT INTO "customers" ("id") VALUES (1) ON CONFLICT ("id") DO UPDATE
SET was aborted: ERROR: syntax error at end of input connect_1 |
Position: 77 Call getNextException to see other errors in the batch.
this is my source.json
{
"name": "src-table",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "postgres1_container",
"database.port": "5432",
"database.user": "postgres",
"database.password": "postgres",
"database.dbname": "postgres",
"database.whitelist": "postgres",
"database.server.name": "postgres1",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$3"
}
and this my jdbc-sink.json
{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "customers",
"connection.url": "jdbc:postgresql://postgres2_container:5432/postgres?user=postgres&password=postgres",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
debezium/zookeeper : 0.9
debezium/kafka:0.9
debezium/postgres:9.6
debezium/connect:0.9
PostgreSQL JDBC Driver 42.2.5
Kafka Connect JDBC 5.2.1
i tried to downgrade jdbc driver and confluent kafka connect but still have the same error
solve, the problem coz while i create a table in postgres1, i did not set the id to a PK value
Same issue,
I think this is an issue on JDBC Connector, when the table has only primary key columns and no other column there is nothing to update and therefore the statement syntax is wrong as it always excepts a column to update after the on-conflict.
One solution could be to add additional columns to that table, of course this is not a solution but a quick and dirty workaround.
Another solution is to upgrade the JDBC, I tested the same with kafka-connect-jdbc-10.4.0 and seems that this issue is no longer present.