Schema Changes Not Replicating with Kafka Connect - apache-kafka

The goal is: MySQL -> Kafka -> MySQL. The sink destination should be up to date with production.
Inserting and deleting records is fine, but I am having issues with schema changes, such as a dropped column. The changes are not replicating to the sink destination.
My source:
{
"name": "hub-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "42",
"database.server.name": "study",
"database.include.list": "companyHub",
"database.history.kafka.bootstrap.servers": "broker:29092",
"database.history.kafka.topic": "dbhistory.study",
"include.schema.changes": "true"
}
}
My sink
{
"name": "companyHub-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:mysql://172.18.141.102:3306/Hub",
"connection.user": "user",
"connection.password": "passaword",
"topics": "study.companyHub.countryTaxes, study.companyHub.addresses",
"auto.create": "true",
"auto.evolve": "true",
"delete.enabled": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_key",
"transforms": "dropPrefix, unwrap",
"transforms.dropPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex":"study.companyHub.(.*)$",
"transforms.dropPrefix.replacement":"$1",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite"
}
}
Any help would be greatly appreciated!

Auto-creation and Auto-evoluton
Tip
Make sure the JDBC user has the appropriate permissions for DDL.
If auto.create is enabled, the connector can CREATE the destination table if it is found to be missing. The creation takes place online with records being consumed from the topic, since the connector uses the record schema as a basis for the table definition. Primary keys are specified based on the key configuration settings.
If auto.evolve is enabled, the connector can perform limited auto-evolution by issuing ALTER on the destination table when it encounters a record for which a column is found to be missing. Since data-type changes and removal of columns can be dangerous, the connector does not attempt to perform such evolutions on the table. Addition of primary key constraints is also not attempted. In contrast, if auto.evolve is disabled no evolution is performed and the connector task fails with an error stating the missing columns.
For both auto-creation and auto-evolution, the nullability of a column is based on the optionality of the corresponding field in the schema, and default values are also specified based on the default value of the corresponding field if applicable. We use the following mapping from Connect schema types to database-specific types:
Schema Type MySQL Oracle PostgreSQL SQLite SQL Server Vertica
Auto-creation or auto-evolution is not supported for databases not mentioned here.
Important
For backwards-compatible table schema evolution, new fields in record schemas must be optional or have a default value. If you need to delete a field, the table schema should be manually altered to either drop the corresponding column, assign it a default value, or make it nullable.

Related

Some rows in the Postgres table can generate CDC while others cannot

I have a Postgres DB with CDC setup.
I deployed the Kafka Debezium connector 1.8.0.Final for a Postgres DB by
POST http://localhost:8083/connectors
with body:
{
"name": "postgres-kafkaconnector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "example.com",
"database.port": "5432",
"database.dbname": "my_db",
"database.user": "xxx",
"database.password": "xxx",
"database.server.name": "postgres_server",
"table.include.list": "public.products",
"plugin.name": "pgoutput"
}
}
I noticed some strange things.
In same table, when I update rows, some rows can generate CDC, but other rows cannot generate CDC.
And those rows are very similar except for id and identifier are different.
-- Updating this row can generate CDC
UPDATE public.products
SET identifier = 'GET /api/accounts2'
WHERE id = '90c21719-ce41-4523-8ad1-ed6b21ecfaf1';
-- Updating this row cannot generate CDC
UPDATE public.products
SET identifier = 'GET /api/notworking/accounts2'
WHERE id = '22f5ebf3-9594-493d-8aa6-649d9fbcefd2';
I checked my Kafka Connect container log, there is no error neither.
Any idea?
Found the issue! It is because my Kafka Connector postgres-kafkaconnector was initially pointing to a DB (stage1), then I switched to another DB (stage2) by updating
"database.hostname": "example.com",
"database.port": "5432",
"database.dbname": "my_db",
"database.user": "xxx",
"database.password": "xxx",
However, they are using same configuration properties in the Kafka Connect I deployed in the very beginning:
config.storage.topic
offset.storage.topic
status.storage.topic
Since this connector with different DB config shared same above Kafka configuration properties, nd the database table schemas are same,
it became mess due to sharing same Kafka offset.
One simple way to fix is when deploying Kafka connector to test on different DBs, using different names such as postgres-kafkaconnector-stage1 and postgres-kafkaconnector-stage2 to avoid Kafka topic offset mess.

Kafka MongoDB Sink only one record from table

I am using ksqlDB, Where I have created a table from the stream. when I fire select query in that table it gives me all the record properly. Now I want to sink that table in MongoDB. I am also able to create a sink between the Kafka table to MongoDB. But somehow it sinks only one record into it(MongoDB). Whereas in the table I have 100 records. Below is my MongoDB sink connector.
{
"name": "MongoSinkConnectorConnector_1",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"topics": "FEEDS",
"connection.uri": "mongodb://xxx:xxx#x.x.x.x:27017/",
"database": "xxx",
"max.num.retries": "1000000",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneTimestampsStrategy",
"value.projection.type": "allowlist",
"value.projection.list": "id",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"buffer.capacity": "20000",
"value.converter.schema.registry.url": "http://x.x.x.x:8081",
"key.converter.schemas.enable": "false",
"insert.mode": "upsert"
}
}
I could not able to understand that, what's the reason behind that. Any help appreciated. Thank you
you can set the "batch.size" property in the Sink Connector property and also you can have a better write model strategy that you can read through Mongo DB Source Connector Official documentation https://docs.confluent.io/cloud/current/connectors/cc-mongo-db-source.html.

Is it possible to sink kafka message generated by debezium to snowflake

I use debezium-ui repo to testing debezium mysql cdc feature, the message can normally stream
into kafka, the request body of to create mysql connect are as follows:
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "dbzui-db-mysql",
"database.port": "3306",
"database.user": "mysqluser",
"database.password": "mysql",
"database.server.id": "184054",
"database.server.name": "inventory-connector-mysql",
"database.include.list": "inventory",
"database.history.kafka.bootstrap.servers": "dbzui-kafka:9092",
"database.history.kafka.topic": "dbhistory.inventory"
}
}
And then I need to sink the kafka message into snowflake, the data warehouse my team use. I create a snowflake sink connector to sink it, the request body are as follows:
{
"name": "kafka2-04",
"config": {
"connector.class": "com.snowflake.kafka.connector.SnowflakeSinkConnector",
"tasks.max": 1,
"topics": "inventory-connector-mysql.inventory.orders",
"snowflake.topic2table.map": "inventory-connector-mysql.inventory.orders:tbl_orders",
"snowflake.url.name": "**.snowflakecomputing.com",
"snowflake.user.name": "kafka_connector_user_1",
"snowflake.private.key": "*******",
"snowflake.private.key.passphrase": "",
"snowflake.database.name": "kafka_db",
"snowflake.schema.name": "kafka_schema",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "com.snowflake.kafka.connector.records.SnowflakeJsonConverter",
"header.converter": "org.apache.kafka.connect.storage.SimpleHeaderConverter",
"value.converter.schemas.enable":"true"
}
}
But after it runs,the data sink into my snowflake is like: data in snowflake, the schema in snowflake table is different from mysql table. Is my sink connector config is incorrect or it is impossible to sink kafka data generated by debezium with SnowflakeSinkConnector.
This is default behavior in Snowflake and it is documented here:
Every Snowflake table loaded by the Kafka connector has a schema consisting of two VARIANT columns:
RECORD_CONTENT. This contains the Kafka message.
RECORD_METADATA. This contains metadata about the message, for example, the topic from which the message was read.
If Snowflake creates the table, then the table contains only these two columns. If the user creates the table for the Kafka Connector to add rows to, then the table can contain more than these two columns (any additional columns must allow NULL values because data from the connector does not include values for those columns).

table.include.list configuration parameter not working in Debezium Postgres Connector

I am using Debezium Postgres connector to capture changes in postgres tables.
The documentation for this connector
https://debezium.io/documentation/reference/connectors/postgresql.html
mentions a configuration parameter
table.include.list
However when I set the value of this parameter to 'config.abc'. Even after that changes from both tables in config schema (namely abc and def) are getting streamed.
The reason I want to do this is that I want to create 2 separate connectors for each of the 2 tables to split the load and faster change data streaming.
Is this a known issue ? Anyway to overcome this ?
The same problem here (Debezium Version: 1.1.0.Final and postgresql:11.13.0). Below is my configuration:
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"transforms.unwrap.delete.handling.mode": "rewrite",
"slot.name": "debezium_planning",
"transforms": "unwrap,extractInt",
"include.schema.changes": "false",
"decimal.handling.mode": "string",
"database.schema": "partsmaster",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.converters.LongConverter",
"database.user": "************",
"database.dbname": "smartng-db",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"database.server.name": "ASLog.m3d.smartng-db",
"database.port": "***********",
"plugin.name": "pgoutput",
"column.exclude.list": "create_date, create_user, change_date, change_user",
"transforms.extractInt.field": "planning_id",
"database.hostname": "************",
"database.password": "*************",
"name": "ASLog.m3d.source-postgres-smartng-planning",
"transforms.unwrap.add.fields": "op,table,lsn,source.ts_ms",
"table.include.list": "partsmaster.planning",
"snapshot.mode": "never"
}
A change in another table partsmaster.basic is causing this connector to fail because the attribute planning_id is not available in table partsmaster.basic.
I need separate connectors for each table. table.include.list is not working. Neither table.exclude.list does.
My last resort would be to use a separate Postgres Publication for each connector. But I still hope I can find an immediate configuration solution here or be pointed out at what I am missing. In a previous version I have used table.whitelist without problems.
Solution:
create seperate publications to each table:
CREATE PUBLICATION dbz_planning FOR TABLE partsmaster.planning;
drop previous replication slot:
select pg_drop_replication_slot('debezium_planning');
Change your connector configurations:
{
"slot.name": "dbz_planning",
"publication.name": "planning_publication"
}
et voilĂ !

How to use the Kafka Connect JDBC to source PostgreSQL with multiple schemas that contain tables with the same name?

I need to source data from a PostgreSQL database with ~2000 schemas. All schemas contain the same tables (it is a multi-tenant application).
The connector is configured as following:
{
"name": "postgres-source",
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"timestamp.column.name": "updated",
"incrementing.column.name": "id",
"connection.password": "********",
"tasks.max": "1",
"mode": "timestamp+incrementing",
"topic.prefix": "postgres-source-",
"connection.user": "*********",
"poll.interval.ms": "3600000",
"numeric.mapping": "best_fit",
"connection.url": "jdbc:postgresql://*******:5432/*******",
"table.whitelist": "table1",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"false"
}
With this configuration, I get this error:
"The connector uses the unqualified table name as the topic name and has detected duplicate unqualified table names. This could lead to mixed data types in the topic and downstream processing errors. To prevent such processing errors, the JDBC Source connector fails to start when it detects duplicate table name configurations"
Apparently the connector does not want to publish data from multiple tables with the same name to a single topic.
This does not matter to me, it could go to a single topic or multiple topics (one for each schema).
As an additional info, if I add:
"schema.pattern": "schema1"
to the config, the connector works and the data from the specified schema and table is copied.
Is there a way to copy multiple schemas that contain tables with the same name?
Thank you