CDC PostgreSQL with Debezium while ignoring columns - postgresql

I have a postgreSQL table with 10 columns, we need to enable CDC on this table to capture changes only in ONE of the columns ignoring the other nine. Our debezium has the following configurations:
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.history.file.filename": "/data/postgresql-d-connection-teste3-history.dat",
"database.user": "postgres",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"database.dbname": "postgres",
"max.queue.size": "81290",
"tasks.max": "1",
"transforms": "Reroute",
"database.server.name": "xxxx",
"offset.flush.timeout.ms": "60000",
"transforms.Reroute.topic.regex": "(.*)",
"buffer.memory": "2048",
"database.port": "5432",
"plugin.name": "wal2json",
"offset.flush.interval.ms": "10000",
"tombstones.on.delete": "false",
"transforms.Reroute.topic.replacement": "teste3",
"decimal.handling.mode": "string",
"database.hostname": "xxxx",
"database.password": "xxxx",
"name": "postgresql-d-connection-teste3",
"table.include.list": "public.test",
"max.batch.size": "20480",
"database.history": "io.debezium.relational.history.FileDatabaseHistory"
}
We also already tried to set these parameters in the JSON, without success:
"column.include.list": "public.test.{id|name}"
"column.exclude.list": "public.test.{id|name}"
I have the same scenario working fine with MS SQL Server, where I only had to execute these query:
EXEC sys.sp_cdc_enable_table
#source_schema = N'xxxx',
#source_name = N'xxxx',
#captured_column_list = N'col1, col2, col3
#supports_net_changes = 0
Is it possible to achieve the same goal with postgreSQL?
Thanks in advance

Related

Debezium Connector - read from beginning and stop working connector

I am trying to use Debezium to connect to my Postgres database. I would like to copy data from a specific table. Using this configuration I only copy the newest data. Should I only change a snapshot.mode?
"name": "prod-contact-connect",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.user": "user",
"database.dbname": "db_name",
"slot.name": "debezium_contact",
"tasks.max": "1",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"publication.name": "dbz_publication",
"transforms": "unwrap",
"database.server.name": "connect.prod.contact",
"database.port": "5432",
"plugin.name": "pgoutput",
"table.whitelist": "specific_table_name",
"database.sslmode": "disable",
"database.hostname": "localhost",
"database.password": "pass",
"name": "prod-contact-connect",
"transforms.unwrap.add.fields": "op,table,schema,name",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"snapshot.mode": "never"
}
}
by the way, how can I stop working the debezium connector for a moment? There is some enable flag?

Debezium stops after initial sync

The initial sync works as expected but then the connector just stops and does not care about further table changes. There are no errors thrown and the connector is still marked as active and running.
Database: Amazon Postgres v10.7
Debezium config:
"name": "postgres_cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "...",
"database.port": "5432",
"database.user": "...",
"database.password": "...",
"database.dbname": "...",
"database.server.name": "...",
"table.whitelist": "public.table1,public.table2,public.table3",
"plugin.name": "pgoutput",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "unwrap, route, extractId",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": false,
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "[^.]+\\.[^.]+\\.(.+)",
"transforms.route.replacement": "postgres_$1",
"transforms.extractId.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractId.field": "id"
}
}
Any thoughts about what the problem could be?
Edit:
Log-Errors:
ERROR WorkerSourceTask{id=postgres_cdc-0} Failed to flush, timed out while waiting for producer to flush outstanding 75687 messages (org.apache.kafka.connect.runtime.WorkerSourceTask)
ERROR WorkerSourceTask{id=postgres_cdc-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)

Adding new Debezium Connector to Apache Kafka restarts snapshot

I'm using debezium/Kafka(v1.0) that uses Apache Kafka 2.4.
In addition, I have deployed a debezium mysql connector that monitor some tables configured to take snapshot at beginning, at this point is all good.
After a time I have the need to start monitoring other tables, so I create another connector, this time without snapshot because is not needed.
This produces that the first connector start to take the snapshot again.
Is this an expected behavior?
How is the procedure to start monitoring new tables without the others connector take the snapshot again?
thanks in advance.
EDIT configs added:
First Connector
{
"name": "orion_connector_con_snapshot_prod_v1",
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.hostname": "my_host",
"database.port": "3306",
"database.user": "my_db",
"database.password": "*********************",
"database.server.name": "orion_kafka",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "history_orion",
"database.history.skip.unparseable.ddl": "true",
"database.history.store.only.monitored.tables.ddl": "true",
"table.whitelist": "my_db.my_table_1,my_db.my_table_2,my_db.my_table_3",
"snapshot.mode": "when_needed",
"snapshot.locking.mode": "none"
}
The second connector that start the problem
{
"name": "nexo_impactos_connector_sin_snapshot_v1",
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.hostname": "my_host",
"database.port": "3306",
"database.user": "my_db",
"database.password": "*********************",
"database.server.name": "nexo_kafka",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "nexo_impactos",
"database.history.skip.unparseable.ddl": "true",
"database.history.store.only.monitored.tables.ddl": "true",
"table.whitelist": "other_db.other_table",
"snapshot.mode": "schema_only",
"snapshot.locking.mode": "none"
}

Multiple replication slot for debezium connector

I want to create multiple debezium connector with different replication slot. But I am Unable to create multiple replication slot for postgres debezium connector.
I am using docker container for Postgres & kafka. I tried setting up max_replication_slots = 2 in postgressql.conf file & also different slot.name. but still it did not create 2 replication slot for me.
{
"config": {
"batch.size": "49152",
"buffer.memory": "100663296",
"compression.type": "lz4",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.dbname": "Db1",
"database.hostname": "DBhost",
"database.password": "dbpwd",
"database.port": "5432",
"database.server.name": "serve_name",
"database.user": "usename",
"decimal.handling.mode": "double",
"hstore.handling.mode": "json",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "debezium-702771",
"plugin.name": "wal2json",
"schema.refresh.mode": "columns_diff_exclude_unchanged_toast",
"slot.drop_on_stop": "true",
"slot.name": "debezium1",
"table.whitelist": "tabel1",
"time.precision.mode": "adaptive_time_microseconds",
"transforms": "Reroute",
"transforms.Reroute.topic.regex": "(.*).public.(.*)",
"transforms.Reroute.topic.replacement": "$1.$2",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"
},
"name": "debezium-702771",
"tasks": [],
"type": "source"
}
{
"config": {
"batch.size": "49152",
"buffer.memory": "100663296",
"compression.type": "lz4",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.dbname": "Db1",
"database.hostname": "DBhost",
"database.password": "dbpwd",
"database.port": "5432",
"database.server.name": "serve_name",
"database.user": "usename",
"decimal.handling.mode": "double",
"hstore.handling.mode": "json",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "debezium-702772",
"plugin.name": "wal2json",
"schema.refresh.mode": "columns_diff_exclude_unchanged_toast",
"slot.drop_on_stop": "true",
"slot.name": "debezium2",
"table.whitelist": "tabel1",
"time.precision.mode": "adaptive_time_microseconds",
"transforms": "Reroute",
"transforms.Reroute.topic.regex": "(.*).public.(.*)",
"transforms.Reroute.topic.replacement": "$1.$2",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"
},
"name": "debezium-702772",
"tasks": [],
"type": "source"
}
It creates multiple connector but not multiple replication slot even after giving different slot name. Do I need to do anything over here.

kafka connector jdbc-sink syntax error at the end

i have an issue about jdbc-sink with this arch.
postgres1 ---> kafka ---> postgres2
the producer working fine, but the consumer has an error :
connect_1 | org.apache.kafka.connect.errors.RetriableException:
java.sql.SQLException: java.sql.BatchUpdateException: Batch entry 0
INSERT INTO "customers" ("id") VALUES (1) ON CONFLICT ("id") DO UPDATE
SET was aborted: ERROR: syntax error at end of input connect_1 |
Position: 77 Call getNextException to see other errors in the batch.
this is my source.json
{
"name": "src-table",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "postgres1_container",
"database.port": "5432",
"database.user": "postgres",
"database.password": "postgres",
"database.dbname": "postgres",
"database.whitelist": "postgres",
"database.server.name": "postgres1",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$3"
}
and this my jdbc-sink.json
{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "customers",
"connection.url": "jdbc:postgresql://postgres2_container:5432/postgres?user=postgres&password=postgres",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
debezium/zookeeper : 0.9
debezium/kafka:0.9
debezium/postgres:9.6
debezium/connect:0.9
PostgreSQL JDBC Driver 42.2.5
Kafka Connect JDBC 5.2.1
i tried to downgrade jdbc driver and confluent kafka connect but still have the same error
solve, the problem coz while i create a table in postgres1, i did not set the id to a PK value
Same issue,
I think this is an issue on JDBC Connector, when the table has only primary key columns and no other column there is nothing to update and therefore the statement syntax is wrong as it always excepts a column to update after the on-conflict.
One solution could be to add additional columns to that table, of course this is not a solution but a quick and dirty workaround.
Another solution is to upgrade the JDBC, I tested the same with kafka-connect-jdbc-10.4.0 and seems that this issue is no longer present.