kafka connector jdbc-sink syntax error at the end - postgresql

i have an issue about jdbc-sink with this arch.
postgres1 ---> kafka ---> postgres2
the producer working fine, but the consumer has an error :
connect_1 | org.apache.kafka.connect.errors.RetriableException:
java.sql.SQLException: java.sql.BatchUpdateException: Batch entry 0
INSERT INTO "customers" ("id") VALUES (1) ON CONFLICT ("id") DO UPDATE
SET was aborted: ERROR: syntax error at end of input connect_1 |
Position: 77 Call getNextException to see other errors in the batch.
this is my source.json
{
"name": "src-table",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "postgres1_container",
"database.port": "5432",
"database.user": "postgres",
"database.password": "postgres",
"database.dbname": "postgres",
"database.whitelist": "postgres",
"database.server.name": "postgres1",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$3"
}
and this my jdbc-sink.json
{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "customers",
"connection.url": "jdbc:postgresql://postgres2_container:5432/postgres?user=postgres&password=postgres",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
debezium/zookeeper : 0.9
debezium/kafka:0.9
debezium/postgres:9.6
debezium/connect:0.9
PostgreSQL JDBC Driver 42.2.5
Kafka Connect JDBC 5.2.1
i tried to downgrade jdbc driver and confluent kafka connect but still have the same error

solve, the problem coz while i create a table in postgres1, i did not set the id to a PK value

Same issue,
I think this is an issue on JDBC Connector, when the table has only primary key columns and no other column there is nothing to update and therefore the statement syntax is wrong as it always excepts a column to update after the on-conflict.
One solution could be to add additional columns to that table, of course this is not a solution but a quick and dirty workaround.
Another solution is to upgrade the JDBC, I tested the same with kafka-connect-jdbc-10.4.0 and seems that this issue is no longer present.

Related

CDC PostgreSQL with Debezium while ignoring columns

I have a postgreSQL table with 10 columns, we need to enable CDC on this table to capture changes only in ONE of the columns ignoring the other nine. Our debezium has the following configurations:
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.history.file.filename": "/data/postgresql-d-connection-teste3-history.dat",
"database.user": "postgres",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"database.dbname": "postgres",
"max.queue.size": "81290",
"tasks.max": "1",
"transforms": "Reroute",
"database.server.name": "xxxx",
"offset.flush.timeout.ms": "60000",
"transforms.Reroute.topic.regex": "(.*)",
"buffer.memory": "2048",
"database.port": "5432",
"plugin.name": "wal2json",
"offset.flush.interval.ms": "10000",
"tombstones.on.delete": "false",
"transforms.Reroute.topic.replacement": "teste3",
"decimal.handling.mode": "string",
"database.hostname": "xxxx",
"database.password": "xxxx",
"name": "postgresql-d-connection-teste3",
"table.include.list": "public.test",
"max.batch.size": "20480",
"database.history": "io.debezium.relational.history.FileDatabaseHistory"
}
We also already tried to set these parameters in the JSON, without success:
"column.include.list": "public.test.{id|name}"
"column.exclude.list": "public.test.{id|name}"
I have the same scenario working fine with MS SQL Server, where I only had to execute these query:
EXEC sys.sp_cdc_enable_table
#source_schema = N'xxxx',
#source_name = N'xxxx',
#captured_column_list = N'col1, col2, col3
#supports_net_changes = 0
Is it possible to achieve the same goal with postgreSQL?
Thanks in advance

What property to use in kafka mysql source connector to register a new version for any schema change

This is the configuration of schema-registry.properties
listeners=http://10.X.X.76:8081
kafkastore.bootstrap.servers=PLAINTEXT://10.XXX:9092,PLAINTEXT://10.XXX:9092,PLAINTEXT://10.XXXX.1:9092,PLAINTEXT://1XXXX.69:9092
kafkastore.topic=_schemas
debug=false
master.eligibility=true
This is the configuration of my connector,
{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"snapshot.locking.mode": "minimal",
"database.user": "cdc_user",
"tasks.max": "3",
"database.history.kafka.bootstrap.servers": "10.49.115.249:9092,10.48.130.211:9092,10.54.178.121:9092,10.53.4.69:9092",
"database.history.kafka.topic": "history.cdc.fkw.supply.mp.seller_facility",
"database.server.name": "cdc.fkw.supply.mp",
"heartbeat.interval.ms": "5000",
"database.port": "3306",
"table.whitelist": "seller_facility.addresses, seller_facility.location, seller_facility.default_location, seller_facility.location_document_mapping",
"database.hostname": "dog-rr.ffb-supply-ffb-supply-mp.prod.altair.fkcloud.in",
"database.password": "6X5DpJrVzI",
"database.history.kafka.recovery.poll.interval.ms": "5000",
"name": "cdc.fkw.supply.mp.seller_facility.connector",
"database.history.skip.unparseable.ddl": "true",
"errors.tolerance": "all",
"database.whitelist": "seller_facility",
"snapshot.mode": "when_needed"
}
How do I register a new schema when there is any change in the schema?
What property can I add to do so, so that it just adds a new version to schema registry for that particular topic and is fully compatible.
Assuming your key/value.converter are using one of the Confluent ones, such as AvroConverter for example, any new/removed database columns will automatically be picked up by the Connect framework, and registered to the Registry as part of the serialization in the KafkaAvroSerializer process.
Changing the database column types might generate errors, for example, changing VARCHAR to INT

Debezium stops after initial sync

The initial sync works as expected but then the connector just stops and does not care about further table changes. There are no errors thrown and the connector is still marked as active and running.
Database: Amazon Postgres v10.7
Debezium config:
"name": "postgres_cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "...",
"database.port": "5432",
"database.user": "...",
"database.password": "...",
"database.dbname": "...",
"database.server.name": "...",
"table.whitelist": "public.table1,public.table2,public.table3",
"plugin.name": "pgoutput",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "unwrap, route, extractId",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": false,
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "[^.]+\\.[^.]+\\.(.+)",
"transforms.route.replacement": "postgres_$1",
"transforms.extractId.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractId.field": "id"
}
}
Any thoughts about what the problem could be?
Edit:
Log-Errors:
ERROR WorkerSourceTask{id=postgres_cdc-0} Failed to flush, timed out while waiting for producer to flush outstanding 75687 messages (org.apache.kafka.connect.runtime.WorkerSourceTask)
ERROR WorkerSourceTask{id=postgres_cdc-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)

Kafka-confluent: How to use pk.mode=record_key for upsert and delete mode in JDBC sink connector?

In Kafka confluent, how can we use upsert using the source as CSV file while using pk.mode=record_key for composite key in the MySQL table? The upsert mode is working while using the pk.mode=record_values. Is there any additional configuration that needs to be done?
I am getting this error if I am trying with pk.mode=record_key. Error - Caused by: org.apache.kafka.connect.errors.ConnectException: Need exactly one PK column defined since the key schema for records is a primitive type.
Below is my JDBC sink connector configuration:
{
"name": "<name>",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "<topic name>",
"connection.url": "<url>",
"connection.user": "<user name>",
"connection.password": "*******",
"insert.mode": "upsert",
"batch.size": "50000",
"table.name.format": "<table name>",
"pk.mode": "record_key",
"pk.fields": "field1,field2",
"auto.create": "true",
"auto.evolve": "true",
"max.retries": "10",
"retry.backoff.ms": "3000",
"mode": "bulk",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://localhost:8081"
}
}
You need to use pk.mode of record.value.
This means take field(s) from the value of the message and use them as the primary key in the target table and for UPSERT purposes.
If you set record.key it will try to take the key field(s) from the Kafka message key. Unless you've actually got the values in your message key, this is not the setting that you want to use.
These might help you further:
Kafka Connect JDBC Sink deep-dive: Working with Primary Keys
📹https://rmoff.dev/kafka-jdbc-video
📹https://rmoff.dev/ksqldb-jdbc-sink-video

Kafka Connect date handling of debezium generated events

I'm using debezium SQL Server to track changes on a production base.
The topic is created, CDC is working like a charm, but when trying to use jdbcSinkConnector to dump the data in another Sql Server DB, I'm encountering the following error.
com.microsoft.sqlserver.jdbc.SQLServerException: One or more values is out of range of values for the datetime2 SQL Server data type
On the source database the sql datatype is timestamp2(7).
The kafka event is 1549461754650000000.
The schema type is INT64.
The schema name io.debezium.time.NanoTimestamp.
I can't find a way to tell the TimestampConverter that is value isn't expressed in millis, or micros, but nanoseconds (would not work with microseconds anyway).
here is my connector configuration
{
"name": "cdc.swip.bi.ods.sink.contract",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "swip.swip_core.contract",
"connection.url": "jdbc:sqlserver://someip:1234;database=DB",
"connection.user": "loloolololo",
"connection.password": "muahahahahaha",
"dialect.name": "SqlServerDatabaseDialect",
"auto.create": "false",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schemas.enable": "true",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://localhost:8081",
"transforms": "unwrap,created_date,modified_date",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.created_date.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.created_date.target.type": "Timestamp",
"transforms.created_date.field": "created_date",
"transforms.modified_date.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.modified_date.target.type": "Timestamp",
"transforms.modified_date.field": "modified_date",
"insert.mode": "insert",
"delete.enabled": "false",
"pk.fields": "id",
"pk.mode": "record_value",
"schema.registry.url": "http://localhost:8081",
"table.name.format": "ODS.swip.contract"
}
}
there is a missing feature in the SQL Server connector - DBZ-1419.
You can workaround the problem by writing your own SMT that would do the field conversion on the sink side before it is processed by the JDBC connector.
I forgort to post the answer.
The property "time.precision.mode":"connect" does the trick
https://debezium.io/documentation/reference/connectors/sqlserver.html#sqlserver-property-time-precision-mode
{
"name":"debezium-connector-sqlserver",
"config": {
"connector.class":"io.debezium.connector.sqlserver.SqlServerConnector",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter.schemas.enable":"true",
"database.hostname":"someHost",
"database.port":"somePort",
"database.user":"someUser",
"database.password":"somePassword",
"database.dbname":"someDb",
"database.server.name":"xxx.xxx",
"database.history.kafka.topic":"xxx.xxx.history",
"time.precision.mode":"connect",
"database.history.kafka.bootstrap.servers":"example.com:9092"
}
}