kafka sink connector. How insert record_key to table if record_value = null? - postgresql

I have a topic where the data is only record_key and the record_value is null. I need to write the fields from the record_key to the table through the sink connector, but I get an error. How do I insert values from a record_key, ignoring record_value = null?
{
"name": "ObjectForDelete.sink",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"transforms": [
"uuid",
"name"
],
"topics": [
"ObjectForDeletePG"
],
"transforms.uuid.type": "org.apache.kafka.connect.transforms.InsertField$Key",
"transforms.uuid.topic.field": "key.uuid",
"transforms.name.type": "org.apache.kafka.connect.transforms.InsertField$Key",
"transforms.name.topic.field": "key.name",
"connection.url": "jdbc:postgresql://127.0.0.1:5432/ulk",
"connection.user": "root",
"connection.password": "****",
"dialect.name": "PostgreSqlDatabaseDialect",
"insert.mode": "insert",
"table.name.format": "ObjectForDelete",
"pk.mode": "none",
"pk.fields": [],
"db.timezone": "Europe/Kiev"
}
ERROR:
Caused by: org.apache.kafka.connect.errors.ConnectException: Sink connector 'ObjectForDelete.sink' is configured with 'delete.enabled=false' and 'pk.mode=kafka' and therefore requires records with a non-null Struct value and non-null Struct schema, but found record at (topic='ObjectForDeletePG',partition=0,offset=0,timestamp=1623758581060) with a null value and null value schema.

Related

Only Map objects supported in absence of schema for record conversion to BigQuery format

I'm streaming data from Postgres to Kakfa to Big Query. Most tables in PG have a primary key, as such most tables/topics have an Avro key and value schema, these all go to Big Query fine.
I do have a couple of tables that do not have a PK, and subsequently have no Avro key schema.
When I create a sink connector for those tables the connector errors with,
Caused by: com.wepay.kafka.connect.bigquery.exception.ConversionConnectException: Only Map objects supported in absence of schema for record conversion to BigQuery format.
If I remove the 'key.converter' config then I get 'Top-level Kafka Connect schema must be of type 'struct'' error.
How do I handle this?
Here's the connector config for reference,
{
"project": "staging",
"defaultDataset": "data_lake",
"keyfile": "<redacted>",
"keySource": "JSON",
"sanitizeTopics": "true",
"kafkaKeyFieldName": "_kid",
"autoCreateTables": "true",
"allowNewBigQueryFields": "true",
"upsertEnabled": "false",
"bigQueryRetry": "5",
"bigQueryRetryWait": "120000",
"bigQueryPartitionDecorator": "false",
"name": "hd-sink-bq",
"connector.class": "com.wepay.kafka.connect.bigquery.BigQuerySinkConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "<redacted>",
"key.converter.basic.auth.credentials.source": "USER_INFO",
"key.converter.schema.registry.basic.auth.user.info": "<redacted>",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "<redacted>",
"value.converter.basic.auth.credentials.source": "USER_INFO",
"value.converter.schema.registry.basic.auth.user.info": "<redacted>",
"topics": "public.event_issues",
"errors.tolerance": "all",
"errors.log.include.messages": "true",
"errors.deadletterqueue.topic.name": "connect.bq-sink.deadletter",
"errors.deadletterqueue.topic.replication.factor": "1",
"errors.deadletterqueue.context.headers.enable": "true",
"transforms": "tombstoneHandler",
"offset.flush.timeout.ms": "300000",
"transforms.dropNullRecords.predicate": "isNullRecord",
"transforms.dropNullRecords.type": "org.apache.kafka.connect.transforms.Filter",
"transforms.tombstoneHandler.behavior": "drop_warn",
"transforms.tombstoneHandler.type": "io.aiven.kafka.connect.transforms.TombstoneHandler"
}
For my case, I used to handle such case by using the predicate, as following
{
...
"predicates.isTombstone.type":
"org.apache.kafka.connect.transforms.predicates.RecordIsTombstone",
"predicates": "isTombstone",
"transforms.x.predicate":"isTombstone",
"transforms.x.negate":true
...
}
This as per the docs here, and the transforms.x.negate will skip such tompStone records.

Ignore updating missing fields with Confluent JDBC Connector

Say I have a Postgres database table with the fields "id", "flavor" and "item". I also have a Kafka topic with two messages (let's ignore the kafka-key and assume the ID is in the value for now. Schema-definition is also omitted):
{"id": 1, "flavor": "chocolate"}
{"id": 1, "item": "cookie"}
Now I'd like to use the Confluent JDBC (Sink) Connector for persisting the kafka messages in UPSERT-mode, hoping to get to the following end result in the database:
id | flavor | item
----------------------
1 | chocolate | cookie
Ẁhat I did get, however, is was this:
id | flavor | item
----------------------
1 | null | cookie
I assume that's because the second message uses an UPDATE-statement where it infers the "null" values for fields that weren't provided, and writes those null-values over my actual data.
Is there a way to get to my desired result by changing the configuration of either the Confluent JDBC Connector or PostgreSQL 12? Failing that, is there another reasonably supported postgreSQL compatible connector out there that can do this?
Here's my connector configuration (connection details obviously redacted):
{
"name": "sink-jdbc-upsertstest",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "TEST-upserts",
"connection.url": "jdbc:postgresql://host:port/database",
"connection.user": "user",
"connection.password": "password",
"dialect.name": "ExtendedPostgreSqlDatabaseDialect",
"table.name.format": "upsert-test",
"batch.size": "100",
"insert.mode": "upsert",
"auto.evolve": "true",
"auto.create": "true",
"pk.mode": "record_value",
"pk.fields": "id",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"errors.deadletterqueue.topic.name": "dlq_upserts",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.replication.factor": "1",
"errors.deadletterqueue.context.headers.enable": "true"
}
}

Kafka-confluent: How to use pk.mode=record_key for upsert and delete mode in JDBC sink connector?

In Kafka confluent, how can we use upsert using the source as CSV file while using pk.mode=record_key for composite key in the MySQL table? The upsert mode is working while using the pk.mode=record_values. Is there any additional configuration that needs to be done?
I am getting this error if I am trying with pk.mode=record_key. Error - Caused by: org.apache.kafka.connect.errors.ConnectException: Need exactly one PK column defined since the key schema for records is a primitive type.
Below is my JDBC sink connector configuration:
{
"name": "<name>",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "<topic name>",
"connection.url": "<url>",
"connection.user": "<user name>",
"connection.password": "*******",
"insert.mode": "upsert",
"batch.size": "50000",
"table.name.format": "<table name>",
"pk.mode": "record_key",
"pk.fields": "field1,field2",
"auto.create": "true",
"auto.evolve": "true",
"max.retries": "10",
"retry.backoff.ms": "3000",
"mode": "bulk",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://localhost:8081"
}
}
You need to use pk.mode of record.value.
This means take field(s) from the value of the message and use them as the primary key in the target table and for UPSERT purposes.
If you set record.key it will try to take the key field(s) from the Kafka message key. Unless you've actually got the values in your message key, this is not the setting that you want to use.
These might help you further:
Kafka Connect JDBC Sink deep-dive: Working with Primary Keys
📹https://rmoff.dev/kafka-jdbc-video
📹https://rmoff.dev/ksqldb-jdbc-sink-video

kafka connector jdbc-sink syntax error at the end

i have an issue about jdbc-sink with this arch.
postgres1 ---> kafka ---> postgres2
the producer working fine, but the consumer has an error :
connect_1 | org.apache.kafka.connect.errors.RetriableException:
java.sql.SQLException: java.sql.BatchUpdateException: Batch entry 0
INSERT INTO "customers" ("id") VALUES (1) ON CONFLICT ("id") DO UPDATE
SET was aborted: ERROR: syntax error at end of input connect_1 |
Position: 77 Call getNextException to see other errors in the batch.
this is my source.json
{
"name": "src-table",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "postgres1_container",
"database.port": "5432",
"database.user": "postgres",
"database.password": "postgres",
"database.dbname": "postgres",
"database.whitelist": "postgres",
"database.server.name": "postgres1",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$3"
}
and this my jdbc-sink.json
{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "customers",
"connection.url": "jdbc:postgresql://postgres2_container:5432/postgres?user=postgres&password=postgres",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
debezium/zookeeper : 0.9
debezium/kafka:0.9
debezium/postgres:9.6
debezium/connect:0.9
PostgreSQL JDBC Driver 42.2.5
Kafka Connect JDBC 5.2.1
i tried to downgrade jdbc driver and confluent kafka connect but still have the same error
solve, the problem coz while i create a table in postgres1, i did not set the id to a PK value
Same issue,
I think this is an issue on JDBC Connector, when the table has only primary key columns and no other column there is nothing to update and therefore the statement syntax is wrong as it always excepts a column to update after the on-conflict.
One solution could be to add additional columns to that table, of course this is not a solution but a quick and dirty workaround.
Another solution is to upgrade the JDBC, I tested the same with kafka-connect-jdbc-10.4.0 and seems that this issue is no longer present.

Kafka Connect JDBC failed on JsonConverter

I am working on a design MySQL -> Debezium -> Kafka -> Flink -> Kafka -> Kafka Connect JDBC -> MySQL. Following is sample message i write from Flink (I also tried using Kafka console producer as well)
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int64",
"optional": false,
"field": "id"
},
{
"type": "string",
"optional": true,
"field": "name"
}
],
"optional": true,
"name": "user"
},
"payload": {
"id": 1,
"name": "Smith"
}
}
but connect failed on JsonConverter
DataException: JsonConverter with schemas.enable requires "schema" and "payload" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:338)
I have debugged and in method public SchemaAndValue toConnectData(String topic, byte[] value) value is null. My sink configurations are:
{
"name": "user-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "user",
"connection.url": "jdbc:mysql://localhost:3306/my_db?verifyServerCertificate=false",
"connection.user": "root",
"connection.password": "root",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
Can someone please help me on this issue?
I think an issue is not related with the value serialization (of Kafka message). It is rather problem with the key of the message.
What is your key.converter? I think it is the same like value.converter (org.apache.kafka.connect.json.JsonConverter). Your key might be simple String, that doesn't contain schema, payload
Try to change key.converter to org.apache.kafka.connect.storage.StringConverter
For Kafka Connect you set default Converters, but you can also set specific one for your particular Connector configuration (that will overwrite default one). For that you have to modify your config request:
{
"name": "user-sink",
"config": {
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "user",
"connection.url": "jdbc:mysql://localhost:3306/my_db?verifyServerCertificate=false",
"connection.user": "root",
"connection.password": "root",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}