kafka connect ignores fields in message to jdbc postgres - postgresql

I am trying to set up a Kafka Connect to copy data from the topic (in Protobuf) to PostgreSQL db.
I set it up, and it's working but half of the fields are being completely ignored, without any error, simply missing in the db.
Below is my connector, and in the db only the fields value, isSomething, physicalType and key are present.
missing fields are all different types (string - text, int64, custom types which i'd like as a string).
{
"schema.registry.url": "http://schema-registry:8081",
"name": "JdbcSinkConnectorConnector_0",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schema.registry.url": "http://schema-registry:8081",
"value.converter": "io.confluent.connect.protobuf.ProtobufConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"value.converter.schemas.enable": true,
"topics": "schema.data.normalized",
"confluent.controlcenter.schema.registry.url": "http://schema-registry:8081",
"connection.url": "jdbc:postgresql://db:5432/db",
"connection.user": "postgres",
"connection.password": "a",
"dialect.name": "PostgreSqlDatabaseDialect",
"pk.mode": "record_key",
"pk.fields": "key",
"auto.create": true,
"auto.evolve": true,
"table.name.format": "app.TestTable",
"fields.whitelist": "fieldA,created,unitSymbol,unitMultiplier,timeStamp,value,validForStart,validForEnd,isSomething,physicalType",
"transforms": "RenameField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "created:created_timestamp"
}
syntax = "proto3";
package xxx;
import "identified_object.proto";
import "unit_symbol.proto";
import "unit_multiplier.proto";
option java_multiple_files = true;
message XxxMessage {
.IdentifiedObject id = 1;
.IdentifiedObject environmental_data_provider = 2;
string fieldA = 3;
int64 created = 4;
.UnitSymbol unit_symbol = 5;
.UnitMultiplier unit_multiplier = 6;
int64 time_stamp = 7;
float value = 8;
int64 valid_for_start = 9;
int64 valid_for_end = 10;
bool isSomething = 11;
string physicalType = 12;
}
Example message:
{
"id": {
"aliasName": "",
"description": "",
"mRid": "8f90016b-7f2c-4172-a52a-bd6caae95020",
"name": ""
},
"environmentalDataProvider": {
"aliasName": "",
"description": "",
"mRid": "408851cf-d0d5-43fe-a8cf-e234583aa7ae",
"name": "...."
},
"fieldA": "rb771347-3691-4920-88af-b2b8caffdea1",
"created": "1664469000000",
"unitSymbol": "UNIT_SYMBOL_METER_Per_SEC",
"unitMultiplier": "UNIT_MULTIPLIER_UNSPECIFIED",
"timeStamp": "1664857800000",
"value": 4.2018013,
"validForStart": "1664857800000",
"validForEnd": "1664858700000",
"isSomething": true,
"physicalType": "typeA"
}
edit: added proto schema and example message

Related

Change the topic name in a HDFS2 SINK CONNECTOR integrated with HIVE

Good morning
When I work with a HDFS2 connector sink, integrated with HIVE, the database table get the name of the topic. Is there a way to choice the name of the table?.
This is the config of my conector:
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"hive.integration": "true",
"hive.database": "databaseEze",
"hive.metastore.uris": "thrift://server1.dc.es.arioto:9083",
"transforms.InsertField.timestamp.field": "carga",
"flush.size": "100000",
"tasks.max": "2",
"timezone": "Europe/Paris",
"transforms": "RenameField,InsertField,carga_format",
"rotate.interval.ms": "900000",
"locale": "en-GB",
"logs.dir": "/logs",
"format.class": "io.confluent.connect.hdfs.avro.AvroFormat",
"transforms.InsertField.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"transforms.RenameField.renames": "var1:Test1,var2:Test2,var3:test3",
"transforms.carga_format.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.carga_format.target.type": "string",
"transforms.carga_format.format": "yyyyMMdd",
"hadoop.conf.dir": "/etc/hadoop/",
"schema.compatibility": "BACKWARD",
"topics": "Skiel-Tracking-Replicator",
"hdfs.url": "hdfs://database/user/datavaseEze/",
"transforms.InsertField.topic.field": "ds_topic",
"partition.field.name": "carga",
"transforms.InsertField.partition.field": "test_partition",
"value.converter.schema.registry.url": "http://schema-registry-eze-dev.ocjc.serv.dc.es.arioto",
"partitioner.class": "io.confluent.connect.storage.partitioner.FieldPartitioner",
"name": "KAFKA-HDFS-HIVE-TEST",
"transforms.fx_carga_format.field": "carga",
"transforms.InsertField.offset.field": "test_offset"
}
With that config, the table will name **Skiel-Tracking-Replicator** and I want that the table name will be d9nvtest.
You can use the RegexRouter Single Message Transform to modify the topic name.
{
"transforms" : "renameTopic",
"transforms.renameTopic.type" : "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.renameTopic.regex" : "Skiel-Tracking-Replicator",
"transforms.renameTopic.replacement": "d9nvtest"
}
See https://rmoff.net/2020/12/11/twelve-days-of-smt-day-4-regexrouter/
While using RegexRouter with kafka-connect-hdfs, this issue occurs - https://github.com/confluentinc/kafka-connect-hdfs/issues/236
Last comment here specifies that these two are conceptually incompatible.

What kind of data got routed to a dead letter queue topic?

I Have implemented Dead Letter Queues error handling in Kafka. It works and the data are sent to DLQ topics. I am not understanding what types of data got routed in DLQ topics.
1st picture is the data that got routed into DLQ Topics and the second one is the normal data that got sunk into databases.
Does anyone have any idea how does that key got changed as I have used id as a key?
Here is my source and sink properties:
"name": "jdbc_source_postgresql_analytics",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:postgresql://192.168.5.40:5432/abc",
"connection.user": "abc",
"connection.password": "********",
"topic.prefix": "test_",
"mode": "timestamp+incrementing",
"incrementing.column.name": "id",
"timestamp.column.name": "updatedAt",
"validate.non.null": true,
"table.whitelist": "test",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": false,
"value.converter.schemas.enable": false,
"catalog.pattern": "public",
"transforms": "createKey,extractInt",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields": "id",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field": "id",
"errors.tolerance": "all"
}
}
sink properties:
{
"name": "es_sink_analytics",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"type.name": "_doc",
"key.converter.schemas.enable": "false",
"topics": "TEST",
"topic.index.map": "TEST:te_test",
"value.converter.schemas.enable": "false",
"connection.url": "http://192.168.10.40:9200",
"connection.username": "******",
"connection.password": "********",
"key.ignore": "false",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name": "dlq-error-es",
"errors.deadletterqueue.topic.replication.factor": "1",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"schema.ignore": "true",
"error.tolerance":"all"
}
}

Debezium Postgres and ElasticSearch - Store complex Object in ElasticSearch

I have in Postgres a database with a table "product" which is connected 1 to n with "sales_Channel". So 1 Product can have multiple SalesChannel. Now I want to transfer it to ES and keep it up to date, so I am using debezium and kafka. It is no problem to transfer the single tables to ES. I can query for SalesChannels and Products. But I need Products with all SalesChannels attached as a Result. How get I debezium to transfer this?
mapping for Product
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "integer"
}
}
}
}
}
sink for Product
{
"name": "es-sink-product",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "product",
"connection.url": "http://elasticsearch:9200",
"transforms": "unwrap,key",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.drop.deletes": "false",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "id",
"key.ignore": "false",
"type.name": "_doc",
"behavior.on.null.values": "delete"
}
}
you either need to use Outbox pattern, see https://debezium.io/documentation/reference/1.2/configuration/outbox-event-router.html
or you can use aggregate objects, see
https://github.com/debezium/debezium-examples/tree/master/jpa-aggregations
https://github.com/debezium/debezium-examples/tree/master/kstreams-fk-join

Kafka Connect JDBC failed on JsonConverter

I am working on a design MySQL -> Debezium -> Kafka -> Flink -> Kafka -> Kafka Connect JDBC -> MySQL. Following is sample message i write from Flink (I also tried using Kafka console producer as well)
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int64",
"optional": false,
"field": "id"
},
{
"type": "string",
"optional": true,
"field": "name"
}
],
"optional": true,
"name": "user"
},
"payload": {
"id": 1,
"name": "Smith"
}
}
but connect failed on JsonConverter
DataException: JsonConverter with schemas.enable requires "schema" and "payload" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:338)
I have debugged and in method public SchemaAndValue toConnectData(String topic, byte[] value) value is null. My sink configurations are:
{
"name": "user-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "user",
"connection.url": "jdbc:mysql://localhost:3306/my_db?verifyServerCertificate=false",
"connection.user": "root",
"connection.password": "root",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
Can someone please help me on this issue?
I think an issue is not related with the value serialization (of Kafka message). It is rather problem with the key of the message.
What is your key.converter? I think it is the same like value.converter (org.apache.kafka.connect.json.JsonConverter). Your key might be simple String, that doesn't contain schema, payload
Try to change key.converter to org.apache.kafka.connect.storage.StringConverter
For Kafka Connect you set default Converters, but you can also set specific one for your particular Connector configuration (that will overwrite default one). For that you have to modify your config request:
{
"name": "user-sink",
"config": {
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "user",
"connection.url": "jdbc:mysql://localhost:3306/my_db?verifyServerCertificate=false",
"connection.user": "root",
"connection.password": "root",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}

debezium - change of topic name gives the error cross-database references

I am using this debezium-examples
source.json
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "184054",
"database.server.name": "dbserver1",
"database.whitelist": "inventory",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$3"
}
}
jdbc-sink.json
{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "customers",
"connection.url": "jdbc:postgresql://postgres:5432/inventory?user=postgresuser&password=postgrespw",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
I have run this example its working fine.But when I have made some changes as discuss in the following scenario. it giving me 'cross-database references' error.
Scenario
I have remove these properties from source
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$3"
now it creating topic in kafka as follow
dbServer1.inventory.products
dbserver1.inventory.products_on_hand
dbserver1.inventory.customers
dbserver1.inventory.orders
When I specified topic= dbserver1.inventory.customers in jdbc-sink, it giving me the following exception
ERROR: cross-database references are not implemented:
"dbserver1.inventory.customers" at character 14
postgres_1 | STATEMENT: CREATE TABLE "dbserver1"."inventory"."customers" (
postgres_1 | "last_name" TEXT NOT NULL,
postgres_1 | "id" INT NOT NULL,
postgres_1 | "first_name" TEXT NOT NULL,
postgres_1 | "email" TEXT NOT NULL,
postgres_1 | PRIMARY KEY("id"))
connect_1 | 2019-01-29 09:39:18,931 WARN || Create failed, will attempt amend if table already exists [io.confluent.connect.jdbc.sink.DbStructure]
connect_1 | org.postgresql.util.PSQLException: ERROR: cross-database references are not implemented: "dbserver1.inventory.customers"
connect_1 | Position: 14
Note: Its not duplicate as other question is also posted by me, which is covering different scenario
change inventory -> dbserver1
(databasename).(schemaname).(tablename)
{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "customers",
"connection.url": "jdbc:postgresql://postgres:5432/dbserver1?user=postgresuser&password=postgrespw",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
The table.name.format sink property solved this for me. It allows you to override what the destination table name will be. See https://docs.confluent.io/3.1.1/connect/connect-jdbc/docs/sink_config_options.html