Debezium Postgres and ElasticSearch - Store complex Object in ElasticSearch - postgresql

I have in Postgres a database with a table "product" which is connected 1 to n with "sales_Channel". So 1 Product can have multiple SalesChannel. Now I want to transfer it to ES and keep it up to date, so I am using debezium and kafka. It is no problem to transfer the single tables to ES. I can query for SalesChannels and Products. But I need Products with all SalesChannels attached as a Result. How get I debezium to transfer this?
mapping for Product
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "integer"
}
}
}
}
}
sink for Product
{
"name": "es-sink-product",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "product",
"connection.url": "http://elasticsearch:9200",
"transforms": "unwrap,key",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.drop.deletes": "false",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "id",
"key.ignore": "false",
"type.name": "_doc",
"behavior.on.null.values": "delete"
}
}

you either need to use Outbox pattern, see https://debezium.io/documentation/reference/1.2/configuration/outbox-event-router.html
or you can use aggregate objects, see
https://github.com/debezium/debezium-examples/tree/master/jpa-aggregations
https://github.com/debezium/debezium-examples/tree/master/kstreams-fk-join

Related

sink json like string to a databae with kafka connect jdbc

i am producing simple plaintext json like data to kafka with simple kafka-console-producer command and i want to sink this data to database table. i have tried many ways to do this. but always i get deserializer error or unknown magic bytes error.
there is no serialization and schema validation on that. but the data is always same type.
we cant change the producer configs to add serializer also.
schema :
{
"type": "record",
"name": "people",
"namespace": "com.cena",
"doc": "This is a sample Avro schema to get you started. Please edit",
"fields": [
{
"name": "first_name",
"type": "string",
"default":null
},
{
"name": "last_name",
"type": "string",
"default":null
},
{
"name": "town",
"type": "string",
"default":null
},
{
"name": "country_code",
"type": "string",
"default":null
},
{
"name": "mobile_number",
"type": "string",
"default":null
}
]
}
Connector :
{
"name": "JdbcSinkConnecto",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"table.name.format": "people",
"topics": "people",
"tasks.max": "1",
"transforms": "RenameField",
"transforms.RenameField.renames": "\"town:city,mobile_number:msisdn\"",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"connection.url": "jdbc:postgresql://localhost:5432/postgres",
"connection.password": "postgres",
"connection.user": "postgres",
"insert.mode": "insert",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schema.registry.url": "http://http://localhost:8081"
}
data sample :
{"first_name": "some_name","last_name": "Family","town": "some_city","country_code": "+01","mobile_number": "some_number"}
Is there a way to use kafka connect for this ?
with simple kafka-console-producer
That doesn't use Avro, so I'm not sure why you added an Avro schema to the question.
You also don't show value.converter value, so it's unclear if that is truly JSON or Avro...
You are required to add a schema to the data for JDBC sink. If you use plain JSON and kafka-console-producer, then you need data that looks like {"schema": ... , "payload": { your data here } }, then you need value.converter.schemas.enabled=true for class of JsonConverter
ref. Converters and Serializers Deep Dive
If you want to use Avro, then use kafka-avro-console-producer ... This still accepts JSON inputs, but serializes to Avro (and will fix your magic byte error)
Another option would be to use ksqlDB to first parse the JSON into a defined STREAM with typed and named fields, then you can run the Connector from it in embedded mode
By the way, StringConverter does not use schema registry, so remove schema.registry.url property for it... And if you want to use a registry, don't put http:// twice

Kafka connect failing to create Mongo Source Connector

I'm getting an error while creating a source connector. It works fine in all the environments except one place. I'm using a mongodb user having read-write permission which has all the actions changeStream and find. But still I'm getting this error. Also this is not present in /connector-plugins/{connectorType}/config/validate though.
Config:
{
"name": "mongo-source",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"pipeline": "{{pipeline}}", //variable replaced appropriate value
"transforms.dropPrefix.replacement": "{{topic}}", //variable replaced appropriate value
"topic.prefix": "",
"tasks.max": "1",
"poll.await.time.ms": 5,
"connection.uri": "${file:/secrets/secrets.properties:mongo.connection.uri}",
"transforms": "dropPrefix",
"change.stream.full.document": "updateLookup",
"errors.tolerance": "none",
"transforms.dropPrefix.regex": ".*",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter"
}
}
ERROR
{
"error_code": 400,
"message": "Connector configuration is invalid and contains the following 1 error(s):\nInvalid user permissions. Missing the following action permissions: changeStream, find\nYou can also find the above list of errors at the endpoint `/connector-plugins/{connectorType}/config/validate`"
}
You have to try /connector-plugins/{connectorType}/config/validate with config value in data:
{
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"pipeline": "{{pipeline}}", //variable replaced appropriate value
"transforms.dropPrefix.replacement": "{{topic}}", //variable replaced appropriate value
"topic.prefix": "",
"tasks.max": "1",
"poll.await.time.ms": 5,
"connection.uri": "${file:/secrets/secrets.properties:mongo.connection.uri}",
"transforms": "dropPrefix",
"change.stream.full.document": "updateLookup",
"errors.tolerance": "none",
"transforms.dropPrefix.regex": ".*",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter"
}
P.S.: At least you need a field "name" inside config value.

Task becomes UNASSIGNED for Debezium MySQL source connector

I am using debezium 1.9. I created a connector using below config
{
"name": "user_management_db-connector-5",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "XXXX",
"database.port": "3306",
"database.user": "XXX",
"database.password": "XXX",
"database.server.id": "12345",
"database.server.name": "ula-stg-db",
"database.include.list": "user_management_db",
"database.history.kafka.bootstrap.servers": "kafka.ulastg.xyz:9094,kafka.ulastg.xyz:9092",
"database.history.kafka.topic": "dbhistory.user_management_db",
"snapshot.mode" : "schema_only",
"snapshot.locking.mode" : "none",
"table.include.list": "user_management_db.user,user_management_db.store,user_management_db.store_type,user_management_db.user_segment,user_management_db.user_segment_mapping",
"transforms":"Reroute",
"transforms.Reroute.type":"io.debezium.transforms.ByLogicalTableRouter",
"transforms.Reroute.topic.regex":"(.*)user_management_db(.+)",
"transforms.Reroute.topic.replacement":"$1cdc",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"topic.creation.default.include": "ula-stg-db.+",
"topic.creation.default.partitions": 20,
"topic.creation.default.replication.factor": 2,
"topic.creation.default.cleanup.policy": "delete",
"topic.creation.default.delete.retention.ms": 300000,
"errors.log.enable": true,
"errors.log.include.messages" :true
}
}
The connector gets created and I can see events in the topic ula-stge-db.cdc
The problem is after some time ( approximately a day ) I see events stop getting populated. I do not see any error in connector logs.
It only throws a generic info in regular interval
2022-07-12 09:24:25,654 INFO || WorkerSourceTask{id=promo_management_db-connector-5-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. [org.apache.kafka.connect.runtime.WorkerSourceTask]
The connector status is now shown as below
{
"name": "user_management_db-connector-5",
"connector": {
"state": "RUNNING",
"worker_id": "172.31.65.156:8083"
},
"tasks": [
{
"id": 0,
"state": "UNASSIGNED",
"worker_id": "172.31.71.28:8083"
}
],
"type": "source"
}
How to debug further ?
P.S: I am connecting to AWS RDS MySql. And Kafka is hosted in an EC2.

MongoDB Kafka Sink Connector doesn't process the RenameByRegex processor

I need to listening events from a Kafka Topic and Sink to a collection in MongoDB. The message contains an nested object with an id property, like in the example above.
{
"testId": 1,
"foo": "bar",
"foos": [{ "id":"aaaaqqqq-rrrrr" }]
}
I'm trying to rename this nested id to _id with RegExp
{
"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "test",
"connection.uri": "mongodb://mongo:27017",
"database": "test_db",
"collection": "test",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"value.projection.list":"testId",
"value.projection.type": "whitelist",
"post.processor.chain": "com.mongodb.kafka.connect.sink.processor.DocumentIdAdder, com.mongodb.kafka.connect.sink.processor.field.renaming.RenameByRegex",
"field.renamer.regexp": "[{\"regexp\":\"\b(id)\b\", \"pattern\":\"\b(id)\b\",\"replace\":\"_id\"}]"
}
And the result of a config/validate is 500 Internal Server Error, with that message:
{
"error_code": 500,
"message": null
}
I missing something or is a issue?
I think all you want is Kafka Connect Single Message Transform (SMT) and more precisely ReplaceField:
Filter or rename fields within a Struct or Map.
The following will replace id field name with _id:
"transforms": "RenameField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "id:_id"
In your case, before applying the above trasnformation you might also want to Flatten foos:
"transforms": "flatten",
"transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.flatten.delimiter": "."
and finally apply the transformation for renaming the field:
"transforms": "RenameField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "foos.id:foos._id"

Kafka Connect JDBC failed on JsonConverter

I am working on a design MySQL -> Debezium -> Kafka -> Flink -> Kafka -> Kafka Connect JDBC -> MySQL. Following is sample message i write from Flink (I also tried using Kafka console producer as well)
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int64",
"optional": false,
"field": "id"
},
{
"type": "string",
"optional": true,
"field": "name"
}
],
"optional": true,
"name": "user"
},
"payload": {
"id": 1,
"name": "Smith"
}
}
but connect failed on JsonConverter
DataException: JsonConverter with schemas.enable requires "schema" and "payload" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:338)
I have debugged and in method public SchemaAndValue toConnectData(String topic, byte[] value) value is null. My sink configurations are:
{
"name": "user-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "user",
"connection.url": "jdbc:mysql://localhost:3306/my_db?verifyServerCertificate=false",
"connection.user": "root",
"connection.password": "root",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
Can someone please help me on this issue?
I think an issue is not related with the value serialization (of Kafka message). It is rather problem with the key of the message.
What is your key.converter? I think it is the same like value.converter (org.apache.kafka.connect.json.JsonConverter). Your key might be simple String, that doesn't contain schema, payload
Try to change key.converter to org.apache.kafka.connect.storage.StringConverter
For Kafka Connect you set default Converters, but you can also set specific one for your particular Connector configuration (that will overwrite default one). For that you have to modify your config request:
{
"name": "user-sink",
"config": {
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "user",
"connection.url": "jdbc:mysql://localhost:3306/my_db?verifyServerCertificate=false",
"connection.user": "root",
"connection.password": "root",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}