How to make Instaclustr Kafka Sink Connector work with Avro serialized value to postgres? - postgresql

I have a Kafka topic of Avro-serialized value.
I am trying to set up a JDBC(postgres) sink connector to dump these messages in the postgres table.
But, I am getting below error.
"org.apache.kafka.common.config.ConfigException: Invalid value io.confluent.connect.avro.AvroConverter for configuration value.converter: Class io.confluent.connect.avro.AvroConverter could not be found."
My Sink.json is
{"name": "postgres-sink",
"config": {
"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max":"1",
"topics": "<topic_name>",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "instaclustr_schema_registry_host:8085",
"connection.url": "jdbc:postgresql://postgres:5432/postgres?currentSchema=local",
"connection.user": "postgres",
"connection.password": "postgres",
"auto.create": "true",
"auto.evolve":"true",
"pk.mode":"none",
"table.name.format": "<table_name>"
}
}
Also, I have made changes in the connect-distributed.properties(bootstrap servers).
The command I am running is -
curl -X POST -H "Content-Type: application/json" --data #postgres-sink.json https://<instaclustr_schema_registry_host>:8083/connectors

io.confluent.connect.avro.AvroConverter is not part of the Apache Kafka distribution. You can either just run Apache Kafka as part of Confluent Platform (which ships with the converter and is easier) or you can download it separately and install it yourself.

Related

Sink data from DLQ topic to another table as CLOB

I'm using a connector to sink records from a topic to a DB. There are also some values redirected to DLQ (Dead Letter Queue). The records in DLQ may contain wrong types, sizes, non-avro values etc. What I want to do is, sinking all the records to an Oracle DB table. This table will only have 2 columns; a CLOB for the entire message and record date.
To sink from Kafka, we need a schema. Since this topic will contain many types of records, I can't create a proper schema (or can I?). I just want to sink the messages as a whole, how can I achieve this?
I've tried it with this schema and connector:
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{\"type\":\"record\",\"name\":\"DLQ_TEST\",\"namespace\":\"DLQ_TEST\",\"fields\":[
{\"name\":\"VALUE\",\"type\":[\"null\",\"string\"]},
{\"name\":\"RECORDDATE\",\"type\":[\"null\",\"long\"]}]}"}' http://server:8071/subjects/DLQ_INSERT-value/versions
curl -i -X PUT -H "Content-Type:application/json" http://server:8083/connectors/sink_DLQ_INSERT/config -d
'{
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "jdbc:oracle:thin:#oracleserver:1357/VITORCT",
"table.name.format": "GLOBAL.DLQ_TEST_DLQ",
"connection.password": "${file:/kafka/vty/pass.properties:vitweb_pwd}",
"connection.user": "${file:/kafka/vty/pass.properties:vitweb_user}",
"tasks.max": "1",
"topics": "DLQ_TEST_dlq",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "true",
"auto.create": "false",
"insert.mode": "insert",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter"
}'
I don't exactly understand how to make connector use this schema.

Kafka Connect: streaming changes from Postgres to topics using debezium

I'm pretty new to Kafka and Kafka Connect world. I am trying to implement CDC using Kafka (on MSK), Kafka Connect (using the Debezium connector for PostgreSQL) and an RDS Postgres instance. Kafka Connect runs in a K8 pod in our cluster deployed in AWS.
Before diving into the details of the configuration used, I'll try to summarise the problem:
Once the connector starts, it sends messages to the topic as expected (snahpshot)
Once we make any change to a table (Create, Update, Delete), no messages are sent to the topic. We would expect to see messages about the changes made to the table.
My connector config looks like:
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.user": "root",
"database.dbname": "insights",
"slot.name": "cdc_organization",
"tasks.max": "1",
"column.blacklist": "password, access_key, reset_token",
"database.server.name": "insights",
"database.port": "5432",
"plugin.name": "wal2json_rds_streaming",
"schema.whitelist": "public",
"table.whitelist": "public.kafka_connect_cdc_test",
"key.converter.schemas.enable": "false",
"database.hostname": "de-test-sre-12373.cbplqnioxomr.eu-west-1.rds.amazonaws.com",
"database.password": "MYSECRETPWD",
"value.converter.schemas.enable": "false",
"name": "source-postgres",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"snapshot.mode": "initial"
}
We have tried different configurations for the plugin.name property: wal2josn, wal2json_streaming and wal2json_rds_streaming.
There's no problem of connection between the connector and the DB as we already saw messages flowing through as soon as the connector starts.
Is there a configuration issue with the connector described above that prevent us to see messages related to new changes appearing in the topic?
Thanks
Your connector config looks a bit confusing. I'm pretty new to Kafka as well so I don't really know the issue but this is my connector config that works for me.
{
"name":"<connector_name>",
"config": {
"connector.class":"io.debezium.connector.postgresql.PostgresConnector",
"database.server.name":"<server>",
"database.port":"5432",
"database.hostname":"<host>",
"database.user":"<user>",
"database.dbname":"<password>",
"tasks.max":"1",
"database.history.kafka.boostrap.servers":"localhost:9092",
"database.history.kafka.topic":"<kafka_topic_name>",
"plugin.name":"pgoutput",
"include.schema.changes":"true"
}
}
If this configuration didn't work aswell, try look up the log console; sometimes the error isn't the last write of the console

SMT's to create kafka connector string partition key through connector config

I've been implementing a kafka connector for PostgreSQL (I'm using the debezium kafka connector and running all the pieces through docker). I need a custom partition key, and so I've been using the SMT to achieve this. However, the approach that I'm using creates a Struct, and I need it to be a string. This article runs through how to set up the partition key as an int, but I can't access the config file to set up the appropriate transforms. Currently my kafka connector looks like this
curl -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{
"name": "connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "postgres",
"database.password": "password",
"database.dbname" : "postgres",
"database.server.name": "postgres",
"table.include.list": "public.table",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.table",
"transforms": "routeRecords,unwrap,createkey",
"transforms.routeRecords.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.routeRecords.regex": "(.*)",
"transforms.routeRecords.replacement": "table",
"transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState",
"transforms.createkey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createkey.fields": "id"
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
}
}'
I know that I have to extract the value of the column but I'm just not sure how.
ValueToKey creates a Struct from a list of fields, as it is documented.
You need one more transform to extract a specific field from a Struct, as shown in the linked post.
org.apache.kafka.connect.transforms.ExtractField$Key
Note: This does not "set" the partition of the actual Kafka record, only the key, which is then hashed by the Producer to get the partition

kafka FileStreamSourceConnector write an avro file to topic with key field

I want to use kafka FileStreamSourceConnector to write a local avro file into a topic.
My connector config looks like this:
curl -i -X PUT -H "Content-Type:application/json" http://localhost:8083/connectors/file_source_connector/config \
-d '{
"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"topic": "my_topic",
"file": "/data/log.avsc",
"format.include.keys": "true",
"source.auto.offset.reset": "earliest",
"tasks.max": "1",
"value.converter.schemas.enable": "true",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
}'
Then when I print out the topic, the key fields are null.
Updated on 2021-03-29:
After watching this video 🎄Twelve Days of SMT 🎄 - Day 2: ValueToKey and ExtractField from Robin,
I applied SMT to my connector config:
curl -i -X PUT -H "Content-Type:application/json" http://localhost:8083/connectors/file_source_connector_02/config \
-d '{
"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"topic": "my_topic",
"file": "/data/log.avsc",
"tasks.max": "1",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"transforms": "ValueToKey, ExtractField",
"transforms.ValueToKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.ValueToKey.fields":"id",
"transforms.ExtractField.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.ExtractField.field":"id"
}'
However, the connector is failed:
Caused by: org.apache.kafka.connect.errors.DataException: Only Struct objects supported for [copying fields from value to key], found: java.lang.String
I would use ValueToKey transformer.
In bad case ignorig values and setting random key.
For details look at:ValueToKey
FileStreamSource assumes UTF8 encoded, line delimited files are your input, not binary files such as Avro. Last I checked, format.include.keys is not a valid config for the connector either.
Therefore each consumed event will be a string, and subsequently, transforms that require Structs with field names will not work
You can use the Hoist transform to create a Struct from each "line", but this still will not parse your data to make the ID field accessible to move to the key.
Also, your file is AVSC, which is JSON formatted, not Avro, so I'm not sure what the goal is by using the AvroConverter, or having "schemas.enable": "true". Still, the lines read by the connector are not parsed by converters such that fields are accessible, only serialized when sent to Kafka
My suggestion would be to write some other CLI script using plain producer libraries to parse the file, extract the schema, register that with Schema Registry, build a producer record for each entity in the file, and send them

how to override key.serializer in kafka connect jdbc

I am doing mysql to kafka connection using kafka jdbc source connector. Everything working fine. Now i need to pass key.serializer and value.serializer to encrypt data as show at macronova. but i didn't found any changes in output.
POST API to start source connector
curl -X POST -H "Content-Type: application/json" --data '{
"name": "jdbc-source-connector-2",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"tasks.max": 10,
"connection.url": "jdbc:mysql://localhost:3306/connect_test?user=roo&password=roo",
"mode": "incrementing",
"table.whitelist" : "test",
"incrementing.column.name": "id",
"timestamp.column.name": "modified",
"topic.prefix": "table-",
"poll.interval.ms": 1000
}
}' http://localhost:8083/connectors
Connectors take Converters only, not serializers via key and value properties
If you want to encrypt a whole string, you'd need to implement your own converter or edit your code that writes into the database to write into Kafka instead, then consume and write to the database as well as other downstream systems