Unable to load data to Postgres using Kafka JDBC Sink Connector - postgresql

I have brought data from RDS postgres to kafka topic using debezium source connector. Data in topic looks like below:
{"domain":"domain-new-34B","profile":"2423947a-asf23424","account":"aasdfadf","customer":"gaf23sdf","profileno":"324","user":"234234","updatedat":233463463456,"__deleted":"false"}
I am using Kafka JDBC sink connector to send data to Cloudsql Postgres. My sink connector .json file looks like below:
{
"name":"postgres-sink-connector",
"config":{
"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max":1,
"auto.create":true,
"connection.url":"jdbc:postgresql://host:5432/testdb",
"connection.user":"user1",
"connection.password":"user123",
"topics":"server1.public.topic1",
"auto.create":"true",
"auto.evolve":"true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"false",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"false",
"transforms": "flatten",
"transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.flatten.delimiter": ".",
"table.name.format": "${topic}",
"transforms": "route",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$2_$3",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms": "unwrap",
"transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState"
}
}
Getting below error when posting connector:
java.base/java.lang.Thread.run(Thread.java:834)\nCaused by: org.apache.kafka.connect.errors.ConnectException: Sink connector 'postgres-sink-connector' is configured with 'delete.enabled=false' and 'pk.mode=none' and therefore requires records with a non-null Struct value and non-null Struct schema, but found record at (topic='server1.public.topic1',partition=0,offset=0,timestamp=1212312441756) with a HashMap value and null value schema.
I have not created destination table and i want it to be autocreated.

Matthias - it drew my attention that "transforms.flatten.delimiter": ".". Regarding the PostgreSQL db, "." is NOT allowed in the column name. Column Names in PostgreSQL must begin with a letter (a-z) or underscore (_).
The subsequent characters in a name can be letters, digits (0-9), or underscores.
Besides, you have "auto.create" repeated twice in your configuration. One is enough.

Related

Kafka Connect - From JSON records to Avro files in HDFS

My current setup contains Kafka, HDFS, Kafka Connect, and a Schema Registry all in networked docker containers.
The Kafka topic contains simple JSON data without a Schema:
{
"repo_name": "ironbee/ironbee"
}
The Schema Registry contains a JSON Schema describing the data in the Kafka Topic:
{"$schema": "https://json-schema.org/draft/2019-09/schema",
"$id": "http://example.com/example.json",
"type": "object",
"title": "Root Schema",
"required": [
"repo_name"
],
"properties": {
"repo_name": {
"type": "string",
"default": "",
"title": "The repo_name Schema",
"examples": [
"ironbee/ironbee"
]
}
}}
What I am trying to achieve is a Connection that reads JSON data from a Topic and dumps it into files in HDFS (Avro or Parquet).
{
"name": "kafka to hdfs",
"connector.class": "io.confluent.connect.hdfs3.Hdfs3SinkConnector",
"topics": "repo",
"hdfs.url": "hdfs://namenode:9000",
"flush.size": 3,
"confluent.topic.bootstrap.servers": "kafka-1:19092,kafka-2:29092,kafka-3:39092",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "io.confluent.connect.json.JsonSchemaConverter",
"value.converter.schemas.enable": "false",
"value.converter.schema.registry.url": "http://schema-registry:8081"
}
If I try to read the raw JSON value via the StringConverter (no schema used) and dump it into Avro files it works, resulting in
Key=null Value={my json} touples
so no usable structure at all.
When I try to use my schema via the JsonSchemaConverter I get the errors
“Converting byte[] to Kafka Connect data failed due to serialization error of topic”
“Unknown magic byte”
I think that there is something wrong with the configuration of my connection, but after a week of trying everything my google-skills have reached their limits.
All the code is available here: https://github.com/SDU-minions/7-Scalable-Systems-Project/tree/dev/Kafka
raw JSON value via the StringConverter (no schema used)
schemas.enable property only exists on JSONConverter. Strings don't have schemas. JSONSchema always has a schema, so property also doesn't exist there.
When I try to use my schema via the JsonSchemaConverter I get the errors
Your producer needs to use Confluent JSONSchema Serializer. Otherwise, it doesn't get sent to Kafka with the "magic byte" referred to in your error.
I personally haven't tried converting JSON schema records to Avro directly in Connect. Usually the pattern is to either produce Avro directly, or convert within ksqlDB, for example to a new Avro topic, which is then consumed by Connect.

KSQLDB Push Queries Fail to Deserialize Data - Schema Lookup Performed with Wrong Schema ID

I'm not certain what I could be missing.
I have set up a Kafka broker server, with a Zookeeper and a distributed Kafka Connect.
For schema management, I have set up an Apicurio Schema Registry instance
I also have KSQLDB setup
The following I can confirm is working as expected
My source JDBC connector successfully pushed table data into the topic stss.market.info.public.ice_symbols
Problem:
Inside the KSQLDB server, I have successfully created a table from the topic stss.market.info.public.ice_symbols
Here is the detail of the table created
The problem I'm facing is when performing a push query against this table, it returns no data. Deserialization of the data fails due to the unsuccessful lookup of the AVRO schema in the Apicurio Registry.
Looking at the Apicurio Registry logs reveals that KSQLDB calls to Apicrio Registry to fetch the deserialization schema using a schema ID of 0 instead of 5, which is the ID of the schema I have registered in the registry.
KSQLDB server logs also confirm this 404 HTTP response in the Apicurio logs as shown in the image below
Expectation:
I expect, KSQLDB queries to the table to perform a schema lookup with an ID of 5 and not 0. I'm guessing I'm probably missing some configuration.
Here is the image of the schema registered in the Apicruio Registry
Here is also my source connector configuration. It has the appropriate schema lookup strategy configured. Although, I don't believe KSQLDB requires this when deserialization its table data. This configuration should only be relevant to the capturing of the table data, and its validation and storage in the topic stss.market.info.public.ice_symbols.
{
"name": "new.connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"plugin.name": "pgoutput",
"database.hostname": "172.17.203.10",
"database.port": "6000",
"database.user": "postgres",
"database.password": "123",
"database.dbname": "stss_market_info",
"database.server.name": "stss.market.info",
"table.include.list": "public.ice_symbols",
"message.key.columns": "public.ice_symbols:name",
"snapshot.mode": "always",
"transforms": "unwrap,extractKey",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.extractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractKey.field": "name",
"value.converter": "io.apicurio.registry.utils.converter.AvroConverter",
"value.converter.apicurio.registry.url": "http://local-server:8080/apis/registry/v2",
"value.converter.apicurio.registry.auto-register": true,
"value.converter.apicurio.registry.find-latest": true,
"value.apicurio.registry.as-confluent": true,
"name": "new.connector",
"value.converter.schema.registry.url": "http://local-server:8080/apis/registry/v2"
}
}
Thanks in advance for any assistance.
You can specify the "VALUE_SCHEMA_ID=5" property in the WITH clause when you create a stream/table.

Kafka Sink how to map fields to db with different topic and table schema name

I am currently setting up the Kafka Sink connector with a topic name waiting-room, while my db schema is called waiting_room. So I am trying to map the topic message to the db schema but I do not see any data entering the database.
So I tried the following scenario:
So since the table schema is waiting_room I tried to add quote.sql.identifier=ALWAYS since it quotes table name and allow the Kafka sink to quote it so it can map to the table but I did not see quote.sql.identifier=ALWAYS in the Kafka sink. Does both table.schema and Kafka sink need to be quote inorder to map it or how can I map with table schema as underscore and have kafka map it
Then if I changed the table.name.format=waiting-room and have the db schema = gt.namespace."waiting-room" I do not see my kafka sink get updated but instead my table.name.format will = waiting_room and have the status of the connector as 404 not found.
Is there a way to map and have data enter to the db when topic and db name different
Try to use Kafka Connect SMT RegexRouter:
{
"task.max": "1",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "'"$URL"'",
"topics": "waiting-room",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "waiting-room",
"transforms.route.replacement": "gt.namespace.waiting_room",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": true
}

Kafka Connect JDBC Sink quote.sql.identifiers not working

I'm trying to use Kafka Connect to sync data from an old DB2 database to a Postgres database using the JDBC Source and Sink Connectors. It works fine, but only if I am very strict on the case I use for table names.
For example, I have a table in DB2 called ACTION and it also exists in Postgres with the same columns, etc. The only difference is in DB2 it's upper case ACTION and in Postgres it's lowercase action.
Here's a sink file that works:
{
"name": "jdbc_sink_pg_action",
"config": {
"_comment": "The JDBC connector class",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"_comment": "How to serialise the value of keys ",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"_comment": "As above, but for the value of the message. Note that these key/value serialisation settings can be set globally for Connect and thus omitted for individual connector configs to make them shorter and clearer",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"_comment": " --- JDBC-specific configuration below here --- ",
"_comment": "JDBC connection URL.",
"connection.url": "jdbc:postgresql://localhost:5435/postgres",
"connection.user": "postgres",
"connection.password": "*****",
"topics": "ACTION",
"table.name.format": "action",
"_comment": "The insertion mode to use",
"insert.mode": "upsert",
"_comment": "The primary key mode",
"pk.mode": "record_value",
"_comment": "List of comma-separated primary key field names. The runtime interpretation of this config depends on the pk.mode",
"pk.fields": "ACTION_ID",
"quote.sql.identifiers": "never"
}
}
This is ok, but it's not very flexible. For example, I have many other tables and I'd like to sync them too, but I don't want to create a connector file for each and every table. So I try using:
"table.name.format": "${topic}",
When I do this, I get the following error in the logs when I try to load my sink connector:
Caused by: org.apache.kafka.connect.errors.ConnectException: Table "ACTION"
is missing and auto-creation is disabled
So it seems to me that "quote.sql.identifiers": "never" is not actually working otherwise the query the sink connector is doing would be unquoted and it would allow for any case (it would convert to lower).
Why isn't this working? I get the same results if I just use ACTION as the table.name.format.
Your PostgreSQL table name (action) is not equal to the topic name (ACTION).
Kafka Connect JDBC Connector uses getTables() method to check if a table exists, where tableNamePattern param is case sensitive (according the docs: must match the table name as it is stored in the database).
You can use ChangeTopicCase transformation from Kafka Connect Common Transformations.

How to configure Kafka connect sink connector for exasol databse

I am trying to setup kafka sink connector for writing to exasol database.
I have followed https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/ this article.
Since I could not find any similar sink connector class for exasol hence I tried to use jar https://github.com/exasol/kafka-connect-jdbc-exasol/tree/master/kafka-connect-exasol/jars [copied this jar in
$confluent_dir/share/java/kafka-connect-jdbc] and given the Dialect class inside it as a connector class name in my config json file below.
I have created a json file for configuration as below:
{
"name": "jdbc_sink_mysql_dev_02",
"config": {
"_comment": "The JDBC connector class. Don't change this if you want to use the JDBC Source.",
"connector.class": "com.exasol.connect.jdbc.dailect.ExasolDatabaseDialect",
"_comment": "How to serialise the value of keys - here use the Confluent Avro serialiser. Note that the JDBC Source Connector always returns null for the key ",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"_comment": "Since we're using Avro serialisation, we need to specify the Confluent schema registry at which the created schema is to be stored. NB Schema Registry and Avro serialiser are both part of Confluent Platform.",
"key.converter.schema.registry.url": "http://localhost:8081",
"_comment": "As above, but for the value of the message. Note that these key/value serialisation settings can be set globally for Connect and thus omitted for individual connector configs to make them shorter and clearer",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"_comment": " --- JDBC-specific configuration below here --- ",
"_comment": "JDBC connection URL. This will vary by RDBMS. Consult your manufacturer's handbook for more information",
"connection.url": "jdbc:exa:<myhost>:<myport> <myuser>/<mypassword>",
"_comment": "Which table(s) to include",
"table.whitelist": "<my_table_name>",
"_comment": "Pull all rows based on an timestamp column. You can also do bulk or incrementing column-based extracts. For more information, see http://docs.confluent.io/current/connect/connect-jdbc/docs/source_config_options.html#mode",
"mode": "timestamp",
"_comment": "Which column has the timestamp value to use? ",
"timestamp.column.name": "update_ts",
"_comment": "If the column is not defined as NOT NULL, tell the connector to ignore this ",
"validate.non.null": "false",
"_comment": "The Kafka topic will be made up of this prefix, plus the table name ",
"topic.prefix": "mysql-"
}
}
I am trying to load this connector with below command:
./bin/confluent load jdbc_sink_mysql_dev_02 -d <my_configuration_json_file_path>
P.S. My confluent version is 5.1.0
In similar fashion I have created a mysql-source connector for reading data from mysql and its working well , my use case demands to write that data to exasol database using sink-connector.
Although I am not getting any exceptions, but kafka is not reading any messages.
Any pointers or help to configure such sink-connector to write to exasol database.