Kafka Sink how to map fields to db with different topic and table schema name - postgresql

I am currently setting up the Kafka Sink connector with a topic name waiting-room, while my db schema is called waiting_room. So I am trying to map the topic message to the db schema but I do not see any data entering the database.
So I tried the following scenario:
So since the table schema is waiting_room I tried to add quote.sql.identifier=ALWAYS since it quotes table name and allow the Kafka sink to quote it so it can map to the table but I did not see quote.sql.identifier=ALWAYS in the Kafka sink. Does both table.schema and Kafka sink need to be quote inorder to map it or how can I map with table schema as underscore and have kafka map it
Then if I changed the table.name.format=waiting-room and have the db schema = gt.namespace."waiting-room" I do not see my kafka sink get updated but instead my table.name.format will = waiting_room and have the status of the connector as 404 not found.
Is there a way to map and have data enter to the db when topic and db name different

Try to use Kafka Connect SMT RegexRouter:
{
"task.max": "1",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "'"$URL"'",
"topics": "waiting-room",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "waiting-room",
"transforms.route.replacement": "gt.namespace.waiting_room",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": true
}

Related

KSQLDB Push Queries Fail to Deserialize Data - Schema Lookup Performed with Wrong Schema ID

I'm not certain what I could be missing.
I have set up a Kafka broker server, with a Zookeeper and a distributed Kafka Connect.
For schema management, I have set up an Apicurio Schema Registry instance
I also have KSQLDB setup
The following I can confirm is working as expected
My source JDBC connector successfully pushed table data into the topic stss.market.info.public.ice_symbols
Problem:
Inside the KSQLDB server, I have successfully created a table from the topic stss.market.info.public.ice_symbols
Here is the detail of the table created
The problem I'm facing is when performing a push query against this table, it returns no data. Deserialization of the data fails due to the unsuccessful lookup of the AVRO schema in the Apicurio Registry.
Looking at the Apicurio Registry logs reveals that KSQLDB calls to Apicrio Registry to fetch the deserialization schema using a schema ID of 0 instead of 5, which is the ID of the schema I have registered in the registry.
KSQLDB server logs also confirm this 404 HTTP response in the Apicurio logs as shown in the image below
Expectation:
I expect, KSQLDB queries to the table to perform a schema lookup with an ID of 5 and not 0. I'm guessing I'm probably missing some configuration.
Here is the image of the schema registered in the Apicruio Registry
Here is also my source connector configuration. It has the appropriate schema lookup strategy configured. Although, I don't believe KSQLDB requires this when deserialization its table data. This configuration should only be relevant to the capturing of the table data, and its validation and storage in the topic stss.market.info.public.ice_symbols.
{
"name": "new.connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"plugin.name": "pgoutput",
"database.hostname": "172.17.203.10",
"database.port": "6000",
"database.user": "postgres",
"database.password": "123",
"database.dbname": "stss_market_info",
"database.server.name": "stss.market.info",
"table.include.list": "public.ice_symbols",
"message.key.columns": "public.ice_symbols:name",
"snapshot.mode": "always",
"transforms": "unwrap,extractKey",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.extractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractKey.field": "name",
"value.converter": "io.apicurio.registry.utils.converter.AvroConverter",
"value.converter.apicurio.registry.url": "http://local-server:8080/apis/registry/v2",
"value.converter.apicurio.registry.auto-register": true,
"value.converter.apicurio.registry.find-latest": true,
"value.apicurio.registry.as-confluent": true,
"name": "new.connector",
"value.converter.schema.registry.url": "http://local-server:8080/apis/registry/v2"
}
}
Thanks in advance for any assistance.
You can specify the "VALUE_SCHEMA_ID=5" property in the WITH clause when you create a stream/table.

Is it possible to sink kafka message generated by debezium to snowflake

I use debezium-ui repo to testing debezium mysql cdc feature, the message can normally stream
into kafka, the request body of to create mysql connect are as follows:
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "dbzui-db-mysql",
"database.port": "3306",
"database.user": "mysqluser",
"database.password": "mysql",
"database.server.id": "184054",
"database.server.name": "inventory-connector-mysql",
"database.include.list": "inventory",
"database.history.kafka.bootstrap.servers": "dbzui-kafka:9092",
"database.history.kafka.topic": "dbhistory.inventory"
}
}
And then I need to sink the kafka message into snowflake, the data warehouse my team use. I create a snowflake sink connector to sink it, the request body are as follows:
{
"name": "kafka2-04",
"config": {
"connector.class": "com.snowflake.kafka.connector.SnowflakeSinkConnector",
"tasks.max": 1,
"topics": "inventory-connector-mysql.inventory.orders",
"snowflake.topic2table.map": "inventory-connector-mysql.inventory.orders:tbl_orders",
"snowflake.url.name": "**.snowflakecomputing.com",
"snowflake.user.name": "kafka_connector_user_1",
"snowflake.private.key": "*******",
"snowflake.private.key.passphrase": "",
"snowflake.database.name": "kafka_db",
"snowflake.schema.name": "kafka_schema",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "com.snowflake.kafka.connector.records.SnowflakeJsonConverter",
"header.converter": "org.apache.kafka.connect.storage.SimpleHeaderConverter",
"value.converter.schemas.enable":"true"
}
}
But after it runs,the data sink into my snowflake is like: data in snowflake, the schema in snowflake table is different from mysql table. Is my sink connector config is incorrect or it is impossible to sink kafka data generated by debezium with SnowflakeSinkConnector.
This is default behavior in Snowflake and it is documented here:
Every Snowflake table loaded by the Kafka connector has a schema consisting of two VARIANT columns:
RECORD_CONTENT. This contains the Kafka message.
RECORD_METADATA. This contains metadata about the message, for example, the topic from which the message was read.
If Snowflake creates the table, then the table contains only these two columns. If the user creates the table for the Kafka Connector to add rows to, then the table can contain more than these two columns (any additional columns must allow NULL values because data from the connector does not include values for those columns).

Unable to load data to Postgres using Kafka JDBC Sink Connector

I have brought data from RDS postgres to kafka topic using debezium source connector. Data in topic looks like below:
{"domain":"domain-new-34B","profile":"2423947a-asf23424","account":"aasdfadf","customer":"gaf23sdf","profileno":"324","user":"234234","updatedat":233463463456,"__deleted":"false"}
I am using Kafka JDBC sink connector to send data to Cloudsql Postgres. My sink connector .json file looks like below:
{
"name":"postgres-sink-connector",
"config":{
"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max":1,
"auto.create":true,
"connection.url":"jdbc:postgresql://host:5432/testdb",
"connection.user":"user1",
"connection.password":"user123",
"topics":"server1.public.topic1",
"auto.create":"true",
"auto.evolve":"true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"false",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"false",
"transforms": "flatten",
"transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.flatten.delimiter": ".",
"table.name.format": "${topic}",
"transforms": "route",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$2_$3",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms": "unwrap",
"transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState"
}
}
Getting below error when posting connector:
java.base/java.lang.Thread.run(Thread.java:834)\nCaused by: org.apache.kafka.connect.errors.ConnectException: Sink connector 'postgres-sink-connector' is configured with 'delete.enabled=false' and 'pk.mode=none' and therefore requires records with a non-null Struct value and non-null Struct schema, but found record at (topic='server1.public.topic1',partition=0,offset=0,timestamp=1212312441756) with a HashMap value and null value schema.
I have not created destination table and i want it to be autocreated.
Matthias - it drew my attention that "transforms.flatten.delimiter": ".". Regarding the PostgreSQL db, "." is NOT allowed in the column name. Column Names in PostgreSQL must begin with a letter (a-z) or underscore (_).
The subsequent characters in a name can be letters, digits (0-9), or underscores.
Besides, you have "auto.create" repeated twice in your configuration. One is enough.

Kafka Connect JDBC Sink quote.sql.identifiers not working

I'm trying to use Kafka Connect to sync data from an old DB2 database to a Postgres database using the JDBC Source and Sink Connectors. It works fine, but only if I am very strict on the case I use for table names.
For example, I have a table in DB2 called ACTION and it also exists in Postgres with the same columns, etc. The only difference is in DB2 it's upper case ACTION and in Postgres it's lowercase action.
Here's a sink file that works:
{
"name": "jdbc_sink_pg_action",
"config": {
"_comment": "The JDBC connector class",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"_comment": "How to serialise the value of keys ",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"_comment": "As above, but for the value of the message. Note that these key/value serialisation settings can be set globally for Connect and thus omitted for individual connector configs to make them shorter and clearer",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"_comment": " --- JDBC-specific configuration below here --- ",
"_comment": "JDBC connection URL.",
"connection.url": "jdbc:postgresql://localhost:5435/postgres",
"connection.user": "postgres",
"connection.password": "*****",
"topics": "ACTION",
"table.name.format": "action",
"_comment": "The insertion mode to use",
"insert.mode": "upsert",
"_comment": "The primary key mode",
"pk.mode": "record_value",
"_comment": "List of comma-separated primary key field names. The runtime interpretation of this config depends on the pk.mode",
"pk.fields": "ACTION_ID",
"quote.sql.identifiers": "never"
}
}
This is ok, but it's not very flexible. For example, I have many other tables and I'd like to sync them too, but I don't want to create a connector file for each and every table. So I try using:
"table.name.format": "${topic}",
When I do this, I get the following error in the logs when I try to load my sink connector:
Caused by: org.apache.kafka.connect.errors.ConnectException: Table "ACTION"
is missing and auto-creation is disabled
So it seems to me that "quote.sql.identifiers": "never" is not actually working otherwise the query the sink connector is doing would be unquoted and it would allow for any case (it would convert to lower).
Why isn't this working? I get the same results if I just use ACTION as the table.name.format.
Your PostgreSQL table name (action) is not equal to the topic name (ACTION).
Kafka Connect JDBC Connector uses getTables() method to check if a table exists, where tableNamePattern param is case sensitive (according the docs: must match the table name as it is stored in the database).
You can use ChangeTopicCase transformation from Kafka Connect Common Transformations.

How to configure Kafka connect sink connector for exasol databse

I am trying to setup kafka sink connector for writing to exasol database.
I have followed https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/ this article.
Since I could not find any similar sink connector class for exasol hence I tried to use jar https://github.com/exasol/kafka-connect-jdbc-exasol/tree/master/kafka-connect-exasol/jars [copied this jar in
$confluent_dir/share/java/kafka-connect-jdbc] and given the Dialect class inside it as a connector class name in my config json file below.
I have created a json file for configuration as below:
{
"name": "jdbc_sink_mysql_dev_02",
"config": {
"_comment": "The JDBC connector class. Don't change this if you want to use the JDBC Source.",
"connector.class": "com.exasol.connect.jdbc.dailect.ExasolDatabaseDialect",
"_comment": "How to serialise the value of keys - here use the Confluent Avro serialiser. Note that the JDBC Source Connector always returns null for the key ",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"_comment": "Since we're using Avro serialisation, we need to specify the Confluent schema registry at which the created schema is to be stored. NB Schema Registry and Avro serialiser are both part of Confluent Platform.",
"key.converter.schema.registry.url": "http://localhost:8081",
"_comment": "As above, but for the value of the message. Note that these key/value serialisation settings can be set globally for Connect and thus omitted for individual connector configs to make them shorter and clearer",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"_comment": " --- JDBC-specific configuration below here --- ",
"_comment": "JDBC connection URL. This will vary by RDBMS. Consult your manufacturer's handbook for more information",
"connection.url": "jdbc:exa:<myhost>:<myport> <myuser>/<mypassword>",
"_comment": "Which table(s) to include",
"table.whitelist": "<my_table_name>",
"_comment": "Pull all rows based on an timestamp column. You can also do bulk or incrementing column-based extracts. For more information, see http://docs.confluent.io/current/connect/connect-jdbc/docs/source_config_options.html#mode",
"mode": "timestamp",
"_comment": "Which column has the timestamp value to use? ",
"timestamp.column.name": "update_ts",
"_comment": "If the column is not defined as NOT NULL, tell the connector to ignore this ",
"validate.non.null": "false",
"_comment": "The Kafka topic will be made up of this prefix, plus the table name ",
"topic.prefix": "mysql-"
}
}
I am trying to load this connector with below command:
./bin/confluent load jdbc_sink_mysql_dev_02 -d <my_configuration_json_file_path>
P.S. My confluent version is 5.1.0
In similar fashion I have created a mysql-source connector for reading data from mysql and its working well , my use case demands to write that data to exasol database using sink-connector.
Although I am not getting any exceptions, but kafka is not reading any messages.
Any pointers or help to configure such sink-connector to write to exasol database.