I'm trying to use Kafka Connect to sync data from an old DB2 database to a Postgres database using the JDBC Source and Sink Connectors. It works fine, but only if I am very strict on the case I use for table names.
For example, I have a table in DB2 called ACTION and it also exists in Postgres with the same columns, etc. The only difference is in DB2 it's upper case ACTION and in Postgres it's lowercase action.
Here's a sink file that works:
{
"name": "jdbc_sink_pg_action",
"config": {
"_comment": "The JDBC connector class",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"_comment": "How to serialise the value of keys ",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"_comment": "As above, but for the value of the message. Note that these key/value serialisation settings can be set globally for Connect and thus omitted for individual connector configs to make them shorter and clearer",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"_comment": " --- JDBC-specific configuration below here --- ",
"_comment": "JDBC connection URL.",
"connection.url": "jdbc:postgresql://localhost:5435/postgres",
"connection.user": "postgres",
"connection.password": "*****",
"topics": "ACTION",
"table.name.format": "action",
"_comment": "The insertion mode to use",
"insert.mode": "upsert",
"_comment": "The primary key mode",
"pk.mode": "record_value",
"_comment": "List of comma-separated primary key field names. The runtime interpretation of this config depends on the pk.mode",
"pk.fields": "ACTION_ID",
"quote.sql.identifiers": "never"
}
}
This is ok, but it's not very flexible. For example, I have many other tables and I'd like to sync them too, but I don't want to create a connector file for each and every table. So I try using:
"table.name.format": "${topic}",
When I do this, I get the following error in the logs when I try to load my sink connector:
Caused by: org.apache.kafka.connect.errors.ConnectException: Table "ACTION"
is missing and auto-creation is disabled
So it seems to me that "quote.sql.identifiers": "never" is not actually working otherwise the query the sink connector is doing would be unquoted and it would allow for any case (it would convert to lower).
Why isn't this working? I get the same results if I just use ACTION as the table.name.format.
Your PostgreSQL table name (action) is not equal to the topic name (ACTION).
Kafka Connect JDBC Connector uses getTables() method to check if a table exists, where tableNamePattern param is case sensitive (according the docs: must match the table name as it is stored in the database).
You can use ChangeTopicCase transformation from Kafka Connect Common Transformations.
Related
I'm not certain what I could be missing.
I have set up a Kafka broker server, with a Zookeeper and a distributed Kafka Connect.
For schema management, I have set up an Apicurio Schema Registry instance
I also have KSQLDB setup
The following I can confirm is working as expected
My source JDBC connector successfully pushed table data into the topic stss.market.info.public.ice_symbols
Problem:
Inside the KSQLDB server, I have successfully created a table from the topic stss.market.info.public.ice_symbols
Here is the detail of the table created
The problem I'm facing is when performing a push query against this table, it returns no data. Deserialization of the data fails due to the unsuccessful lookup of the AVRO schema in the Apicurio Registry.
Looking at the Apicurio Registry logs reveals that KSQLDB calls to Apicrio Registry to fetch the deserialization schema using a schema ID of 0 instead of 5, which is the ID of the schema I have registered in the registry.
KSQLDB server logs also confirm this 404 HTTP response in the Apicurio logs as shown in the image below
Expectation:
I expect, KSQLDB queries to the table to perform a schema lookup with an ID of 5 and not 0. I'm guessing I'm probably missing some configuration.
Here is the image of the schema registered in the Apicruio Registry
Here is also my source connector configuration. It has the appropriate schema lookup strategy configured. Although, I don't believe KSQLDB requires this when deserialization its table data. This configuration should only be relevant to the capturing of the table data, and its validation and storage in the topic stss.market.info.public.ice_symbols.
{
"name": "new.connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"plugin.name": "pgoutput",
"database.hostname": "172.17.203.10",
"database.port": "6000",
"database.user": "postgres",
"database.password": "123",
"database.dbname": "stss_market_info",
"database.server.name": "stss.market.info",
"table.include.list": "public.ice_symbols",
"message.key.columns": "public.ice_symbols:name",
"snapshot.mode": "always",
"transforms": "unwrap,extractKey",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.extractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractKey.field": "name",
"value.converter": "io.apicurio.registry.utils.converter.AvroConverter",
"value.converter.apicurio.registry.url": "http://local-server:8080/apis/registry/v2",
"value.converter.apicurio.registry.auto-register": true,
"value.converter.apicurio.registry.find-latest": true,
"value.apicurio.registry.as-confluent": true,
"name": "new.connector",
"value.converter.schema.registry.url": "http://local-server:8080/apis/registry/v2"
}
}
Thanks in advance for any assistance.
You can specify the "VALUE_SCHEMA_ID=5" property in the WITH clause when you create a stream/table.
The goal is: MySQL -> Kafka -> MySQL. The sink destination should be up to date with production.
Inserting and deleting records is fine, but I am having issues with schema changes, such as a dropped column. The changes are not replicating to the sink destination.
My source:
{
"name": "hub-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "42",
"database.server.name": "study",
"database.include.list": "companyHub",
"database.history.kafka.bootstrap.servers": "broker:29092",
"database.history.kafka.topic": "dbhistory.study",
"include.schema.changes": "true"
}
}
My sink
{
"name": "companyHub-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:mysql://172.18.141.102:3306/Hub",
"connection.user": "user",
"connection.password": "passaword",
"topics": "study.companyHub.countryTaxes, study.companyHub.addresses",
"auto.create": "true",
"auto.evolve": "true",
"delete.enabled": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_key",
"transforms": "dropPrefix, unwrap",
"transforms.dropPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex":"study.companyHub.(.*)$",
"transforms.dropPrefix.replacement":"$1",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite"
}
}
Any help would be greatly appreciated!
Auto-creation and Auto-evoluton
Tip
Make sure the JDBC user has the appropriate permissions for DDL.
If auto.create is enabled, the connector can CREATE the destination table if it is found to be missing. The creation takes place online with records being consumed from the topic, since the connector uses the record schema as a basis for the table definition. Primary keys are specified based on the key configuration settings.
If auto.evolve is enabled, the connector can perform limited auto-evolution by issuing ALTER on the destination table when it encounters a record for which a column is found to be missing. Since data-type changes and removal of columns can be dangerous, the connector does not attempt to perform such evolutions on the table. Addition of primary key constraints is also not attempted. In contrast, if auto.evolve is disabled no evolution is performed and the connector task fails with an error stating the missing columns.
For both auto-creation and auto-evolution, the nullability of a column is based on the optionality of the corresponding field in the schema, and default values are also specified based on the default value of the corresponding field if applicable. We use the following mapping from Connect schema types to database-specific types:
Schema Type MySQL Oracle PostgreSQL SQLite SQL Server Vertica
Auto-creation or auto-evolution is not supported for databases not mentioned here.
Important
For backwards-compatible table schema evolution, new fields in record schemas must be optional or have a default value. If you need to delete a field, the table schema should be manually altered to either drop the corresponding column, assign it a default value, or make it nullable.
I have brought data from RDS postgres to kafka topic using debezium source connector. Data in topic looks like below:
{"domain":"domain-new-34B","profile":"2423947a-asf23424","account":"aasdfadf","customer":"gaf23sdf","profileno":"324","user":"234234","updatedat":233463463456,"__deleted":"false"}
I am using Kafka JDBC sink connector to send data to Cloudsql Postgres. My sink connector .json file looks like below:
{
"name":"postgres-sink-connector",
"config":{
"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max":1,
"auto.create":true,
"connection.url":"jdbc:postgresql://host:5432/testdb",
"connection.user":"user1",
"connection.password":"user123",
"topics":"server1.public.topic1",
"auto.create":"true",
"auto.evolve":"true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"false",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"false",
"transforms": "flatten",
"transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.flatten.delimiter": ".",
"table.name.format": "${topic}",
"transforms": "route",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$2_$3",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms": "unwrap",
"transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState"
}
}
Getting below error when posting connector:
java.base/java.lang.Thread.run(Thread.java:834)\nCaused by: org.apache.kafka.connect.errors.ConnectException: Sink connector 'postgres-sink-connector' is configured with 'delete.enabled=false' and 'pk.mode=none' and therefore requires records with a non-null Struct value and non-null Struct schema, but found record at (topic='server1.public.topic1',partition=0,offset=0,timestamp=1212312441756) with a HashMap value and null value schema.
I have not created destination table and i want it to be autocreated.
Matthias - it drew my attention that "transforms.flatten.delimiter": ".". Regarding the PostgreSQL db, "." is NOT allowed in the column name. Column Names in PostgreSQL must begin with a letter (a-z) or underscore (_).
The subsequent characters in a name can be letters, digits (0-9), or underscores.
Besides, you have "auto.create" repeated twice in your configuration. One is enough.
I am trying to setup kafka sink connector for writing to exasol database.
I have followed https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/ this article.
Since I could not find any similar sink connector class for exasol hence I tried to use jar https://github.com/exasol/kafka-connect-jdbc-exasol/tree/master/kafka-connect-exasol/jars [copied this jar in
$confluent_dir/share/java/kafka-connect-jdbc] and given the Dialect class inside it as a connector class name in my config json file below.
I have created a json file for configuration as below:
{
"name": "jdbc_sink_mysql_dev_02",
"config": {
"_comment": "The JDBC connector class. Don't change this if you want to use the JDBC Source.",
"connector.class": "com.exasol.connect.jdbc.dailect.ExasolDatabaseDialect",
"_comment": "How to serialise the value of keys - here use the Confluent Avro serialiser. Note that the JDBC Source Connector always returns null for the key ",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"_comment": "Since we're using Avro serialisation, we need to specify the Confluent schema registry at which the created schema is to be stored. NB Schema Registry and Avro serialiser are both part of Confluent Platform.",
"key.converter.schema.registry.url": "http://localhost:8081",
"_comment": "As above, but for the value of the message. Note that these key/value serialisation settings can be set globally for Connect and thus omitted for individual connector configs to make them shorter and clearer",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"_comment": " --- JDBC-specific configuration below here --- ",
"_comment": "JDBC connection URL. This will vary by RDBMS. Consult your manufacturer's handbook for more information",
"connection.url": "jdbc:exa:<myhost>:<myport> <myuser>/<mypassword>",
"_comment": "Which table(s) to include",
"table.whitelist": "<my_table_name>",
"_comment": "Pull all rows based on an timestamp column. You can also do bulk or incrementing column-based extracts. For more information, see http://docs.confluent.io/current/connect/connect-jdbc/docs/source_config_options.html#mode",
"mode": "timestamp",
"_comment": "Which column has the timestamp value to use? ",
"timestamp.column.name": "update_ts",
"_comment": "If the column is not defined as NOT NULL, tell the connector to ignore this ",
"validate.non.null": "false",
"_comment": "The Kafka topic will be made up of this prefix, plus the table name ",
"topic.prefix": "mysql-"
}
}
I am trying to load this connector with below command:
./bin/confluent load jdbc_sink_mysql_dev_02 -d <my_configuration_json_file_path>
P.S. My confluent version is 5.1.0
In similar fashion I have created a mysql-source connector for reading data from mysql and its working well , my use case demands to write that data to exasol database using sink-connector.
Although I am not getting any exceptions, but kafka is not reading any messages.
Any pointers or help to configure such sink-connector to write to exasol database.
I need to source data from a PostgreSQL database with ~2000 schemas. All schemas contain the same tables (it is a multi-tenant application).
The connector is configured as following:
{
"name": "postgres-source",
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"timestamp.column.name": "updated",
"incrementing.column.name": "id",
"connection.password": "********",
"tasks.max": "1",
"mode": "timestamp+incrementing",
"topic.prefix": "postgres-source-",
"connection.user": "*********",
"poll.interval.ms": "3600000",
"numeric.mapping": "best_fit",
"connection.url": "jdbc:postgresql://*******:5432/*******",
"table.whitelist": "table1",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"false"
}
With this configuration, I get this error:
"The connector uses the unqualified table name as the topic name and has detected duplicate unqualified table names. This could lead to mixed data types in the topic and downstream processing errors. To prevent such processing errors, the JDBC Source connector fails to start when it detects duplicate table name configurations"
Apparently the connector does not want to publish data from multiple tables with the same name to a single topic.
This does not matter to me, it could go to a single topic or multiple topics (one for each schema).
As an additional info, if I add:
"schema.pattern": "schema1"
to the config, the connector works and the data from the specified schema and table is copied.
Is there a way to copy multiple schemas that contain tables with the same name?
Thank you