Kafka connect JDBC source connector not working - postgresql

Hello Everyone,
I am using Kafka JDBC Source connector using for postgres. Following is my connector configuration. Some how it is not bringing any data. What is wrong in this configuration?
{
"name": "test-connection",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "timestamp",
"timestamp.column.name": "TEST_DT",
"topic.prefix": "test",
"connection.password": "xxxxxx",
"validate.non.null": "false",
"connection.user": "xxxxxx",
"table.whitelist": "test.test",
"connection.url": "jdbc:postgresql://xxxx:5432/xxxx?ssl=true&stringtype=unspecified",
"name": "test-connection"
},
"tasks": [],
"type": "source"
}
Do I need to create the topic or does it get generated automatically?
I expect the data to be flowing based on the example but the data is not flowing.Following is the log I see in the kafka connect. But, no data is flowing in.
Log
[2019-07-07 20:52:37,465] INFO WorkerSourceTask{id=test-connection-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2019-07-07 20:52:37,465] INFO WorkerSourceTask{id=test-connection-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)

Do I need to create the topic or does it get generated automatically?
it generates automatically with the "test" prefix you set in "topic.prefix": "test"
so your topic is called "testtest-connection" or "testtest.test"
it is possible that you are using Avro schema, if so, you have to consume the topic with Avro consumer.

I faced exactly the same problem and there was no error in the log, though I was adding/modifying the records in postgres and it was not sending any messages. Was getting the same log messages in INFO mode as you mentioned. Here I resolved it and possibly one or all of these might be causing this. So please check what was the issue at your end. If it solves your issue, please accept this as the answer.
"table.whitelist" : "public.abcd" <-- This property you ensure you give the schema name also explicitly e.g. I gave "public" as my "abcd" table was in this schema.
Usually the Database when we run (e.g. via Docker) the timezone is in UTC mode and if you are in a timezone more than that then while querying the data internally it would put the filter condition such a way that your data is filtered out. To overcome the best way is your timestamp column should be "timestamp with timezone" this solved my issue. Another variation I did was i inserted the data and gave the value of this column as "now() -interval '3 days'" to ensure the data is old and immediately it flow to Topic. Well, best is to give timestamp with timezone instead of this hack.
Finally another possible solution could be while giving the connector config you tell what timezone is your postgres db is. You may google for the property. As point-2 solved my issue so i didn't try this.
CREATE TABLE public.abcd (
id SERIAL PRIMARY KEY,
title VARCHAR(100) not NULL,
update_ts TIMESTAMP with time zone default now() not null
);
My config which worked. in case needed.
{
"name": "jdbc_source_connector_postgresql_004",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:postgresql://192.168.99.116:30000/mydb",
"connection.user": "sachin",
"connection.password": "123456",
"topic.prefix": "thesach004_",
"poll.interval.ms" : 1000,
"mode":"timestamp",
"table.whitelist" : "public.abcd",
"timestamp.column.name": "update_ts",
"validate.non.null": false,
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
}
}
-$achin

Related

Kafka Connect - From JSON records to Avro files in HDFS

My current setup contains Kafka, HDFS, Kafka Connect, and a Schema Registry all in networked docker containers.
The Kafka topic contains simple JSON data without a Schema:
{
"repo_name": "ironbee/ironbee"
}
The Schema Registry contains a JSON Schema describing the data in the Kafka Topic:
{"$schema": "https://json-schema.org/draft/2019-09/schema",
"$id": "http://example.com/example.json",
"type": "object",
"title": "Root Schema",
"required": [
"repo_name"
],
"properties": {
"repo_name": {
"type": "string",
"default": "",
"title": "The repo_name Schema",
"examples": [
"ironbee/ironbee"
]
}
}}
What I am trying to achieve is a Connection that reads JSON data from a Topic and dumps it into files in HDFS (Avro or Parquet).
{
"name": "kafka to hdfs",
"connector.class": "io.confluent.connect.hdfs3.Hdfs3SinkConnector",
"topics": "repo",
"hdfs.url": "hdfs://namenode:9000",
"flush.size": 3,
"confluent.topic.bootstrap.servers": "kafka-1:19092,kafka-2:29092,kafka-3:39092",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "io.confluent.connect.json.JsonSchemaConverter",
"value.converter.schemas.enable": "false",
"value.converter.schema.registry.url": "http://schema-registry:8081"
}
If I try to read the raw JSON value via the StringConverter (no schema used) and dump it into Avro files it works, resulting in
Key=null Value={my json} touples
so no usable structure at all.
When I try to use my schema via the JsonSchemaConverter I get the errors
“Converting byte[] to Kafka Connect data failed due to serialization error of topic”
“Unknown magic byte”
I think that there is something wrong with the configuration of my connection, but after a week of trying everything my google-skills have reached their limits.
All the code is available here: https://github.com/SDU-minions/7-Scalable-Systems-Project/tree/dev/Kafka
raw JSON value via the StringConverter (no schema used)
schemas.enable property only exists on JSONConverter. Strings don't have schemas. JSONSchema always has a schema, so property also doesn't exist there.
When I try to use my schema via the JsonSchemaConverter I get the errors
Your producer needs to use Confluent JSONSchema Serializer. Otherwise, it doesn't get sent to Kafka with the "magic byte" referred to in your error.
I personally haven't tried converting JSON schema records to Avro directly in Connect. Usually the pattern is to either produce Avro directly, or convert within ksqlDB, for example to a new Avro topic, which is then consumed by Connect.

KSQLDB Push Queries Fail to Deserialize Data - Schema Lookup Performed with Wrong Schema ID

I'm not certain what I could be missing.
I have set up a Kafka broker server, with a Zookeeper and a distributed Kafka Connect.
For schema management, I have set up an Apicurio Schema Registry instance
I also have KSQLDB setup
The following I can confirm is working as expected
My source JDBC connector successfully pushed table data into the topic stss.market.info.public.ice_symbols
Problem:
Inside the KSQLDB server, I have successfully created a table from the topic stss.market.info.public.ice_symbols
Here is the detail of the table created
The problem I'm facing is when performing a push query against this table, it returns no data. Deserialization of the data fails due to the unsuccessful lookup of the AVRO schema in the Apicurio Registry.
Looking at the Apicurio Registry logs reveals that KSQLDB calls to Apicrio Registry to fetch the deserialization schema using a schema ID of 0 instead of 5, which is the ID of the schema I have registered in the registry.
KSQLDB server logs also confirm this 404 HTTP response in the Apicurio logs as shown in the image below
Expectation:
I expect, KSQLDB queries to the table to perform a schema lookup with an ID of 5 and not 0. I'm guessing I'm probably missing some configuration.
Here is the image of the schema registered in the Apicruio Registry
Here is also my source connector configuration. It has the appropriate schema lookup strategy configured. Although, I don't believe KSQLDB requires this when deserialization its table data. This configuration should only be relevant to the capturing of the table data, and its validation and storage in the topic stss.market.info.public.ice_symbols.
{
"name": "new.connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"plugin.name": "pgoutput",
"database.hostname": "172.17.203.10",
"database.port": "6000",
"database.user": "postgres",
"database.password": "123",
"database.dbname": "stss_market_info",
"database.server.name": "stss.market.info",
"table.include.list": "public.ice_symbols",
"message.key.columns": "public.ice_symbols:name",
"snapshot.mode": "always",
"transforms": "unwrap,extractKey",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.extractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractKey.field": "name",
"value.converter": "io.apicurio.registry.utils.converter.AvroConverter",
"value.converter.apicurio.registry.url": "http://local-server:8080/apis/registry/v2",
"value.converter.apicurio.registry.auto-register": true,
"value.converter.apicurio.registry.find-latest": true,
"value.apicurio.registry.as-confluent": true,
"name": "new.connector",
"value.converter.schema.registry.url": "http://local-server:8080/apis/registry/v2"
}
}
Thanks in advance for any assistance.
You can specify the "VALUE_SCHEMA_ID=5" property in the WITH clause when you create a stream/table.

Debezium connector with TimescaleDB extension

I'm having trouble with detecting changes on Postresql hyper table(TimescaleDB extension).
Setup:
I have Postresql(ver 11.10) installed with TimescaleDB(ver 1.7.1) extension.
I have 2 tables I want to monitor them with Debezium(ver 1.3.1) connector installed on Kafka Connect for the purpose CDC(Capture Data Change).
Tables are table1 and table2hyper, but table2hyper is hypertable.
After creating Debezium connector in Kafka Connect I can see 2 topics created(one for each table):
(A) kconnect.public.table1
(B) kconnect.public.table2hyper
When consuming messages with kafka-console-consumer for topic A, I can see the messages after a row update in table1.
But when consuming messages from topic B(table2hyper table changes), nothing is emitted after for example a row update in table2hyper table.
Initialy Debezium connector does a snapshot of rows from table2hyper table and sends them to topic B(I can see the messages in topic B when using kafka-console-consumer), but changes that I do after the initial snapshot are not emitted.
Why am I unable to see subsequent changes(after initial snapshot) from table2hyper?
Connector creation payload:
{
"name": "alarm-table-connector7",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "xxx",
"database.port": "5432",
"database.user": "xxx",
"database.password": "xxx",
"database.dbname": "xxx",
"database.server.name": "kconnect",
"database.whitelist": "public.dev_db",
"table.include.list": "public.table1, public.table2hyper",
"plugin.name": "pgoutput",
"tombstones.on.delete":"true",
"slot.name": "slot3",
"transforms": "unwrap",
"transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones":"false",
"transforms.unwrap.delete.handling.mode":"rewrite",
"transforms.unwrap.add.fields":"table,lsn,op"
}
}
Thx in advance!
After trying for a while, I did not succeed to stream data from hyper table with Debezium connector. I was using version 1.3.1. and upgrade to latest 1.4.1. did not help.
However, I did succeed with Confluent JDBC connector.
As far as my research and testing goes, this is the conclusion and feel free to correct me if necessary:
Debezium works on ordinary tables on INSERT, UPDATE and
DELETE events
Confluent connector captures only INSERT events(unless you combine
some columns for detecting changes) and works on ordinary and
hyper(TimescaleDB) tables.
we have never tested Debezium with TimescaleDB. I recommend you to check if TimescaleDB updates are present in logical rpelication slot. If yes it should be technically possible to have Debezium process the events. If not then is is not possible at all.

Kafka Connect JDBC Sink quote.sql.identifiers not working

I'm trying to use Kafka Connect to sync data from an old DB2 database to a Postgres database using the JDBC Source and Sink Connectors. It works fine, but only if I am very strict on the case I use for table names.
For example, I have a table in DB2 called ACTION and it also exists in Postgres with the same columns, etc. The only difference is in DB2 it's upper case ACTION and in Postgres it's lowercase action.
Here's a sink file that works:
{
"name": "jdbc_sink_pg_action",
"config": {
"_comment": "The JDBC connector class",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"_comment": "How to serialise the value of keys ",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"_comment": "As above, but for the value of the message. Note that these key/value serialisation settings can be set globally for Connect and thus omitted for individual connector configs to make them shorter and clearer",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"_comment": " --- JDBC-specific configuration below here --- ",
"_comment": "JDBC connection URL.",
"connection.url": "jdbc:postgresql://localhost:5435/postgres",
"connection.user": "postgres",
"connection.password": "*****",
"topics": "ACTION",
"table.name.format": "action",
"_comment": "The insertion mode to use",
"insert.mode": "upsert",
"_comment": "The primary key mode",
"pk.mode": "record_value",
"_comment": "List of comma-separated primary key field names. The runtime interpretation of this config depends on the pk.mode",
"pk.fields": "ACTION_ID",
"quote.sql.identifiers": "never"
}
}
This is ok, but it's not very flexible. For example, I have many other tables and I'd like to sync them too, but I don't want to create a connector file for each and every table. So I try using:
"table.name.format": "${topic}",
When I do this, I get the following error in the logs when I try to load my sink connector:
Caused by: org.apache.kafka.connect.errors.ConnectException: Table "ACTION"
is missing and auto-creation is disabled
So it seems to me that "quote.sql.identifiers": "never" is not actually working otherwise the query the sink connector is doing would be unquoted and it would allow for any case (it would convert to lower).
Why isn't this working? I get the same results if I just use ACTION as the table.name.format.
Your PostgreSQL table name (action) is not equal to the topic name (ACTION).
Kafka Connect JDBC Connector uses getTables() method to check if a table exists, where tableNamePattern param is case sensitive (according the docs: must match the table name as it is stored in the database).
You can use ChangeTopicCase transformation from Kafka Connect Common Transformations.

kafka connect jdbc source setup is not reading data from db and so no data in kafka topic

we configured kafka connect jdbc to read data from db2 and publish to kafka topic and we are using one of the column of type timestamp as timestamp.column.name , but i see that kafka connect is not publishing any data to kafka topic , even their is no new data coming after the kafka connect setup done , their is huge data in DB2 , so atleast that it should publish to kaka topic , but that also not happening , below my connector source configuration
{
"name": "next-error-msg",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "1",
"connection.url": "DB_DATA_SOURCE_URL",
"connection.user": "DB_DATA_SOURCE_USERNAME",
"connection.password": "DB_DATA_SOURCE_PASSWORD",
"schema.pattern": "DB_DATA_SCHEMA_PATTERN",
"mode": "timestamp",
"query": "SELECT SEQ_I AS error_id, SEND_I AS scac , to_char(CREATE_TS,'YYYY-MM-DD-HH24.MI.SS.FF6') AS create_timestamp, CREATE_TS, MSG_T AS error_message FROM DB_ERROR_MEG",
"timestamp.column.name": "CREATE_TS",
"validate.non.null": false,
"topic.prefix": "DB_ERROR_MSG_TOPIC_NAME"
}
}
my doubts are why it is not reading the data , and it should read existing data already present in the DB , but that is not happening , is their something i need to configure or add extra ?