Debezium connector with TimescaleDB extension

Debezium connector with TimescaleDB extension - postgresql

I'm having trouble with detecting changes on Postresql hyper table(TimescaleDB extension).
Setup:
I have Postresql(ver 11.10) installed with TimescaleDB(ver 1.7.1) extension.
I have 2 tables I want to monitor them with Debezium(ver 1.3.1) connector installed on Kafka Connect for the purpose CDC(Capture Data Change).
Tables are table1 and table2hyper, but table2hyper is hypertable.
After creating Debezium connector in Kafka Connect I can see 2 topics created(one for each table):
(A) kconnect.public.table1
(B) kconnect.public.table2hyper
When consuming messages with kafka-console-consumer for topic A, I can see the messages after a row update in table1.
But when consuming messages from topic B(table2hyper table changes), nothing is emitted after for example a row update in table2hyper table.
Initialy Debezium connector does a snapshot of rows from table2hyper table and sends them to topic B(I can see the messages in topic B when using kafka-console-consumer), but changes that I do after the initial snapshot are not emitted.
Why am I unable to see subsequent changes(after initial snapshot) from table2hyper?
Connector creation payload:
{
"name": "alarm-table-connector7",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "xxx",
"database.port": "5432",
"database.user": "xxx",
"database.password": "xxx",
"database.dbname": "xxx",
"database.server.name": "kconnect",
"database.whitelist": "public.dev_db",
"table.include.list": "public.table1, public.table2hyper",
"plugin.name": "pgoutput",
"tombstones.on.delete":"true",
"slot.name": "slot3",
"transforms": "unwrap",
"transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones":"false",
"transforms.unwrap.delete.handling.mode":"rewrite",
"transforms.unwrap.add.fields":"table,lsn,op"
}
}
Thx in advance!

After trying for a while, I did not succeed to stream data from hyper table with Debezium connector. I was using version 1.3.1. and upgrade to latest 1.4.1. did not help.
However, I did succeed with Confluent JDBC connector.
As far as my research and testing goes, this is the conclusion and feel free to correct me if necessary:
Debezium works on ordinary tables on INSERT, UPDATE and
DELETE events
Confluent connector captures only INSERT events(unless you combine
some columns for detecting changes) and works on ordinary and
hyper(TimescaleDB) tables.

we have never tested Debezium with TimescaleDB. I recommend you to check if TimescaleDB updates are present in logical rpelication slot. If yes it should be technically possible to have Debezium process the events. If not then is is not possible at all.

Related

KSQLDB Push Queries Fail to Deserialize Data - Schema Lookup Performed with Wrong Schema ID

I'm not certain what I could be missing.
I have set up a Kafka broker server, with a Zookeeper and a distributed Kafka Connect.
For schema management, I have set up an Apicurio Schema Registry instance
I also have KSQLDB setup
The following I can confirm is working as expected
My source JDBC connector successfully pushed table data into the topic stss.market.info.public.ice_symbols
Problem:
Inside the KSQLDB server, I have successfully created a table from the topic stss.market.info.public.ice_symbols
Here is the detail of the table created
The problem I'm facing is when performing a push query against this table, it returns no data. Deserialization of the data fails due to the unsuccessful lookup of the AVRO schema in the Apicurio Registry.
Looking at the Apicurio Registry logs reveals that KSQLDB calls to Apicrio Registry to fetch the deserialization schema using a schema ID of 0 instead of 5, which is the ID of the schema I have registered in the registry.
KSQLDB server logs also confirm this 404 HTTP response in the Apicurio logs as shown in the image below
Expectation:
I expect, KSQLDB queries to the table to perform a schema lookup with an ID of 5 and not 0. I'm guessing I'm probably missing some configuration.
Here is the image of the schema registered in the Apicruio Registry
Here is also my source connector configuration. It has the appropriate schema lookup strategy configured. Although, I don't believe KSQLDB requires this when deserialization its table data. This configuration should only be relevant to the capturing of the table data, and its validation and storage in the topic stss.market.info.public.ice_symbols.
{
"name": "new.connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"plugin.name": "pgoutput",
"database.hostname": "172.17.203.10",
"database.port": "6000",
"database.user": "postgres",
"database.password": "123",
"database.dbname": "stss_market_info",
"database.server.name": "stss.market.info",
"table.include.list": "public.ice_symbols",
"message.key.columns": "public.ice_symbols:name",
"snapshot.mode": "always",
"transforms": "unwrap,extractKey",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.extractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractKey.field": "name",
"value.converter": "io.apicurio.registry.utils.converter.AvroConverter",
"value.converter.apicurio.registry.url": "http://local-server:8080/apis/registry/v2",
"value.converter.apicurio.registry.auto-register": true,
"value.converter.apicurio.registry.find-latest": true,
"value.apicurio.registry.as-confluent": true,
"name": "new.connector",
"value.converter.schema.registry.url": "http://local-server:8080/apis/registry/v2"
}
}
Thanks in advance for any assistance.

You can specify the "VALUE_SCHEMA_ID=5" property in the WITH clause when you create a stream/table.

Some rows in the Postgres table can generate CDC while others cannot

I have a Postgres DB with CDC setup.
I deployed the Kafka Debezium connector 1.8.0.Final for a Postgres DB by
POST http://localhost:8083/connectors
with body:
{
"name": "postgres-kafkaconnector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "example.com",
"database.port": "5432",
"database.dbname": "my_db",
"database.user": "xxx",
"database.password": "xxx",
"database.server.name": "postgres_server",
"table.include.list": "public.products",
"plugin.name": "pgoutput"
}
}
I noticed some strange things.
In same table, when I update rows, some rows can generate CDC, but other rows cannot generate CDC.
And those rows are very similar except for id and identifier are different.
-- Updating this row can generate CDC
UPDATE public.products
SET identifier = 'GET /api/accounts2'
WHERE id = '90c21719-ce41-4523-8ad1-ed6b21ecfaf1';
-- Updating this row cannot generate CDC
UPDATE public.products
SET identifier = 'GET /api/notworking/accounts2'
WHERE id = '22f5ebf3-9594-493d-8aa6-649d9fbcefd2';
I checked my Kafka Connect container log, there is no error neither.
Any idea?

Found the issue! It is because my Kafka Connector postgres-kafkaconnector was initially pointing to a DB (stage1), then I switched to another DB (stage2) by updating
"database.hostname": "example.com",
"database.port": "5432",
"database.dbname": "my_db",
"database.user": "xxx",
"database.password": "xxx",
However, they are using same configuration properties in the Kafka Connect I deployed in the very beginning:
config.storage.topic
offset.storage.topic
status.storage.topic
Since this connector with different DB config shared same above Kafka configuration properties, nd the database table schemas are same,
it became mess due to sharing same Kafka offset.
One simple way to fix is when deploying Kafka connector to test on different DBs, using different names such as postgres-kafkaconnector-stage1 and postgres-kafkaconnector-stage2 to avoid Kafka topic offset mess.

Kafka Connect: streaming changes from Postgres to topics using debezium

I'm pretty new to Kafka and Kafka Connect world. I am trying to implement CDC using Kafka (on MSK), Kafka Connect (using the Debezium connector for PostgreSQL) and an RDS Postgres instance. Kafka Connect runs in a K8 pod in our cluster deployed in AWS.
Before diving into the details of the configuration used, I'll try to summarise the problem:
Once the connector starts, it sends messages to the topic as expected (snahpshot)
Once we make any change to a table (Create, Update, Delete), no messages are sent to the topic. We would expect to see messages about the changes made to the table.
My connector config looks like:
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.user": "root",
"database.dbname": "insights",
"slot.name": "cdc_organization",
"tasks.max": "1",
"column.blacklist": "password, access_key, reset_token",
"database.server.name": "insights",
"database.port": "5432",
"plugin.name": "wal2json_rds_streaming",
"schema.whitelist": "public",
"table.whitelist": "public.kafka_connect_cdc_test",
"key.converter.schemas.enable": "false",
"database.hostname": "de-test-sre-12373.cbplqnioxomr.eu-west-1.rds.amazonaws.com",
"database.password": "MYSECRETPWD",
"value.converter.schemas.enable": "false",
"name": "source-postgres",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"snapshot.mode": "initial"
}
We have tried different configurations for the plugin.name property: wal2josn, wal2json_streaming and wal2json_rds_streaming.
There's no problem of connection between the connector and the DB as we already saw messages flowing through as soon as the connector starts.
Is there a configuration issue with the connector described above that prevent us to see messages related to new changes appearing in the topic?
Thanks

Your connector config looks a bit confusing. I'm pretty new to Kafka as well so I don't really know the issue but this is my connector config that works for me.
{
"name":"<connector_name>",
"config": {
"connector.class":"io.debezium.connector.postgresql.PostgresConnector",
"database.server.name":"<server>",
"database.port":"5432",
"database.hostname":"<host>",
"database.user":"<user>",
"database.dbname":"<password>",
"tasks.max":"1",
"database.history.kafka.boostrap.servers":"localhost:9092",
"database.history.kafka.topic":"<kafka_topic_name>",
"plugin.name":"pgoutput",
"include.schema.changes":"true"
}
}
If this configuration didn't work aswell, try look up the log console; sometimes the error isn't the last write of the console

Is it possible to sink kafka message generated by debezium to snowflake

I use debezium-ui repo to testing debezium mysql cdc feature, the message can normally stream
into kafka, the request body of to create mysql connect are as follows:
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "dbzui-db-mysql",
"database.port": "3306",
"database.user": "mysqluser",
"database.password": "mysql",
"database.server.id": "184054",
"database.server.name": "inventory-connector-mysql",
"database.include.list": "inventory",
"database.history.kafka.bootstrap.servers": "dbzui-kafka:9092",
"database.history.kafka.topic": "dbhistory.inventory"
}
}
And then I need to sink the kafka message into snowflake, the data warehouse my team use. I create a snowflake sink connector to sink it, the request body are as follows:
{
"name": "kafka2-04",
"config": {
"connector.class": "com.snowflake.kafka.connector.SnowflakeSinkConnector",
"tasks.max": 1,
"topics": "inventory-connector-mysql.inventory.orders",
"snowflake.topic2table.map": "inventory-connector-mysql.inventory.orders:tbl_orders",
"snowflake.url.name": "**.snowflakecomputing.com",
"snowflake.user.name": "kafka_connector_user_1",
"snowflake.private.key": "*******",
"snowflake.private.key.passphrase": "",
"snowflake.database.name": "kafka_db",
"snowflake.schema.name": "kafka_schema",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "com.snowflake.kafka.connector.records.SnowflakeJsonConverter",
"header.converter": "org.apache.kafka.connect.storage.SimpleHeaderConverter",
"value.converter.schemas.enable":"true"
}
}
But after it runs,the data sink into my snowflake is like: data in snowflake, the schema in snowflake table is different from mysql table. Is my sink connector config is incorrect or it is impossible to sink kafka data generated by debezium with SnowflakeSinkConnector.

This is default behavior in Snowflake and it is documented here:
Every Snowflake table loaded by the Kafka connector has a schema consisting of two VARIANT columns:
RECORD_CONTENT. This contains the Kafka message.
RECORD_METADATA. This contains metadata about the message, for example, the topic from which the message was read.
If Snowflake creates the table, then the table contains only these two columns. If the user creates the table for the Kafka Connector to add rows to, then the table can contain more than these two columns (any additional columns must allow NULL values because data from the connector does not include values for those columns).

Kafka connect JDBC source connector not working

Hello Everyone,
I am using Kafka JDBC Source connector using for postgres. Following is my connector configuration. Some how it is not bringing any data. What is wrong in this configuration?
{
"name": "test-connection",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "timestamp",
"timestamp.column.name": "TEST_DT",
"topic.prefix": "test",
"connection.password": "xxxxxx",
"validate.non.null": "false",
"connection.user": "xxxxxx",
"table.whitelist": "test.test",
"connection.url": "jdbc:postgresql://xxxx:5432/xxxx?ssl=true&stringtype=unspecified",
"name": "test-connection"
},
"tasks": [],
"type": "source"
}
Do I need to create the topic or does it get generated automatically?
I expect the data to be flowing based on the example but the data is not flowing.Following is the log I see in the kafka connect. But, no data is flowing in.
Log
[2019-07-07 20:52:37,465] INFO WorkerSourceTask{id=test-connection-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2019-07-07 20:52:37,465] INFO WorkerSourceTask{id=test-connection-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)

Do I need to create the topic or does it get generated automatically?
it generates automatically with the "test" prefix you set in "topic.prefix": "test"
so your topic is called "testtest-connection" or "testtest.test"
it is possible that you are using Avro schema, if so, you have to consume the topic with Avro consumer.

I faced exactly the same problem and there was no error in the log, though I was adding/modifying the records in postgres and it was not sending any messages. Was getting the same log messages in INFO mode as you mentioned. Here I resolved it and possibly one or all of these might be causing this. So please check what was the issue at your end. If it solves your issue, please accept this as the answer.
"table.whitelist" : "public.abcd" <-- This property you ensure you give the schema name also explicitly e.g. I gave "public" as my "abcd" table was in this schema.
Usually the Database when we run (e.g. via Docker) the timezone is in UTC mode and if you are in a timezone more than that then while querying the data internally it would put the filter condition such a way that your data is filtered out. To overcome the best way is your timestamp column should be "timestamp with timezone" this solved my issue. Another variation I did was i inserted the data and gave the value of this column as "now() -interval '3 days'" to ensure the data is old and immediately it flow to Topic. Well, best is to give timestamp with timezone instead of this hack.
Finally another possible solution could be while giving the connector config you tell what timezone is your postgres db is. You may google for the property. As point-2 solved my issue so i didn't try this.
CREATE TABLE public.abcd (
id SERIAL PRIMARY KEY,
title VARCHAR(100) not NULL,
update_ts TIMESTAMP with time zone default now() not null
);
My config which worked. in case needed.
{
"name": "jdbc_source_connector_postgresql_004",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:postgresql://192.168.99.116:30000/mydb",
"connection.user": "sachin",
"connection.password": "123456",
"topic.prefix": "thesach004_",
"poll.interval.ms" : 1000,
"mode":"timestamp",
"table.whitelist" : "public.abcd",
"timestamp.column.name": "update_ts",
"validate.non.null": false,
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
}
}
-$achin

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse