Debezium PostgreSQL connector not creating topic - postgresql

A Kafka topic is not created when I use start the Debezium connector for PostgreSQL. Here's what I have in my properties file:
name=testdb
connector.class=io.debezium.connector.postgresql.PostgresConnector
topic.prefix=test
database.hostname=localhost
database.port=5432
database.user=postgres
database.password=root
database.dbname=testdb
database.server.name=testdb
table.include.list=ipaddrs
plugin.name=pgoutput
According to this, the topic should be named testdb.myschema.ipaddrs (myschema is the name of my schema). However bin/kafka-topics.sh --list --bootstrap-server 192.168.56.1:9092 returns nothing. A topic is not created if I add a row to table ipaddrs.
Kakfka connect starts up successfully when I run bin/connect-standalone.sh config/connect-standalone.properties config/postgres.properties without any exceptions.
I have auto.create.topics.enable = true. http://localhost:8083/connectors/testdb/status shows this:
{"name":"testdb","connector":{"state":"RUNNING","worker_id":"10.0.0.48:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"10.0.0.48:8083"}],"type":"source"}
I am not running Zookeeper. I am running Kafka with KRaft.

table.include.list was not correct. It has to include the schema. So it should have been myschema.ipaddrs in my example. In addition, it seems like the documentation is incorrect for the topic name. In my system it is <topic.prefix>.<table name>. So in my example, it's test.myschema.ipaddrs.

Assuming the database has the correct configuration to work with the Debezium postgreSQL connector, you can remove "database.dbname" if ipaddrs is table name not repeated in another schema.

Related

Exclude Postgresql batch delete logs in confluent debezium connector

We have a requirement for Debezium connector to exclude Postgresql delete logs generated part of batch delete query on a table. This batch runs every month and generates a lot of delete logs which are not needed at down streams.
We have tried below,
add filter conditions to exclude delete logs on the table, but this is excluding all the other delete logs on the table as well along with batch delete logs -> not a suitable option in prod.
Use txId in Debezium filter to skip particular transaction assigned for batch delete . but this requires Debezium connector config changes every time txId changes -> not a suitable option in prod.
Debezium version - 1.2.1
Source - Postgresql 10.18 database

Debezium is very slow, how to improve?

First time user of Debezium, I get only around 1000 messages per MINUTE in debezium ( Which is very slow compared to online benchmark ). No throttling on Kafka connect/ MySQL/ Kafka Broker, not sure what I am doing here. I will post the config here for reference.
Config of Kafka-Connect Worker:
-e CONNECT_GROUP_ID="quickstart" \
-e CONNECT_CONFIG_STORAGE_TOPIC="quickstart-config" \
-e CONNECT_OFFSET_STORAGE_TOPIC="quickstart-offsets" \
-e CONNECT_STATUS_STORAGE_TOPIC="quickstart-status" \
-e CONNECT_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_REST_ADVERTISED_HOST_NAME="localhost" \
-e CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR=1\
-e CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR=1\
-e CONNECT_STATUS_STORAGE_REPLICATION_FACTOR=1\```
Config of Kafka Debezium MySQL Connector
I use all default config for Kafka Debezium MySQL Connector
The best is to reconfigure connector with Avro Serialization to reduce size of the messages and track schema changes. This should give you very significant improvement.
The Avro binary format is compact and efficient. Avro schemas make it
possible to ensure that each record has the correct structure. Avro’s
schema evolution mechanism enables schemas to evolve. This is
essential for Debezium connectors, which dynamically generate each
record’s schema to match the structure of the database table that was
changed.
To see difference removal of the schema from each message will make in your case, without Avro and set up of the Schema Registry, set settings below to false. Do not use it this way in production.
The default behavior is that the JSON converter includes the record’s
message schema, which makes each record very verbose.
If you want records to be serialized with JSON, consider setting the
following connector configuration properties to false:
key.converter.schemas.enable
value.converter.schemas.enable.
Setting these properties to false excludes the verbose schema information from each record.
See also: "Kafka Connect Deep Dive – Converters and Serialization Explained"

JDBC Kafka Postgres table is missing even though a table for topic exist in the postgresDB

I am trying to capture some produced messages into the db, however when I deploy my sink connector, it said that the table is missing. So then I checked the connection to see whether the db it says it is working and I regranted permission and it is working with no error in the db side. However when I check the status in the sink connector it said that the table does not exist
Here is my config for the sink connector
task.max=1
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
connection.url=jdbc:postgresql://dev-pg-host.us-west-2.amazonaws.com:5432/devpg?currentSchema=flatten_stream_data
topics=topic_flatten
error.tolerance=all
error.log.enable=true
error.log.include.message=true
So I added auto.create=true to auto create the db, but then I got ERROR: permission denied for schema TABLE. and or INFO Using PostgresSQL dialect table 'topic_flatten' absent WARN Create failed, will attempt amend if table already exist Error permission denied for schema table name
So then I tried to transform with SMT
transforms=route
transforms.route.type=org.apache.kafka.connect.transforms.RegexRouter
transforms.route.regex=topic_flatten
transforms.route.replacement=topic_flatten
DB Schema:
Schema: flatten_stream_data
Table: topic_flatten
create table topic_flatten(
flatten_change_type varchar(15)
flatten_id varchar(50)
flatten_kind varchar(50)
)
grant delete, insert, select, truncate, update on topic_flatten to flatdletl
I still receive the same error that the table does not exist. that the flatten table is missing
The reason I am trying to capture the messages from the producer is the producer did the auto converting of the array for me since the JDBC sink don't support array conversion. So now the producer does the handles the array while the sink just consumes the messages. I was hoping to have the sink connector sink the topic_flatten topic to the topic table. So I just created a topic_flatten topic table just to be simpler but it said failed table does not exist. Any ideas?

kafka-connect JDBC PostgreSQL Sink Connector explicitly define the PostgrSQL schema (namespace)

I am using the JDBC sink connector to write data to postgresql.
The connector works fine, but it seems the connector can only write data to the default postgresql schema called public
This is the common JDBC URL format for postgresql.
jdbc:postgresql://<host>:<port5432>/<database>
Is it possible to explicitly define the schema name, to which I need the postgresql sink connector to write?
UPDATE:
Thanks, #Laurenz Albe for the hint. I can define search_path in the jdbc connection URL like either of these:
jdbc:postgresql://<host>:<port>/<database>?options=-c%20search_path=myschema,public
jdbc:postgresql://<host>:<port>/<database>?currentSchema=myschema
Use the options connection parameter to set the search_path:
jdbc:postgresql://<host>:<port5432>/<database>?options=-c%20search_path=myschema,public

Re-add lost Clickhouse replica in Zookeeper cluster

We previously had three Clickhouse nodes perfectly synced within Zookeeper until one of them was lost.
The Clickhouse node was rebuilt exactly as it was before (with Ansible) and the same create table command was run which resulted in the following error.
Command:
CREATE TABLE ontime_replica ( ... )
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/ontime_replica', '{replica}', FlightDate, (Year, FlightDate), 8192)
The error is:
Received exception from server:
Code: 253. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception: Replica /clickhouse/tables/01/ontime_replica/replicas/clickhouse1 already exists..
We're currently using Zookeeper version 3.4.10 and I would like to know if there's a way to remove the existing replica within Zookeeper, or simple let Zookeeper know that this is the new version of the existing replica.
Thank you in advance!
My approach to the solution was incorrect. Originally, I thought I needed to remove the replica within Zookeeper. Instead, the following commands within the Clickhouse server solve this problem.
Copy the SQL file from another, working node. The file is in /var/lib/clickhouse/metadata/default
chown clickhouse:clickhouse <database>.sql
chmod 0640 <database>.sql
sudo -u clickhouse touch /var/lib/clickhouse/flags/force_restore_data
service clickhouse-server start