Kafka can not stream database activity - postgresql

I'm streaming my PostgreSQL using confluent (./kafka-avro-console-consumer). Successfully streaming, but Kafka only showed INSERT activity. Other than that like DELETE, UPDATE, CREATE TABLE, didn't stream on my Kafka consume.
I did:
Install PostgreSQL including making role until making tables
Install debezium CDC for PostgreSQL (I didn't use json/jdbc at all)
Install wal2json
Making connector to confluent
Automatically: topics are made after successfully deploy the connector
Stream topics ./kafka-avro-console-consumer --bootstrap-server localhost:9092 --topic debezium.public.emp_bio --from-beginning
This is my connector
{
"name": "postgres-connector-1",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "localhost",
"database.port": "5432",
"database.user": "dbuser1",
"database.password": "password",
"database.dbname":"testdb",
"database.server.name": "debezium",
"database.whitelist": "testdb",
"plugin.name": "wal2json",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"database.history.kafka.topic": "postgres-hist-test",
"include.schema.changes": "true"
}
}
Expected: I can stream other activities like DELETE, DROP, UPDATE, CREATE from my PostgreSQL
Error: Sorry, there is no error message anywhere!

Related

Some rows in the Postgres table can generate CDC while others cannot

I have a Postgres DB with CDC setup.
I deployed the Kafka Debezium connector 1.8.0.Final for a Postgres DB by
POST http://localhost:8083/connectors
with body:
{
"name": "postgres-kafkaconnector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "example.com",
"database.port": "5432",
"database.dbname": "my_db",
"database.user": "xxx",
"database.password": "xxx",
"database.server.name": "postgres_server",
"table.include.list": "public.products",
"plugin.name": "pgoutput"
}
}
I noticed some strange things.
In same table, when I update rows, some rows can generate CDC, but other rows cannot generate CDC.
And those rows are very similar except for id and identifier are different.
-- Updating this row can generate CDC
UPDATE public.products
SET identifier = 'GET /api/accounts2'
WHERE id = '90c21719-ce41-4523-8ad1-ed6b21ecfaf1';
-- Updating this row cannot generate CDC
UPDATE public.products
SET identifier = 'GET /api/notworking/accounts2'
WHERE id = '22f5ebf3-9594-493d-8aa6-649d9fbcefd2';
I checked my Kafka Connect container log, there is no error neither.
Any idea?
Found the issue! It is because my Kafka Connector postgres-kafkaconnector was initially pointing to a DB (stage1), then I switched to another DB (stage2) by updating
"database.hostname": "example.com",
"database.port": "5432",
"database.dbname": "my_db",
"database.user": "xxx",
"database.password": "xxx",
However, they are using same configuration properties in the Kafka Connect I deployed in the very beginning:
config.storage.topic
offset.storage.topic
status.storage.topic
Since this connector with different DB config shared same above Kafka configuration properties, nd the database table schemas are same,
it became mess due to sharing same Kafka offset.
One simple way to fix is when deploying Kafka connector to test on different DBs, using different names such as postgres-kafkaconnector-stage1 and postgres-kafkaconnector-stage2 to avoid Kafka topic offset mess.

No CDC generated by Kafka Debezium connector for Postgres

I succeed generating CDC in a Postgres DB.
Today, when I use same step to try to set up Kafka Debezium connector for another Postgres DB.
First I ran
POST http://localhost:8083/connectors
with body:
{
"name": "postgres-kafkaconnector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "example.com",
"database.port": "5432",
"database.dbname": "my_db",
"database.user": "xxx",
"database.password": "xxx",
"database.server.name": "postgres_server",
"table.include.list": "public.products",
"plugin.name": "pgoutput"
}
}
which succeed without error.
Then I ran
GET http://localhost:8083/connectors/postgres-kafkaconnector/status
to check status. It returns this result without any error:
{
"name": "postgres-kafkaconnector",
"connector": {
"state": "RUNNING",
"worker_id": "10.xx.xx.xx:8083"
},
"tasks": [
{
"id": 0,
"state": "RUNNING",
"worker_id": "10.xx.xx.xx:8083"
}
],
"type": "source"
}
However, this time, when I updated anything in the products table. No CDC got generated.
Any idea? Any suggestion for helping further debug would be appreciate. Thanks!
Found the issue! It is because my Kafka Connector postgres-kafkaconnector was initially pointing to a DB (stage1), then I switched to another DB (stage2) by updating
"database.hostname": "example.com",
"database.port": "5432",
"database.dbname": "my_db",
"database.user": "xxx",
"database.password": "xxx",
However, they are using same configuration properties in the Kafka Connect I deployed in the very beginning:
config.storage.topic
offset.storage.topic
status.storage.topic
Since this connector with different DB config shared same above Kafka configuration properties, nd the database table schemas are same,
it became mess due to sharing same Kafka offset.
One simple way to fix is when deploying Kafka connector to test on different DBs, using different names such as postgres-kafkaconnector-stage1 and postgres-kafkaconnector-stage2 to avoid Kafka topic offset mess.

Connecting Kafka connect source directly to the kafka connect sink

I want to sync two tables from two distinct database services with Kafka Connect Source and Kafka Connect Sink.
Kafka Connect Source reads data from source database and publishes changes into a topic named TOP1, and Kafka Connect Sink subscribed into the TOP1 and should write the change into the destination database.
The source and destination database are MSSQL and I use Debezium connector for SQL Server.
I created Kafka Connect Source with following configuration:
{
"name": "sql-source",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"database.server.name": "TEST",
"database.dbname": "test_source",
"database.hostname": "172.1x.xx.xx",
"database.port": "1433",
"database.user": "sa",
"database.password": "xxxxx",
"database.instance": "MSSQLSERVER",
"database.history.kafka.bootstrap.servers": "kafka01.xxxx.dev:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"value.converter.schema.registry.url":"http://kafka01.xxxx.dev:8081"
}
}
This works great and publish any changes(insert, update, delete) with following schema:
{
"before": {...},
"after": {...},
"source": {...}
}
But how should I create Kafka Connect Sink configuration that in destination database I have exactly the same data as source database.
When a record inserted in source the same insert in destination, a record deleted in source the same record delete from destination and also update in source results update in the destination.

Kafka Connect: streaming changes from Postgres to topics using debezium

I'm pretty new to Kafka and Kafka Connect world. I am trying to implement CDC using Kafka (on MSK), Kafka Connect (using the Debezium connector for PostgreSQL) and an RDS Postgres instance. Kafka Connect runs in a K8 pod in our cluster deployed in AWS.
Before diving into the details of the configuration used, I'll try to summarise the problem:
Once the connector starts, it sends messages to the topic as expected (snahpshot)
Once we make any change to a table (Create, Update, Delete), no messages are sent to the topic. We would expect to see messages about the changes made to the table.
My connector config looks like:
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.user": "root",
"database.dbname": "insights",
"slot.name": "cdc_organization",
"tasks.max": "1",
"column.blacklist": "password, access_key, reset_token",
"database.server.name": "insights",
"database.port": "5432",
"plugin.name": "wal2json_rds_streaming",
"schema.whitelist": "public",
"table.whitelist": "public.kafka_connect_cdc_test",
"key.converter.schemas.enable": "false",
"database.hostname": "de-test-sre-12373.cbplqnioxomr.eu-west-1.rds.amazonaws.com",
"database.password": "MYSECRETPWD",
"value.converter.schemas.enable": "false",
"name": "source-postgres",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"snapshot.mode": "initial"
}
We have tried different configurations for the plugin.name property: wal2josn, wal2json_streaming and wal2json_rds_streaming.
There's no problem of connection between the connector and the DB as we already saw messages flowing through as soon as the connector starts.
Is there a configuration issue with the connector described above that prevent us to see messages related to new changes appearing in the topic?
Thanks
Your connector config looks a bit confusing. I'm pretty new to Kafka as well so I don't really know the issue but this is my connector config that works for me.
{
"name":"<connector_name>",
"config": {
"connector.class":"io.debezium.connector.postgresql.PostgresConnector",
"database.server.name":"<server>",
"database.port":"5432",
"database.hostname":"<host>",
"database.user":"<user>",
"database.dbname":"<password>",
"tasks.max":"1",
"database.history.kafka.boostrap.servers":"localhost:9092",
"database.history.kafka.topic":"<kafka_topic_name>",
"plugin.name":"pgoutput",
"include.schema.changes":"true"
}
}
If this configuration didn't work aswell, try look up the log console; sometimes the error isn't the last write of the console

SMT's to create kafka connector string partition key through connector config

I've been implementing a kafka connector for PostgreSQL (I'm using the debezium kafka connector and running all the pieces through docker). I need a custom partition key, and so I've been using the SMT to achieve this. However, the approach that I'm using creates a Struct, and I need it to be a string. This article runs through how to set up the partition key as an int, but I can't access the config file to set up the appropriate transforms. Currently my kafka connector looks like this
curl -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{
"name": "connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "postgres",
"database.password": "password",
"database.dbname" : "postgres",
"database.server.name": "postgres",
"table.include.list": "public.table",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.table",
"transforms": "routeRecords,unwrap,createkey",
"transforms.routeRecords.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.routeRecords.regex": "(.*)",
"transforms.routeRecords.replacement": "table",
"transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState",
"transforms.createkey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createkey.fields": "id"
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
}
}'
I know that I have to extract the value of the column but I'm just not sure how.
ValueToKey creates a Struct from a list of fields, as it is documented.
You need one more transform to extract a specific field from a Struct, as shown in the linked post.
org.apache.kafka.connect.transforms.ExtractField$Key
Note: This does not "set" the partition of the actual Kafka record, only the key, which is then hashed by the Producer to get the partition