Configuring Kafka connect Postgress Debezium CDC plugin - postgresql

I am trying to use kafka connect to read changes in postgress DB.
I have Kafka running on my local system and i want to use the Kafka connect API in standalone mode to read the postgress server DB changes.
connect-standalone.sh connect-standalone.properties dbezium.properties
i would appreciate if someone can help me with setting up configuration properties for CDC postgress debezium connector
https://www.confluent.io/connector/debezium-postgresql-cdc-connector/
I am following the below to construct the properties
https://debezium.io/docs/connectors/postgresql/#how-the-postgresql-connector-works
The name of the Kafka topics takes by default the form
serverName.schemaName.tableName, where serverName is the logical name
of the connector as specified with the database.server.name
configuration property
and here is what i have come up with for dbezium.properties
name=cdc_demo
connector.class=io.debezium.connector.postgresql.PostgresConnector
tasks.max=1
plugin.name=wal2json
slot.name=debezium
slot.drop_on_stop=false
database.hostname=localhost
database.port=5432
database.user=postgress
database.password=postgress
database.dbname=test
time.precision.mode=adaptive
database.sslmode=disable
Lets say i create a PG schema name as demo and table name as suppliers
So i need to create a topic with name as test.demo.suppliers so that this plugin can push the data to?
Also can someone suggest a docker image which has the postgress server + with suitable replication plugin such as wal2json etc? i am having hard time configuring postgress and the CDC plugin myself.

Check out the tutorial with associated Docker Compose and sample config.
The topic you've come up with sounds correct, but if you have your Kafka broker configured to auto-create topics (which is the default behaviour IIRC) then it will get created for you and you don't need to pre-create it.

Related

Schema Registry URL for IIDR CDC Kafka subscription

I have created a cluster Amazon MSK. Also, created an EC2 instance and installed Kafka on it to create a topic in Amazon MSK. I am able to produce/consume messages on the topic using Kafka scripts.
I have also installed the IIDR Replication agent on an EC2 instance. The plan is to migrate DB2 table data into the Amazon MSK topic.
In the IDR Management console, I am able to add the IIDR replication server as the target.
Now when creating the subscription, it is asking for ZooKeeper URL and Schema Registry URL. I can get the Zookeeper endpoints from Amazon MSK.
What value to provide for the schema registry URL as there's none created?
Thanks for your help.
If you do not need to specify a schema registry because say you are using a KCOP that generate JSON, just put in a dummy value. Equally if you are specifying a list of Kafka brokers in the kafkaconsumer.propertie and the kafkaproducer.properties files in the CDC instance.conf directory you can put in dummy values for the zookeeper fields.
Hope this helps
Robert

How can I increase the tasks.max for debezium sql connnector?

I tried setting the configuration for debezium MySQL connector for the property
'tasks.max=50'
But the connector in logs shows error as below:
'java.lang.IllegalArgumentException: Only a single connector task may be started'
I am using MSK Connector with debezium custom plugin and Debezium version 1.8.
It's not possible.
The database bin log must be read sequentially by only one task.
Run multiple connectors for different tables if you want to distribute workload

Is there a way of telling a sink connector in Kafka Connect how to look for schema entries

I have successfully set up Kafka Connect in distributed mode locally with the Confluent BigQuery connector. The topics are being made available to me by another party; I am simply moving these topics into my Kafka Connect on my local machine, and then to the sink connector (and thus into BigQuery).
Because of the topics being created by someone else, the schema registry is also being managed by them. So in my config, I set "schema.registry.url":https://url-to-schema-registry, but we have multiple topics which all use the same schema entry, which is located at, let's say, https://url-to-schema-registry/subjects/generic-entry-value/versions/1.
What is happening, however, is that Connect is looking for the schema entry based on the topic name. So let's say my topic is my-topic. Connect is looking for the entry at this URL: https://url-to-schema-registry/subjects/my-topic-value/versions/1. But instead, I want to use the entry located at https://url-to-schema-registry/subjects/generic-entry-value/versions/1, and I want to do so for any and all topics.
How can I make this change? I have tried looking at this doc: https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#configuration-details as well as this class: https://github.com/confluentinc/schema-registry/blob/master/schema-serializer/src/main/java/io/confluent/kafka/serializers/subject/TopicRecordNameStrategy.java
but this looks to be a config parameter for the schema registry itself (which I have no control over), not the sink connector. Unless I'm not configuring something correctly.
Is there a way for me to configure my sink connector to look for a specified schema entry like generic-entry-value/versions/..., instead of the default format topic-name-value/versions/...?
The strategy is configurable at the connector level.
e.g.
value.converter.value.subject.name.strategy=...
There are only strategies built-in, however for Topic and/or RecordName lookups. You'll need to write your own class for static lookups from "generic-entry" if you otherwise cannot copy this "generic-entry-value" schema into new subjects
e.g
# get output of this to a file
curl ... https://url-to-schema-registry/subjects/generic-entry-value/versions/1/schema
# upload it again where "new-entry" is the name of the other topic
curl -XPOST -d #schema.json https://url-to-schema-registry/subjects/new-entry-value/versions

Example connector property file to provide standalone Kafka Connect

I am starting to play with CDC and Kafka connect
After countless hours trying, I have come to understand the logic
Set Kafka Connect properties (bin/connect-standalone.sh) with your cluster information
Set Kafka Connect configuration file (config/connect-standalone.properties)
Download your Kafka connector (in this case MySQL from Debizium)
Configure connector properties in whatevername.properties
In order to run a worker with Kafka Connector, you need to
./bin/connect-standalone.sh config/connect-standalone.properties
which answers:
INFO Usage: ConnectStandalone worker.properties connector1.properties [connector2.properties ...] (org.apache.kafka.connect.cli.ConnectStandalone:62)
I know we need to run:
./bin/connect-standalone.sh config/connect-standalone.properties myconfig.properties
My issue is that I cannot find any format description, or example of that myconfig.properties field.
【Extra Info】
Debizium configuration properties list:
https://docs.confluent.io/debezium-connect-mysql-source/current/mysql_source_connector_config.html#mysql-source-connector-config
https://debezium.io/documentation/reference/1.5/connectors/mysql.html
【Question】
Where can I find an example of the connector properties?
Thanks!
I'm not sure if I understood your question, but here is an example of properties for this connector :
connector.class=io.debezium.connector.mysql.MySqlConnector
connector.name=someuniquename
database.hostname=192.168.99.100
database.port=3306
database.user=debezium-user
database.password=debezium-user-pw
database.server.id=184054
database.server.name=fullfillment
database.include.list=inventory
database.history.kafka.bootstrap.servers=kafka:9092
database.history.kafka.topic=dbhistory.fullfillment
include.schema.changes=true
The original config is the one from the documentation which I converted from json to properties : https://debezium.io/documentation/reference/1.5/connectors/mysql.html#mysql-example-configuration

How to use Kafka with Neo4j community edition

I installed Neo4j and I can access the server. I can make nodes though cypher.
Now I want to use it for data streams. But I'm not sure how to do so. I just started Neo4j and I'm struggling with installing 'Stream Plugin'.
Any help is highly appreciated.
You should copy the jar files for the Neo4j streams plugin directly into your /plugins folder and configure the connections to Kafka and Zookeeper as well as other Neo4j property values at the neo4j.conf file as described here. For example:
kafka.zookeeper.connect=zookeeper-host:2181
kafka.bootstrap.servers=kafka-host:9092
Alternatively, if you are looking only for a sink connection from Kafka (i.e. moving records from Kafka topics to into Neo4j), you can also use Kafka Connect with the the supported Kafka Connect Neo4j Sink. More at https://www.confluent.io/hub/neo4j/kafka-connect-neo4j