Can Debezium be used to commit changes directly to another DB? - debezium

I am interested if there's a way to use Debezium to CDC on one DB and commit every change to another DB (instead of committing and consuming Kafka in the middle). I have seen there's a way to write changes to a file but couldn't find anything about writing to a DB. In a sense, the question is whether I can combine source-connector and sink-connector configuration for one Debezium..?

Yes you can use Debezium as a CDC tool, please refer to my project on github, there I copy data changes from mysql to postgresql, and on another project I copy changes from an oracle database to another oracle database.

Related

Can I initiate an ad-hoc Debezium snapshot without a signaling table?

I am running a Debezium connector to PostgreSQL. The snapshot.mode I use is initial, since I don't want to resnapshot just because the connector has been restarted. However, during development I want to restart the process, as the messages expire from Kafka before they have been read.
If I delete and recreate the connector via Kafka Connect REST API, this doesn't do anything, as the information in the offset/status/config topics is preserved. I have to delete and recreate them when restarting the whole connect cluster to trigger another snapshot.
Am I missing a more convenient way of doing this?
You will also need a new name for the connector as well as a new database.server.name name in the connector config, which stores all the offset information. It should almost be like deploying a connector for the first time again.

Custom Connector for Apache Kafka

I am looking to write a custom connector for Apache Kafka to connect to SQL database to get CDC data. I would like to write a custom connector so I can connect to multiple databases using one connector because all the marketplace connectors only offer one database per connector.
First question: Is it possible to connect to multiple databases using one custom connector? Also, in that custom connector, can I define which topics the data should go to?
Second question: Can I write a custom connector in .NET or it has to be Java? Is there an example that I can look at for custom connector for CDC for a database in .net?
There are no .NET examples. The Kafka Connect API is Java only, and not specific to Confluent.
Source is here - https://github.com/apache/kafka/tree/trunk/connect
Dependency here - https://search.maven.org/artifact/org.apache.kafka/connect-api
looking to write a custom connector ... to connect to SQL database to get CDC data
You could extend or contribute to Debezium, if you really wanted this feature.
connect to multiple databases using one custom connector
If you mean database servers, then not really, no. Your URL would have to be unique per connector task, and there isn't an API to map a task number to a config value. If you mean one server, and multiple database schemas, then I also don't think that is really possible to properly "distribute" within a single connector with multiple tasks (thus why database.names config in Debezium only currently supports one name).
explored debezium but it won't work for us because we have microservices architecture and we have more than 1000 databases for many clients and debezium creates one topic for each table which means it is going to be a massive architecture
Kafka can handle thousands of topics fine. If you run the connector processes in Kubernetes, as an example, then they're centrally deployable, scalable, and configurable from there.
However, I still have concerns over you needing all databases to capture CDC events.
Was also previously suggested to use Maxwell

Debezium: Check if snapshot is complete for postgres initial_only

I'm using Debezium postgres connector v1.4.2.Final.
I'm using snapshot.mode=initial_only, where I only want to get the table(s) snapshot and not stream the incremental changes. Once the snapshot is completed, I want to stop/kill the connector. How can I find out if the snapshotting is complete and that it's safe to kill the connector?
I'm using this to be able to add new tables to an existing connector. For doing that I'm trying this:
kill the original connector (snapshot.mode=initial)
start a new connector with snapshot.mode=initial_only for new tables
stop the new connector once snapshotting is complete
Start original connector after adding new tables to table.whitelist
please check JMX metrics. Verify if this one https://debezium.io/documentation/reference/1.5/connectors/postgresql.html#connectors-snaps-metric-snapshotcompleted_postgresql would suite to your needs.

AWS: What is the right way of PostgreSQL integration with Kinesis?

The aim that I want to achieve:
is to be notified about DB data updates, for this reason, I want to build the following chain: PostgreSQL -> Kinesis -> Lambda.
But I am now sure how to notify Kisesis properly about DB changes?
I saw a few examples where peoples try to use Postgresql triggers to send data to Kinesis.
some people use wal2json concept.
So I have some doubts about which option to choose, that why I am looking for advice.
You can leverage Debezium to do the same.
Debezium connectors can also be intergrated within the code, using Debezium Engine and you can add transformation or filtering logic(if you need) before pushing the changes out to Kinesis.
Here's a link that explains about Debezium Postgres Connector.
Debezium Server( Internally I believe it makes use of Debezium Engine).
It supports Kinesis, Google PubSub, Apache Pulsar as of now for CDC from Databases that Debezium Supports.
Here is an article that you can refer to for step by step configuration of Debezium Server
[https://xyzcoder.github.io/2021/02/19/cdc-using-debezium-server-mysql-kinesis.html][1]

Debezium SQL Server Connector Kafka Initial Snapshot

according to the Debezium SQL Server Connector documentation, initial snapshot only fires on connector first run.
However if I delete connector and create new one but with the same name, initial snapshot is not working also.
Is this by design or known Issue?
Any help appreciated
Kafka Connect stores details about connectors such as their snapshot status and ingest progress even after they've been deleted. If you recreate it with the same name it will assume it's the same connector and thus will try to continue from where the previous connector got to.
If you want a connector to start from scratch (i.e. run snapshot etc) then you need to give the connector a new name. (Technically, you could also go into Kafka Connect and muck about with the internal data to remove the data for the connector of the same name, but that's probably a bad idea)
Give your connector a new database.server.name value or create a new topic. The reason why the snapshot doesn't fire again is because the current offset value for your topic and consumer has already passed the snapshot count index.