Confluent Sink Connector - Sink to Multiple Database Destinations - apache-kafka

If I need to copy the same data from a topic to multiple databases with different IPs, do I need to create multiple sink connectors or I can somehow specify multiple destinations in connection.url or some other way.
Your help is most appreciated

I'm assuming that you're talking about Kafka Connect and the JDBC Sink connector? If so, then you need to create one sink connector per target database. One connector cannot target multiple databases.

You need to spin up the individual sink connectors sink since the connector class or the configuration that is being used varies with databases. Specifying multiple destination IPs of different databases in the connection.url won't work.

Related

How to group kafka topics in different dbs and collections with mongodb sink connector depending on kafka topic name or message key/value

As the title states, I'm using debezium Postgres source connector and I would like MongoDB sink connector to group kafka topics in different collection and databases (also different dbs to isolate unrelated data) according to their names. While inquiring I came across with topic.regex connector property at mongo docs. Unfortunately, this only creates a collection in mongo for each kafka topic successfully matched against the specified regex, and I'm planning on using the same mongodb server to harbor many dbs captured from multiple debezium source connectors. Can you help me?
Note: I read this mongo sink setting FieldPathNamespaceMapper, but I'm not sure if it would fit my needs nor how to correctly configure it.
topics.regex is a general sink connector peppery, not unique to Mongo.
If I understand the problem, correctly, obviously only collections will get created in the configured database for Kafka topics that actually exist (match the pattern) and get consumed by the sink.
If you want collections that don't match a pattern, then you'll still need to consume them, but need to explicitly rename the topics via RegexRouter transform before records are written to Mongo
In kafka workers are simple containers that can run multiple connectors. For each connector workers generate tasks according to internal rules and your configurations. So, if you take a look at mongodb sink connector configurations:
https://www.mongodb.com/docs/kafka-connector/current/sink-connector/configuration-properties/all-properties/
You can create different connectors with the same connection.uri, database and collection, or different values. So you might use the topics.regex or topics parameters to group the topics for a single connector with its own connection.uri, database and collection, and run multiple connectors at the same time. Remember that if tasks.max > 1 in your connector, messages might be read out of order. If this is not a problem, set a value of tasks.max next to the number of mongodb shards. The worker will adjust the number of tasks automatically.

MongoDB Atlas Source Connector Single Topic

I am using Confluent MongoDB Atlas Source Connector to pull data from MongoDB collection to Kafka. I have noticed that the connector is creating multiple topics in the Kafka Cluster. I need the data to be available on one topic so that the consumer application can consume the data from the topic. How can I do this?
Besides, why the Kafka connector is creating so many topics? isn't is difficult for consumer applications to retrieve the data with that approach?
Kafka Connect creates 3 internal topics for the whole cluster for managing its own workload. You should never need/want external consumers to use these
In addition to that, connectors can create their own topics. Debezium for example creates a "database history topic", and again, this shouldn't be read outside of the Connect framework.
Most connectors only need to create one for the source to pull data into, which is what consumers actually should care about

Custom Connector for Apache Kafka

I am looking to write a custom connector for Apache Kafka to connect to SQL database to get CDC data. I would like to write a custom connector so I can connect to multiple databases using one connector because all the marketplace connectors only offer one database per connector.
First question: Is it possible to connect to multiple databases using one custom connector? Also, in that custom connector, can I define which topics the data should go to?
Second question: Can I write a custom connector in .NET or it has to be Java? Is there an example that I can look at for custom connector for CDC for a database in .net?
There are no .NET examples. The Kafka Connect API is Java only, and not specific to Confluent.
Source is here - https://github.com/apache/kafka/tree/trunk/connect
Dependency here - https://search.maven.org/artifact/org.apache.kafka/connect-api
looking to write a custom connector ... to connect to SQL database to get CDC data
You could extend or contribute to Debezium, if you really wanted this feature.
connect to multiple databases using one custom connector
If you mean database servers, then not really, no. Your URL would have to be unique per connector task, and there isn't an API to map a task number to a config value. If you mean one server, and multiple database schemas, then I also don't think that is really possible to properly "distribute" within a single connector with multiple tasks (thus why database.names config in Debezium only currently supports one name).
explored debezium but it won't work for us because we have microservices architecture and we have more than 1000 databases for many clients and debezium creates one topic for each table which means it is going to be a massive architecture
Kafka can handle thousands of topics fine. If you run the connector processes in Kubernetes, as an example, then they're centrally deployable, scalable, and configurable from there.
However, I still have concerns over you needing all databases to capture CDC events.
Was also previously suggested to use Maxwell

Kafka sink from multiple independent brokers

I want to aggregate changes from multiple databases into one so I thought to run a Debezium connector and a Kafka server/broker next to each database, and use a Kafka sink connector to consume from all those Kafkas to write into one database.
The question is, can I use a single instance of Kafka sink connector to consume at the same time, from multiple Kafka brokers which are independent (not a cluster).
Running a Kafka broker next to each database sounds very complicated. And a single Kafka connect worker that connects to different Kafka broker clusters does not seem to be supported, as far as I can see.
If you go down this path, it may make more sense to use something like Kafka MirrorMaker to copy your local topics to a single main Kafka cluster, and then use a Kafka Connect Sink to read all the copied topics from one worker and write to a central DB.
Ultimately, running a Broker next to each source database is pretty complicated. From what you described, it sounds like you have some connectivity between your different databases, but it is limited and possibly prone to disconnects. Have you considered alternative designs:
DB Replication: Use your DB vendor's native async replication to just copy the data to a single target DB. The remote region is always read-only, replication should not slow down your source DB (depends on the DB, of-course). And async DB replication can usually handle some network disconnections and latency.
Local Debezium: Run a process with Debezium next to each DB, and save all events to a file. Copy the files to some central server or to a cloud storage service like S3. Finally, import these files into a central DB. This would basically skip Kafka completely.
You can point the Connect property files at whatever bootstrap.servers you want
The property itself is required to be part of a single "cluster" (even if a single broker), which would be determined by the broker zookeeper.connect property

Is it available if Kafka Sink consume multiple sink into multiple table with standalone configuration?

I have read that an Kafka connect source can receive a multiple topics from a database (a topic represent one table). I have PostgreSQL database with many table, and one Kafka source are satisfied enough for this time. But is it available if I declare only single JDBC Kafka sink to consume all the topics into topic-based destination table, for example all tables from PostgreSQL into single MS SQL Server Database? It is time-cost, for example if I have 200 Tables from one database and must make 200 sinks connection for each tables, although I only need to declare the source once.
You can use Debezium to snapshot one database and all tables to send them over Kafka and dump to any other sink connector (including MSSQL), yes
How many connectors you need to run or how many tables on the destination you'll create are ultimately up to your own configurations
And standalone doesn't matter, but distributed mode is preferred anyway, even if you are only using one machine