Using debezium mongo connector, I am pushing data out of mongodb to kafka. I have 4 collections: transactions, wallets, cards and users.
Before launching kafka-connector, I created a topic "mongo_conn.digi.users". and when i re-launch it Kafka-connector created "mongo_conn.digi.cards" and "mongo_conn.digi.transactions" topic on it's own and got data pushed to them. However, it did not create a topic for wallets and did not push data to "mongo_conn.digi.users".
So, I deleted "mongo_conn.digi.users" and re-launched connector. As expected, "mongo_conn.digi.users" was recreated by kafka-connector but nothing has been pushed too. also no topic was created for wallets collection same issue.
I am trying to find a reason on why connector did not create any topic for wallets topic & why kafka-connector is not pushing any data for users topic
this is my connector output
Related
I have a workflow where upstream is generating a data and transformer module applies some business logic on it and store the result in table. Now requirement is I need to publish that result into Kafka topic
You can use Debezium to pull CDC logs from a few supported databases into a Kafka topic.
Otherwise, Kafka Connect offers many plugins for different data sources, and Confluent Hub is a sort of index where you can search for those
Otherwise, simply make your data generator into a Kafka producer instead of just a database client
As the title states, I'm using debezium Postgres source connector and I would like MongoDB sink connector to group kafka topics in different collection and databases (also different dbs to isolate unrelated data) according to their names. While inquiring I came across with topic.regex connector property at mongo docs. Unfortunately, this only creates a collection in mongo for each kafka topic successfully matched against the specified regex, and I'm planning on using the same mongodb server to harbor many dbs captured from multiple debezium source connectors. Can you help me?
Note: I read this mongo sink setting FieldPathNamespaceMapper, but I'm not sure if it would fit my needs nor how to correctly configure it.
topics.regex is a general sink connector peppery, not unique to Mongo.
If I understand the problem, correctly, obviously only collections will get created in the configured database for Kafka topics that actually exist (match the pattern) and get consumed by the sink.
If you want collections that don't match a pattern, then you'll still need to consume them, but need to explicitly rename the topics via RegexRouter transform before records are written to Mongo
In kafka workers are simple containers that can run multiple connectors. For each connector workers generate tasks according to internal rules and your configurations. So, if you take a look at mongodb sink connector configurations:
https://www.mongodb.com/docs/kafka-connector/current/sink-connector/configuration-properties/all-properties/
You can create different connectors with the same connection.uri, database and collection, or different values. So you might use the topics.regex or topics parameters to group the topics for a single connector with its own connection.uri, database and collection, and run multiple connectors at the same time. Remember that if tasks.max > 1 in your connector, messages might be read out of order. If this is not a problem, set a value of tasks.max next to the number of mongodb shards. The worker will adjust the number of tasks automatically.
I am ingesting data into Druid from Kafka's topic. Now I want to migrate my Kafka Topic to the new Kafka Cluster. What are the possible ways to do this without duplication of data and without downtime?
I have considered below possible ways to migrate Topic to the new Kafka Cluster.
Manual Migration:
Create a topic with the same configuration in the new Kafka cluster.
Stop pushing data in the Kafka cluster.
Start pushing data in the new cluster.
Stop consuming from the old cluster.
Start consuming from the new cluster.
Produce data in both Kafka clusters:
Create a topic with the same configuration in the new Kafka cluster.
Start producing messages in both Kafka clusters.
Change Kafka topic configration in Druid.
Reset Kafka topic offset in Druid.
Start consuming from the new cluster.
After successful migration, stop producing in the old Kafka cluster.
Use Mirror Maker 2:
MM2 creates Kafka's topic in a new cluster.
Start replicating data in both clusters.
Move producer and consumer to the new Kafka cluster.
The problem with this approach:
Druid manages Kafka topic's offset in its metadata.
MM2 will create two topics with the same name(with prefix) in the new cluster.
Does druid support the topic name with regex?
Note: Druid manages Kafka topic offset in its metadata.
Druid Version: 0.22.1
Old Kafka Cluster Version: 2.0
Maybe a slight modification of your number 1:
Start publishing to the new cluster.
Wait for the current supervisor to catch up all the data in the old topic.
Suspend the supervisor. This will force all the tasks to write and publish the segments. Wait for all the tasks for this supervisor to succeed. This is where "downtime" starts. All of the currently ingested data is still queryable while we switch to the new cluster. New data is being accumulated in the new cluster, but not being ingested in Druid.
All the offset information of the current datasource are stored in Metadata Storage. Delete those records using
delete from druid_dataSource where datasource={name}
Terminate the current supervisor.
Submit the new spec with the new topic and new server information.
You can follow these steps:
1- On your new cluster, create your new topic (the same name or new name, doesn't matter)
2- Change your app config to send messages to new kafka cluster
3- Wait till druid consume all messages from the old kafka, you can ensure when data is being consumed by checking supervisor's lagging and offset info
4- Suspend the task, and wait for the tasks to publish their segment and exit successfully
5- Edit druid's datasource, make sure useEarliestOffset is set to true, change the info to consume from new kafka cluster (and new topic name if it isn't the same)
6- Save the schema and resume the task. Druid will hit the wall when checking the offset, because it cannot find them in new kafka, and then it starts from the beginning
Options 1 and 2 will have downtime and you will lose all data in the existing topic.
Option 2 cannot guarantee you wont lose data or generate duplicates as you try to send messages to multiple clusters together.
There will be no way to migrate the Druid/Kafka offset data to the new cluster without at least trying MM2. You say you can reset the offset in Option 2, so why not do the same with Option 3? I haven't used Druid, but it should be able to support consuming from multiple topics, with pattern or not. With option 3, you don't need to modify any producer code until you are satisfied with the migration process.
I am using Confluent MongoDB Atlas Source Connector to pull data from MongoDB collection to Kafka. I have noticed that the connector is creating multiple topics in the Kafka Cluster. I need the data to be available on one topic so that the consumer application can consume the data from the topic. How can I do this?
Besides, why the Kafka connector is creating so many topics? isn't is difficult for consumer applications to retrieve the data with that approach?
Kafka Connect creates 3 internal topics for the whole cluster for managing its own workload. You should never need/want external consumers to use these
In addition to that, connectors can create their own topics. Debezium for example creates a "database history topic", and again, this shouldn't be read outside of the Connect framework.
Most connectors only need to create one for the source to pull data into, which is what consumers actually should care about
I have developed kafka-sink-connector (using confluent-oss-3.2.0-2.11, connect framework) for my data-store (Amppol ADS), which stores data from kafka topics to corresponding tables in my store.
Every thing is working as expected as long as kafka servers and ADS servers are up and running.
Need a help/suggestions about a specific use-case where events are getting ingested in kafka topics and underneath sink component (ADS) is down.
Expectation here is Whenever a sink servers comes up, records that were ingested earlier in kafka topics should be inserted into the tables;
Kindly advise how to handle such a case.
Is there any support available in connect framework for this..? or atleast some references will be a great help.
SinkConnector offsets are maintained in the _consumer_offsets topic on Kafka against your connector name and when SinkConnector restarts it will pick messages from Kafka server from the previous offset it had stored on the _consumer_offsets topic.
So you don't have to worry anything about managing offsets. Its all done by the workers in the Connect framework. In your scenario you go and just restart your sink connector. If the messages are pushed to Kafka by your source connector and are available in the Kafka, sink connector can be started/restarted at any time.