How to read collection data from Mongodb and publish into kafka topic periodically in spring boot - mongodb

I need to read my mongo DB table data periodically and publish it into a Kafka topic using spring boot. I have created a collection in Mongo DB and inserted a few records in Mongo DB. Further, I want to read the data from Mongo DB periodically and need to publish those table data in Kafka's topic using spring boot. I'm very new to spring batch scheduler. Can you please suggest me an idea to achieve this?
Thanks in advance.

What you are talking about is more relevant to Spring Integration: https://spring.io/projects/spring-integration#overview
So, you configure a MongoDbMessageSource with a Poller to read collection periodically.
And then you have service-activator based on the KafkaProducerMessageHandler to damp data into a Kafka topic.
See more in docs:
https://docs.spring.io/spring-integration/docs/5.3.2.RELEASE/reference/html/mongodb.html#mongodb
https://docs.spring.io/spring-integration/docs/5.4.0-M3/reference/html/kafka.html#kafka
Not sure though how to do that with Spring Batch...

Related

Can we implement CRUD spring boot and postgresql with kafka

Can kafka be implemented with spring boot + postgresql ? is there an additional process, for example if we insert data into the database how to use kafka? then when calling data from database should I put into kafka first and then the get api just call from kafka not from database ? is this correct to use kafka ?
then when the user calls the save api then I have to retrieve it from the database at the same time so that the data in kafka is also updated ?
Kafka consumption would be separate from any database action
You can use Kafka Connect framework, separately from any Spring app to write data from Postgres to/from Kafka using a combination of Debezium and/or Confluent's JDBC connector
Actions from the Spring app can use KafkaTemplate or JDBCTemplate depending on your needs

Kafka use case to replace control M jobs

I have few control M jobs running in production..
first job - to load csv file records and insert to database staging table
Second job - to perform data encrichment for each records in staging table
Third job - to read data from staging table and insert to another table..
Currently we use apache camel to do this.
We have bough confluent kafka license so we want to use kafka..
Design proposal
Create csv kafka source connector to read data from csv and insert to kafka input topic
Create spring cloud kafka stream binder application to read data from input topic, enrich the data and push to output topic
To have kafka sink connector to push data from output topic to database
Problem now in steps two we need to have database connection to enrich the data and when watch video in youtube it said spring cloud kafka stream binder should not have database connection.. so how i should design my flow? What spring technology i should use?
There's nothing preventing you from having a database connection, but if you read the database table into a Kafka stream/table then you can join and enrich data using Kafka Streams joins rather than remote database calls

Kafka Connector to IBM DB2

I'm currently working in a Mainframe Technology where we store the data in IBM DB2.
We got a new requirement to use scalable process to migrate the data to a new messaging platform including new database. For that we have identified Kafka is a suitable solution with either KSQLDB or MONGODB.
Can someone able to tell or direct me on how can we connect to IBM DB2 from Kafka to import the data and place it in either KSQLDB or MONGODB?
Any help is much appreciated.
To import the data from IBM DB2 into Kafka, You need to use any connector like the Debezium connector for DB2.
The information regarding the connector can be found in the following.
https://debezium.io/documentation/reference/connectors/db2.html
Connector Configuration
You can also use JDBC Source Connector for the same functionality. The following links are helpful for the configuration.
https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/
A Simple diagram for events flows from RDMS to Kafka topic.
After placing the data into Kafka, we need to transfer that data MongoDb. We need to use Mongo Db Connector to transfer the data from Kafka to mongo Db.
https://www.mongodb.com/blog/post/getting-started-with-the-mongodb-connector-for-apache-kafka-and-mongodb-atlas
https://www.confluent.io/hub/mongodb/kafka-connect-mongodb

Can we use any other database like MariaDB or MongoDB for Storing states in Kafka Streams instead of Rocks DB, is there any way to configure it?

i have a Spring boot Kafka Stream application which process all the incoming events and store it in the State Store which Kafka Streams provides internally and query it using interactive query service. Inside all these Kafka Streams using "RocksDB" , i want to replace this RocksDB with any other db that can configurable like MariaDB or MongoDB. Is there a way to do it ? if not
How can i configure Kafka Stream application to use MongoDB for creating the state stores.
StateStore / KeyValueStore are open interfaces in Kafka Streams which can be used with TopologyBuilder.addStateStore
Yes, you can materialize values to your own store implementation with a database of your choice, but it'll affect processing semantics should there be any database connection issues, particularly with remote databases.
Instead, using a topic more of a log of transactions then following that up with Kafka Connect is the proper approach for external systems

Kafka Connect MongoDB Sink vs Mongo Client

I need to build an app that reads from Kafka and writes the data to MongoDB.
Most of the times, the data will be written as is, but there will be cases where some processing on the data will be needed.
I wonder what to do -
Use Kafka Connect MongoDB Sink or use our "old and familiar" approach of building an app with Kafka consumer and write the data to Mongo using MongoDB client (runs on K8s).
What are the adventages\ disadvantages of using Kafka connect? in terms of monitoring, scaling, debugging and pre-processing of the data?
Thanks