Kafka use case to replace control M jobs - postgresql

I have few control M jobs running in production..
first job - to load csv file records and insert to database staging table
Second job - to perform data encrichment for each records in staging table
Third job - to read data from staging table and insert to another table..
Currently we use apache camel to do this.
We have bough confluent kafka license so we want to use kafka..
Design proposal
Create csv kafka source connector to read data from csv and insert to kafka input topic
Create spring cloud kafka stream binder application to read data from input topic, enrich the data and push to output topic
To have kafka sink connector to push data from output topic to database
Problem now in steps two we need to have database connection to enrich the data and when watch video in youtube it said spring cloud kafka stream binder should not have database connection.. so how i should design my flow? What spring technology i should use?

There's nothing preventing you from having a database connection, but if you read the database table into a Kafka stream/table then you can join and enrich data using Kafka Streams joins rather than remote database calls

Related

Publish rdbms table record in kafka topic

I have a workflow where upstream is generating a data and transformer module applies some business logic on it and store the result in table. Now requirement is I need to publish that result into Kafka topic
You can use Debezium to pull CDC logs from a few supported databases into a Kafka topic.
Otherwise, Kafka Connect offers many plugins for different data sources, and Confluent Hub is a sort of index where you can search for those
Otherwise, simply make your data generator into a Kafka producer instead of just a database client

Stream both schema and data changes from MySQL to MySQL using Kafka Connect

How we can stream schema and data changes along with some kind of transformations into another MySQL instance using Kafka connect source connector.
Is there a way to propagate schema changes also if I use Kafka's Python library(confluent_kafka) to consume and transform messages before loading into target DB.
You can use Debezium to stream MySQL binlogs into Kafka. Debezium is built upon Kafka Connect framework.
From there, you can use whatever client you want, including Python, to consume and transform the data.
If you want to write to MySQL, you can use Kafka Connect JDBC sink connector.
Here is an old post on this topic - https://debezium.io/blog/2017/09/25/streaming-to-another-database/

Can we implement CRUD spring boot and postgresql with kafka

Can kafka be implemented with spring boot + postgresql ? is there an additional process, for example if we insert data into the database how to use kafka? then when calling data from database should I put into kafka first and then the get api just call from kafka not from database ? is this correct to use kafka ?
then when the user calls the save api then I have to retrieve it from the database at the same time so that the data in kafka is also updated ?
Kafka consumption would be separate from any database action
You can use Kafka Connect framework, separately from any Spring app to write data from Postgres to/from Kafka using a combination of Debezium and/or Confluent's JDBC connector
Actions from the Spring app can use KafkaTemplate or JDBCTemplate depending on your needs

Is it available if Kafka Sink consume multiple sink into multiple table with standalone configuration?

I have read that an Kafka connect source can receive a multiple topics from a database (a topic represent one table). I have PostgreSQL database with many table, and one Kafka source are satisfied enough for this time. But is it available if I declare only single JDBC Kafka sink to consume all the topics into topic-based destination table, for example all tables from PostgreSQL into single MS SQL Server Database? It is time-cost, for example if I have 200 Tables from one database and must make 200 sinks connection for each tables, although I only need to declare the source once.
You can use Debezium to snapshot one database and all tables to send them over Kafka and dump to any other sink connector (including MSSQL), yes
How many connectors you need to run or how many tables on the destination you'll create are ultimately up to your own configurations
And standalone doesn't matter, but distributed mode is preferred anyway, even if you are only using one machine

How do I read a Table In Postgresql Using Flink

I want to do some analytics using Flink on the Data in Postgresql. How and where should I give the port address,username and password. I was trying with the table source as mentioned in the link:https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/common.html#register-tables-in-the-catalog.
final static ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
final static TableSource csvSource = new CsvTableSource("localhost", port);
I am unable to start with actually. I went through all the documents but detailed report about this not found.
The tables and catalog referred to the link you've shared are part of Flink's SQL support, wherein you can use SQL to express computations (queries) to be performed on data ingested into Flink. This is not about connecting Flink to a database, but rather it's about having Flink behave somewhat like a database.
To the best of my knowledge, there is no Postgres source connector for Flink. There is a JDBC table sink, but it only supports append mode (via INSERTs).
The CSVTableSource is for reading data from CSV files, which can then be processed by Flink.
If you want to operate on your data in batches, one approach you could take would be to export the data from Postgres to CSV, and then use a CSVTableSource to load it into Flink. On the other hand, if you wish to establish a streaming connection, you could connect Postgres to Kafka and then use one of Flink's Kafka connectors.
Reading a Postgres instance directly isn't supported as far as I know. However, you can get realtime streaming of Postgres changes by using a Kafka server and a Debezium instance that replicates from Postgres to Kafka.
Debezium connects using the native Postgres replication mechanism on the DB side and emits all record inserts, updates or deletes as a message on the Kafka side. You can then use the Kafka topic(s) as your input in Flink.