Debezium Embedded Engine with AWS Kinesis - PostgreSQL snapshot load and Transaction metadata stream - postgresql

I'd like to use Debezium Embedded Engine with AWS Kinesis in order to load initial snapshot of PostgreSQL database and then continuously perform a CDC.
I know, that with Kafka Connect I'll have Transaction metadata topic out of the box in order to check transaction boundaries.
How about the same but with Debezium Embedded Engine and AWS Kinesis ( https://debezium.io/blog/2018/08/30/streaming-mysql-data-changes-into-kinesis/ ) Will I have Kinesis Transaction metadata stream in this case? Also, will Debezium Embedded Engine perform initial snapshot of the existing PostgreSQL data?
UPDATED
I implemented test EmbeddedEngine application with PostgreSQL:
engine = EmbeddedEngine.create()
.using(config)
.using(this.getClass().getClassLoader())
.using(Clock.SYSTEM)
.notifying(this::sendRecord)
.build();
Right now, inside my 'sendRecord(SourceRecord record)' method I can see the correct topics for each database table which participate in transaction, for example:
private void sendRecord(SourceRecord record) {
String streamName = streamNameMapper(record.topic());
System.out.println("streamName: " + streamName);
results to the following output:
streamName: kinesis.public.user_states
streamName: kinesis.public.tasks
within the same txId=1510
but I still can't see Transaction metadata stream.
How to correctly get Transaction metadata stream with Debezium EmbeddedEngine?

If you are not specific about using just Debezium Embedded Engine then there is an option provided by Debezium itself and it is called Dewbezium Server( Internally I believe it makes use of Debezium Engine).
It is a good alternative to making use of Kafka and it supports Kinesis, Google PubSub, Apache Pulsar as of now for CDC.
Here is an article that you can refer to
https://xyzcoder.github.io/2021/02/19/cdc-using-debezium-server-mysql-kinesis.html

Related

Stream both schema and data changes from MySQL to MySQL using Kafka Connect

How we can stream schema and data changes along with some kind of transformations into another MySQL instance using Kafka connect source connector.
Is there a way to propagate schema changes also if I use Kafka's Python library(confluent_kafka) to consume and transform messages before loading into target DB.
You can use Debezium to stream MySQL binlogs into Kafka. Debezium is built upon Kafka Connect framework.
From there, you can use whatever client you want, including Python, to consume and transform the data.
If you want to write to MySQL, you can use Kafka Connect JDBC sink connector.
Here is an old post on this topic - https://debezium.io/blog/2017/09/25/streaming-to-another-database/

Can we implement CRUD spring boot and postgresql with kafka

Can kafka be implemented with spring boot + postgresql ? is there an additional process, for example if we insert data into the database how to use kafka? then when calling data from database should I put into kafka first and then the get api just call from kafka not from database ? is this correct to use kafka ?
then when the user calls the save api then I have to retrieve it from the database at the same time so that the data in kafka is also updated ?
Kafka consumption would be separate from any database action
You can use Kafka Connect framework, separately from any Spring app to write data from Postgres to/from Kafka using a combination of Debezium and/or Confluent's JDBC connector
Actions from the Spring app can use KafkaTemplate or JDBCTemplate depending on your needs

Kafka use case to replace control M jobs

I have few control M jobs running in production..
first job - to load csv file records and insert to database staging table
Second job - to perform data encrichment for each records in staging table
Third job - to read data from staging table and insert to another table..
Currently we use apache camel to do this.
We have bough confluent kafka license so we want to use kafka..
Design proposal
Create csv kafka source connector to read data from csv and insert to kafka input topic
Create spring cloud kafka stream binder application to read data from input topic, enrich the data and push to output topic
To have kafka sink connector to push data from output topic to database
Problem now in steps two we need to have database connection to enrich the data and when watch video in youtube it said spring cloud kafka stream binder should not have database connection.. so how i should design my flow? What spring technology i should use?
There's nothing preventing you from having a database connection, but if you read the database table into a Kafka stream/table then you can join and enrich data using Kafka Streams joins rather than remote database calls

AWS: What is the right way of PostgreSQL integration with Kinesis?

The aim that I want to achieve:
is to be notified about DB data updates, for this reason, I want to build the following chain: PostgreSQL -> Kinesis -> Lambda.
But I am now sure how to notify Kisesis properly about DB changes?
I saw a few examples where peoples try to use Postgresql triggers to send data to Kinesis.
some people use wal2json concept.
So I have some doubts about which option to choose, that why I am looking for advice.
You can leverage Debezium to do the same.
Debezium connectors can also be intergrated within the code, using Debezium Engine and you can add transformation or filtering logic(if you need) before pushing the changes out to Kinesis.
Here's a link that explains about Debezium Postgres Connector.
Debezium Server( Internally I believe it makes use of Debezium Engine).
It supports Kinesis, Google PubSub, Apache Pulsar as of now for CDC from Databases that Debezium Supports.
Here is an article that you can refer to for step by step configuration of Debezium Server
[https://xyzcoder.github.io/2021/02/19/cdc-using-debezium-server-mysql-kinesis.html][1]

How do I read a Table In Postgresql Using Flink

I want to do some analytics using Flink on the Data in Postgresql. How and where should I give the port address,username and password. I was trying with the table source as mentioned in the link:https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/common.html#register-tables-in-the-catalog.
final static ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
final static TableSource csvSource = new CsvTableSource("localhost", port);
I am unable to start with actually. I went through all the documents but detailed report about this not found.
The tables and catalog referred to the link you've shared are part of Flink's SQL support, wherein you can use SQL to express computations (queries) to be performed on data ingested into Flink. This is not about connecting Flink to a database, but rather it's about having Flink behave somewhat like a database.
To the best of my knowledge, there is no Postgres source connector for Flink. There is a JDBC table sink, but it only supports append mode (via INSERTs).
The CSVTableSource is for reading data from CSV files, which can then be processed by Flink.
If you want to operate on your data in batches, one approach you could take would be to export the data from Postgres to CSV, and then use a CSVTableSource to load it into Flink. On the other hand, if you wish to establish a streaming connection, you could connect Postgres to Kafka and then use one of Flink's Kafka connectors.
Reading a Postgres instance directly isn't supported as far as I know. However, you can get realtime streaming of Postgres changes by using a Kafka server and a Debezium instance that replicates from Postgres to Kafka.
Debezium connects using the native Postgres replication mechanism on the DB side and emits all record inserts, updates or deletes as a message on the Kafka side. You can then use the Kafka topic(s) as your input in Flink.