AWS: What is the right way of PostgreSQL integration with Kinesis? - postgresql

The aim that I want to achieve:
is to be notified about DB data updates, for this reason, I want to build the following chain: PostgreSQL -> Kinesis -> Lambda.
But I am now sure how to notify Kisesis properly about DB changes?
I saw a few examples where peoples try to use Postgresql triggers to send data to Kinesis.
some people use wal2json concept.
So I have some doubts about which option to choose, that why I am looking for advice.

You can leverage Debezium to do the same.
Debezium connectors can also be intergrated within the code, using Debezium Engine and you can add transformation or filtering logic(if you need) before pushing the changes out to Kinesis.
Here's a link that explains about Debezium Postgres Connector.

Debezium Server( Internally I believe it makes use of Debezium Engine).
It supports Kinesis, Google PubSub, Apache Pulsar as of now for CDC from Databases that Debezium Supports.
Here is an article that you can refer to for step by step configuration of Debezium Server
[https://xyzcoder.github.io/2021/02/19/cdc-using-debezium-server-mysql-kinesis.html][1]

Related

Custom Connector for Apache Kafka

I am looking to write a custom connector for Apache Kafka to connect to SQL database to get CDC data. I would like to write a custom connector so I can connect to multiple databases using one connector because all the marketplace connectors only offer one database per connector.
First question: Is it possible to connect to multiple databases using one custom connector? Also, in that custom connector, can I define which topics the data should go to?
Second question: Can I write a custom connector in .NET or it has to be Java? Is there an example that I can look at for custom connector for CDC for a database in .net?
There are no .NET examples. The Kafka Connect API is Java only, and not specific to Confluent.
Source is here - https://github.com/apache/kafka/tree/trunk/connect
Dependency here - https://search.maven.org/artifact/org.apache.kafka/connect-api
looking to write a custom connector ... to connect to SQL database to get CDC data
You could extend or contribute to Debezium, if you really wanted this feature.
connect to multiple databases using one custom connector
If you mean database servers, then not really, no. Your URL would have to be unique per connector task, and there isn't an API to map a task number to a config value. If you mean one server, and multiple database schemas, then I also don't think that is really possible to properly "distribute" within a single connector with multiple tasks (thus why database.names config in Debezium only currently supports one name).
explored debezium but it won't work for us because we have microservices architecture and we have more than 1000 databases for many clients and debezium creates one topic for each table which means it is going to be a massive architecture
Kafka can handle thousands of topics fine. If you run the connector processes in Kubernetes, as an example, then they're centrally deployable, scalable, and configurable from there.
However, I still have concerns over you needing all databases to capture CDC events.
Was also previously suggested to use Maxwell

how to stream data from AWS MSK (kafka) to snowflake using MSK connect

I'm trying to set up a MSK connector for snowflake and i could hardly see any documentation on how to do it. Unfortunately AWS support person also referred me to use snowflake documentation page.
By following this i can create an EC2 instance and spinoff connector but i wanted to go on serverless mode and use MSK connectors
I'm having hard time with connector properties for snowflake and aws doesnt provide much information about it
As answered on the plugins page, you'd need to upload the Snowflake ZIP/JAR plugins to S3, where they'd be downloaded prior to the connector starting
https://docs.aws.amazon.com/msk/latest/developerguide/msk-connect-plugins.html

Kafka Connector to IBM DB2

I'm currently working in a Mainframe Technology where we store the data in IBM DB2.
We got a new requirement to use scalable process to migrate the data to a new messaging platform including new database. For that we have identified Kafka is a suitable solution with either KSQLDB or MONGODB.
Can someone able to tell or direct me on how can we connect to IBM DB2 from Kafka to import the data and place it in either KSQLDB or MONGODB?
Any help is much appreciated.
To import the data from IBM DB2 into Kafka, You need to use any connector like the Debezium connector for DB2.
The information regarding the connector can be found in the following.
https://debezium.io/documentation/reference/connectors/db2.html
Connector Configuration
You can also use JDBC Source Connector for the same functionality. The following links are helpful for the configuration.
https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/
A Simple diagram for events flows from RDMS to Kafka topic.
After placing the data into Kafka, we need to transfer that data MongoDb. We need to use Mongo Db Connector to transfer the data from Kafka to mongo Db.
https://www.mongodb.com/blog/post/getting-started-with-the-mongodb-connector-for-apache-kafka-and-mongodb-atlas
https://www.confluent.io/hub/mongodb/kafka-connect-mongodb

Couchdb changes to Apache Kafka

I want to have all of the changes of a couchdb database in kafka at application run time as they arrive. Is there any reliable existing tool for that?
You may try to use Kafka Connect tool. Also, Confluent Platform provides long list of different connectors for Kafka Connect.
I'm not a CouchDB user, but you may choose one of applicable source connectors here or create your own Kafka CouchDB source connector.

Best way to stream/logically replicate RDS Postgres data to kinesis

Our primary datastore is an RDS Postgres database. It would be nice if we could stream all changes to that happen in Postgres to some sink - whether that's kinesis, elasticsearch or any other data store.
We use Postgres 9.5 which has support for 'logical replication'. However, all the extensions that tap into this stream are blocked on RDS. There's a tutorial for streaming the MySQL RDS flavor to kinesis - the postgres equivalent would be ideal. Is this possible currently?
Have a look at https://github.com/disneystreaming/pg2k4j . It takes all changes made to your database and streams them to Kinesis. See the README for an example of how to set this up with RDS. We've been using it in production and have found it very useful for solving this exact problem. Disclaimer: I wrote https://github.com/disneystreaming/pg2k4j
Integrate a central Amazon Relational Database Service (Amazon RDS) for PostgreSQL database with other systems by streaming its modifications into Amazon Kinesis Data Streams. An earlier post, Streaming Changes in a Database with Amazon Kinesis, described how to integrate a central RDS for MySQL database with other systems by streaming modifications through Kinesis. In this post, I take it a step further and explain how to use an AWS Lambda function to capture the changes in Amazon RDS for PostgreSQL and stream those changes to Kinesis Data Streams.
https://aws.amazon.com/blogs/database/stream-changes-from-amazon-rds-for-postgresql-using-amazon-kinesis-data-streams-and-aws-lambda/