Can someone show me how to connect my kafka server to a jdbc postgresql database and retrieve data from it ? all the tutorials on the internet got me more confused !
You've not said which tutorials you've tried, or in what way you got confused…but the short answer is to use the Kafka Connect JDBC Connector.
You can find examples here and here.
Another option to explore is another Kafka Connect connector, called Debezium. This implements proper Change-Data-Capture (CDC) against Postgres.
Related
I have some Kafka consumers and producers running through my Kafka instance on my Heroku Cluster. I'm looking to create a data sink connector to connect Kafka to PosytgreSQL to put data FROM Kafka TO my heroku PostgreSQL instance. Pretty much like the HeroKu docs, but one way.
I can't figure out the steps I need to take to achieve this.
The docs say to look at the Gitlab or Confluence Ecosystem page but i can't find any mention of Postgres in these.
Looking in the Confluent Kafka Connectors library there seems to something from Debezium but i'm not running Confluent.
The diagram in the Heroku docs mentions a JDBC connector? I found this Postgres JDBC driver, should I be using this?
I'm happy to create a consumer and update postgres manually as the data comes if that's what's needed, but I feel that Kafka to Postgres must be a common enough interface that there should be something out there to manage this?
I'm just looking for some high level help or examples to set me on the right path.
Thanks
You're almost there :)
Bear in mind that Kafka Connect is part of Apache Kafka, and you get a variety of connectors. Some (e.g. Debezium) are community projects from Red Hat, others (e.g. JDBC Sink) are community projects from Confluent.
The JDBC Sink connector will let you stream data from Kafka to a database with a JDBC driver - such as Postgres.
Here's an example configuration:
{
"connector.class" : "io.confluent.connect.jdbc.JdbcSinkConnector",
"key.converter" : "org.apache.kafka.connect.storage.StringConverter",
"connection.url" : "jdbc:postgresql://postgres:5432/",
"connection.user" : "postgres",
"connection.password": "postgres",
"auto.create" : true,
"auto.evolve" : true,
"insert.mode" : "upsert",
"pk.mode" : "record_key",
"pk.fields" : "MESSAGE_KEY"
}
Here's a walkthrough and couple of videos that you might find useful:
Kafka Connect in Action: JDBC Sink
ksqlDB and the Kafka Connect JDBC Sink
Do i actually need to install anything
Kafka Connect comes with Apache Kafka. You need to install the JDBC connector.
Do i actually need to write any code
No, just the configuration, similar to what I quoted above
can i just call the Connect endpoint , which comes with Kafka,
Once you've installed the connector you run Kafka Connect (a binary that ships with Apache Kafka) and then use the REST endpoint to create the connector using the configuration
I'm currently working in a Mainframe Technology where we store the data in IBM DB2.
We got a new requirement to use scalable process to migrate the data to a new messaging platform including new database. For that we have identified Kafka is a suitable solution with either KSQLDB or MONGODB.
Can someone able to tell or direct me on how can we connect to IBM DB2 from Kafka to import the data and place it in either KSQLDB or MONGODB?
Any help is much appreciated.
To import the data from IBM DB2 into Kafka, You need to use any connector like the Debezium connector for DB2.
The information regarding the connector can be found in the following.
https://debezium.io/documentation/reference/connectors/db2.html
Connector Configuration
You can also use JDBC Source Connector for the same functionality. The following links are helpful for the configuration.
https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/
A Simple diagram for events flows from RDMS to Kafka topic.
After placing the data into Kafka, we need to transfer that data MongoDb. We need to use Mongo Db Connector to transfer the data from Kafka to mongo Db.
https://www.mongodb.com/blog/post/getting-started-with-the-mongodb-connector-for-apache-kafka-and-mongodb-atlas
https://www.confluent.io/hub/mongodb/kafka-connect-mongodb
The aim that I want to achieve:
is to be notified about DB data updates, for this reason, I want to build the following chain: PostgreSQL -> Kinesis -> Lambda.
But I am now sure how to notify Kisesis properly about DB changes?
I saw a few examples where peoples try to use Postgresql triggers to send data to Kinesis.
some people use wal2json concept.
So I have some doubts about which option to choose, that why I am looking for advice.
You can leverage Debezium to do the same.
Debezium connectors can also be intergrated within the code, using Debezium Engine and you can add transformation or filtering logic(if you need) before pushing the changes out to Kinesis.
Here's a link that explains about Debezium Postgres Connector.
Debezium Server( Internally I believe it makes use of Debezium Engine).
It supports Kinesis, Google PubSub, Apache Pulsar as of now for CDC from Databases that Debezium Supports.
Here is an article that you can refer to for step by step configuration of Debezium Server
[https://xyzcoder.github.io/2021/02/19/cdc-using-debezium-server-mysql-kinesis.html][1]
From the following issue at CrateDB GitHub page it seems it is not possible, i.e., the Kafka protocol is not supported by CrateDB.
https://github.com/crate/crate/issues/7459
Is there another way to load data from Kafka into CrateDB?
Usually you'd use Kafka Connect for integrating Kafka to target (and source) systems, using the appropriate connector for the destination technology.
I can't find a Kafka Connect connector for CrateDB, but there is a JDBC sink connector for Kafka Connect, and a JDBC driver for CrateDB, so this may be worth a try.
You can read more about Kafka Connect here, and see it in action in this blog series:
https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/
https://www.confluent.io/blog/blogthe-simplest-useful-kafka-connect-data-pipeline-in-the-world-or-thereabouts-part-2/
https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-3/
Disclaimer: I work for Confluent, and I wrote the above blog posts.
I am trying to stream changes in my Postgres database using the Kafka Connect JDBC Connector. I am running into issues upon startup as the database is quite big and the query dies every time as rows change in between.
What is the best practice for starting off the JDBC Connector on really huge tables?
Assuming you can't pause the workload on the database that you're streaming the contents in from to allow the initialisation to complete, I would look at Debezium.
In fact, depending on your use case, I would look at Debezium regardless :) It lets you do true CDC against Postgres (and MySQL and MongoDB), and is a Kafka Connect plugin just like the JDBC Connector is so you retain all the benefits of that.