how to stream data from AWS MSK (kafka) to snowflake using MSK connect - apache-kafka

I'm trying to set up a MSK connector for snowflake and i could hardly see any documentation on how to do it. Unfortunately AWS support person also referred me to use snowflake documentation page.
By following this i can create an EC2 instance and spinoff connector but i wanted to go on serverless mode and use MSK connectors
I'm having hard time with connector properties for snowflake and aws doesnt provide much information about it

As answered on the plugins page, you'd need to upload the Snowflake ZIP/JAR plugins to S3, where they'd be downloaded prior to the connector starting
https://docs.aws.amazon.com/msk/latest/developerguide/msk-connect-plugins.html

Related

Schema Registry URL for IIDR CDC Kafka subscription

I have created a cluster Amazon MSK. Also, created an EC2 instance and installed Kafka on it to create a topic in Amazon MSK. I am able to produce/consume messages on the topic using Kafka scripts.
I have also installed the IIDR Replication agent on an EC2 instance. The plan is to migrate DB2 table data into the Amazon MSK topic.
In the IDR Management console, I am able to add the IIDR replication server as the target.
Now when creating the subscription, it is asking for ZooKeeper URL and Schema Registry URL. I can get the Zookeeper endpoints from Amazon MSK.
What value to provide for the schema registry URL as there's none created?
Thanks for your help.
If you do not need to specify a schema registry because say you are using a KCOP that generate JSON, just put in a dummy value. Equally if you are specifying a list of Kafka brokers in the kafkaconsumer.propertie and the kafkaproducer.properties files in the CDC instance.conf directory you can put in dummy values for the zookeeper fields.
Hope this helps
Robert

Use Kafka connect with AWS documentDB

I'm trying to use AWS DocumentDB as a sink for storing data received from Kafka and was wondering if the MongoDB Kafka connector works with DocumentDB as its documentation mentions that it is compatible with MongoDB drivers.
https://www.mongodb.com/docs/kafka-connector/current/
https://aws.amazon.com/documentdb/
If not this connector what is the alternate way other than building a custom kafka connect?
You can use MongoDB Kafka connector with DocumentDB for source as well Sink.
Kafka Connector worker(with Mongodb Kafka connector) can be run in distributed mode using containers as well as EC2 hosts.
You can refer blog here which has step by step details
https://aws.amazon.com/blogs/database/stream-data-with-amazon-documentdb-and-amazon-msk-using-a-kafka-connector/

AWS MSK Connect with Mongo DB as the Source

We are planning to have an MSK created in AWS with Mongo DB as the Source for the MSK. We do not see any Pre-defined Connectors for connecting to the Mongo DB as source.
How to setup a connector?
Do we need to create a Connector completely or are there built in connectors available.
You can use MSK Connect to install and provision whatever connectors you want via plugins.
This includes Debezium's Mongo connector, or the source connector provided by Mongo themselves, which are "available to download" ; not sure what you mean "pre-defined".
After this, consult the official documentation for the connector of your choice for its own config properties and use HTTP requests to submit it to the cluster

AWS: What is the right way of PostgreSQL integration with Kinesis?

The aim that I want to achieve:
is to be notified about DB data updates, for this reason, I want to build the following chain: PostgreSQL -> Kinesis -> Lambda.
But I am now sure how to notify Kisesis properly about DB changes?
I saw a few examples where peoples try to use Postgresql triggers to send data to Kinesis.
some people use wal2json concept.
So I have some doubts about which option to choose, that why I am looking for advice.
You can leverage Debezium to do the same.
Debezium connectors can also be intergrated within the code, using Debezium Engine and you can add transformation or filtering logic(if you need) before pushing the changes out to Kinesis.
Here's a link that explains about Debezium Postgres Connector.
Debezium Server( Internally I believe it makes use of Debezium Engine).
It supports Kinesis, Google PubSub, Apache Pulsar as of now for CDC from Databases that Debezium Supports.
Here is an article that you can refer to for step by step configuration of Debezium Server
[https://xyzcoder.github.io/2021/02/19/cdc-using-debezium-server-mysql-kinesis.html][1]

Kafka Connect with Amazon MSK

How do I use Kafka Connect adapters with Amazon MSK?
As per the AWS documentation, it supports Kafka connect but not documented about how to setup adapters and use it.
Edit Oct 2021: MSK Connect has been launched, see https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/
AFAIK Amazon MSK does not provide managed connectors, so you have to run them yourself. This is done by running the Kafka Connect worker process (a JVM) and then providing it one or more connector configurations to run.
From the point of view of a Kafka Connect worker it just needs a Kafka cluster to connect to; it shouldn't matter whether it's MSK or on-premises, since it's ultimately 'just' a consumer/producer underneath.
You can see more, including a live demo, here: https://rmoff.dev/bbuzz19-kafka-connect
For an example of configuring Kafka Connect to use a cloud-hosted Kafka platform (in this case, Confluent Cloud), see this article.
If you are interested in managed connectors in the Cloud, check out the connectors that are provided in Confluent Cloud.
Disclaimer: I work for Confluent :)
AWS now supports MSK Connect, a new feature of MSK service based on Kafka Connect allowing you to deploy managed Kafka connectors built for Kafka connect
Check the announcement here: https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/
There are two aspects to this
Kafka Connect is a framework which should be deployed separately from kafka brokers. MSK only provides kafka brokers. If you want to use Kafka Connect with MSK you would need to use EC2 instances and deploy the kafka binaries.Kafka Connect framework is bundled along with kafka
Coming to connectors if you donot have a confluent subscription or similar - I am afraid your choices get very limited. But having said you can always write your own connectors. Writing new connectors is not that difficult rather you can apply your business specific logic and be on your way quite quickly.