Kafka Connect with Amazon MSK - apache-kafka

How do I use Kafka Connect adapters with Amazon MSK?
As per the AWS documentation, it supports Kafka connect but not documented about how to setup adapters and use it.

Edit Oct 2021: MSK Connect has been launched, see https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/
AFAIK Amazon MSK does not provide managed connectors, so you have to run them yourself. This is done by running the Kafka Connect worker process (a JVM) and then providing it one or more connector configurations to run.
From the point of view of a Kafka Connect worker it just needs a Kafka cluster to connect to; it shouldn't matter whether it's MSK or on-premises, since it's ultimately 'just' a consumer/producer underneath.
You can see more, including a live demo, here: https://rmoff.dev/bbuzz19-kafka-connect
For an example of configuring Kafka Connect to use a cloud-hosted Kafka platform (in this case, Confluent Cloud), see this article.
If you are interested in managed connectors in the Cloud, check out the connectors that are provided in Confluent Cloud.
Disclaimer: I work for Confluent :)

AWS now supports MSK Connect, a new feature of MSK service based on Kafka Connect allowing you to deploy managed Kafka connectors built for Kafka connect
Check the announcement here: https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/

There are two aspects to this
Kafka Connect is a framework which should be deployed separately from kafka brokers. MSK only provides kafka brokers. If you want to use Kafka Connect with MSK you would need to use EC2 instances and deploy the kafka binaries.Kafka Connect framework is bundled along with kafka
Coming to connectors if you donot have a confluent subscription or similar - I am afraid your choices get very limited. But having said you can always write your own connectors. Writing new connectors is not that difficult rather you can apply your business specific logic and be on your way quite quickly.

Related

Can Kafka Connect consume data from a separate kerberized Kafka instance and then route to Splunk?

My pipeline is:
Kerberized Kafka --> Logstash (hosted on a different server) --> Splunk.
Can I replace the Logstash component with Kafka Connect?
Could you point me to a resource/guide where I can use kerberized Kafka as a source for my Kafka connect (which is hosted separately)?
From the documentation, what I understood is that if Kafka Connect is hosted on the same cluster as that of Kafka, that's quite possible. But I don't have that option right now, as our Kafka cluster is multi-tenant and hence not approved for additional processes on the cluster.
Kerberos keytabs aren't commonly machine/JVM specific, so yes, Kafka Connect should be able to be configured very similarly to Logstash since both are JVM processes using native Kafka protocol.
You shouldn't run Connect on the brokers anyway
If you can't add Kafka Connect to an existing Kafka cluster, you will have to spin up a separate Kafka Connect (Cluster or standalone).
I've written about it here: enter link description here

How AWS MSK and Confluent Schema Registry and Confluent Kafka connect recommended to use together?

We are planning to use AWS MSK service for Managed Kafka and Schema Registry and Kafka Connect services from Confluent together to run our connectors (Elasticsearch Sink Connector). We have planned to run Schema Registry and Connectors in EC2.
As per the Confluent team, They could not officially support Confluent Schema Registry and Kafka Connect if we use MSK for Kafka.
So, Anyone can share their experience? like
if Anybuddy has used a combination of MSK and Confluent services together in the production environment?
Is there any risk in using this kind of combination?
Is it recommended or not to use this combination?
How is Confluent community support if we will face any issue with Connectors?
Any other suggestions, comments, or alternatives?
We already have a Confluent Corporate Platform license but We want to have managed Kafka service that's why we have chosen AWS MKS as it's very cost-effective than Confluent Cloud as per our analysis?
Kindly please share your thoughts and Thanks in advance.
Thanks
Objectively answering your question this is something doable but it depends where is your major pain.
From the licensing perspective there is nothing that forces you to have a Confluent subscription just to use Kafka Connect or Schema Registry, as they are based on the Apache License 2.0 and Confluent Community License respectively.
From the technical perspective you can run both Kafka Connect and Schema Registry on EC2 and; as long they are running in the same VPC that the MSK cluster they will work flawlessly.
From the cost perspective you will have to evaluate how much it costs to have Kafka Connect and Schema Registry being managed by you and/or your team. Think not only about the install and setup phase but the manage and evolve phase as well. The software might not have any cost but the effort to operate these components can be translated into cost.
How is Confluent community support if we will face any issue with Connectors?
The Kafka community is usually very helpful whether if you ask for help in the Apache Kafka users group or the community that Confluent owns in Slack. Of course, it is all about best effort and you can't rely on them to get support. It may take several days until some good Samaritan decide to help you. Which also translates to cost: how much costs being down and/or waiting for a resolution?
I am no longer a Confluent employee and therefore I won't even try to convince you to buy from them. But you should evaluate this component of cost and check if using Confluent Cloud wouldn't provide you a more cost effective solution since it includes a managed version of Kafka, Kafka Connect, and Schema Registry. In my experience, the managed Kafka on Confluent Cloud is not that costly and the managed Schema Registry is "free", but using a managed connector can be very costly and it can be worse depending of the number of tasks that you configure in the managed connector. This is the only gotcha that you ought to watch out.
AWS MSK now supports fully managed free schema registry service that easily integrates with Kafka and other AWS services like Kinesis, Glue etc. It's much easier to get started with it.

Is there a way to dump Amazon MSK Topic to S3 directly?

I have planned to used Amazon MSK and i want to dump consumer logs to S3 . But i don't see any options. Do i need to write my own consumer or is there a way to consume Amazon MSK consumer output to s3 directly ?
Kafka Connect is generally the best (easiest/scalable/portable/resilient) way to get data between Kafka and systems down (and up) stream such as S3. Learn more about Kafka Connect here and in this talk here.
MSK Connect can run Kafka Connect workloads for your MSK on AWS.
Another option you have is to run your own Kafka Connect worker (which connects to MSK) and use the S3 sink connector (tutorial).
There is not a direct way to do it from MSK. You can use an external consumer to do it or preferably use KafkaConnect in an EC2 within the same VPC as MSK.
Either way you need to consider for high availability and data transfer costs. For HA, use consumers in different AZs. For costs, use MSK 2.4.1 that allows consumers to fetch data from the closest replica.

kafka connect mongo on kafka MSK

I am using Kafka MSK in AWS. So we don't have native kafka connect with all required connectors like on confluent.
Actually I work with kakfa mongo connector and I want to find a way to push the kafka mongo connector jar to an on an instance of kafka MSK cluster.
The path to which the jar will be pushed is the plugins.path as defined in the properties of the used connector.
ANy way to make it please ?
MSK doesn't give you a hosted Kafka Connect worker. You'd need to provision and run this yourself, e.g. on EC2. This work would then connect to your Kafka cluster (MSK in this case)
To be clear: MSK is only the hosted Kafka brokers (and Zookeeper). It does not include Kafka Connect, which is what you need in order to run connectors.

Kafka and Kafka Connect deployment environment

if I already have Kafka running on premises, is Kafka Connect just a configuration on top of my existing Kafka, or does Kafka Connect require it's own Server/Environment separate from that of my existing Kafka?
Kafka Connect is part of Apache Kafka, but it runs as a separate process, called a Kafka Connect Worker. Except in a sandbox environment, you would usually deploy it on a separate machine/node from your Kafka brokers.
This diagram shows conceptually how it runs, separate from your brokers:
You can run Kafka Connect on a single node, or as part of a cluster (for throughput and redundancy).
You can read more here about installation and configuration and architecture of Kafka Connect.
Kafka Connect is its own configuration on top of your bootstrap-server's configuration.
For Kafka Connect you can choose between a standalone server or distributed connect servers and you'll have to update the corresponding properties file to point to your currently running Kafka server(s).
Look under {kafka-root}/config and you'll see
You'll basically update connect-standalone or connect-distributed properties based on your need.