Retrieve secrets from AWS Secrets manager in Confluent ksqlDB - apache-kafka

I am trying to create in Confluent Cloud (Kafka) a MongoDB connector sink with ksqlDB. The problem is that I have the data source and credentials in the AWS Secrets manager.
Is there a way to obtain the secrets with ksqlDB to set the connector properties?

Kafka Connect supports Externalized config for secrets. Whether such an implement exists for AWS, I am not sure, but if not, you'll need to write your own ConfigProvider for it.
Alternatively, there may be alternative solutions like running ksql or just Connect itself in MSK Connect, ECS, EC2, or EKS where you write processes around exposing Secrets Manager data into files or environment variables, which can then be used by Connect's default config providers, then setup ksql externally to point at those Connect instances, or just process the topics it outputs

Related

Schema Registry URL for IIDR CDC Kafka subscription

I have created a cluster Amazon MSK. Also, created an EC2 instance and installed Kafka on it to create a topic in Amazon MSK. I am able to produce/consume messages on the topic using Kafka scripts.
I have also installed the IIDR Replication agent on an EC2 instance. The plan is to migrate DB2 table data into the Amazon MSK topic.
In the IDR Management console, I am able to add the IIDR replication server as the target.
Now when creating the subscription, it is asking for ZooKeeper URL and Schema Registry URL. I can get the Zookeeper endpoints from Amazon MSK.
What value to provide for the schema registry URL as there's none created?
Thanks for your help.
If you do not need to specify a schema registry because say you are using a KCOP that generate JSON, just put in a dummy value. Equally if you are specifying a list of Kafka brokers in the kafkaconsumer.propertie and the kafkaproducer.properties files in the CDC instance.conf directory you can put in dummy values for the zookeeper fields.
Hope this helps
Robert

Confluent Cloud Kafka - Audit Log Cluster : Sink Connector

For Kafka cluster hosted in Confluent Cloud, there is an Audit Log cluster that gets created. It seems to be possible to hook a Sink connector to this cluster and drain the events out from "confluent-audit-log-events" topic.
However, I am running into the below error when I run the connector to do the same.
org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [connect-offsets]
In my connect-distributed.properties file, I have the settings as :
offset.storage.topic=connect-offsets
offset.storage.replication.factor=3
offset.storage.partitions=3
What extra permission/s needs to be granted so that the connector can create the required topics in the cluster? The key/secret being used in the connect-distributed.properties files is a valid key/secret that is associated to the service account for this cluster.
Also, when I run the consumer in the console using the same key (as above) , I am able to read the audit log events just fine.
It's confirmed that this feature (hooking up a connector to the Audit Log cluster) is not supported at the moment in Confluent Cloud. This feature may be available later this year at some point.

Using Kafka Security Manager for ACL for Schema Registry

I have a Kafka cluster running with Zookeeper, Confluent Schema registry and Kafka security manager(KSM). KSM, https://github.com/conduktor/kafka-security-manager, is software makes it easy to manager Kafka ACL with a csv file instead of using the command line tool.
The confluent schema registry let us store Avro schema for Kafka. It is currently open and I need to secured it. I want to give every user the READ or GET permission only. I am currently using kubernetes to deploy all the tools.
How can I do that with KSM? Where can I find examples?
Thank you
Kafka ACLs don't apply to the Schema Registry, they would apply to the underlying _schemas topic, which you'd setup in the Registry's configuration
The API itself can be secured using TLS and HTTP Authentication
https://docs.confluent.io/platform/current/schema-registry/security/index.html
give every user the READ or GET permission only.
I don't think you can lock down HTTP method level access to specific users, you'll likely need a proxy for this, but also without POST, there's no way to register topics...

Is there a way to dump Amazon MSK Topic to S3 directly?

I have planned to used Amazon MSK and i want to dump consumer logs to S3 . But i don't see any options. Do i need to write my own consumer or is there a way to consume Amazon MSK consumer output to s3 directly ?
Kafka Connect is generally the best (easiest/scalable/portable/resilient) way to get data between Kafka and systems down (and up) stream such as S3. Learn more about Kafka Connect here and in this talk here.
MSK Connect can run Kafka Connect workloads for your MSK on AWS.
Another option you have is to run your own Kafka Connect worker (which connects to MSK) and use the S3 sink connector (tutorial).
There is not a direct way to do it from MSK. You can use an external consumer to do it or preferably use KafkaConnect in an EC2 within the same VPC as MSK.
Either way you need to consider for high availability and data transfer costs. For HA, use consumers in different AZs. For costs, use MSK 2.4.1 that allows consumers to fetch data from the closest replica.

Kafka Connect with Amazon MSK

How do I use Kafka Connect adapters with Amazon MSK?
As per the AWS documentation, it supports Kafka connect but not documented about how to setup adapters and use it.
Edit Oct 2021: MSK Connect has been launched, see https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/
AFAIK Amazon MSK does not provide managed connectors, so you have to run them yourself. This is done by running the Kafka Connect worker process (a JVM) and then providing it one or more connector configurations to run.
From the point of view of a Kafka Connect worker it just needs a Kafka cluster to connect to; it shouldn't matter whether it's MSK or on-premises, since it's ultimately 'just' a consumer/producer underneath.
You can see more, including a live demo, here: https://rmoff.dev/bbuzz19-kafka-connect
For an example of configuring Kafka Connect to use a cloud-hosted Kafka platform (in this case, Confluent Cloud), see this article.
If you are interested in managed connectors in the Cloud, check out the connectors that are provided in Confluent Cloud.
Disclaimer: I work for Confluent :)
AWS now supports MSK Connect, a new feature of MSK service based on Kafka Connect allowing you to deploy managed Kafka connectors built for Kafka connect
Check the announcement here: https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/
There are two aspects to this
Kafka Connect is a framework which should be deployed separately from kafka brokers. MSK only provides kafka brokers. If you want to use Kafka Connect with MSK you would need to use EC2 instances and deploy the kafka binaries.Kafka Connect framework is bundled along with kafka
Coming to connectors if you donot have a confluent subscription or similar - I am afraid your choices get very limited. But having said you can always write your own connectors. Writing new connectors is not that difficult rather you can apply your business specific logic and be on your way quite quickly.