Can Kafka Connect consume data from a separate kerberized Kafka instance and then route to Splunk? - apache-kafka

My pipeline is:
Kerberized Kafka --> Logstash (hosted on a different server) --> Splunk.
Can I replace the Logstash component with Kafka Connect?
Could you point me to a resource/guide where I can use kerberized Kafka as a source for my Kafka connect (which is hosted separately)?
From the documentation, what I understood is that if Kafka Connect is hosted on the same cluster as that of Kafka, that's quite possible. But I don't have that option right now, as our Kafka cluster is multi-tenant and hence not approved for additional processes on the cluster.

Kerberos keytabs aren't commonly machine/JVM specific, so yes, Kafka Connect should be able to be configured very similarly to Logstash since both are JVM processes using native Kafka protocol.
You shouldn't run Connect on the brokers anyway

If you can't add Kafka Connect to an existing Kafka cluster, you will have to spin up a separate Kafka Connect (Cluster or standalone).
I've written about it here: enter link description here

Related

How to expand confluent cloud kafka cluster?

I have set up a confluent cloud multizone cluster and it got created with just one bootstrap server. There was no setting for choosing number of servers while creating the cluster. Even after creation, I can’t edit the number of bootstrap servers.
I want to know how to increase the number of servers in confluent cloud kafka cluster.
Under the hood, the Confluent Cloud cluster is already running multiple brokers. Depending on your cluster configuration (specifically, whether you're running Standard or Dedicated, and what region and cloud you're in), the cluster will have between six and several dozen brokers.
The way a Kafka client bootstrap server config works is that the client reaches out to the bootstrap server and requests a list of all brokers, and then uses those broker endpoints to actually produce/consume from Kafka (reference: https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-properties-bootstrap-servers.html)
In Confluent Cloud, the provided bootstrap server is actually a load balancer in front of all of the brokers; when the client connects to the bootstrap server it'll receive the actual endpoints for all of the actual brokers, and then use that for subsequent connections.
So TL;DR, in your client, you only need to specify the one bootstrap server; under the hood, the Kafka client will connect to the (many) brokers running in Confluent Cloud, and it should all just work.
Source: I work at Confluent.

How can I show different kafka to my confluent?

I install confluent and it has own kafka.
I want to change kafka from own to another?
Which .properties or whatelse file I must change to look different kafka.
thanks in advance
In your Kafka Connect worker configuration, you need to set bootstrap.servers to point to the broker(s) on your source Kafka cluster.
You can only connect to one source Kafka cluster per Kafka Connect worker. If you need to stream data from multiple Kafka clusters, you would run multiple Kafka Connect workers.
Edit If you're using Confluent CLI then the Kafka Connect worker config is taken from etc/schema-registry/connect-avro-distributed.properties.

kafka connect mongo on kafka MSK

I am using Kafka MSK in AWS. So we don't have native kafka connect with all required connectors like on confluent.
Actually I work with kakfa mongo connector and I want to find a way to push the kafka mongo connector jar to an on an instance of kafka MSK cluster.
The path to which the jar will be pushed is the plugins.path as defined in the properties of the used connector.
ANy way to make it please ?
MSK doesn't give you a hosted Kafka Connect worker. You'd need to provision and run this yourself, e.g. on EC2. This work would then connect to your Kafka cluster (MSK in this case)
To be clear: MSK is only the hosted Kafka brokers (and Zookeeper). It does not include Kafka Connect, which is what you need in order to run connectors.

Kafka Connect with Amazon MSK

How do I use Kafka Connect adapters with Amazon MSK?
As per the AWS documentation, it supports Kafka connect but not documented about how to setup adapters and use it.
Edit Oct 2021: MSK Connect has been launched, see https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/
AFAIK Amazon MSK does not provide managed connectors, so you have to run them yourself. This is done by running the Kafka Connect worker process (a JVM) and then providing it one or more connector configurations to run.
From the point of view of a Kafka Connect worker it just needs a Kafka cluster to connect to; it shouldn't matter whether it's MSK or on-premises, since it's ultimately 'just' a consumer/producer underneath.
You can see more, including a live demo, here: https://rmoff.dev/bbuzz19-kafka-connect
For an example of configuring Kafka Connect to use a cloud-hosted Kafka platform (in this case, Confluent Cloud), see this article.
If you are interested in managed connectors in the Cloud, check out the connectors that are provided in Confluent Cloud.
Disclaimer: I work for Confluent :)
AWS now supports MSK Connect, a new feature of MSK service based on Kafka Connect allowing you to deploy managed Kafka connectors built for Kafka connect
Check the announcement here: https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/
There are two aspects to this
Kafka Connect is a framework which should be deployed separately from kafka brokers. MSK only provides kafka brokers. If you want to use Kafka Connect with MSK you would need to use EC2 instances and deploy the kafka binaries.Kafka Connect framework is bundled along with kafka
Coming to connectors if you donot have a confluent subscription or similar - I am afraid your choices get very limited. But having said you can always write your own connectors. Writing new connectors is not that difficult rather you can apply your business specific logic and be on your way quite quickly.

Kafka and Kafka Connect deployment environment

if I already have Kafka running on premises, is Kafka Connect just a configuration on top of my existing Kafka, or does Kafka Connect require it's own Server/Environment separate from that of my existing Kafka?
Kafka Connect is part of Apache Kafka, but it runs as a separate process, called a Kafka Connect Worker. Except in a sandbox environment, you would usually deploy it on a separate machine/node from your Kafka brokers.
This diagram shows conceptually how it runs, separate from your brokers:
You can run Kafka Connect on a single node, or as part of a cluster (for throughput and redundancy).
You can read more here about installation and configuration and architecture of Kafka Connect.
Kafka Connect is its own configuration on top of your bootstrap-server's configuration.
For Kafka Connect you can choose between a standalone server or distributed connect servers and you'll have to update the corresponding properties file to point to your currently running Kafka server(s).
Look under {kafka-root}/config and you'll see
You'll basically update connect-standalone or connect-distributed properties based on your need.