Kafka and Kafka Connect deployment environment - apache-kafka

if I already have Kafka running on premises, is Kafka Connect just a configuration on top of my existing Kafka, or does Kafka Connect require it's own Server/Environment separate from that of my existing Kafka?

Kafka Connect is part of Apache Kafka, but it runs as a separate process, called a Kafka Connect Worker. Except in a sandbox environment, you would usually deploy it on a separate machine/node from your Kafka brokers.
This diagram shows conceptually how it runs, separate from your brokers:
You can run Kafka Connect on a single node, or as part of a cluster (for throughput and redundancy).
You can read more here about installation and configuration and architecture of Kafka Connect.

Kafka Connect is its own configuration on top of your bootstrap-server's configuration.
For Kafka Connect you can choose between a standalone server or distributed connect servers and you'll have to update the corresponding properties file to point to your currently running Kafka server(s).
Look under {kafka-root}/config and you'll see
You'll basically update connect-standalone or connect-distributed properties based on your need.

Related

Using Kafka Connect in distributed mode, where are internal topics supposed to exist

As a follow up to my previous question here Attempting to run Kafka Connect in distributed mode locally, problem with internal topics, I have started to figure out what might really be going on (I'm learning Kafka as I go).
Kafka Connect, one way or another, requires three internal topics: config, offset, and status. Are these topics supposed to exist in the Kafka cluster where I am consuming data from? For context, what I'm doing is someone else has a Kafka cluster set up that has topics (messages?) for me to consume. I spin up a Kafka Connect cluster on my local machine (to test) and this local instance (we'll call it that going forward) then connects to the remote Kafka cluster (we'll call it the remote cluster) by way of me typing in the bootstrap servers, some callback handler classes, and a connect.jaas file.
Do these three topics need to already exist on the remote cluster? Here I have been trying to create them on my own broker on my local instance, but through continued research, I'm seeing maybe these three internal topics need to be on the remote cluster (where I'm getting my data from). Does the owner of the remote Kafka cluster need to create these three topics for me? Where would they create them exactly? What if their cluster is not a Kafka Connect cluster specifically?
The topics need to be created on the cluster defined by bootstrap.servers in the Connect worker properties. This can be local or remote, depending on what data you actually want the connector tasks to send/receive. Individual connect tasks cannot override what brokers are being used (not possible to use a source connector to write to multiple Kafka clusters, for example)
Latest versions of Kafka Connect will automatically create those internal topics, if it is authorized to do so. Otherwise, yes, they'll need to be created using kafka-topics --create with appropriate partition counts and replication factors.
If your data exists in a remote Kafka cluster, the only reason to run a local instance is if you want to use MirrorMaker, for example.
What if their cluster is not a Kafka Connect cluster specifically?
Unclear what this means. Kafka Connect is a client just like a Kafka Streams app or normal producer or consumer. It doesn't store topics itself.

Configuring kafka connect with multi brokers

Steps
I have used two kafka brokers and I have started zookeeper,kafka server and kafka connect services.
I have one source type kafka connector which can be used for getting data from Database.
If i start the connector[connector 1] by using the rest API, then it will hit any one kafka server [Server 1] using load balancer.After that server 1 will store and running the connector.But server 2 does not know the connector [connector 1] which is running in the server 1.
Expectation
So if the kafka server 1 is down, then the another kafka server 2 should be able to run the connector in the failed kafka server 1.
While starting the connector, kafka server should know how many connectors are in running, so that if any one broker failed to do the job then another server will be able to continue the job.
Reality
Another Kafka server 2 which is not doing the job as per the requirement.
is there any thing to make it by configuration setup with kafka?.
Kindly suggest me some ideas.
Kafka Server 1
Kafka Server 2
It appears that you have started all processes in single pods.
You should run Kafka, Zookeeper, and Connect all as separate services in different pods.
I suggest you refer the Confluent or Strimzi sites to find Kafka Kubernetes Helm Charts / Operators
But to answer the question - You could give one or more broker to connect-distributed.properties bootstrap.server value. Then each broker is connected to as part of the Kafka cluster, and will reconnect in the event that one broker is unavailable
"Kakfa servers" (brokers) do not run Connectors
If you want to run a cluster of connect workers, you also need to setup their rest.advertised.listener address so that they can communicate with each other.

Can Kafka Connect consume data from a separate kerberized Kafka instance and then route to Splunk?

My pipeline is:
Kerberized Kafka --> Logstash (hosted on a different server) --> Splunk.
Can I replace the Logstash component with Kafka Connect?
Could you point me to a resource/guide where I can use kerberized Kafka as a source for my Kafka connect (which is hosted separately)?
From the documentation, what I understood is that if Kafka Connect is hosted on the same cluster as that of Kafka, that's quite possible. But I don't have that option right now, as our Kafka cluster is multi-tenant and hence not approved for additional processes on the cluster.
Kerberos keytabs aren't commonly machine/JVM specific, so yes, Kafka Connect should be able to be configured very similarly to Logstash since both are JVM processes using native Kafka protocol.
You shouldn't run Connect on the brokers anyway
If you can't add Kafka Connect to an existing Kafka cluster, you will have to spin up a separate Kafka Connect (Cluster or standalone).
I've written about it here: enter link description here

How can I show different kafka to my confluent?

I install confluent and it has own kafka.
I want to change kafka from own to another?
Which .properties or whatelse file I must change to look different kafka.
thanks in advance
In your Kafka Connect worker configuration, you need to set bootstrap.servers to point to the broker(s) on your source Kafka cluster.
You can only connect to one source Kafka cluster per Kafka Connect worker. If you need to stream data from multiple Kafka clusters, you would run multiple Kafka Connect workers.
Edit If you're using Confluent CLI then the Kafka Connect worker config is taken from etc/schema-registry/connect-avro-distributed.properties.

Setup kafka-connect to fetch data from remote brokers

I'm trying to set up Kafka connect sink connector. Kafka connect is part of Kafka connect worker (confluent-3.2.0). I have a Kafka broker (confluent-3.2.0) up and running on machine A. I want to set up Kafka-connect-sink connector on another machine B to consume messages, using a custom Kafka-connect-sink connector jar. Assume that Kafka broker and Zoo keeper ports on machine A are open to machine B.
So should I install/setup confluent-3.2.0 on machine B (Since Kafka Connect is part of Kafka package) by setting the classpath to the Kafka-connect-sink connector jar and run the following command?
./bin/connect-distributed.sh worker.properties
Yes. What you describe will work and is the easiest way to setup this system even though on machine B you really only need the start script, the configuration properties file, the jars for Kafka Connect, and the jars for the custom connector.