Run Kafka and Kafka-connect on different servers? - apache-kafka

I want to know if Kafka and Kafka-connect can run on different servers? So a connector would be started on server A and send data from a kafka topic on server B to HDFS or S3 etc. Thanks

Yes, and for Production deployments this is typically recommended for resource reasons. Generally you'd deploy a cluster of Kafka Brokers (3+ for HA), and then a cluster of Kafka Cluster workers (as many as needed for throughput capacity / resilience) -- all on separate nodes.
For more details, see the Confluent Enterprise Reference Architecture.

Yes, you can do it.
I have my set of kafka servers and kafka connect applications are running in different machines and writing data in hdfs. you have to mention list of brokers in bootstrap.servers under worker properties file (config/connect-distributed.properties or config/connect-standalone.properties) instead of localhost:9092

Related

In Kafka Connect, how to connect with multiple kafka clusters?

I set the kafka connect cluster in distributed mode and I wanna get connections with multiple kafka CLUSTERS, not just multiple brokers.
Target brokers can be set with bootstrap.servers in connect-distributed.properties.
So, at first, I set broker1 from kafka-cluster-A like below:
bootstrap.servers=broker1:9092
Absolutely, it worked well.
And then, I added broker2 from kafka-cluster-B like below:
bootstrap.servers=broker1:9092,broker2:9092
So, these two brokers are in the different clusters.
And this didn't work at all.
Without any error, it was just stuck and there was no answer with the request like creating connector through the REST API.
How can I connect with multiple kafka clusters?
As far as I know, you can only connect a Kafka Connect worker to one Kafka cluster.
If you have data on different clusters that you want to handle with Kafka Connect then run multiple Kafka Connect worker processes.

How can I show different kafka to my confluent?

I install confluent and it has own kafka.
I want to change kafka from own to another?
Which .properties or whatelse file I must change to look different kafka.
thanks in advance
In your Kafka Connect worker configuration, you need to set bootstrap.servers to point to the broker(s) on your source Kafka cluster.
You can only connect to one source Kafka cluster per Kafka Connect worker. If you need to stream data from multiple Kafka clusters, you would run multiple Kafka Connect workers.
Edit If you're using Confluent CLI then the Kafka Connect worker config is taken from etc/schema-registry/connect-avro-distributed.properties.

2 cluster of zookeper servers in hadoop+kafka cluster - is it posible?

We have Kafka cluster with the following details
3 kafka machines
3 zookeeper servers
We also have Hadoop cluster that includes datanode machines
And all application are using the zookeeper servers, including the kafka machines
Now
We want to do the following changes
We want to add additional 3 zookeeper servers that will be in a separate cluster
And only kafka machine will use this additional zookeeper servers
Is it possible ?
Editing the ha.zookeeper.quorum in Hadoop configurations to be separate from zookeeper.connect in Kafka configurations, such that you have two individual Zookeeper clusters, can be achieved, yes.
However, I don't think Ambari or Cloudera Manager, for example, allow you to view or configure more than one Zookeeper cluster at a time.
Yes, that's possible. Kafka uses Zookeeper to perform various distributed coordination tasks, such as deciding which Kafka broker is responsible for allocating partition leaders, and storing metadata on topics in the broker.
After closing kafka, the original zookeeper cluster data will be copied to the new cluster using tools, this is a zookeeper cluster data transfer util zkcopy
But if your Kafka cluster didn't stop work, you should think about Zookeeper data transfer to additional zookeeper servers.

Kafka and Kafka Connect deployment environment

if I already have Kafka running on premises, is Kafka Connect just a configuration on top of my existing Kafka, or does Kafka Connect require it's own Server/Environment separate from that of my existing Kafka?
Kafka Connect is part of Apache Kafka, but it runs as a separate process, called a Kafka Connect Worker. Except in a sandbox environment, you would usually deploy it on a separate machine/node from your Kafka brokers.
This diagram shows conceptually how it runs, separate from your brokers:
You can run Kafka Connect on a single node, or as part of a cluster (for throughput and redundancy).
You can read more here about installation and configuration and architecture of Kafka Connect.
Kafka Connect is its own configuration on top of your bootstrap-server's configuration.
For Kafka Connect you can choose between a standalone server or distributed connect servers and you'll have to update the corresponding properties file to point to your currently running Kafka server(s).
Look under {kafka-root}/config and you'll see
You'll basically update connect-standalone or connect-distributed properties based on your need.

Best Practices for Kafka Cluster Deployment Configuration?

I'm asking for general best practices here:
If I want a five node cluster, do all five nodes run the Confluent Platform Umbrella Packages that include Zookeeper, Kafka, schema-registry?
Is it ever recommended to run the zookeper cluster on separate servers from the Kafka cluster?
If I want to run the Kafka Connect distributed worker, do I run that on all cluster nodes? Do I ever want to run on separate servers? Is Docker recommended for this or is Docker unnecessary?
With Kafka Streaming apps, should they be run on all cluster nodes? Should they be dockerized? Should they ever run on separate nodes?
Is something like Mesos recommended?
It is a best practice to run Kafka Brokers on dedicated servers (or virtual servers). The same is true of Zookeeper.
All the other components of the Confluent Platform can run colocated on common servers or on separate machines.
You would typically run only one Schema Registry (or two if you want fault tolerance). They can run on any machine that can connect back to the Kafka Brokers.
Kafka Connect distributed workers only need to run on machines that you want to host Kafka Connectors. They just need to be able to connect back to the Kafka Brokers.
Kafka Streams apps can run anywhere you want so long as they can connect back to the Kafka Brokers.
All components can run inside docker containers or without docker.
You can use whatever microservices or data center resource management tools you want (or none at all) - it is your choice.