Should zookeeper be run on independent machines? in production environment - apache-kafka

in practice, I would like to build a server using the zookeeper and Kafka.
However, I heard that the zookeeper and Kafka should be built separately. I wonder why they should be built separately.

Related

Is it possible to install Confluents open source version of Kafka Connect to work with non-confluent platform (plain open source Kafka)?

I have a Kafka cluster that I work with which is managed by my team and runs on Kubernetes. We want to install the Kafka connect via helm into our cluster to work with our Kafka. This Kafka we are running is NOT the confluent platform Kafka. Is there a good way to do this? I was wondering if this would work cp-helm-charts. Will using the confluentinc Kafka connect container be compatible with my Kafka cluster that is on non-confluent platform?
Kafka Connect has never been labelled as a Confluent Platform exclusive product.
The Framework is entirely Apache 2.0 Licensed and Open Source.
Similarly, "Confluent Platform Kafka" is just Apache Kafka

Kafka - Confluent Hub - Exploit only part of it

I already saw a similar question in SO, but not clearly answer my doubts.
We have different Kafka clusters and lot of exploitation operational habits around it. We have our way to start/stop the cluster, lots of exploit scripts that help maintain the cluster etc..
Now we would like to use Kafka connect connectors for new needs, but from what I saw, Kafka connect is extremely coupled to confluent-hub.
It's like I can't even use the connectors without having to install a full operational confluent-hub.
This makes it very difficult for us to use Kafka connect connectors, I understand that confluent-hub might be a framework that help running those connectors, but it's like we can't even use a dissociated Kafka cluster ( a one not exploited by confluent-hub..).
But maybe I miss something..
Do you know if there is any way to use properly Kafka connectors on a already existing Kafka cluster ( completely independent from confluent-hub) ?
EDITED :
It's more a question regarding the high coupled behaviour between confluent-hub and Kafka-connect. All the features that comes with Kafka connect ( distributed workers to handle different fail over scenarios, etc..) are not usable without confluent-hub, thus a "need" to have Kafka cluster running exclusively via confluent-hub, which is not an easy task when you already have an existing big Kafka cluster with lots of OPS habits on it.
Kafka Connect is part of Apache Kafka. It's a pluggable framework for streaming integration between systems in and out of Kafka.
To use Kafka Connect you need connectors for the specific technology with which you want to integrate. For example, S3 sink, Elasticsearch sink, JDBC source or sink, and so on.
The connector API is part of Apache Kafka, and available for anyone who wants to develop a connector.
Connectors are written by various people and organisations, and available in various different ways. How you obtain a connector depends on which connector you want, how its licensed, and how the author has made it available for distribution. It could be you go to github, clone the repo and build the JAR. It could be you can download the JAR directly.
All that Confluent Hub does is make lots of these connectors available for you in one place, easily searchable, and with an optional CLI tool that will install them for you.
Do you have to use Confluent Hub? No, not at all. Might it make your life easier in locating connectors that you want to use, and make it easier to install them? Hopefully :)
Disclaimer: I work for Confluent.

Upgrading Kafka client from 0.8.2.0 to 0.11.0.0

Currently, at my company we are migrating from Kafka 0.8 to 0.11, brokers migration steps and clearly stated in kafka documentation here
What I am stuck in is, upgrading the kafka clients (producers, consumers, spark-streaming), I don't find any documentation/ articles listing out clearly what are the required changes or steps to follow to upgarde the client, all what I found is the java doc Producer Client
What I did so far is to change the kafka client version in my gradle to kafka-clients-0.11.0.0, and everything from the compilation point of view went fine with no code changes at all.
What I seek help with is, is there any expected problems I should take care of, any pointers for client changes other than the kafka-client version?
I went through lots of experiments to get this done.
For the consumers and producers, I just used the kafka consumers and producers 0.11.0.
The trick part was replacing spark-streaming, spark-streaming latest version only support upto kafka 0.10.X, which doesn't contains any updates related to the new broker.
What I recommend here, if you are about to write an application from scratch and your main goal is realtime streaming go for kafka-streaming API, it is just AWESOME!, if you already have spark streaming app (which was my case), you should either judge which is more important than the other wether to get stuck with the kafka-broker version 10.X and spark-streaming which was [experimental][1] btw.
The benefits of having the streaming inside kafka not spark the following:
Kafka streaming is a normal jar that can be injected in any java application, so you don't care that much about deployment, and environment
Auto-scaling is so easy when using kafka-streaming using any scaleset provided by any cloud service provider, unlike scaling a HDP cluster.
Monitoring using something like prometheus would be much easier.

Zookeeper/Kafka with Tomcat - Possible At All?

I was wondering if someone has used zookeeper/kafka embeeded within Tomcat. I know that Kafka requires Zookeeper, but does it mean that I have to run Kafka and Zookeeper as separate instances? So far I cannot see any use cases where everything has been bolt in. Could anyone advise?
My question is more around the concept of using zookeeper and kafka as a jar within the same tomcat web application.
Both Kafka and Zookeeper are meant to be used in a stand-alone fashion, run as separate processes.
They should even be on different machines/vms/containers than the tomcat web application.
You also probably want a Zookeeper cluster of 3-5 machines, rather than a single one, at least for production.
Both of them have Java clients though, for you to interact from the web application with them, and those are OK to include.

Connect Confluent with already existing three kafka brokers

I'm new in Confluent world, and I know how to start kafka, zookeepers from confluent, but it's not that what I need.
I have already 3 kafka nodes and 2 zookeepers installed by Ambari. Afterwards I downloaded 3.0.0 version of Confluent and now I want to connect Confluent with already running Kafka and zookeeper. I don't want to instance new kafka server or zookeeper server which confluent is giving.
Does anyone has an idea how to accomplish that, what to actually run from Confluent and what to change.
By now I was only chaning files in ./etc/kafka or ./etc/zookeeper which are in Confluent dir. Thank you!
clarify some basics about Confluent and how manage communication between Confluent and Kafka
First things first, there is no single application called "Confluent" that can be started all on its own.
There's is nothing to configure for Kafka or Zookeeper. The Confluent Platform doesn't add anything on top of the existing Apache Kafka you have (presumably, via Hortonworks or Cloudera).
In fact, those companies do add patches to Kafka that would be slightly different than the base Apache versions you would get from Confluent.
That being said, if you read through each of extra services that Confluent provides, you'll notice either a Zookeeper or a Bootstrap server configuration option. Fill out those fields, start the respective services, and you're good to go.
what to actually run from Confluent
Look in the bin directory, you can find all the start scripts. From the comments, looks like you're trying to use Connect Distributed (which should already be installed by any recent Kafka installation, it's not Confluent specific), and Schema Registry. You'll have to be more specific about the errors that you get, but the config files are all in the etc path.
Unless you're using KSQL, REST Proxy or Control Center, there's not much to run because, as mentioned, Kafka Connect is included with the base Apache Kafka project and Hortonworks is maintaining their own Schema Registry project
2 zookeepers installed by Ambari
This is a highly non-recommended setup. Please install an odd number of Zookeepers. 3 or 5, preferably