Use Confluent CLI in order to run several kafka brokers - apache-kafka

I wasn't able to find an answer for this, and I'm facing some difficulties while trying to implement it.
Does confluent command only is able to run multiple brokers + Zookeeper?
Is it possible to build a kafka cluster consist of 3 nodes, using Confluent CLI?
Thanks.

No, it does not support multi-node clusters.
It only runs a single localhost kafka-server-start and zookeeper-server-start commands.
Additionally, as mentioned on the documentation page
meant for development purposes only and is not suitable for a production environment. The data that are produced are transient and are intended to be temporary.
By "production environment" here, it is implied as multiple servers.

Related

Switching Kafka from Kraft to Zookeeper

I have a recent Kafka cluster which uses Kraft. I am facing some problems with it and it is possibly due to use of Kraft. I wish to switch to Zookeeper without losing data. Downtime is okay. How do I go about it?
I'm afraid there isn't a documented process to downgrade a cluster from KRaft to ZooKeeper.
If you've found an issue with KRaft, you should report it to the Kafka project via a Jira ticket so it can get fixed.
Assuming your KRaft cluster is somewhat functional, a way to preserve your data is to create a new cluster (running ZooKeeper) and use a tool like MirrorMaker to migrate your data.

How AWS MSK and Confluent Schema Registry and Confluent Kafka connect recommended to use together?

We are planning to use AWS MSK service for Managed Kafka and Schema Registry and Kafka Connect services from Confluent together to run our connectors (Elasticsearch Sink Connector). We have planned to run Schema Registry and Connectors in EC2.
As per the Confluent team, They could not officially support Confluent Schema Registry and Kafka Connect if we use MSK for Kafka.
So, Anyone can share their experience? like
if Anybuddy has used a combination of MSK and Confluent services together in the production environment?
Is there any risk in using this kind of combination?
Is it recommended or not to use this combination?
How is Confluent community support if we will face any issue with Connectors?
Any other suggestions, comments, or alternatives?
We already have a Confluent Corporate Platform license but We want to have managed Kafka service that's why we have chosen AWS MKS as it's very cost-effective than Confluent Cloud as per our analysis?
Kindly please share your thoughts and Thanks in advance.
Thanks
Objectively answering your question this is something doable but it depends where is your major pain.
From the licensing perspective there is nothing that forces you to have a Confluent subscription just to use Kafka Connect or Schema Registry, as they are based on the Apache License 2.0 and Confluent Community License respectively.
From the technical perspective you can run both Kafka Connect and Schema Registry on EC2 and; as long they are running in the same VPC that the MSK cluster they will work flawlessly.
From the cost perspective you will have to evaluate how much it costs to have Kafka Connect and Schema Registry being managed by you and/or your team. Think not only about the install and setup phase but the manage and evolve phase as well. The software might not have any cost but the effort to operate these components can be translated into cost.
How is Confluent community support if we will face any issue with Connectors?
The Kafka community is usually very helpful whether if you ask for help in the Apache Kafka users group or the community that Confluent owns in Slack. Of course, it is all about best effort and you can't rely on them to get support. It may take several days until some good Samaritan decide to help you. Which also translates to cost: how much costs being down and/or waiting for a resolution?
I am no longer a Confluent employee and therefore I won't even try to convince you to buy from them. But you should evaluate this component of cost and check if using Confluent Cloud wouldn't provide you a more cost effective solution since it includes a managed version of Kafka, Kafka Connect, and Schema Registry. In my experience, the managed Kafka on Confluent Cloud is not that costly and the managed Schema Registry is "free", but using a managed connector can be very costly and it can be worse depending of the number of tasks that you configure in the managed connector. This is the only gotcha that you ought to watch out.
AWS MSK now supports fully managed free schema registry service that easily integrates with Kafka and other AWS services like Kinesis, Glue etc. It's much easier to get started with it.

Kafka 2.0 - Multiple Kerberos Principals in KafkaConnect Connectors

We are currently using HDF (Hortonworks Dataflow) 3.3.1 which bundles Kafka 2.0.0. Problem is with running multiple connectors with different configuration(Kerberos principals) on same KafkaConnect Cluster.
As part of this Kafka version, all connectors are supposed to use same consumer/producer properties which have been set in worker configuration with consumer.* or producer.* prefix. But as I stated, we have multiple users (apps) running their own connectors and we can't use a single Kerberos principal to allow read on all topics.
So just wanted to check with experts if there is any way this security limitation can be over come. The option I can think of is - run a different Kafka-Connect cluster for each Kafka User (different principals) but what implications it could have if we run many KafkaConnect Clusters on same nodes ? Will it cause any impacts in term of resources (Java heap etc.) or this is the only way (standard procedure) to handle this.
PS: In later releases (2.3+) this problem is fixed via KAFKA-8265 and these settings can be overwritten but even if we try upgrading to latest HDF we will only get Kafka 2.1 which will not solve this issue.
Thanks for your help !!
I think upgrading is your best option to get the linked feature. As I commented, you can go get latest kafka versions on your own... Hortonworks/Cloudera doesn't offer support for Connect anyway. They'd rather you use Spark/Flink/NiFi (I think Storm is no longer around?)
what implications it could have if we run many KafkaConnect Clusters on same nodes ? Will it cause any impacts in term of resources (Java heap etc.)
Heap is the main one (for batching, sink connectors). Network and CPU load could also come into account, depending on rate of messages.
As long as the advertised ports for each cluster process aren't colliding, you should be able to use the same group ids and internal topics, though

two kafka versions running on same cluster

I am trying to configure two Kafka servers on a cluster of 3 nodes. while there is already one Kafka broker(0.8 version) already running with the application. and there is a dependency on that kafka version 0.8 that cannot be disturbed/upgraded .
Now for a POC, I need to configure 1.0.0 since my new code is compatible with this version and above...
my task is to push data from oracle to HIVE tables. for this I am using jdbc connect to fetch data from oracle and hive jdbc to push data to hive tables. it should be fast and easy way...
I need the following help
can I use spark-submit to run this data push to hive?
can I simply copy kafka_2.12-1.0.0 on my Linux server on one of the node and run my code on it. I think I need to configure my Zookeeper.properties and server.properties with ports not in use and start this new zookeeper and kafka services separately??? please note I cannot disturb existing zookeeper and kafka already running.
kindly help me achieve it.
I'm not sure running two very memory intensive applications (Kafka and/or Kafka Connect) on the same machines is considered very safe. Especially if you do not want to disturb existing applications. Realistically, a rolling restart w/ upgrade will be best for performance and feature reasons. And, no, two Kafka versions should not be part of the same cluster, unless you are in the middle of a rolling upgrade scenario.
If at all possible, please use new hardware... I assume Kafka 0.8 is even running on machines that could be old, and out of warranty? Then, there's no significant reason that I know of not to even use a newer version of Kafka, but yes, extract it on any machine you'd like, use perhaps use something like Ansible, or preferred config management tool you choose, to do it for you.
You can share the same Zookeeper cluster actually, just make sure it's not the same settings. For example,
Cluster 0.8
zookeeper.connect=zoo.example.com:2181/kafka08
Cluster 1.x
zookeeper.connect=zoo.example.com:2181/kafka10
Also, not clear where Spark fits into this architecture. Please don't use JDBC sink for Hive. Use the proper HDFS Kafka Connect sink, which has direct Hive support via the metastore. And while the JDBC source might work for Oracle, chances are, you might already be able to afford a license for GoldenGate
i am able to achieve two kafka version 0.8 and 1.0 running on the same server with respective zookeepers.
steps followed:
1. copy the version package folder to the server at desired location
2. changes configuration setting in zookeeper.properties and server.propeties(here you need to set port which are not in used on that particular server)
3. start the services and push data to kafka topics.
Note: this requirement is only for a POC and not an ideal production environment. as answered above we must upgrade to next level rather than what is practiced above.

Confluent - How to use external zookeeper instead of embedded zookeeper

I used to setup standalone Confluent Server with embedded Zookeeper(ZK). But now, my prod server has its own ZK cluster. So I want to use it instead of the embedded ZK in Confluent.
Using ksql for example. Although I can set the ZK settings of ksql to my own ZK cluster and run the embedded ZK to another port and just let it be. But I have two independent ZK which make me "uncomfortable".
How can I make the embedded ZK "disable" and all the Confluent Servers use my own ZK cluster?
The Confluent CLI is not for production use. It is intended for use on a single node development environment.
You can see recommendations for production deployment here, and information about configuring your services here.