Confluent - How to use external zookeeper instead of embedded zookeeper

Confluent - How to use external zookeeper instead of embedded zookeeper - apache-kafka

I used to setup standalone Confluent Server with embedded Zookeeper(ZK). But now, my prod server has its own ZK cluster. So I want to use it instead of the embedded ZK in Confluent.
Using ksql for example. Although I can set the ZK settings of ksql to my own ZK cluster and run the embedded ZK to another port and just let it be. But I have two independent ZK which make me "uncomfortable".
How can I make the embedded ZK "disable" and all the Confluent Servers use my own ZK cluster?

The Confluent CLI is not for production use. It is intended for use on a single node development environment.
You can see recommendations for production deployment here, and information about configuring your services here.

Related

How to expand confluent cloud kafka cluster?

I have set up a confluent cloud multizone cluster and it got created with just one bootstrap server. There was no setting for choosing number of servers while creating the cluster. Even after creation, I can’t edit the number of bootstrap servers.
I want to know how to increase the number of servers in confluent cloud kafka cluster.

Under the hood, the Confluent Cloud cluster is already running multiple brokers. Depending on your cluster configuration (specifically, whether you're running Standard or Dedicated, and what region and cloud you're in), the cluster will have between six and several dozen brokers.
The way a Kafka client bootstrap server config works is that the client reaches out to the bootstrap server and requests a list of all brokers, and then uses those broker endpoints to actually produce/consume from Kafka (reference: https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-properties-bootstrap-servers.html)
In Confluent Cloud, the provided bootstrap server is actually a load balancer in front of all of the brokers; when the client connects to the bootstrap server it'll receive the actual endpoints for all of the actual brokers, and then use that for subsequent connections.
So TL;DR, in your client, you only need to specify the one bootstrap server; under the hood, the Kafka client will connect to the (many) brokers running in Confluent Cloud, and it should all just work.
Source: I work at Confluent.

Kafka Connect with Amazon MSK

How do I use Kafka Connect adapters with Amazon MSK?
As per the AWS documentation, it supports Kafka connect but not documented about how to setup adapters and use it.

Edit Oct 2021: MSK Connect has been launched, see https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/
AFAIK Amazon MSK does not provide managed connectors, so you have to run them yourself. This is done by running the Kafka Connect worker process (a JVM) and then providing it one or more connector configurations to run.
From the point of view of a Kafka Connect worker it just needs a Kafka cluster to connect to; it shouldn't matter whether it's MSK or on-premises, since it's ultimately 'just' a consumer/producer underneath.
You can see more, including a live demo, here: https://rmoff.dev/bbuzz19-kafka-connect
For an example of configuring Kafka Connect to use a cloud-hosted Kafka platform (in this case, Confluent Cloud), see this article.
If you are interested in managed connectors in the Cloud, check out the connectors that are provided in Confluent Cloud.
Disclaimer: I work for Confluent :)

AWS now supports MSK Connect, a new feature of MSK service based on Kafka Connect allowing you to deploy managed Kafka connectors built for Kafka connect
Check the announcement here: https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/

There are two aspects to this
Kafka Connect is a framework which should be deployed separately from kafka brokers. MSK only provides kafka brokers. If you want to use Kafka Connect with MSK you would need to use EC2 instances and deploy the kafka binaries.Kafka Connect framework is bundled along with kafka
Coming to connectors if you donot have a confluent subscription or similar - I am afraid your choices get very limited. But having said you can always write your own connectors. Writing new connectors is not that difficult rather you can apply your business specific logic and be on your way quite quickly.

Is it possible to monitor kafka producer/consumer metrics in DCOS?

I am trying to do this without using Confluent Control Center, since I do not have a license.
I am able to see the Kafka Broker metrics by using dcos task metrics details <broker-id> and see that all of these are already exposed on my DCOS Prometheus instance.
However, I do not see any consumer/producer metrics available on Prometheus, despite having some producers/consumer tasks on dcos.
Is there a process I can follow to expose kafka prodcuer/consumer metrics on dcos? I tried the following https://github.com/ibm-cloud-architecture/refarch-eda/blob/master/docs/kafka/monitoring.md .
But from my understanding we cannot use JMX on a Kafka instance hosted on DCOS (yet) (soruce: https://jira.mesosphere.com/browse/DCOS_OSS-3632?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel)
Any ideas?

You would have to add Prometheus JMX exporters to each of your Kafka Java processes, and then you would need to have the Prometheus server be able to scrape those. You would do this by downloading that JAR in each of the processes (containers?), then editing the KAFKA_OPTS environment varible to include the -javaagent option
AFAIK, this does not require setting up a remotely accessible JMX port.
Note: Control Center doesn't monitor JMX values. It uses Kafka MetricsReporters and Interceptors. Use of these interfaces, if you chose to write your own, or find others, doesn't require Control Center at all.

How to use the Web Interface of Confluent Control Center?

I deployed confluent on a Single cloud on a Linux VM and I'm planning to configure interceptors for the Consumers.
But how do I configure Control Center in order to access its Web-Interface?
Deployed Confluent Control Center Enterprise Edition.

Not very clear what issues you're having.
Edit the server.properties file. Add the bootstrap servers, Zookeeper quorum, and optionally the connect cluster.
With that, you should at least be able to see the topics in the cluster
Run the control-center binary with the server property file as a parameter.
See additional setup steps required for the clients to get monitoring to work at the documentation

Best Practices for Kafka Cluster Deployment Configuration?

I'm asking for general best practices here:
If I want a five node cluster, do all five nodes run the Confluent Platform Umbrella Packages that include Zookeeper, Kafka, schema-registry?
Is it ever recommended to run the zookeper cluster on separate servers from the Kafka cluster?
If I want to run the Kafka Connect distributed worker, do I run that on all cluster nodes? Do I ever want to run on separate servers? Is Docker recommended for this or is Docker unnecessary?
With Kafka Streaming apps, should they be run on all cluster nodes? Should they be dockerized? Should they ever run on separate nodes?
Is something like Mesos recommended?

It is a best practice to run Kafka Brokers on dedicated servers (or virtual servers). The same is true of Zookeeper.
All the other components of the Confluent Platform can run colocated on common servers or on separate machines.
You would typically run only one Schema Registry (or two if you want fault tolerance). They can run on any machine that can connect back to the Kafka Brokers.
Kafka Connect distributed workers only need to run on machines that you want to host Kafka Connectors. They just need to be able to connect back to the Kafka Brokers.
Kafka Streams apps can run anywhere you want so long as they can connect back to the Kafka Brokers.
All components can run inside docker containers or without docker.
You can use whatever microservices or data center resource management tools you want (or none at all) - it is your choice.