Apache Kafka consumer groups and microservices running on Kubernetes, are they compatible? - apache-kafka

So far, I have been using Spring Boot apps (with Spring Cloud Stream) and Kafka running without any supporting infrastructure (PaaS).
Since our corporate platform is running on Kubernetes we need to move those Spring Boot apps into K8s to allow the apps to scale and so on. Obviously there will be more than one instance of every application so we will define a consumer group per application to ensure the unique delivery and processing of every message.
Kafka will be running outside Kubernetes.
Now my doubt is: since the apps deployed on k8s are accessed through the k8s service that abstracts the underlying pods, and individual application pods can't be access directly outside of the k8s cluster, Kafka won't know how to call individual instances of the consumer group to deliver the messages, will it?
How can I make them work together?

Kafka brokers do not push data to clients. Rather clients poll() and pull data from the brokers. As long as the consumers can connect to the bootstrap servers and you set the Kafka brokers to advertise an IP and port that the clients can connect to and poll() then it will all work fine.

Can Spring Cloud Data Flow solve your requirement to control the number of instances deployed?
and, there is a community released Spring Cloud Data Flow server for OpenShift:
https://github.com/donovanmuller/spring-cloud-dataflow-server-openshift

Related

Triggering kubernetes job for a kafka message

I have a kubernetes service that only does something when it consumes a message from a Kafka queue. The queue does not have messages very often, and running the service as a job triggered whenever a message is found would save resources.
I see that Kubernetes has this functionality for AMQP-type message services: https://kubernetes.io/docs/tasks/job/coarse-parallel-processing-work-queue/
Is there a way to adapt this for Kafka, given that Kafka does not support AMQP? I'd switch to a different messaging system, but I have other services that also read from this queue that require Kafka.
That Kafka consumer Service is all you really need. If you want to save resources, this could be paired with KEDA autoscaler such that it scales up and down, depending on load or consumer group lag.
Or you can use serverless platforms such as KNative to trigger based on Kafka (or other messaging systems) events.
Kafka does not support AMQP
Kafka Connect should be able to bridge AMQP to Kafka. E.g. Apache Camel has connectors for both.

Kafka Post Deployment - Handling ever-growing clients

We have setup a Kafka Cluster for High Availability and distributed data load. The current consumers and producers specify all the broker IP addresses to connect to the cluster. In the future, there will be the need to continuosly monitor the cluster and add a new broker based on collected metrics and overall system performance. In case a broker crashes, as soon as possible we have to add a new broker with a different IP.
In these scenarios, we have to change all client configurations, a time consuming and stressful operation.
I think we can setup a Config Server (e.g. Spring Cloud Config Server) to specify all the broker IP addresses in a centralized manner, so we have to change all in one place, without touching all the clients, but I don't know if it is the best approach. Obviously, the clients must be programmed to get broker list from config server.
There's a better approach?
Worth pointing out that the "bootstrap" process doesn't require giving every single broker address to the clients, really only the first available address in the list is used for the initial connection, then the advertised.listeners on all the broker configs in the cluster, is what the clients actually use
The answer to your question is to use service discovery, yes. That could be Spring Could Config, but the more general option would be Hashicorp Consul or other service that uses DNS (Kubernetes uses CoreDNS, by default, for example, or AWS Route53).
Then you edit the /etc/resolv.conf of each machine (assuming Linux) the client is running on to include the DNS servers, and you can simply refer to kafka.your.domain:9092 rather than using IP addresses
You could use a load balancer (with a friendly dns like kafka.domain.com), which points to all of your brokers. We do this in our environment. Your clients then connect to kafka.domain.com:9092.
As soon as you add new brokers, you only change the load balancer endpoints and not the client configuration.
Additionally please note that you only need to connect to one bootstrap broker and don't have to list all of them in the client configuration.

How to expand confluent cloud kafka cluster?

I have set up a confluent cloud multizone cluster and it got created with just one bootstrap server. There was no setting for choosing number of servers while creating the cluster. Even after creation, I can’t edit the number of bootstrap servers.
I want to know how to increase the number of servers in confluent cloud kafka cluster.
Under the hood, the Confluent Cloud cluster is already running multiple brokers. Depending on your cluster configuration (specifically, whether you're running Standard or Dedicated, and what region and cloud you're in), the cluster will have between six and several dozen brokers.
The way a Kafka client bootstrap server config works is that the client reaches out to the bootstrap server and requests a list of all brokers, and then uses those broker endpoints to actually produce/consume from Kafka (reference: https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-properties-bootstrap-servers.html)
In Confluent Cloud, the provided bootstrap server is actually a load balancer in front of all of the brokers; when the client connects to the bootstrap server it'll receive the actual endpoints for all of the actual brokers, and then use that for subsequent connections.
So TL;DR, in your client, you only need to specify the one bootstrap server; under the hood, the Kafka client will connect to the (many) brokers running in Confluent Cloud, and it should all just work.
Source: I work at Confluent.

Exposing a public kafka cluster

If I were to create a public Kafka cluster that accepts messages from multiple clients, but are purely processed by a separate backend. What would be the right way to design it?
A bit more concrete example, let's say I have 50 kafka brokers. How do I:
Configure clients without the manually adding in IPs of the 50 kafka brokers.?
Loadbalancing messages to kafka broker based on load if possible.
Easier/automated way to setup additional clients with quota.
You can use hashicorp consul which is one of the open source service discovery tools to get your kafka brokers on, ultimately you will have single endpoint and you don't need to add multiple brokers in your clients. There are several other open source told available
There are few ways, use kafka assigner tool to balance the traffic or kafka cruise control open source tool to automatically balance the cluster for you

Deploying Kafka Consumers

We are deploying kafka consumers based of Java API in a seperate VM grouped by usage. Probably 3-4 consumers (not in same group)/vm based on throughput of these consumers.
Is it best to use this method or deploy the consumer using dockers? Any pointers would be helpful.
Though you can use Kafka confluent REST proxy and others, my question is about consumer deployment.
A VM has too much overhead for simply running one or few JVM applications. If you have a container platform, then that would be preferred, and would start the app faster than provisioning new VMs per app