Exposing Kafka brokers using Google click to deploy environment - kubernetes

I have used the Kafka cluster (with replication) click to deploy container from Google on kubernetes. How do I expose the brokers so I can consume from an external consumer? I'm very new to kunernetes.
I have tried exposing the broker nodes with a load balancer but the external ip given
I opened the port with a firewall rule
But when connecting from my consumer it throws an error about being disconnected
Any help would be great, I can provide more info if asked.

You cannot use a load balancer. Kafka clients must talk directly to the brokers.
Firewall is a good step, but you need to ensure each of the brokers are exposed properly via advertised.listeners within and outside of the VPC. Refer blog post.
Alternative, within GKE, you can run Strimzi Operator, which handles Kubernetes resources for you with regard to the Kafka cluster.

Related

Accessing Kafka in Kubernetes from SvelteKit

I am building a SvelteKit application with Kafka Support. I have created a kafka.js file and tested the kafka support with a local kafka setup, which is successful. Now, When I replaced the topic name with a topic that is running in the kafka of our Kubernetes cluster, am not seeing any response.
How do I test this connection that is establishing between Kafka in Kubernetes cluster and the JS web application ? Any hints could be much helpful. Doing just console logs are not helpful so far because kafka itself not getting hit.
Any two pods deployed in the same namespace can communicate using local service names. So, do the brokers have a Service resource?
For example, assuming you are in namespace: default, and you have kafka-svc, then you'd setup bootstrap.servers: kafka-svc.svc.cluster.local.
You also need to configure Kafka's advertised.listeners. Related blog - https://strimzi.io/blog/2019/04/17/accessing-kafka-part-1/
But this requires NodeJS (or other language) backend, and not SvelteKit UI (which cannot connect to backend TCP server, only use some HTTP bridge). Your HTTP options would include some custom server, or Confluent REST Proxy, Strimzi Kafka Bridge, etc. But you were already told this.

Kafka Post Deployment - Handling ever-growing clients

We have setup a Kafka Cluster for High Availability and distributed data load. The current consumers and producers specify all the broker IP addresses to connect to the cluster. In the future, there will be the need to continuosly monitor the cluster and add a new broker based on collected metrics and overall system performance. In case a broker crashes, as soon as possible we have to add a new broker with a different IP.
In these scenarios, we have to change all client configurations, a time consuming and stressful operation.
I think we can setup a Config Server (e.g. Spring Cloud Config Server) to specify all the broker IP addresses in a centralized manner, so we have to change all in one place, without touching all the clients, but I don't know if it is the best approach. Obviously, the clients must be programmed to get broker list from config server.
There's a better approach?
Worth pointing out that the "bootstrap" process doesn't require giving every single broker address to the clients, really only the first available address in the list is used for the initial connection, then the advertised.listeners on all the broker configs in the cluster, is what the clients actually use
The answer to your question is to use service discovery, yes. That could be Spring Could Config, but the more general option would be Hashicorp Consul or other service that uses DNS (Kubernetes uses CoreDNS, by default, for example, or AWS Route53).
Then you edit the /etc/resolv.conf of each machine (assuming Linux) the client is running on to include the DNS servers, and you can simply refer to kafka.your.domain:9092 rather than using IP addresses
You could use a load balancer (with a friendly dns like kafka.domain.com), which points to all of your brokers. We do this in our environment. Your clients then connect to kafka.domain.com:9092.
As soon as you add new brokers, you only change the load balancer endpoints and not the client configuration.
Additionally please note that you only need to connect to one bootstrap broker and don't have to list all of them in the client configuration.

How to expand confluent cloud kafka cluster?

I have set up a confluent cloud multizone cluster and it got created with just one bootstrap server. There was no setting for choosing number of servers while creating the cluster. Even after creation, I can’t edit the number of bootstrap servers.
I want to know how to increase the number of servers in confluent cloud kafka cluster.
Under the hood, the Confluent Cloud cluster is already running multiple brokers. Depending on your cluster configuration (specifically, whether you're running Standard or Dedicated, and what region and cloud you're in), the cluster will have between six and several dozen brokers.
The way a Kafka client bootstrap server config works is that the client reaches out to the bootstrap server and requests a list of all brokers, and then uses those broker endpoints to actually produce/consume from Kafka (reference: https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-properties-bootstrap-servers.html)
In Confluent Cloud, the provided bootstrap server is actually a load balancer in front of all of the brokers; when the client connects to the bootstrap server it'll receive the actual endpoints for all of the actual brokers, and then use that for subsequent connections.
So TL;DR, in your client, you only need to specify the one bootstrap server; under the hood, the Kafka client will connect to the (many) brokers running in Confluent Cloud, and it should all just work.
Source: I work at Confluent.

How to get Zookeeper to serve up an IP instead of a hostname for Druid

I'm not sure if this should be a Druid or Zookeeper question.
Druid uses Zookeeper to discover Druid nodes. In my setup, the Druid nodes are AWS autoscale instances and get hostnames that don't resolve via DNS. So when Druid asks ZK, it gets back:
spanky-asg-druid-master-10-0-10-218.sandbox-foo.net:8081
Is there a way to either configure ZK to return an IP instead of a hostname or for Druid to ask ZK for an IP?
If you set the druid.host=1.2.3.4 (using the right IP, of course) in the runtime properties of the Druid processes, then they will announce in ZK using IPs instead of hostnames.

Kafka, configuring a single vs multiple broker servers for clients?

I use to configure bootstrap.servers in my kafka producer/consumer/stream apps with a list of broker ips. But I’d like to move to a single url entry that will be resolved by the DNS lookup to a broker ip currently known as up (DNS actively check the brokers in the cluster and responds to lookup with an IP short TTL [10s]). This gives me more flexibility to add brokers in the future, and I can keep the same config in my apps across all the environments/stages. Is this a recommended approach, or this remove resiliency on the client side to not have a strict list of brokers? I assume this config would only be used to initially “discover” the cluster and the partition leader brokers.
If anything, I'd say this adds a single point of failure on the single address you're providing, unless it's actually a load balanced, reverse proxy.
Another possibility that's worked somewhat well internally is using Consul service discovery, with Consul agents running on each broker. This way, you can do service discovery as well as health checks and easier monitoring setup, e.g. having Prometheus jmx_exporter on the brokers, and Prometheus Server scraping those values for all kafka.service.consul addresses