Kafka communication using cloudera - apache-kafka

On-prem--->HAProxy(AWS)--->Kafka(AWS). We can allow the external communication using advertised.listerers property and we can use the listeners for internal communication. If we are enabling both the settings,the communication is not happening properly. We are using 0.10.2 as Kafka version.
I believe we have some setting to do through zookeeper to control the broker communication. How we can do it using cloudera?

See https://rmoff.net/2018/08/02/kafka-listeners-explained/. You need to set up two listeners, one for the external IP through which your clients connect, one for the internal AWS network through which your brokers will communicate with each other.

Related

How to connect to Kafka brokers via proxy over tcp (don't want to use kafka rest)

Please find the screen shot, what we are trying to achieve
Our deployment servers can't connect to internet directly without proxy. Hence needed a way to send messages to outside organization kafka cluster. Please note that we do not want to use kafka rest.
Connecting to Kafka is very complex and doesn't support this scenario; when you first connect to the bootstrap servers; they send the actual connections the client needs to make, based on the broker properties (advertised listeners).

Kafka Post Deployment - Handling ever-growing clients

We have setup a Kafka Cluster for High Availability and distributed data load. The current consumers and producers specify all the broker IP addresses to connect to the cluster. In the future, there will be the need to continuosly monitor the cluster and add a new broker based on collected metrics and overall system performance. In case a broker crashes, as soon as possible we have to add a new broker with a different IP.
In these scenarios, we have to change all client configurations, a time consuming and stressful operation.
I think we can setup a Config Server (e.g. Spring Cloud Config Server) to specify all the broker IP addresses in a centralized manner, so we have to change all in one place, without touching all the clients, but I don't know if it is the best approach. Obviously, the clients must be programmed to get broker list from config server.
There's a better approach?
Worth pointing out that the "bootstrap" process doesn't require giving every single broker address to the clients, really only the first available address in the list is used for the initial connection, then the advertised.listeners on all the broker configs in the cluster, is what the clients actually use
The answer to your question is to use service discovery, yes. That could be Spring Could Config, but the more general option would be Hashicorp Consul or other service that uses DNS (Kubernetes uses CoreDNS, by default, for example, or AWS Route53).
Then you edit the /etc/resolv.conf of each machine (assuming Linux) the client is running on to include the DNS servers, and you can simply refer to kafka.your.domain:9092 rather than using IP addresses
You could use a load balancer (with a friendly dns like kafka.domain.com), which points to all of your brokers. We do this in our environment. Your clients then connect to kafka.domain.com:9092.
As soon as you add new brokers, you only change the load balancer endpoints and not the client configuration.
Additionally please note that you only need to connect to one bootstrap broker and don't have to list all of them in the client configuration.

How to proxy Apache Kafka producer requests on the Kafka broker, and redirect to a separate Kafka cluster?

How to proxy Apache Kafka producer requests on the Kafka broker, and redirect to a separate Kafka cluster?
In my specific case, it's not possible to update the clients that write to this cluster. This means, it's not feasible to do the following:
Updating the bootstrap broker configuration in the client
Rewrite the client code to support Confluent REST Proxy
Therefore, I'm looking for proxy that would work on the Kafka Protocol
Here are some potential options I've discovered so far:
Kafka-proxy
EnvoyProxy
Does anyone have any experience with the tools above (or alternative tools) that would allow me to redirect a binary TCP Kafka request to a separate Kafka cluster?
Since late 2021, Envoy proxy does that for Produce requests (with kafka-mesh-filter), however it's still a work in progress.
In general your Proxy will need to understand the Kafka protocol and maintain necessary connections to upstream Kafka cluster(s).

Configuring kafka listeners

I have this question regarding configuring kafka listeners properties correctly -
listeners and advertised.listeners.
In my config I am setting below props:
listeners=SASL_PLAINTEXT://:9092
advertised.listeners=SASL_PLAINTEXT://u-kafkatst-kafkadev-5.sd.xxx.com:9092
The clients connect using u-kafkatst-kafkadev-5.sd.xxx.com:9092. Do I need to have the same value in listener and advertised.listeners. Here u-kafkatst-kafkadev-5.sd.xxx.com is a dns record that points to the host where kafka broker is running.
What are the situations where I would want to keep them same and different?
Thanks!
The advertised.listeners property is important if you are doing anything other than connecting to a broker directly on the same network. If you are using Docker, Kubernetes, IaaS (AWS, GCP, etc) then you need to expose the external address for the client to know where to connect to.
This article explains it all in depth.

Internal and External communication in Kafka

Flow:
On-Prem ------>Proxy--->Kafka
advertised.listeners=PLAINTEXT://proxyhostname:8080 - for external communication
listeners=PLAINTEXT://:9092 = for internal communication
•When we set both the property, the internal communication is not happening.(Replication issue and the consumer couldn’t connect via locally and we have to provide the proxy Ip for consumer communication)
•How we can effectively use both the property for internal and external communication?
•Any alternative idea to do the external and internal communication?
It's very common to define multiple listeners and Kafka supports that very well.
To define several listeners, you need to list all of them in advertised.listeners/listeners.
If multiple listeners are going to use the same Security Protocol (PLAINTEXT), you also need to set listener.security.protocol.map to map custom names to Security Protocols. See broker configs in the Kafka Docs.
For example:
listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
advertised.listeners=INTERNAL://:9092,EXTERNAL://proxyhostname:8080
listeners=INTERNAL://:9092,EXTERNAL://:8080
This maps 2 names EXTERNAL and INTERNAL (you can use any name you like, I just reused names from your question) to the PLAINTEXT security protocol. Then for each, it defines the port to listen to and the hostname to advertise in metadata responses.