Flow:
On-Prem ------>Proxy--->Kafka
advertised.listeners=PLAINTEXT://proxyhostname:8080 - for external communication
listeners=PLAINTEXT://:9092 = for internal communication
•When we set both the property, the internal communication is not happening.(Replication issue and the consumer couldn’t connect via locally and we have to provide the proxy Ip for consumer communication)
•How we can effectively use both the property for internal and external communication?
•Any alternative idea to do the external and internal communication?
It's very common to define multiple listeners and Kafka supports that very well.
To define several listeners, you need to list all of them in advertised.listeners/listeners.
If multiple listeners are going to use the same Security Protocol (PLAINTEXT), you also need to set listener.security.protocol.map to map custom names to Security Protocols. See broker configs in the Kafka Docs.
For example:
listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
advertised.listeners=INTERNAL://:9092,EXTERNAL://proxyhostname:8080
listeners=INTERNAL://:9092,EXTERNAL://:8080
This maps 2 names EXTERNAL and INTERNAL (you can use any name you like, I just reused names from your question) to the PLAINTEXT security protocol. Then for each, it defines the port to listen to and the hostname to advertise in metadata responses.
Related
We have setup a Kafka Cluster for High Availability and distributed data load. The current consumers and producers specify all the broker IP addresses to connect to the cluster. In the future, there will be the need to continuosly monitor the cluster and add a new broker based on collected metrics and overall system performance. In case a broker crashes, as soon as possible we have to add a new broker with a different IP.
In these scenarios, we have to change all client configurations, a time consuming and stressful operation.
I think we can setup a Config Server (e.g. Spring Cloud Config Server) to specify all the broker IP addresses in a centralized manner, so we have to change all in one place, without touching all the clients, but I don't know if it is the best approach. Obviously, the clients must be programmed to get broker list from config server.
There's a better approach?
Worth pointing out that the "bootstrap" process doesn't require giving every single broker address to the clients, really only the first available address in the list is used for the initial connection, then the advertised.listeners on all the broker configs in the cluster, is what the clients actually use
The answer to your question is to use service discovery, yes. That could be Spring Could Config, but the more general option would be Hashicorp Consul or other service that uses DNS (Kubernetes uses CoreDNS, by default, for example, or AWS Route53).
Then you edit the /etc/resolv.conf of each machine (assuming Linux) the client is running on to include the DNS servers, and you can simply refer to kafka.your.domain:9092 rather than using IP addresses
You could use a load balancer (with a friendly dns like kafka.domain.com), which points to all of your brokers. We do this in our environment. Your clients then connect to kafka.domain.com:9092.
As soon as you add new brokers, you only change the load balancer endpoints and not the client configuration.
Additionally please note that you only need to connect to one bootstrap broker and don't have to list all of them in the client configuration.
I want to configure kafka so client can connect to it.
What is the difference between advertised.listeners and bootstrap.servers in kafka configuration?
bootstrap.servers parameter is used only for initial connection to cluster. After this initial connection is established, Kafka returns advertised.listeners which is ip/port list that is used to connect to broker(s).
This image can be helpful to understand the concept:
Note: advertised.host.name is deprecated, you can assume that as advertised.listeners
For more information you can check Kafka docs:
bootstrap.servers: A list of host/port pairs to use for establishing
the initial connection to the Kafka cluster. The client will make use
of all servers irrespective of which servers are specified here for
bootstrapping—this list only impacts the initial hosts used to
discover the full set of servers. This list should be in the form
host1:port1,host2:port2,.... Since these servers are just used for the
initial connection to discover the full cluster membership (which may
change dynamically), this list need not contain the full set of
servers (you may want more than one, though, in case a server is
down).
advertised.listeners: Listeners to publish to ZooKeeper for clients to
use, if different than the listeners config property. In IaaS
environments, this may need to be different from the interface to
which the broker binds. If this is not set, the value for listeners
will be used. Unlike listeners it is not valid to advertise the
0.0.0.0 meta-address.
Reference for image: https://www.udemy.com/course/kafka-cluster-setup/
bootstrap.servers is a list of broker(s) that you provide your client with to connect to the Kafka cluster.
advertised.listeners is the host&port of each broker that the client is provided with on the initial connection to the bootstrap.server. When the client connects to brokers subsequently it will use these and not the bootstrap.server, which is why it is so important that you set advertised.listeners correctly based on your networking setup.
For more details see https://rmoff.net/2018/08/02/kafka-listeners-explained/
I have this question regarding configuring kafka listeners properties correctly -
listeners and advertised.listeners.
In my config I am setting below props:
listeners=SASL_PLAINTEXT://:9092
advertised.listeners=SASL_PLAINTEXT://u-kafkatst-kafkadev-5.sd.xxx.com:9092
The clients connect using u-kafkatst-kafkadev-5.sd.xxx.com:9092. Do I need to have the same value in listener and advertised.listeners. Here u-kafkatst-kafkadev-5.sd.xxx.com is a dns record that points to the host where kafka broker is running.
What are the situations where I would want to keep them same and different?
Thanks!
The advertised.listeners property is important if you are doing anything other than connecting to a broker directly on the same network. If you are using Docker, Kubernetes, IaaS (AWS, GCP, etc) then you need to expose the external address for the client to know where to connect to.
This article explains it all in depth.
On-prem--->HAProxy(AWS)--->Kafka(AWS). We can allow the external communication using advertised.listerers property and we can use the listeners for internal communication. If we are enabling both the settings,the communication is not happening properly. We are using 0.10.2 as Kafka version.
I believe we have some setting to do through zookeeper to control the broker communication. How we can do it using cloudera?
See https://rmoff.net/2018/08/02/kafka-listeners-explained/. You need to set up two listeners, one for the external IP through which your clients connect, one for the internal AWS network through which your brokers will communicate with each other.
I have a cluster of Kafka with 5 brokers and I'm using Consul Service Discovery to put their IPs into a dns record.
kafka.service.domain.cc A 1.1.1.1 2.2.2.2 ... 5.5.5.5
Is it recommended to use only one domain name:
kafka.bootstrap.servers = kafka.service.domain.cc:30000
or is it better to have multiple domain names (at least 2), each one resolves to one broker
kafka1.service.domain.cc A 1.1.1.1
kafka2.service.domain.cc A 2.2.2.2
then use them in in kafka
kafka.bootstrap.servers = kafka1.service.domain.cc:30000,kafka2.service.domain.cc:30000
my concerns with the first approach that the domain name will be resolved only once to a random broker, and if that broker is down, no new dns resolving will take place.
From the book Mastering Apache Kafka:
bootstrap.servers is a comma-separated list of host and port pairs
that are the addresses of the Kafka brokers in a "bootstrap" Kafka
cluster that a Kafka client connects to initially to bootstrap itself.
bootstrap.servers provides the initial hosts that act as the
starting point for a Kafka client to discover the full set of alive
servers in the cluster. Since these servers are just used for the
initial connection to discover the full cluster membership (which may
change dynamically), this list does not have to contain the full set
of servers (you may want more than one, though, in case a server is
down).
Clients (producers or consumers) make use of all servers irrespective
of which servers are specified in bootstrap.servers for bootstrapping.
So as the property bootstrap.servers provides the initial hosts that act as the starting point for a Kafka client to discover the full set of alive servers in the cluster, I think both the approach will do. But as they kept the value of the property to be a comma separated list, I guess second approach will be the recommended one. And also it will be a problem in approach 1 is, while bootstrapping, random broker may be down and client will not get the cluster information to continue. So it is always better to provide more than one as fallback if one broker is down during bootstrapping.
Kafka 2.1 included support for handling multiple DNS resource records in bootstrap.servers.
If you set client.dns.lookup="use_all_dns_ips" in your client configuration, it will use all of the IP addresses returned by DNS, not just the first (or a random one).
See KIP-235 and KIP-302 for more information.