Configuring kafka listeners - apache-kafka

I have this question regarding configuring kafka listeners properties correctly -
listeners and advertised.listeners.
In my config I am setting below props:
listeners=SASL_PLAINTEXT://:9092
advertised.listeners=SASL_PLAINTEXT://u-kafkatst-kafkadev-5.sd.xxx.com:9092
The clients connect using u-kafkatst-kafkadev-5.sd.xxx.com:9092. Do I need to have the same value in listener and advertised.listeners. Here u-kafkatst-kafkadev-5.sd.xxx.com is a dns record that points to the host where kafka broker is running.
What are the situations where I would want to keep them same and different?
Thanks!

The advertised.listeners property is important if you are doing anything other than connecting to a broker directly on the same network. If you are using Docker, Kubernetes, IaaS (AWS, GCP, etc) then you need to expose the external address for the client to know where to connect to.
This article explains it all in depth.

Related

Kafka Post Deployment - Handling ever-growing clients

We have setup a Kafka Cluster for High Availability and distributed data load. The current consumers and producers specify all the broker IP addresses to connect to the cluster. In the future, there will be the need to continuosly monitor the cluster and add a new broker based on collected metrics and overall system performance. In case a broker crashes, as soon as possible we have to add a new broker with a different IP.
In these scenarios, we have to change all client configurations, a time consuming and stressful operation.
I think we can setup a Config Server (e.g. Spring Cloud Config Server) to specify all the broker IP addresses in a centralized manner, so we have to change all in one place, without touching all the clients, but I don't know if it is the best approach. Obviously, the clients must be programmed to get broker list from config server.
There's a better approach?
Worth pointing out that the "bootstrap" process doesn't require giving every single broker address to the clients, really only the first available address in the list is used for the initial connection, then the advertised.listeners on all the broker configs in the cluster, is what the clients actually use
The answer to your question is to use service discovery, yes. That could be Spring Could Config, but the more general option would be Hashicorp Consul or other service that uses DNS (Kubernetes uses CoreDNS, by default, for example, or AWS Route53).
Then you edit the /etc/resolv.conf of each machine (assuming Linux) the client is running on to include the DNS servers, and you can simply refer to kafka.your.domain:9092 rather than using IP addresses
You could use a load balancer (with a friendly dns like kafka.domain.com), which points to all of your brokers. We do this in our environment. Your clients then connect to kafka.domain.com:9092.
As soon as you add new brokers, you only change the load balancer endpoints and not the client configuration.
Additionally please note that you only need to connect to one bootstrap broker and don't have to list all of them in the client configuration.

How to connect to someone else's public Kafka Topic

Apologies if this is a very basic question.
I'm just starting to get to grips with Kafka and have been given a kafka endpoint and topic to push messages to but I'm not actually sure where, when writing the consumer, to specify the end point. Atm I've only had experience in creating a consumer for a broker and producer that is running locally on my machine and so was able to do this by setting the bootstrap server to my local host and port.
I have an inkling that it may be something to do with the advertised listeners settings but I am unsure how it works.
Again sorry if this seems like a very basic question but I couldn't find the answer
Thank you!
Advertised listeners are a broker setting. If someone else setup Kafka, then all you need to do is change the bootstrap address
If it's "public" over the internet, then chances are you might also need to configure certificates & authentication
Connecting to a public cluster is same as connecting to a local deployment.
Im assuming that your are provided with a FQDN of the cluster and the topic name.
You need to add the FQDN to the bootstrap.servers property of your consumer and subscribe to the topics using the subscribe()
you might want to look into the client.dns.lookup property if you want to change the discovery strategy.
Additionally you might have to configure the keystore and a truststore depending on the security configuration on the cluster

What is the difference between advertised.listeners and bootstrap.servers?

I want to configure kafka so client can connect to it.
What is the difference between advertised.listeners and bootstrap.servers in kafka configuration?
bootstrap.servers parameter is used only for initial connection to cluster. After this initial connection is established, Kafka returns advertised.listeners which is ip/port list that is used to connect to broker(s).
This image can be helpful to understand the concept:
Note: advertised.host.name is deprecated, you can assume that as advertised.listeners
For more information you can check Kafka docs:
bootstrap.servers: A list of host/port pairs to use for establishing
the initial connection to the Kafka cluster. The client will make use
of all servers irrespective of which servers are specified here for
bootstrapping—this list only impacts the initial hosts used to
discover the full set of servers. This list should be in the form
host1:port1,host2:port2,.... Since these servers are just used for the
initial connection to discover the full cluster membership (which may
change dynamically), this list need not contain the full set of
servers (you may want more than one, though, in case a server is
down).
advertised.listeners: Listeners to publish to ZooKeeper for clients to
use, if different than the listeners config property. In IaaS
environments, this may need to be different from the interface to
which the broker binds. If this is not set, the value for listeners
will be used. Unlike listeners it is not valid to advertise the
0.0.0.0 meta-address.
Reference for image: https://www.udemy.com/course/kafka-cluster-setup/
bootstrap.servers is a list of broker(s) that you provide your client with to connect to the Kafka cluster.
advertised.listeners is the host&port of each broker that the client is provided with on the initial connection to the bootstrap.server. When the client connects to brokers subsequently it will use these and not the bootstrap.server, which is why it is so important that you set advertised.listeners correctly based on your networking setup.
For more details see https://rmoff.net/2018/08/02/kafka-listeners-explained/

Setting up kafka in aws ec2

I have setup kakfa on ec2 instance. I have assigned elastic ip address to that instance. I am able to start the zoo keeper and kafka and create topics. I am not able to connect to broker from my local machine. When i searched, I understood tgar I need to configure listener and advertised host name in the server properties file. I tried enterung the public elasticip address but its not working.
Where am I going wrong and what values do I need to configure. I want a basic single node sigle broker kafka setup on ec2.
You need to configure your advertised.listeners per https://rmoff.net/2018/08/02/kafka-listeners-explained/

Kafka communication using cloudera

On-prem--->HAProxy(AWS)--->Kafka(AWS). We can allow the external communication using advertised.listerers property and we can use the listeners for internal communication. If we are enabling both the settings,the communication is not happening properly. We are using 0.10.2 as Kafka version.
I believe we have some setting to do through zookeeper to control the broker communication. How we can do it using cloudera?
See https://rmoff.net/2018/08/02/kafka-listeners-explained/. You need to set up two listeners, one for the external IP through which your clients connect, one for the internal AWS network through which your brokers will communicate with each other.