Do all Kafka broker IPs have to be specified when producing a message - apache-kafka

I have 3 broker kafka cluster with 3 zookeeper.
My question is if we have to give only one IP address in producer-console.sh file like below
/kafka-console-producer.sh --broker-list 192.168.7.110:9092 --topic test
or all the three ip addressess
./kafka-console-producer.sh --broker-list 192.168.7.110:9092,192.168.5.110:9092,192.168.3.111:9092 --topic test
What will happen if I provide only one IP to produce messages and that IP is shutdown after sometime. Will I be able to produce messages through that IP or not or I have to give all the IP addresses?

Check out the Producer config docs, which describe the purpose of bootstrap.servers (bootstrap-servers / broker-list are synonyms):
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form host1:port1,host2:port2,....
Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).
So if you only provide one IP, and that IP is then shut down, your producer will subsequently fail when it tries to connect. But you could, for example, supply two IPs, so that if one fails the producer can still connect to a different one. But the broker to which the actual messages are sent is not impacted by this.
See also this answer here.

Related

Create Producer when the first broker in the list of brokers is down

I have a multi-node Kafka cluster which I use for consuming and producing.
In my application, I use confluent-kafka-go(1.6.1) to create producers and consumers. Everything works great when I produce and consume messages.
This is how I configure my bootstrap server list
"bootstrap.servers":"localhost:9092,localhost:9093,localhost:9094"
But the moment when I start giving out the IP address of the brokers in bootstrap.servers and if the first broker is down, it seems that the producer repeatedly fails creation telling
Failed to initialize Producer ID: Local: Timed out
If I remove the IP of the failed node, producing and consuming messages work.
If the broker is down after I create the producer/consumer, they continue to be usable by switching over to other nodes.
How should I configure bootstrap.servers in such a way that the producer will be created using the available nodes?
You shouldn't really be running 3 brokers on the same machine anyway, but using multiple unique servers works fine for me when the first is down (and the cluster elects a different leader if it needs to), so sounds like you either lost the primary leader of your topic partitions or you've lost the Controller. Enabling retires on the producer should be able fix itself (by making a new metadata request for partition leaders)
Overall, it's just a CSV; there's no other way to configure that property itself. You could stick a reverse proxy in front of the brokers that resolves only to healthy nodes, but then you'd be conflicting with a potential DNS cache

What is the difference between Kafka Cluster and Kafka Broker?

Has Kafka cluster and Kafka broker the same meaning?
I know cluster has multiple brokers (Is this wrong?).
But when I write code to produce messages, I find awkward option.
props.put("bootstrap.servers", "kafka001:9092, kafka002:9092, kafka003:9092");
Is this broker address or cluster address? If this is broker address, I think it is not good because we have to modify above address when brokers count changes.
(But it seems like broker address..)
Additionally, I saw in MSK in amazon, we can add broker to each AZ.
It means, we cannot have many broker. (Three or four at most?)
And they guided we should write this broker addresses to bootstrap.serveroption as a,` seperated list.
Why they don't guide us to use clusters address or ARN?
A Kafka cluster is a group of Kafka brokers.
When using the Producer API it is not required to mention all brokers within the cluster in the bootstrap.servers properties. The Producer configuration documentation on bootstrap.servers gives the full details:
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form host1:port1,host2:port2,.... Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).
All brokers within a cluster share meta information of other brokers in the same cluster. Therefore, it is sufficient to mention even only one broker in the bootstrap-servers properties. However, you should still mention more than one in case of the one broker being not available for whatever reason.

Clarification on the --broker-list option and --bootstrap-server option in Kafka-console producer and consumer respectively

In Kafka-console-producer, the --broker-list takes a list of servers.
Does the producer connect to all of them? (or)
Does the producer uses the list of servers to connect one of them and if that one fails, switches to the next and so on?
Similarly, in Kafka-console-consumer the --bootstrap-server takes a list of Kafka servers. If there are two Kafka servers, do I need to specify both of them in the --bootstrap-server?
I tried myself running the consumer with one server (Kafka-server1) and when I stopped Kafka-server1, it continued to receive data for the topic.
They both act/are the same.
If you look at the Kafka source code, you'll see both options lead to the same "bootstrap.servers" configuration property
def producerProps(config: ProducerConfig): Properties = {
val props =
if (config.options.has(config.producerConfigOpt))
Utils.loadProps(config.options.valueOf(config.producerConfigOpt))
else new Properties
props ++= config.extraProducerProps
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, config.brokerList) // <---- brokerList is passed as BOOTSTRAP_SERVER
Both consumer and producer will connect in a round-robin fashion to the list of provided addresses to create an initial "boostrap" connection to the Kafka Controller, which knows about all available brokers in the cluster at a given time. It is good practice to give at least 3 for high-availability.
If there are two Kafka servers, do I need to specify both of them in the --bootstrap-server?
With regards to having multiple addresses availble to use, in a cloud enviornment, where you might have brokers over availability zones, it is recommended to have at least 2 brokers listed per availability zone, so 6 total for 3 zones.
The address provided for clients could be similified using a load balancer / reverse proxy down to a single kafka.your.network:9092 address, but then you are introducing extra DNS and network hops to figure out the connection for the sake of having a single, well-known, address.
In any case, all available addresses for the brokers will be handed to the clients and then cached locally.
However, it is important to recognize all send/poll requests will only communicate with the singular leader of a TopicPartition, despite how many addresses you give and how many replicas a topic will have.

how to change Kafka broker list ip

I have 3 Kafka brokers running in a isolated network region, my client can not connect them directly, so I have to use a VIP(virtual ip) to connect the brokers.
For example:
my brokers' IP are: 10.5.1.5, 10.5.1.6, 10.5.1.7,
my VIPs' ip are: 200.100.1.5, 200.100.1.6, 200.100.1.7, they one to one paired.
So when I indicate the bootstrap list as 200.100.1.5, the cluster response me the mixed VIPs and Broker ips, such as: 10.5.1.5, 10.5.1.6, 200.100.1.5, 200.100.1.6 ..., then the connection failed, because my program can not reach broker's ip, only can reach VIPs.
My current configuration as following, it responses both IP and VIP:
listeners=INTERNAL://:9092,EXTERNAL_PLAINTEXT://:8080
advertised.listeners=EXTERNAL_PLAINTEXT://200.100.1.5:8080,INTERNAL://10.5.1.5:9092
listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL_PLAINTEXT:PLAINTEXT
inter.broker.listener.name=INTERNAL
How can I let Kafka only response the VIP list please.
I've got the answer, it could be the following:
advertised.listeners=PLAINTEXT://200.100.1.5:8080
listeners=PLAINTEXT://10.5.1.5:9092
And remove the listener.security and inter.broker.
You can use the broker setting called advertised.listeners to tell your brokers to include a different IP/hostname in their response to clients.
advertised.listeners:
Listeners to publish to ZooKeeper for clients to use, if different
than the listeners config property. In IaaS environments, this may
need to be different from the interface to which the broker binds. If
this is not set, the value for listeners will be used. Unlike
listeners it is not valid to advertise the 0.0.0.0 meta-address.
In your example, for the first broker you can have:
advertised.listeners=PLAINTEXT://200.100.1.5:9092
listeners=PLAINTEXT://10.5.1.5:9092

kafka bootstrap.servers as DNS A-Record with multiple IPs

I have a cluster of Kafka with 5 brokers and I'm using Consul Service Discovery to put their IPs into a dns record.
kafka.service.domain.cc A 1.1.1.1 2.2.2.2 ... 5.5.5.5
Is it recommended to use only one domain name:
kafka.bootstrap.servers = kafka.service.domain.cc:30000
or is it better to have multiple domain names (at least 2), each one resolves to one broker
kafka1.service.domain.cc A 1.1.1.1
kafka2.service.domain.cc A 2.2.2.2
then use them in in kafka
kafka.bootstrap.servers = kafka1.service.domain.cc:30000,kafka2.service.domain.cc:30000
my concerns with the first approach that the domain name will be resolved only once to a random broker, and if that broker is down, no new dns resolving will take place.
From the book Mastering Apache Kafka:
bootstrap.servers is a comma-separated list of host and port pairs
that are the addresses of the Kafka brokers in a "bootstrap" Kafka
cluster that a Kafka client connects to initially to bootstrap itself.
bootstrap.servers provides the initial hosts that act as the
starting point for a Kafka client to discover the full set of alive
servers in the cluster. Since these servers are just used for the
initial connection to discover the full cluster membership (which may
change dynamically), this list does not have to contain the full set
of servers (you may want more than one, though, in case a server is
down).
Clients (producers or consumers) make use of all servers irrespective
of which servers are specified in bootstrap.servers for bootstrapping.
So as the property bootstrap.servers provides the initial hosts that act as the starting point for a Kafka client to discover the full set of alive servers in the cluster, I think both the approach will do. But as they kept the value of the property to be a comma separated list, I guess second approach will be the recommended one. And also it will be a problem in approach 1 is, while bootstrapping, random broker may be down and client will not get the cluster information to continue. So it is always better to provide more than one as fallback if one broker is down during bootstrapping.
Kafka 2.1 included support for handling multiple DNS resource records in bootstrap.servers.
If you set client.dns.lookup="use_all_dns_ips" in your client configuration, it will use all of the IP addresses returned by DNS, not just the first (or a random one).
See KIP-235 and KIP-302 for more information.