Create Producer when the first broker in the list of brokers is down - apache-kafka

I have a multi-node Kafka cluster which I use for consuming and producing.
In my application, I use confluent-kafka-go(1.6.1) to create producers and consumers. Everything works great when I produce and consume messages.
This is how I configure my bootstrap server list
"bootstrap.servers":"localhost:9092,localhost:9093,localhost:9094"
But the moment when I start giving out the IP address of the brokers in bootstrap.servers and if the first broker is down, it seems that the producer repeatedly fails creation telling
Failed to initialize Producer ID: Local: Timed out
If I remove the IP of the failed node, producing and consuming messages work.
If the broker is down after I create the producer/consumer, they continue to be usable by switching over to other nodes.
How should I configure bootstrap.servers in such a way that the producer will be created using the available nodes?

You shouldn't really be running 3 brokers on the same machine anyway, but using multiple unique servers works fine for me when the first is down (and the cluster elects a different leader if it needs to), so sounds like you either lost the primary leader of your topic partitions or you've lost the Controller. Enabling retires on the producer should be able fix itself (by making a new metadata request for partition leaders)
Overall, it's just a CSV; there's no other way to configure that property itself. You could stick a reverse proxy in front of the brokers that resolves only to healthy nodes, but then you'd be conflicting with a potential DNS cache

Related

How to add two more kafka brokers in the local machine if my current running kafka broker already has the data

I have one broker running in my local machine with Windows OS which has 2-3 topics with messages stored. I want to scale up my machine by adding two more broker instances. I have followed all the steps to configure 3 brokers on the same machine by creating different properties file.
My broker=0 getting shutdown when I am starting broker=1 server with below error.
[2019-07-11 13:56:33,580] INFO Stopping serving logs in dir C:\kafka_2.12-2.2.1\data\kafka (kafka.log.LogManager)
[2019-07-11 13:56:33,585] ERROR Shutdown broker because all log dirs in C:\kafka_2.12-2.2.1\data\kafka have failed (kafka.log.LogManager)
Is it possible to add more brokers if my existing broker instance has the data.
Or do I need to delete the data directory and freshly start the broker 0. Is there any possibility to preserve the data without deleting it from the kafka server.
Yes you can add brokers to your cluster and migrate/spread data across all your brokers.
The Expanding your cluster section in the documentation details the steps to achieve this.
After starting the new brokers, you basically need to use the bin/kafka-reassign-partitions.sh tool (other 3rd party tools also exists) to move data onto them.
Please note however that adding brokers on the same machine does not provide a lot of resiliency as if the machine was to go down, all brokers would be affected. But if you want to just play around and learn about Kafka that may be fine.
To run multiple brokers on the same physical machine, it is necessary for each broker in the config to specify a unique broker.id, different log.dirs and ports in listeners.
For example,
config/server{1,2,3}.properties
in every config set difference
broker.id=<id>
log.dirs=/data/kafka<id>
listeners=PLAINTEXT://localhost:909<id>
When all three brokers start, new topics will be created evenly throughout the cluster, but old ones need to be rebalanced.

Is it possible to produce to a kafka topic when only 1 of the brokers is reachable?

Is it possible to produce to a Kafka topic when only 1 of the brokers is reachable from the producer, none of the zookeeper nodes are reachable from the producer, but all of the brokers are healthy and are reachable from each other?
For example, this would be required if I were to produce messages via an SSH tunnel. If this were for a temporary push I could possibly create the topic with replication factor 1 and have all partitions assigned to the broker in question, and reassign the partitions after the fact, but I'm hoping there is a more flexible setup.
This is all using the java client.
Producers don't interact with Zookeeper so it's not an issue.
The only requirement for Producers is to be able to connect to the brokers that are leaders for the partitions they want to use.
If the broker you connect to is the leader for the partitions you want to use, then yes you can produce to it.
Otherwise it's not going to work. Also creating a topic may not help as its partitions could be assigned to any brokers. Also in order to create a topic, a client has to connect to the controller which may not be the broker you can reach.
If you can only connect to 1 "thing", you may want to consider using something like a REST Proxy. Your "isolated" environment could send REST requests to the proxy which is able to connect to all brokers in the cluster.

Kafka Producer, multi DC failover support

I have two distinct kafka clusters located in different data centers - DC1 and DC2. How to organize kafka producer failover between two DCs? If primary kafka cluster (DC1) becomes unavailable, I want producer to switch to failover kafka cluster (DC2) and continue publishing to it? Producer also should be able to switch back to primary cluster, once it is available. Any good patterns, existing libs, approaches, code examples?
Each partition of the Kafka topic your producer is publishing to has a separate leader, often spread across multiple brokers in the cluster, so the producer is connected to many “primary” brokers simultaneously. Should any one of them fail another In Sync Replica (ISR) will be elected as leader and automatically take over. You do not need to do anything in your client app for it to reconnect to the new leader(s), retry any failed requests, and continue.
If this is for Multi-Data Center (MDC) failover then things get much more complicated depending on if the client apps die as well or if they keep running and need just their cluster connections to failover. Offsets are not preserved across multiple Kafka clusters so while producers are simpler, consumers need to call GetOffsetsForTimes() upon failover.
For a great write up of the the MDC failover modes and best practices see the MDC Whitepaper here: https://www.confluent.io/white-paper/disaster-recovery-for-multi-datacenter-apache-kafka-deployments/
Since you asked only about producers, your app can detect if the primary cluster is down (say for a certain number of retries) and then instead of attempting to reconnect, it can instead connect to another brokerlist from the secondary cluster. Alternatively you can redirect the dns name of the brokerlist hosts to point to the secondary cluster.

Can a Kafka broker keep id but change advertised listener?

I have a cluster of 3 Kafka brokers. Most of the topics have replication factor of 2, while the consumer offsets all have a replication factor of 3.
I need to change where the individual brokers are listening, i.e. the IPs/hostnames on which they are listening. Is it possible to change the advertised listeners for a given broker ID? Or do I have to create a new broker with a different ID, repartition topics, and remove the old broker?
Assuming it does work, does the official Java Kafka client realize that the listener has changed and re-request the list of brokers for the topic(s)?
For the interested, I am running Kafka in Kubernetes. Originally, I needed access from both inside and outside the cluster, so I had services with nodePort (hostPort did not work with CNI prior to Kubernetes 1.7).
It worked, but was complex. I no longer need access from outside Kubernetes, so would like to keep it simple and have three brokers that advertise their hostname.
Can I bring down a broker and restart it with a different advertised listener? Or must I add a new broker, rebalance, and remove the old one?

How to permanently remove a broker from Kafka cluster?

How do I permanently remove a broker from a Kafka cluster?
Scenario:
I have a stable cluster of 3 brokers.
I temporarily added a fourth broker that successfully joined the cluster. The controller returned metadata indicating this broker was part of the cluster.
However, I never rebalanced partitions onto this broker, so this broker #4 was never actually used.
I later decided to remove this unused broker from the cluster. I shutdown the broker successfully and Zookeeper /broker/ids no longer lists broker #4.
However, when our application code connects to any Kafka broker and fetches metadata, we get a broker list that includes this deleted broker.
How do I indicate to the cluster that this broker has been permanently removed from the cluster and not just a transient downtime?
Additionally, what's happening under the covers that causes this?
I'm guessing that when I connect to a broker and ask for metadata, the broker checks its local cache for the controller ID, contacts the broker and asks it for the list of all brokers. Then the controller checks it's cached list of brokers and returns the list of all brokers known to have belonged to the cluster at any point in time.
I'm guessing this happens because it's not certain if the dead broker is permanently removed or just transient downtime. So I'm thinking we just need to indicate to the controller that it needs to reset it's list of known cluster brokers to the known live brokers in zookeeper. But would not be surprised if something in my mental model is incorrect.
This is for Kafka 0.8.2. I am planning to upgrade to 0.10 soon, so if 0.10 handles this differently, I'd also love to know that.
It looks like this is most likely due to this bug in Kafka 8, which was fixed in Kafka 9.