Kafka 0.10 quickstart: consumer fails when "primary" broker is brought down

Kafka 0.10 quickstart: consumer fails when "primary" broker is brought down - apache-kafka

So I'm trying the kafka quickstart as per the main documentation. Got the multi-cluster example all setup and test per the instructions and it works. For example, bringing down one broker and the producer and consumer can still send and receive.
However, as per the example, we setup 3 brokers and we bring down broker 2 (with broker id = 1). Now if I bring up all brokers again, but I bring down broker 1 (with broker id = 0), the consumer just hangs. This only happens with broker 1 (id = 0), does not happen with broker 2 or 3. I'm testing this on Windows 7.
Is there something special here with broker 1? Looking at the config they are exactly the same between all 3 brokers except the id, port number and log file location.
I thought it is just a problem with the provided console consumer which doesn't take a broker list, so I wrote a simple java consumer as per their documentation using the default setup but specify the list of brokers in the "bootstrap.servers" property, but no dice, still get the same problem.
The moment I startup broker 1 (broker id = 0), the consumers will just resume working. This isn't a highly available/fault tolerant behavior for the consumer... any help on how to setup a HA/fault tolerant consumer?
Producers doesn't seem to have an issue.

If you follow the quick-start, the created topic should have only one partition with one replica which is hosted in the first broker by default, namely broker 1. That's why the consumer got failed when you brought down this broker.
Try to create a topic with multiple replicas(specifying --replication-factor when creating topic) and rerun your test to see whether it brings higher availability.

Related

Kafka, questions about setting up

I'm testing Kafka on Linux, but I don't know what's wrong because the test results are different from what I understand.
Let me explain the setting.
Currently, three brokers were configured with kafka version 2.8.1 in centos7 using 9092, 9093, and 9094 ports, respectively.
In the case of producers, all three ports were connected to the bootstrap-server setting and then executed.
kafka-console-producer.bat --broker-list localhost:9092,localhost:9093,localhost:9094 --topic test
In the case of consumers, three were set up so that they could be attached to each of the three ports.
1. kafka-console-consumer.bat --bootstrap-serverlocalhost:9092 --topic test
2. kafka-console-consumer.bat --bootstrap-serverlocalhost:9093 --topic test
3. kafka-console-consumer.bat --bootstrap-serverlocalhost:9094 --topic test
If I were to explain what I understood here,
In the case of Kafka, the leader broker acts as a controller, and the follow brokers copy the leader broker's data.
If one of the follower brokers dies, a disconnection message simply appears on the consumer connected to the broker.Other brokers operate normally.
If a leader broker dies, one of the follow brokers will be changed to a leader broker, and the changed leader broker will act as a controller.
If I were to explain the problem,
If you kill a leader broker, check the describe option, and the other follow broker has changed to a leader, but both producers and consumers cannot find a new leader and fail.
Even if a broker running on 9092 ports kills the broker without being a leader, the producer and consumer will fail.
Question.
If the leader broker dies, should the producer and consumer also set up a new connection?
Am I misunderstanding the producer and consumer?
Is there anything wrong with the setting?

I'm testing Kafka on Linux
But you're using Batch files, and connecting to localhost, which are for Windows...
so that they could be attached to each of the three ports.
This isn't how Kafka distributes load. You can only have one consumer thread active per topic partition. Unclear how many partitions your topic has, but if you have only one and that specific broker died (it is the only replica and leader), this explains why your clients would stop working.
Besides that, Kafka is generally on the same port, on mulitple hosts. Using one host is not truly fault-tolerant, and is a waste of resources (CPU, RAM, and disk).
Regarding producers, there is a property for retries that can be configured; I'm not sure if the console producer overrides the default or not, but it should connect to the next available broker upon a new request.
For consumers, the same, however, you'll want to make sure your offsets.topic.replication.factor (and transactions topic factor, if you use them) is higher than 1; otherwise, consumers will be unable to read anything (or transactions will not work, which are enabled by default in newer versions)

Kafka Cluster - issue with one broker not being utilized

I am having Kafka cluster with 3 brokers and 3 zookeeper node running. we have added 4th broker recently. When we bring it as new cluster, few partitions got stored in the 4th broker as expected. Replication factor for all topics is 3 and has each topic has 10 partitions.
Later, Whenever we bring down whole kafka cluster for maintenance activity and bring it back, all topic partitions is getting stored in first 3 brokers and no partition is getting stored in 4th broker. (Note: Due to bug, we had to use new log directory every time kafka is brought up, pretty much like a new cluster)
I can see that all 4 brokers is available in zookeeper (when i do ls /brokers/ids i can see 4 broker ids) but partition is not distributed to 4th broker.
But when i trigger partition reassignment to move few partitions to 4th broker, it worked fine and 4th broker started storing the given partition. Both producer and consumer able to send and fetch data form 4th broker respectively.I cant find the reason why this storage imbalance is happening among kafka brokers. Please share your suggestion.

When we bring it as new cluster, few partitions got stored in the 4th broker as expected.
This should only be expected when you create new topics or expand partitions of existing ones. Topics do not automatically relocate to new brokers
had to use new log directory every time kafka is brought up
That might explain why data is missing. Unclear what bug you're running into, but this step shouldn't be necessary
when i trigger partition reassignment to move few partitions to 4th broker, it worked fine and 4th broker started storing the given partition. Both producer and consumer able to send and fetch data form 4th broker respectively
This is the correct way to expand a cluster, and sounds like it's working as expected.

Why is my kafka topic not consumable with a broker down?

My issue is that I have a three broker Kafka Cluster and an availability requirement to have access to consume and produce to a topic when one or two of my three brokers is down.
I also have a reliability requirement to have a replication factor of 3. These seem to be conflicting requirements to me. Here is how my problem manifests:
I create a new topic with replication factor 3
I send several messages to that topic
I kill one of my brokers to simulate a broker issue
I attempt to consume the topic I created
My consumer hangs
I review my logs and see the error:
Number of alive brokers '2' does not meet the required replication factor '3' for the offsets topic
If I set all my broker's offsets.topic.replication.factor setting to 1, then I'm able to produce and consume my topics, even if I set the topic level replication factor to 3.
Is this an okay configuration? Or can you see any pitfalls in setting things up this way?

You only need as many brokers as your replication factor when creating the topic.
I'm guessing in your case, you start with a fresh cluster and no consumers have connected yet. In this case, the __consumer_offsets internal topic does not exist as it is only created when it's first needed. So first connect a consumer for a moment and then kill one of the brokers.
Apart from that, in order to consume you only need 1 broker up, the leader for the partition.

How to run kafka on different machines

From last 10 days i am trying to set Kafka on different machine:
Server32
Server56
Below are the list of task which i have done so far
Configured Zookeeper and started on both server with
server.1=Server32_IP:2888:3888
server.2=Server56_IP:2888:3888
I also changed server and server-1 properties as below
broker.id=0 port=9092 log.dir=/tmp/kafka0-logs
host.name=Server32
zookeeper.connect=Server32_IP:9092,Server56_IP:9062
& server-1
broker.id=1 port=9062 log.dir=/tmp/kafka1-logs
host.name=Server56
zookeeper.connect=Server32_IP:9092,Server56_IP:9062
Server.property i ran in Server32
Server-1.property i ran in Server56
The Problem is : when i start producer in both the servers and if i try to consume from any one then it is working BUT
When i stop any one server then another one is not able to send the details
Please help me in explaining the process

Running 2 zookeepers is not fault tolerant. If one of the zookeepers is stopped, then the system will not work. Unlike Kafka brokers, zookeeper needs a quorum (or majority) of the configured nodes in order to work. This is why zookeeper is typically deployed with an odd number of instances (nodes). Since 1 of 2 nodes is not a majority it really is no better than running a single zookeeper. You need at least 3 zookeepers to tolerate a failure because 2 of 3 is a majority so the system will stay up.
Kafka is different so you can have any number of Kafka brokers and if they are configured correctly and you create your topics with a replication factor of 2 or greater, then the Kafka cluster can continue if you take any one of the broker nodes down , even if it's just 1 of 2.

There's a lot of information missing here like the Kafka version and whether or not you're using the new consumer APIs or the old APIs. I'm assuming you're probably using a new version of Kafka like 0.10.x along with the new client APIs. In the new version of the client APIs the log data is stored on the Kafka brokers and not Zookeeper as in the older versions. I think your issue here is that you created your topics with a replication factor of 1 and coincidently the Kafka broker server you shutdown was hosting the only replica, so you won't be able to produce or consume messages. You can confirm the health of your topics by running the command:
kafka-topics.sh --zookeeper ZHOST:2181 --describe
You might want to increase the replication factor to 2. That way you might be able to get away with one broker failing. Ideally you would have 3 or more Kafka Broker servers with a replication factor of 2 or higher (obviously not more than the number of brokers in your cluster). Refer to the link below:
https://kafka.apache.org/documentation/#basic_ops_increase_replication_factor
For a topic with replication factor N, we will tolerate up to N-1 server >failures without losing any records committed to the log."

Why does kafka producer take a broker endpoint when being initialized instead of the zk

If I have multiple brokers, which broker should my producer use? Do I need to manually switch the broker to balance the load? Also why does the consumer only need a zookeeper endpoint instead of a broker endpoint?
quick example from tutorial:
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning

which broker should my producer use? Do I need to manually switch the broker to balance the load?
Kafka runs on cluster, meaning set of nodes, so while producing anything you need to tell him the LIST of brokers that you've configured for your application, below is a small note taken from their documentation.
“metadata.broker.list” defines where the Producer can find a one or more Brokers to determine the Leader for each topic. This does not need to be the full set of Brokers in your cluster but should include at least two in case the first Broker is not available. No need to worry about figuring out which Broker is the leader for the topic (and partition), the Producer knows how to connect to the Broker and ask for the meta data then connect to the correct Broker.
Hope this clear some of your confusion
Also why does the consumer only need a zookeeper endpoint instead of a
broker endpoint
This is not technically correct, as there are two types of APIs available, High level and Low level consumer.
The high level consumer basically takes care of most of the thing like leader detection, threading issue, etc. but does not provide much control over messages which exactly the purpose of using the other alternatives Simple or Low level consumer, in which you will see that you need to provide the brokers, partition related details.
So Consumer need zookeeper end point only when you are going with the high level API, in case of using Simple you do need to provide other information

Kafka sets a single broker as the leader for each partition of each topic. The leader is responsible for handling both reads and writes to that partition. You cannot decide to read or write from a non-Leader broker.
So, what does it mean to provide a broker or list of brokers to the kafka-console-producer ? Well, the broker or brokers you provide on the command-line are just the first contact point for your producer. If the broker you list is not the leader for the topic/partition you need, your producer will get the current leader info (called "topic metadata" in kafka-speak) and reconnect to other brokers as necessary before sending writes. In fact, if your topic has multiple partitions it may even connect to several brokers in parallel (if the partition leaders are different brokers).
Second q: why does the consumer require a zookeeper list for connections instead of a broker list? The answer to that is that kafka consumers can operate in "groups" and zookeeper is used to coordinate those groups (how groups work is a larger issue, beyond the scope of this Q). Zookeeper also stores broker lists for topics, so the consumer can pull broker lists directly from zookeeper, making an additional --broker-list a bit redundant.

Kafka Producer API does not interact directly with Zookeeper. However, the High Level Consumer API connects to Zookeeper to fetch/update the partition offset information for each consumer. So, the consumer API would fail if it cannot connect to Zookeeper.

All above answers are correct in older versions of Kafka, but things have changed with arrival of Kafka 0.9.
Now there is no longer any direct interaction with zookeeper from either the producer or consumer. Another interesting things is with 0.9, Kafka has removed the dissimilarity between High-level and Low-level APIs, since both follows a uniformed consumer API.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse