how producers find kafka reader - apache-kafka

The producers send messages by setting up a list of Kafka Broker as follows.
props.put("bootstrap.servers", "127.0.0.1:9092,127.0.0.1:9092,127.0.0.1:9092");
I wonder "producers" how to know that which of the three brokers knew which one had a partition leader.
For a typical distributed server, either you have a load bearing server or have a virtual IP, but for Kafka, how is it loaded?
Does the producers program try to connect to one broker at random and look for a broker with a partition leader?

A Kafka cluster contains multiple broker instances. At any given time, exactly one broker is the leader while the remaining are the in-sync-replicas (ISR) which contain the replicated data. When the leader broker is taken down unexpectedly, one of the ISR becomes the leader.
Kafka chooses one broker’s partition’s replicas as leader using ZooKeeper. When a producer publishes a message to a partition in a topic, it is forwarded to its leader.
According to Kafka documentation:
The partitions of the log are distributed over the servers in the
Kafka cluster with each server handling data and requests for a share
of the partitions. Each partition is replicated across a configurable
number of servers for fault tolerance.
Each partition has one server which acts as the "leader" and zero or
more servers which act as "followers". The leader handles all read and
write requests for the partition while the followers passively
replicate the leader. If the leader fails, one of the followers will
automatically become the new leader. Each server acts as a leader for
some of its partitions and a follower for others so load is well
balanced within the cluster.
You can find topic and partition leader using this piece of code.
EDIT:
The producer sends a meta request with a list of topics to one of the brokers you supplied when configuring the producer.
The response from the broker contains a list of partitions in those topics and the leader for each partition. The producer caches this information and therefore, it knows where to redirect the messages.

It's quite an old question but I have the same question and after researched, I want to share the answer cuz I hope it can help others.
To determine leader of a partition, producer uses a request type called a metadata request, which includes a list of topics the producer is interested in.
The broker will response specifies which partitions exist in the topics, the replicas for each partition, and which replica is the leader.
Metadata requests can be sent to any broker because all brokers have a metadata cache that contains this information.

Related

How client will automatically detect a new leader when the primary one goes down in Kafka?

Consider the below scenario:
I have a Kakfa broker cluster(localhost:9002,localhost:9003,localhost:9004,localhost:9005).
Let's say localhost:9002 is my primary(leader) for the cluster.
Now my producer is producing data and sending it to the broker(localhost:9002).
If my primary broker(localhost:9002) goes down, with the help of Zookeeper or some other consensus algorithm new leader will be elected(consider localhost:9003 is now the new leader).
So, in the above scenario can someone please explain to me how the Kafka client(producer) will get notified about the new broker configuration(localhost:9003) and how it will connect to the new leaders and start producing data again.
Kafka clients are receiving the necessary meta information from the cluster automatically on each request when reading from or writing to a topic in case of a leadership change.
In general, the client sends a (read/write) request to one of the bootstrap server, listed in the configuration bootstrap.servers. This initial request (hence called bootstrap) returns the details on which broker the topic partition leader is located so that the client can communicate directly with that broker. Each individual broker contains all meta information for the entire cluster, meaning also having the knowledge on the partition leader of other brokers.
Now, if one of your broker goes down and the leadership of a topic partition switches, your producer will get notified about it through that mechanism.
There is a KafkaProducer configuration called metadata.max.age.ms which you can modify to update metadata on your producer even if there is no leadership change happening:
"Controls how long the producer will cache metadata for a topic that's idle. If the elapsed time since a topic was last produced to exceeds the metadata idle duration, then the topic's metadata is forgotten and the next access to it will force a metadata fetch request."
Just a few notes on your question:
The term "Kafka broker cluster" does not really exists. You have a Kafka cluster containing one or multiple Kafka brokers.
You do not have a broker as a "primary(leader) for the cluster" but you have for each TopicPartition a leader. Maybe you mean the Controller which is located on one of the brokers within your cluster.

Does zookeeper returns Kafka brokers dns? (which has the leader partition)

I'm new to Kafka and I want to ask a question.
If there are 3 Kafka brokers(kafka1, kafka2, kafka3)(they are in same Kafka Cluster)
and topic=test(replication=2)
kafka1 has leader partition and kafka2 has follower partition.
If producer sends data to kafka3, then how do data stored in kafka1 and Kafka2?
I heard that, if producer sends data to kafka3, then the zookeeper finds the broker who has the leader partition and returns the broker's dns or IP address.
And then, producer will resend to the broker with metadata.
Is it right? or if it's wrong, plz tell me how it works.
Thanks a lot!
Every kafka topic partition has its own leader. So if you have 2 partitions , kafka would assign leader for each partition. They might end up being same kafka nodes or they might be different.
When producer connects to kafka cluster, it gets to know about the the partition leaders. All writes must go through corresponding partition leader, which is responsible to keep track of in-sync replicas.
All consumers only talk to corresponding partition leaders to get data.
If partition leader goes down , one of the replicas become leader and all producers and consumers are notified about this change

Kafka Inter Broker Communication

I understand producer/consumers need to talk to brokers to know leader for partition. Brokers talk to zk to tell they joined the cluster.
Is it true that
Brokers know who is the leader for a given partition from zk
zk detects broker left/died. Then it re-elects leader and sends new leader info to all brokers
Question:
why do we need brokers to communicate with each other? Is it just
so tehy can move partitions around or do they also query metadata from each other. If so what would be example of metadata exchange
Producers/ consumers request metadata from one of the brokers ( as each one of them caches it) and that is how they know who is the leader for a partition.
Regarding "is it true that" section:
Brokers know who is the leader for the given partition thanks to zk and one of them. To be more precise, one of them decides who will be a leader. That broker is called controller. The first broker that connects to zookeeper becomes a controller and his role is to decide which broker will be a leader and which ones will be replicas and to inform them about it. Controller itself is not excluded from this process. It is a broker like any other with this special responsibilities of choosing leaders and replicas
zk indeed detects when a broker dies/ leaves but it doesn't reelect leader. It is controller responsibility. When one of the brokers leaves a cluster, controller gets information from zk and it starts reassignment
About your question - brokers do communicate with each other ( replicas are reading the messages from leaders, controller is informing other brokers about changes), but they do not exchange metadata among themselves - they write metadata to a zookeeper
A Broker is a Kafka server that runs in a Kafka Cluster
"A Kafka cluster is made up of multiple Kafka Brokers. Each Kafka Broker has a unique ID (number). Kafka Brokers contain topic log partitions. Connecting to one broker bootstraps a client to the entire Kafka cluster"
Each broker holds a number of partitions and each of these partitions can be either a leader or a replica for a topic. All writes and reads to a topic go through the leader and the leader coordinates updating replicas with new data. If a leader fails, a replica takes over as the new leader.

Apache Kafka Producer Broker Connection

I have a set of Kafka broker instances running as a cluster. I have a client that is producing data to Kafka:
props.put("metadata.broker.list", "broker1:9092,broker2:9092,broker3:9092");
When we monitor using tcpdump, I can see that only the connections to broker1 and broker2 are ESTABLISHED while for the broker3, there is no connection from my producer. I have a single topic with just one partition.
My questions:
How is the relation between number of brokers and topic partitions? Should I always have number of brokers = number of partitons?
Why in my case, I'm not able to connect to broker3? or atleast my network monitoring does not show that a connection from my Producer is established with broker3?
It would be great if I could get some deeper insight into how the connection to the brokers work from a Producer stand point.
Obviously, your producer does not need to connect to broker3 :)
I'll try to explain you what happens when you are producing data to Kafka:
You spin up some brokers, let's say 3, then create some topic foo with 2 partitions, replication factor 2. Quite simple example, yet could be a real case for someone.
You create a producer with metadata.broker.list (or bootstrap.servers in new producer) configured to these brokers. Worth mentioning, you don't necessarily have to specify all the brokers in your cluster, in fact you can specify only 1 of them and it will still work. I'll explain this in a bit too.
You send a message to topic foo using your producer.
The producer looks up its local metadata cache to see what brokers are leaders for each partition of topic foo and how many partitions does your foo topic have. As this is the first send to the producer, local cache contains nothing.
Producer sends a TopicMetadataRequest to each broker in metadata.broker.list sequentially until first successful response. That's why I mentioned 1 broker in that list would work as long as it's alive.
Returned TopicMetadataResponse will contain the information about requested topics, in your case it's foo and brokers in the cluster. Basically, this response contains the following:
list of brokers in the cluster, where each broker has an ID, host and port. This list may not contain the entire list of brokers in the cluster, but should contain at least the list of brokers that are responsible for servicing the subject topic.
list of topic metadata, where each entry has topic name, number of partitions, leader broker ID for each partition and ISR broker IDs for each partition.
Based on TopicMetadataResponse your producer builds up its local cache and now knows exactly that the request for topic foo partition 0 should go to broker X.
Based on number of partitions in a topic, producer partitions your message and accumulates it with the knowledge that it should be sent as a part of batch to some broker.
When the batch is full or linger.ms timeout passes, your producer flushes the batch to the broker. By "flushes" I mean "opens a new connection to a broker or reuses an existing one, and sends the ProduceRequest".
The producer does not need to open unnecessary connections to all brokers, as the topic you are producing to may not be serviced by some brokers, and your cluster could be quite large. Imagine a 1000 broker cluster with lots of topics, but one of topics has just one partition - you only need that one connection, not 1000.
In your particular case I'm not 100% sure why you have 2 open connections to brokers, if you have just a single partition, but I assume one connection was opened during metadata discovery and was cached for reusing, and the second one is the actual broker connection to produce data. However, I might be wrong in this case.
But anyway, there is no need at all to have a connection for the third broker.
Regarding your question about "Should I always have number of brokers = number of partitons?" the answer is most likely no. If you explain what you are trying to achieve, maybe I'll be able to point you to the right direction, but this is too broad to explain in general. I recommend reading this to clarify things.
UPD to answer the question in comment:
Metadata cache is updated in 2 cases:
If producer fails to communicate with broker for any reason - this includes the case when the broker is not reachable at all and when broker responds with an error (like "I'm not leader for this partition anymore, go away")
If no failures happen, the client still refreshes metadata every metadata.max.age.ms (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/CommonClientConfigs.java#L42-L43) to discover new brokers and partitions itself.

Apache Kafka Topic Partition Message Handling

I'm a bit confused on the Topic partitioning in Apache Kafka. So I'm charting down a simple use case and I would like to know what happens in different scenarios. So here it is:
I have a Topic T that has 4 partitions TP1, TP2, TP4 and TP4.
Assume that I have 8 messages M1 to M8. Now when my producer sends these messages to the topic T, how will they be received by the Kafka broker under the following scenarios:
Scenario 1: There is only one kafka broker instance that has Topic T with the afore mentioned partitions.
Scenario 2: There are two kafka broker instances with each node having same Topic T with the afore mentioned partitions.
Now assuming that kafka broker instance 1 goes down, how will the consumers react? I'm assuming that my consumer was reading from broker instance 1.
I'll answer your questions by walking you through partition replication, because you need to learn about replication to understand the answer.
A single broker is considered the "leader" for a given partition. All produces and consumes occur with the leader. Replicas of the partition are replicated to a configurable amount of other brokers. The leader handles replicating a produce to the other replicas. Other replicas that are caught up to the leader are called "in-sync replicas." You can configure what "caught up" means.
A message is only made available to consumers when it has been committed to all in-sync replicas.
If the leader for a given partition fails, the Kafka coordinator will elect a new leader from the list of in-sync replicas and consumers will begin consuming from this new leader. Consumers will have a few milliseconds of added latency while the new leader is elected. A new coordinator will also be elected automatically if the coordinator fails (this adds more latency, too).
If the topic is configured with no replicas, then when the leader of a given partition fails, consumers can't consume from that partition until the broker that was the leader is brought back online. Or, if it is never brought back online, the data previously produced to that partition will be lost forever.
To answer your question directly:
Scenario 1: if replication is configured for the topic, and there exists an in-sync replica for each partition, a new leader will be elected, and consumers will only experience a few milliseconds of latency because of the failure.
Scenario 2: now that you understand replication, I believe you'll see that this scenario is Scenario 1 with a replication factor of 2.
You may also be interested to learn about acks in the producer.
In the producer, you can configure acks such that the produce is acknowledged when:
the message is put on the producer's socket buffer (acks=0)
the message is written to the log of the lead broker (acks=1)
the message is written to the log of the lead broker, and replicated to all other in-sync replicas (acks=all)
Further, you can configure the minimum number of in-sync replicas required to commit a produce. Then, in the event when not enough in-sync replicas exist given this configuration, the produce will fail. You can build your producer to handle this failure in different ways: buffer, retry, do nothing, block, etc.