When creating topics, can we determine which broker will be the leader for the topic? Are topics balanced across brokers in Kafka? (Considering the topics have just one partition)
Kafka does manage this internally and you don't need to worry about this in general: http://kafka.apache.org/documentation/#basic_ops_leader_balancing
If you create a new topic, Kafka will select a broker based on load. If a topic has only one partitions, it will only be hosted on a single broker (plus followers if you have multiple replicas), because a partitions cannot be split over multiple brokers in Kafka.
Nevertheless, you can get the information which broker host what topic and you can also "move" topics and partitions: http://kafka.apache.org/documentation/#basic_ops_cluster_expansion
Related
__consumer_offsets store offsets of all kafka topics except internal topics such as *-changelog topics in case of streams. Where is this data stored?
The term "internal topic" has two different meanings in Kafka:
Brokers: an internal topic is a topic that the cluster uses (like __consumer_offsets). A client cannot read/write from/to this topic.
Kafka Streams: topics that Kafka Streams creates automatically, are called internal topics, too.
However, those -changelog and -repartition topics that are "internal" topics from a Kafka Streams point of view, are regular topics from a broker point of view. Hence, offsets for both are stored in __consumer_offsets like for any other topic.
Note, that Kafka Streams will only commit offsets for -repartition topics, though. For -changelog topics no offsets are ever committed (Kafka Streams does some offset tracking on the client side though, and writes -changelog offsets into a local .checkpoint file).
The producers send messages by setting up a list of Kafka Broker as follows.
props.put("bootstrap.servers", "127.0.0.1:9092,127.0.0.1:9092,127.0.0.1:9092");
I wonder "producers" how to know that which of the three brokers knew which one had a partition leader.
For a typical distributed server, either you have a load bearing server or have a virtual IP, but for Kafka, how is it loaded?
Does the producers program try to connect to one broker at random and look for a broker with a partition leader?
A Kafka cluster contains multiple broker instances. At any given time, exactly one broker is the leader while the remaining are the in-sync-replicas (ISR) which contain the replicated data. When the leader broker is taken down unexpectedly, one of the ISR becomes the leader.
Kafka chooses one broker’s partition’s replicas as leader using ZooKeeper. When a producer publishes a message to a partition in a topic, it is forwarded to its leader.
According to Kafka documentation:
The partitions of the log are distributed over the servers in the
Kafka cluster with each server handling data and requests for a share
of the partitions. Each partition is replicated across a configurable
number of servers for fault tolerance.
Each partition has one server which acts as the "leader" and zero or
more servers which act as "followers". The leader handles all read and
write requests for the partition while the followers passively
replicate the leader. If the leader fails, one of the followers will
automatically become the new leader. Each server acts as a leader for
some of its partitions and a follower for others so load is well
balanced within the cluster.
You can find topic and partition leader using this piece of code.
EDIT:
The producer sends a meta request with a list of topics to one of the brokers you supplied when configuring the producer.
The response from the broker contains a list of partitions in those topics and the leader for each partition. The producer caches this information and therefore, it knows where to redirect the messages.
It's quite an old question but I have the same question and after researched, I want to share the answer cuz I hope it can help others.
To determine leader of a partition, producer uses a request type called a metadata request, which includes a list of topics the producer is interested in.
The broker will response specifies which partitions exist in the topics, the replicas for each partition, and which replica is the leader.
Metadata requests can be sent to any broker because all brokers have a metadata cache that contains this information.
I deploy a kafka cluster on three hosts.And deploy consumers on the same hosts.
How i to let the consumer consume the nearest broker's partitions.for example,host a's consumer just consume the partitions which belong to host a.
Kafka doesn't work that way. The clients will connect to all three brokers and produce and consume from all three brokers in parallel based on which nodes are currently the leader for each topic partition.
I'm new to Kafka, zookeeper and Storm.
I our environment we have one Kafka broker connecting to multiple zookeepers. Is there an advantage having the producer send the messages to a specific topic and partition on one broker to multiple zookeepers vs multiple brokers to multiple zookeepers?
Yes there is. Kafka allows you to scale by adding brokers. When you use a Kafka cluster with a single broker, as you have, all partitions reside on that single broker. But when you have multiple brokers, Kafka will split the partitions between them. So, broker A may be elected leader for partitions 1 and 2 of your topic, and broker B leader for partition 3. So, when you publish messages to the topic, the client will split the messages between the various partitions on the two brokers.
Note that I also mentioned leader election. Adding brokers to your Kafka cluster gives you replication. Kafka uses ZooKeeper to elect a leader for each partition as I mentioned in my example. Once a leader is elected, the client splits messages among partitions and sends each message to the leader for the appropriate partition. Depending on the topic configuration, the leader may synchronously replicate messages to a backup. So, in my example, if the replication factor for the topic is 2 then broker A will synchronously replicate messages for partitions 1 and 2 to broker B and broker B will synchronously replicate messages for partition 3 to broker A.
So, that's all to say that adding brokers gives you both scalability and fault-tolerance.
I have 3 Kafka brokers in 3 different VMs, with one additionally running a Zookeeper. I now create a topic with 8 partitions. The Producer pushes messages to these group of brokers on the created "topic".
How does the Kafka distribute a topic and its partitions among the brokers?
Does the Kafka redistribute the topic when a new Kafka Broker joins a cluster?
Can the topic partition be increased after the topic was created?
When you create a new topic, Kafka places the partitions and replicas in a way that the brokers with least number of existing partitions are used first, and replicas for same partition are on different brokers.
When you add a new broker, it is used for new partitions (since it has lowest number of existing partitions), but there is no automatic balancing of existing partitions to the new broker. You can use the replica-reassignment tool to move partitions and replicas to the new broker.
Yes, you can add partitions to an existing topic.