Are broker nodes in kafka cluster configured to handle number of partition? - apache-kafka

Kafka places the partitions and replicas in a way that the brokers with least number of existing partitions are used first. Does it mean that brokers are pre-configured to handle the partitions.

When you create a topic, you set the number of partitions.
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test
Also, there is a num.partions parameter you can use. (This is used when a topic is automatically created.)
A broker can have as many partitions as it wants as long as it has enough diskspace, memory, and network bandwidth.
In the screenshot above, you can see the partition of test. If you make a topic with three partitions, you will have two more folders with test-1 and test-2.
Each partition has an index file, a timeindex file, and a log file. The log file keeps Kafka data for that partition.

Related

Where and how topic does get created in broker when script with details on number of partition and replica factor is given

when we create topic ,where we decide number of partition and replica factor.
Do this topic get created in all the brokers?
Is it specific to any one broker?
You are required to pass replication factor and partitions when creating topics.
One which broker(s) each partition and replica are placed, are randomly decided, although you can later use kafka-reassign-partitions to move replicas around to other brokers.
You can crate a topic with this command:
./bin/kafka-topics.sh --create --zookeeper <ZOOKEEPER_URL:PORT> --replication-factor <NO_OF_REPLICATIONS> --partitions <NO_OF_PARTITIONS> --topic <TOPIC_NAME>
After this command metadata about the topic (number of partitions, replicas, ISR list etc.) is stored in Zookeeper. You can get information about topic by this command:
./bin/kafka-topics.sh --zookeeper localhost:2181 --topic TopicName --describe
Replica list of partitions are created according to round-robin algorithm and Controller broker is responsible for noticing new topic creation and triggering partitions assignment.

Getting Kafka usage details

I am trying to find ways to get current usage statistics for my kafka cluster. I am looking to collect following information:
Number of topics in kafka cluster
Number of partitions per kafka broker
Number of active consumers and producers
Number of client connections per kafka broker
Number of messages on each partition, size of disk etc.
Lagging replicas, consumer lag etc.
Active consumer groups
Any other statistics that can and should be collected, currently I am looking at collecting the above stats.
I can get 1 and 2 using zookeeper utilities but I am lost on rest. I have looked at mbeans in Jconsole but didn't find anything about above. I also tried JmxTool to get these mbeans using regex based expression but that also didn't work.
I am using Kafka v2.1 and using new consumer api so zookeeper doesn't have any information about consumers.
Any pointers would be great help!
Might as well use https://github.com/yahoo/kafka-manager or https://github.com/linkedin/cruise-control to get this information.
There are scripts under $KAFKA_HOME/bin which can help you.
Number of topics in kafka cluster
./kafka-topics.sh --zookeeper localhost:2181 --list
Number of partitions per kafka broker
./kafka-topics.sh --zookeeper localhost:2181 --describe
Number of messages on each partition, size of disk etc.
./kafka-log-dirs.sh --describe --bootstrap-server localhost:9092
Lagging replicas, consumer lag etc.
./kafka-consumer-group.sh --bootstrap-server localhost:9092 --group $GROUP_NAME --describe
Active consumer groups
Number of active consumers and producers
You can't get active producer.
Know existing producers for a kafka topic
./kafka-consumer-group.sh --bootstrap-server localhost:9092 --list
Number of client connections per kafka broker
./

Kafka Consumer does not receive data when one of the brokers is down

Kafka Quickstart
Using Kafka v2.1.0 on RHEL v6.9
Consumer fails to receive data when one of the Kafka brokers is down.
Steps performed:
1. Start zookeeper
2. Start Kafka-Server0 (localhost:9092, kafkalogs1)
3. Start Kafka-Server1 (localhost:9094, kafkalog2)
4. Create topic "test1", num of partitions = 1, replication factor = 2
5. Run producer for topic "test1"
6. Run consumer
7. Send messages from the producer
8. Receive messages on the consumer side.
All the above steps worked without any issues.
When I shutdown Kafka-Server0, the consumer stops getting data from Producer.
When I bring back up Kafka-Server0, the consumer starts to get messages from where it left off.
These are the commands used
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test1
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test1
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic test1
The behavior is the same (no message received on the consumer side) when I run the consumer with two servers specified in the --bootstrap-server option.
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9094 --topic test1
Any idea why the consumer stops getting messages when server0 is down even though the replication factor for the topic test1 was set to 2?
There is a similar question already but it was not answered completely
Kafka 0.10 quickstart: consumer fails when "primary" broker is brought down
If the offsets topic is unavailable, you cannot consume.
Look at the server.properties file for these, and see the comment above, and increase accordingly (only applies if topic doesn't already exist)
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
According to your previous question, looks like it only has one replica
See how you can increase replication factor for an existing topic
In initial versions of Kafka, offset was being managed at zookeeper, but Kafka has continuously evolved over the time introducing lot of new features. Now Kafka manages the offset in a topic __consumer_offsets.
You can think of a scenario where you created a topic with a replication factor of 1. In case the broker goes down the data is only on that Kafka node which is down. So you can't get this data. Same analogy applies to __consumer_offsets topic.
You need to revisit the server.properties in order to get features you are expecting. But in case you still wanna consume the messages from the replica partition, you may need to re-start the console consumer with --from-beginning true

Kafka Topic creation upon specific broker

Regarding Kafka topic creation. I have an understanding that Kafka cluster can have several brokers/nodes/servers. Each broker can have one or more topics configured. A topic created could be in one or more brokers depending on partitions provided during topic creation. Is there any way how one can tell in which broker/s should a topic and it's partitions be created?
Regards,
Lokesh
When creating the topic, you can either just specify the number of partitions and replicas and let Kafka distribute them. Or you can directly specify the assignment - which partition and replica goes where.
If you are using the kafka-topics.sh script which is part of kafka, you can use the option --replica-assignment for it. For example:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic topic1 --replica-assignment 0:1:2,0:1:2,0:1:2
If the topic already exists, you can use the kafka-reassign-partitions.sh tool to change the assignment.
This might contain some more details about is: https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-2.2CreateTopics

how to increase number of partitions for a topic and change the leader of each partition and rebalance them

If we have a topic that is in kafka and it has 5 partitions can we increase the number of partitions to 30. Also after increasing the number of partitions we change the leader of each partition in order of the broker ids and rebalance the cluster for that particular topic. How can we do that ?
I found out how it works.
1) First find the information of the existing topic
2) Find out all the brokers in your cluster with thier IDs
3) Scale number of partitions
4) Run the cluster reassignment script to rebalance
Give the actual zookeeper urls or the ips where i am using localhost.
1) In the bin directory we have kafka-topics.sh
./kafka-topics.sh --zookeeper localhost:2181 --topic dummytopic --describe
shows you all about the topic where the leaders of partitions are there and replicas are there
2) ./zookeeper-shell.sh localhost:2181
do ls /brokers/ids
gives you list of all the brokers ids
3)./kafka-topics.sh --alter --zookeeper localhost:2181 --topic dummytopic --partitions 30
Increases the number of partitions
4)
Before running this command you need a json file which will let you change the partition leader for a particular topic
For that i developed a simple tool do generate the json for very large partition count
https://github.com/chandradeepak/kafka-reassignment-gen
go build && topic=dummytopic num_partitions=30 brokerid_start=1022 replica_count=3 ./kafka-reassignment-gen
this will generate a json which we can use for the expand-cluster-reassignment.json . it looks some thing like this
{"version":1,"partitions":[{"topic":"dummytopic","partition":0,"replicas":[1001,1002,1003]},{"topic":"dummytopic","partition":1,"replicas":[1002,1003,1004]},{"topic":"dummytopic","partition":2,"replicas":[1003,1004,1005]},{"topic":"dummytopic","partition":3,"replicas":[1004,1005,1006]},{"topic":"dummytopic","partition":4,"replicas":[1005,1006,1007]},{"topic":"dummytopic","partition":5,"replicas":[1006,1007,1008]},{"topic":"dummytopic","partition":6,"replicas":[1007,1008,1009]},{"topic":"dummytopic","partition":7,"replicas":[1008,1009,1010]},{"topic":"dummytopic","partition":8,"replicas":[1009,1010,1011]},{"topic":"dummytopic","partition":9,"replicas":[1010,1011,1012]},{"topic":"dummytopic","partition":10,"replicas":[1011,1012,1013]},{"topic":"dummytopic","partition":11,"replicas":[1012,1013,1014]},{"topic":"dummytopic","partition":12,"replicas":[1013,1014,1015]},{"topic":"dummytopic","partition":13,"replicas":[1014,1015,1016]},{"topic":"dummytopic","partition":14,"replicas":[1015,1016,1017]},{"topic":"dummytopic","partition":15,"replicas":[1016,1017,1018]},{"topic":"dummytopic","partition":16,"replicas":[1017,1018,1019]},{"topic":"dummytopic","partition":17,"replicas":[1018,1019,1020]},{"topic":"dummytopic","partition":18,"replicas":[1019,1020,1021]},{"topic":"dummytopic","partition":19,"replicas":[1020,1021,1022]},{"topic":"dummytopic","partition":20,"replicas":[1021,1022,1023]},{"topic":"dummytopic","partition":21,"replicas":[1022,1023,1024]},{"topic":"dummytopic","partition":22,"replicas":[1023,1024,1025]},{"topic":"dummytopic","partition":23,"replicas":[1024,1025,1026]},{"topic":"dummytopic","partition":24,"replicas":[1025,1026,1027]},{"topic":"dummytopic","partition":25,"replicas":[1026,1027,1028]},{"topic":"dummytopic","partition":26,"replicas":[1027,1028,1029]},{"topic":"dummytopic","partition":27,"replicas":[1028,1029,1030]},{"topic":"dummytopic","partition":28,"replicas":[1029,1030,1001]},{"topic":"dummytopic","partition":29,"replicas":[1030,1001,1002]}]}
./kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file expand-cluster-reassignment.json --execute
This will execute the cluster reassignment and changes the partition leaders to what you expect.