Kafka List all partition with no leader - apache-kafka

In my kafka Cluster there are more than 2k topics and each topic has 5 partitions. I want list only that partitions which has no leader.
I can go can check for each topic using the below syntax:
kafka-topics.sh --describe --topic <topic_name> --zookeeper <zookeeper_ip>:port
But the problem is there are 2k+ topics, can't be done manually. I can also write a script to loop over each topic and get the partition with no leader. But i am interested in some efficient way to get the information.

Using kafka-topics.sh you can specify the --unavailable-partitions flag to only list partitions that currently don't have a leader and hence cannot be used by Consumers or Producers.
For example:
kafka-topics.sh --describe --unavailable-partitions --zookeeper <zookeeper_ip>:port

Related

delete specific messages from kafka topic __consumer_offsets

I want to delete all messages that are contained in the __consumer_offsets table that start with a given key (resetting one particular consumer group without affecting the rest).
Is there a way to do this?
Kafka comes with a ConsumerGroupCommand tool. You cand find some information in the Kafka documentation.
If you plan to reset a particular Consumer Group ("myConsumerGroup") without affecting the rest you can use
> bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --reset-offsets --group myConsumerGroup --topic topic1 --to-latest
Depending on your requirement you can reset the offsets for each partition of the topic with that tool. The help function or documentation explain the options.

Where and how topic does get created in broker when script with details on number of partition and replica factor is given

when we create topic ,where we decide number of partition and replica factor.
Do this topic get created in all the brokers?
Is it specific to any one broker?
You are required to pass replication factor and partitions when creating topics.
One which broker(s) each partition and replica are placed, are randomly decided, although you can later use kafka-reassign-partitions to move replicas around to other brokers.
You can crate a topic with this command:
./bin/kafka-topics.sh --create --zookeeper <ZOOKEEPER_URL:PORT> --replication-factor <NO_OF_REPLICATIONS> --partitions <NO_OF_PARTITIONS> --topic <TOPIC_NAME>
After this command metadata about the topic (number of partitions, replicas, ISR list etc.) is stored in Zookeeper. You can get information about topic by this command:
./bin/kafka-topics.sh --zookeeper localhost:2181 --topic TopicName --describe
Replica list of partitions are created according to round-robin algorithm and Controller broker is responsible for noticing new topic creation and triggering partitions assignment.

Getting Kafka usage details

I am trying to find ways to get current usage statistics for my kafka cluster. I am looking to collect following information:
Number of topics in kafka cluster
Number of partitions per kafka broker
Number of active consumers and producers
Number of client connections per kafka broker
Number of messages on each partition, size of disk etc.
Lagging replicas, consumer lag etc.
Active consumer groups
Any other statistics that can and should be collected, currently I am looking at collecting the above stats.
I can get 1 and 2 using zookeeper utilities but I am lost on rest. I have looked at mbeans in Jconsole but didn't find anything about above. I also tried JmxTool to get these mbeans using regex based expression but that also didn't work.
I am using Kafka v2.1 and using new consumer api so zookeeper doesn't have any information about consumers.
Any pointers would be great help!
Might as well use https://github.com/yahoo/kafka-manager or https://github.com/linkedin/cruise-control to get this information.
There are scripts under $KAFKA_HOME/bin which can help you.
Number of topics in kafka cluster
./kafka-topics.sh --zookeeper localhost:2181 --list
Number of partitions per kafka broker
./kafka-topics.sh --zookeeper localhost:2181 --describe
Number of messages on each partition, size of disk etc.
./kafka-log-dirs.sh --describe --bootstrap-server localhost:9092
Lagging replicas, consumer lag etc.
./kafka-consumer-group.sh --bootstrap-server localhost:9092 --group $GROUP_NAME --describe
Active consumer groups
Number of active consumers and producers
You can't get active producer.
Know existing producers for a kafka topic
./kafka-consumer-group.sh --bootstrap-server localhost:9092 --list
Number of client connections per kafka broker
./

how to increase number of partitions for a topic and change the leader of each partition and rebalance them

If we have a topic that is in kafka and it has 5 partitions can we increase the number of partitions to 30. Also after increasing the number of partitions we change the leader of each partition in order of the broker ids and rebalance the cluster for that particular topic. How can we do that ?
I found out how it works.
1) First find the information of the existing topic
2) Find out all the brokers in your cluster with thier IDs
3) Scale number of partitions
4) Run the cluster reassignment script to rebalance
Give the actual zookeeper urls or the ips where i am using localhost.
1) In the bin directory we have kafka-topics.sh
./kafka-topics.sh --zookeeper localhost:2181 --topic dummytopic --describe
shows you all about the topic where the leaders of partitions are there and replicas are there
2) ./zookeeper-shell.sh localhost:2181
do ls /brokers/ids
gives you list of all the brokers ids
3)./kafka-topics.sh --alter --zookeeper localhost:2181 --topic dummytopic --partitions 30
Increases the number of partitions
4)
Before running this command you need a json file which will let you change the partition leader for a particular topic
For that i developed a simple tool do generate the json for very large partition count
https://github.com/chandradeepak/kafka-reassignment-gen
go build && topic=dummytopic num_partitions=30 brokerid_start=1022 replica_count=3 ./kafka-reassignment-gen
this will generate a json which we can use for the expand-cluster-reassignment.json . it looks some thing like this
{"version":1,"partitions":[{"topic":"dummytopic","partition":0,"replicas":[1001,1002,1003]},{"topic":"dummytopic","partition":1,"replicas":[1002,1003,1004]},{"topic":"dummytopic","partition":2,"replicas":[1003,1004,1005]},{"topic":"dummytopic","partition":3,"replicas":[1004,1005,1006]},{"topic":"dummytopic","partition":4,"replicas":[1005,1006,1007]},{"topic":"dummytopic","partition":5,"replicas":[1006,1007,1008]},{"topic":"dummytopic","partition":6,"replicas":[1007,1008,1009]},{"topic":"dummytopic","partition":7,"replicas":[1008,1009,1010]},{"topic":"dummytopic","partition":8,"replicas":[1009,1010,1011]},{"topic":"dummytopic","partition":9,"replicas":[1010,1011,1012]},{"topic":"dummytopic","partition":10,"replicas":[1011,1012,1013]},{"topic":"dummytopic","partition":11,"replicas":[1012,1013,1014]},{"topic":"dummytopic","partition":12,"replicas":[1013,1014,1015]},{"topic":"dummytopic","partition":13,"replicas":[1014,1015,1016]},{"topic":"dummytopic","partition":14,"replicas":[1015,1016,1017]},{"topic":"dummytopic","partition":15,"replicas":[1016,1017,1018]},{"topic":"dummytopic","partition":16,"replicas":[1017,1018,1019]},{"topic":"dummytopic","partition":17,"replicas":[1018,1019,1020]},{"topic":"dummytopic","partition":18,"replicas":[1019,1020,1021]},{"topic":"dummytopic","partition":19,"replicas":[1020,1021,1022]},{"topic":"dummytopic","partition":20,"replicas":[1021,1022,1023]},{"topic":"dummytopic","partition":21,"replicas":[1022,1023,1024]},{"topic":"dummytopic","partition":22,"replicas":[1023,1024,1025]},{"topic":"dummytopic","partition":23,"replicas":[1024,1025,1026]},{"topic":"dummytopic","partition":24,"replicas":[1025,1026,1027]},{"topic":"dummytopic","partition":25,"replicas":[1026,1027,1028]},{"topic":"dummytopic","partition":26,"replicas":[1027,1028,1029]},{"topic":"dummytopic","partition":27,"replicas":[1028,1029,1030]},{"topic":"dummytopic","partition":28,"replicas":[1029,1030,1001]},{"topic":"dummytopic","partition":29,"replicas":[1030,1001,1002]}]}
./kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file expand-cluster-reassignment.json --execute
This will execute the cluster reassignment and changes the partition leaders to what you expect.

Kafka consumer list

I need to find out a way to ask Kafka for a list of topics. I know I can do that using the kafka-topics.sh script included in the bin\ directory. Once I have this list, I need all the consumers per topic. I could not find a script in that directory, nor a class in the kafka-consumer-api library that allows me to do it.
The reason behind this is that I need to figure out the difference between the topic's offset and the consumers' offsets.
Is there a way to achieve this? Or do I need to implement this functionality in each of my consumers?
Use kafka-consumer-groups.sh
For example
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
bin/kafka-consumer-groups.sh --describe --group mygroup --bootstrap-server localhost:9092
you can use this for 0.9.0.0. version kafka
./kafka-consumer-groups.sh --list --zookeeper hostname:potnumber
to view the groups you have created. This will display all the consumer group names.
./kafka-consumer-groups.sh --describe --zookeeper hostname:potnumber --describe --group consumer_group_name
To view the details
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
I realize that this question is nearly 4 years old now. Much has changed in Kafka since then. This is mentioned above, but only in small print, so I write this for users who stumble over this question as late as I did.
Offsets by default are now stored in a Kafka Topic (not in Zookeeper any more), see Offsets stored in Zookeeper or Kafka?
There's a kafka-consumer-groups utility which returns all the information, including the offset of the topic and partition, of the consumer, and even the lag (Remark: When you ask for the topic's offset, I assume that you mean the offsets of the partitions of the topic). In my Kafka 2.0 test cluster:
kafka-consumer-groups --bootstrap-server kafka:9092 --describe
--group console-consumer-69763 Consumer group 'console-consumer-69763' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
pytest 0 5 6 1 - - -
``
All the consumers per topic
(Replace --zookeeper with --bootstrap-server to get groups stored by newer Kafka clients)
Get all consumers-per-topic as a table of topictabconsumer:
for t in `kafka-consumer-groups.sh --zookeeper <HOST>:2181 --list 2>/dev/null`; do
echo $t | xargs -I {} sh -c "kafka-consumer-groups.sh --zookeeper <HOST>:2181 --describe --group {} 2>/dev/null | grep ^{} | awk '{print \$2\"\t\"\$1}' "
done > topic-consumer.txt
Make this pairs unique:
cat topic-consumer.txt | sort -u > topic-consumer-u.txt
Get the desired one:
less topic-consumer-u.txt | grep -i <TOPIC>
I do not see it mentioned here, but a command that i use often and that helps me to have a bird's eye view on all groups, topics, partitions, offsets, lags, consumers, etc
kafka-consumer-groups.bat --bootstrap-server localhost:9092 --describe --all-groups
A sample would look like this:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
Group Topic 2 7 7 0 <SOME-ID> XXXX <SOME-ID>
:
:
The most important column is the LAG, where for a healthy platform, ideally it should be 0(or nearer to 0 or a low number for high throughput) - at all times. So make sure you monitor it!!! ;-).
P.S:
An interesting article on how you can monitor the lag can be found here.
Kafka stores all the information in zookeeper. You can see all the topic related information under brokers->topics. If you wish to get all the topics programmatically you can do that using Zookeeper API.
It is explained in detail in below links
Tutorialspoint, Zookeeper Programmer guide
High level consumers are registered into Zookeeper, so you can fetch a list from ZK, similarly to the way kafka-topics.sh fetches the list of topics. I don't think there's a way to collect all consumers; any application sending in a few consume requests is actually a "consumer", and you cannot tell whether they are done already.
On the consumer side, there's a JMX metric exposed to monitor the lag. Also, there is Burrow for lag monitoring.
You can also use kafkactl for this:
# get all consumer groups (output as yaml)
kafkactl get consumer-groups -o yaml
# get only consumer groups assigned to a single topic (output as table)
kafkactl get consumer-groups --topic topic-a
Sample output (e.g. as yaml):
name: my-group
protocoltype: consumer
topics:
- topic-a
- topic-b
- topic-c
Disclaimer: I am contributor to this project