Getting Kafka usage details - apache-kafka

I am trying to find ways to get current usage statistics for my kafka cluster. I am looking to collect following information:
Number of topics in kafka cluster
Number of partitions per kafka broker
Number of active consumers and producers
Number of client connections per kafka broker
Number of messages on each partition, size of disk etc.
Lagging replicas, consumer lag etc.
Active consumer groups
Any other statistics that can and should be collected, currently I am looking at collecting the above stats.
I can get 1 and 2 using zookeeper utilities but I am lost on rest. I have looked at mbeans in Jconsole but didn't find anything about above. I also tried JmxTool to get these mbeans using regex based expression but that also didn't work.
I am using Kafka v2.1 and using new consumer api so zookeeper doesn't have any information about consumers.
Any pointers would be great help!

Might as well use https://github.com/yahoo/kafka-manager or https://github.com/linkedin/cruise-control to get this information.
There are scripts under $KAFKA_HOME/bin which can help you.
Number of topics in kafka cluster
./kafka-topics.sh --zookeeper localhost:2181 --list
Number of partitions per kafka broker
./kafka-topics.sh --zookeeper localhost:2181 --describe
Number of messages on each partition, size of disk etc.
./kafka-log-dirs.sh --describe --bootstrap-server localhost:9092
Lagging replicas, consumer lag etc.
./kafka-consumer-group.sh --bootstrap-server localhost:9092 --group $GROUP_NAME --describe
Active consumer groups
Number of active consumers and producers
You can't get active producer.
Know existing producers for a kafka topic
./kafka-consumer-group.sh --bootstrap-server localhost:9092 --list
Number of client connections per kafka broker
./

Related

How to retrieve Kafka Consumer Configs

I have several consumers that connect to Kafka Cluster that I do not have control over. At the same time, I would like to have visibility into how those consumers are configured.
Is there an API to list all consumers (if there is one for publishers, it is an added benefit) and then read all their configs?
I am talking about these consumer settings:
https://docs.confluent.io/current/installation/configuration/consumer-configs.html#cp-config-consumer
This is not possible as most of those settings are configured at the consumer only and are not pushed to the brokers or any topic.
It's possible however to get a high-level description for a given consumer group:
./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group consumer-group

Kafka Consumer does not receive data when one of the brokers is down

Kafka Quickstart
Using Kafka v2.1.0 on RHEL v6.9
Consumer fails to receive data when one of the Kafka brokers is down.
Steps performed:
1. Start zookeeper
2. Start Kafka-Server0 (localhost:9092, kafkalogs1)
3. Start Kafka-Server1 (localhost:9094, kafkalog2)
4. Create topic "test1", num of partitions = 1, replication factor = 2
5. Run producer for topic "test1"
6. Run consumer
7. Send messages from the producer
8. Receive messages on the consumer side.
All the above steps worked without any issues.
When I shutdown Kafka-Server0, the consumer stops getting data from Producer.
When I bring back up Kafka-Server0, the consumer starts to get messages from where it left off.
These are the commands used
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test1
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test1
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic test1
The behavior is the same (no message received on the consumer side) when I run the consumer with two servers specified in the --bootstrap-server option.
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9094 --topic test1
Any idea why the consumer stops getting messages when server0 is down even though the replication factor for the topic test1 was set to 2?
There is a similar question already but it was not answered completely
Kafka 0.10 quickstart: consumer fails when "primary" broker is brought down
If the offsets topic is unavailable, you cannot consume.
Look at the server.properties file for these, and see the comment above, and increase accordingly (only applies if topic doesn't already exist)
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
According to your previous question, looks like it only has one replica
See how you can increase replication factor for an existing topic
In initial versions of Kafka, offset was being managed at zookeeper, but Kafka has continuously evolved over the time introducing lot of new features. Now Kafka manages the offset in a topic __consumer_offsets.
You can think of a scenario where you created a topic with a replication factor of 1. In case the broker goes down the data is only on that Kafka node which is down. So you can't get this data. Same analogy applies to __consumer_offsets topic.
You need to revisit the server.properties in order to get features you are expecting. But in case you still wanna consume the messages from the replica partition, you may need to re-start the console consumer with --from-beginning true

Kafka List all partition with no leader

In my kafka Cluster there are more than 2k topics and each topic has 5 partitions. I want list only that partitions which has no leader.
I can go can check for each topic using the below syntax:
kafka-topics.sh --describe --topic <topic_name> --zookeeper <zookeeper_ip>:port
But the problem is there are 2k+ topics, can't be done manually. I can also write a script to loop over each topic and get the partition with no leader. But i am interested in some efficient way to get the information.
Using kafka-topics.sh you can specify the --unavailable-partitions flag to only list partitions that currently don't have a leader and hence cannot be used by Consumers or Producers.
For example:
kafka-topics.sh --describe --unavailable-partitions --zookeeper <zookeeper_ip>:port

Are broker nodes in kafka cluster configured to handle number of partition?

Kafka places the partitions and replicas in a way that the brokers with least number of existing partitions are used first. Does it mean that brokers are pre-configured to handle the partitions.
When you create a topic, you set the number of partitions.
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test
Also, there is a num.partions parameter you can use. (This is used when a topic is automatically created.)
A broker can have as many partitions as it wants as long as it has enough diskspace, memory, and network bandwidth.
In the screenshot above, you can see the partition of test. If you make a topic with three partitions, you will have two more folders with test-1 and test-2.
Each partition has an index file, a timeindex file, and a log file. The log file keeps Kafka data for that partition.

Kafka consumer list

I need to find out a way to ask Kafka for a list of topics. I know I can do that using the kafka-topics.sh script included in the bin\ directory. Once I have this list, I need all the consumers per topic. I could not find a script in that directory, nor a class in the kafka-consumer-api library that allows me to do it.
The reason behind this is that I need to figure out the difference between the topic's offset and the consumers' offsets.
Is there a way to achieve this? Or do I need to implement this functionality in each of my consumers?
Use kafka-consumer-groups.sh
For example
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
bin/kafka-consumer-groups.sh --describe --group mygroup --bootstrap-server localhost:9092
you can use this for 0.9.0.0. version kafka
./kafka-consumer-groups.sh --list --zookeeper hostname:potnumber
to view the groups you have created. This will display all the consumer group names.
./kafka-consumer-groups.sh --describe --zookeeper hostname:potnumber --describe --group consumer_group_name
To view the details
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
I realize that this question is nearly 4 years old now. Much has changed in Kafka since then. This is mentioned above, but only in small print, so I write this for users who stumble over this question as late as I did.
Offsets by default are now stored in a Kafka Topic (not in Zookeeper any more), see Offsets stored in Zookeeper or Kafka?
There's a kafka-consumer-groups utility which returns all the information, including the offset of the topic and partition, of the consumer, and even the lag (Remark: When you ask for the topic's offset, I assume that you mean the offsets of the partitions of the topic). In my Kafka 2.0 test cluster:
kafka-consumer-groups --bootstrap-server kafka:9092 --describe
--group console-consumer-69763 Consumer group 'console-consumer-69763' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
pytest 0 5 6 1 - - -
``
All the consumers per topic
(Replace --zookeeper with --bootstrap-server to get groups stored by newer Kafka clients)
Get all consumers-per-topic as a table of topictabconsumer:
for t in `kafka-consumer-groups.sh --zookeeper <HOST>:2181 --list 2>/dev/null`; do
echo $t | xargs -I {} sh -c "kafka-consumer-groups.sh --zookeeper <HOST>:2181 --describe --group {} 2>/dev/null | grep ^{} | awk '{print \$2\"\t\"\$1}' "
done > topic-consumer.txt
Make this pairs unique:
cat topic-consumer.txt | sort -u > topic-consumer-u.txt
Get the desired one:
less topic-consumer-u.txt | grep -i <TOPIC>
I do not see it mentioned here, but a command that i use often and that helps me to have a bird's eye view on all groups, topics, partitions, offsets, lags, consumers, etc
kafka-consumer-groups.bat --bootstrap-server localhost:9092 --describe --all-groups
A sample would look like this:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
Group Topic 2 7 7 0 <SOME-ID> XXXX <SOME-ID>
:
:
The most important column is the LAG, where for a healthy platform, ideally it should be 0(or nearer to 0 or a low number for high throughput) - at all times. So make sure you monitor it!!! ;-).
P.S:
An interesting article on how you can monitor the lag can be found here.
Kafka stores all the information in zookeeper. You can see all the topic related information under brokers->topics. If you wish to get all the topics programmatically you can do that using Zookeeper API.
It is explained in detail in below links
Tutorialspoint, Zookeeper Programmer guide
High level consumers are registered into Zookeeper, so you can fetch a list from ZK, similarly to the way kafka-topics.sh fetches the list of topics. I don't think there's a way to collect all consumers; any application sending in a few consume requests is actually a "consumer", and you cannot tell whether they are done already.
On the consumer side, there's a JMX metric exposed to monitor the lag. Also, there is Burrow for lag monitoring.
You can also use kafkactl for this:
# get all consumer groups (output as yaml)
kafkactl get consumer-groups -o yaml
# get only consumer groups assigned to a single topic (output as table)
kafkactl get consumer-groups --topic topic-a
Sample output (e.g. as yaml):
name: my-group
protocoltype: consumer
topics:
- topic-a
- topic-b
- topic-c
Disclaimer: I am contributor to this project