I need to find out a way to ask Kafka for a list of topics. I know I can do that using the kafka-topics.sh script included in the bin\ directory. Once I have this list, I need all the consumers per topic. I could not find a script in that directory, nor a class in the kafka-consumer-api library that allows me to do it.
The reason behind this is that I need to figure out the difference between the topic's offset and the consumers' offsets.
Is there a way to achieve this? Or do I need to implement this functionality in each of my consumers?
Use kafka-consumer-groups.sh
For example
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
bin/kafka-consumer-groups.sh --describe --group mygroup --bootstrap-server localhost:9092
you can use this for 0.9.0.0. version kafka
./kafka-consumer-groups.sh --list --zookeeper hostname:potnumber
to view the groups you have created. This will display all the consumer group names.
./kafka-consumer-groups.sh --describe --zookeeper hostname:potnumber --describe --group consumer_group_name
To view the details
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
I realize that this question is nearly 4 years old now. Much has changed in Kafka since then. This is mentioned above, but only in small print, so I write this for users who stumble over this question as late as I did.
Offsets by default are now stored in a Kafka Topic (not in Zookeeper any more), see Offsets stored in Zookeeper or Kafka?
There's a kafka-consumer-groups utility which returns all the information, including the offset of the topic and partition, of the consumer, and even the lag (Remark: When you ask for the topic's offset, I assume that you mean the offsets of the partitions of the topic). In my Kafka 2.0 test cluster:
kafka-consumer-groups --bootstrap-server kafka:9092 --describe
--group console-consumer-69763 Consumer group 'console-consumer-69763' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
pytest 0 5 6 1 - - -
``
All the consumers per topic
(Replace --zookeeper with --bootstrap-server to get groups stored by newer Kafka clients)
Get all consumers-per-topic as a table of topictabconsumer:
for t in `kafka-consumer-groups.sh --zookeeper <HOST>:2181 --list 2>/dev/null`; do
echo $t | xargs -I {} sh -c "kafka-consumer-groups.sh --zookeeper <HOST>:2181 --describe --group {} 2>/dev/null | grep ^{} | awk '{print \$2\"\t\"\$1}' "
done > topic-consumer.txt
Make this pairs unique:
cat topic-consumer.txt | sort -u > topic-consumer-u.txt
Get the desired one:
less topic-consumer-u.txt | grep -i <TOPIC>
I do not see it mentioned here, but a command that i use often and that helps me to have a bird's eye view on all groups, topics, partitions, offsets, lags, consumers, etc
kafka-consumer-groups.bat --bootstrap-server localhost:9092 --describe --all-groups
A sample would look like this:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
Group Topic 2 7 7 0 <SOME-ID> XXXX <SOME-ID>
:
:
The most important column is the LAG, where for a healthy platform, ideally it should be 0(or nearer to 0 or a low number for high throughput) - at all times. So make sure you monitor it!!! ;-).
P.S:
An interesting article on how you can monitor the lag can be found here.
Kafka stores all the information in zookeeper. You can see all the topic related information under brokers->topics. If you wish to get all the topics programmatically you can do that using Zookeeper API.
It is explained in detail in below links
Tutorialspoint, Zookeeper Programmer guide
High level consumers are registered into Zookeeper, so you can fetch a list from ZK, similarly to the way kafka-topics.sh fetches the list of topics. I don't think there's a way to collect all consumers; any application sending in a few consume requests is actually a "consumer", and you cannot tell whether they are done already.
On the consumer side, there's a JMX metric exposed to monitor the lag. Also, there is Burrow for lag monitoring.
You can also use kafkactl for this:
# get all consumer groups (output as yaml)
kafkactl get consumer-groups -o yaml
# get only consumer groups assigned to a single topic (output as table)
kafkactl get consumer-groups --topic topic-a
Sample output (e.g. as yaml):
name: my-group
protocoltype: consumer
topics:
- topic-a
- topic-b
- topic-c
Disclaimer: I am contributor to this project
Related
I am trying to find ways to get current usage statistics for my kafka cluster. I am looking to collect following information:
Number of topics in kafka cluster
Number of partitions per kafka broker
Number of active consumers and producers
Number of client connections per kafka broker
Number of messages on each partition, size of disk etc.
Lagging replicas, consumer lag etc.
Active consumer groups
Any other statistics that can and should be collected, currently I am looking at collecting the above stats.
I can get 1 and 2 using zookeeper utilities but I am lost on rest. I have looked at mbeans in Jconsole but didn't find anything about above. I also tried JmxTool to get these mbeans using regex based expression but that also didn't work.
I am using Kafka v2.1 and using new consumer api so zookeeper doesn't have any information about consumers.
Any pointers would be great help!
Might as well use https://github.com/yahoo/kafka-manager or https://github.com/linkedin/cruise-control to get this information.
There are scripts under $KAFKA_HOME/bin which can help you.
Number of topics in kafka cluster
./kafka-topics.sh --zookeeper localhost:2181 --list
Number of partitions per kafka broker
./kafka-topics.sh --zookeeper localhost:2181 --describe
Number of messages on each partition, size of disk etc.
./kafka-log-dirs.sh --describe --bootstrap-server localhost:9092
Lagging replicas, consumer lag etc.
./kafka-consumer-group.sh --bootstrap-server localhost:9092 --group $GROUP_NAME --describe
Active consumer groups
Number of active consumers and producers
You can't get active producer.
Know existing producers for a kafka topic
./kafka-consumer-group.sh --bootstrap-server localhost:9092 --list
Number of client connections per kafka broker
./
In my kafka Cluster there are more than 2k topics and each topic has 5 partitions. I want list only that partitions which has no leader.
I can go can check for each topic using the below syntax:
kafka-topics.sh --describe --topic <topic_name> --zookeeper <zookeeper_ip>:port
But the problem is there are 2k+ topics, can't be done manually. I can also write a script to loop over each topic and get the partition with no leader. But i am interested in some efficient way to get the information.
Using kafka-topics.sh you can specify the --unavailable-partitions flag to only list partitions that currently don't have a leader and hence cannot be used by Consumers or Producers.
For example:
kafka-topics.sh --describe --unavailable-partitions --zookeeper <zookeeper_ip>:port
I've spent some hours to figure out what was going on but didn't manage to find the solution.
Here is my set up on a single machine:
1 zookeeper running
3 broker running (on port 9092/9093/9094)
1 topic with 3 partitions and 3 replications (each partition are properly assigned between brokers)
I'm using kafka console producer to insert messages. If i check the replication offset (cat replication-offset-checkpoint), I see that my messages are properly ingested by Kafka.
Now I use the kafka console consumer (new):
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic testTopicPartitionned2
I dont see anything consumed. I tried to delete my logs folder (/tmp/kafka-logs-[1,2,3]), create new topics, still nothing.
However when I use the old kafka consumer:
sudo bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testTopicPartitionned2
I can see my messages.
Am I missing something big here to make this new consumer work ?
Thanks in advance.
Check to see what setting the consumer is using for auto.offset.reset property
This will affect what a consumer group without a previously committed offset will do in terms of setting where to start reading messages from a partition.
Check the Kafka docs for more on this.
Try providing all your brokers to --bootstrap-server argument to see if you notice any differnce:
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --from-beginning --topic testTopicPartitionned2
Also, your topic name is rather long. I assume you've already made sure you provide the correct topic name.
For reading all partitions in topic:
~bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic myTopic --from-beginning
How can I consume particular partition of the topic? (for instance with partition key 13)
And how produce message in partition with particular partition key? Is it possible?
You can't using console consumer and producer. But you can using higher level clients (in any language that works for you).
You may use for example assign method to manually assign a specific topic-partition to consume (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L906)
You may use a custom Partitioner to override the partitioning logic where you will decide manually how to partition your messages (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java#L206-L208)
With the many clients that are available you can specify the partition number just like serejja has stated.
Also look into https://github.com/cakesolutions/scala-kafka-client which uses actors and provides multiple modes for manual partitions and offsets.
If you want to do the same on the terminal, I suggest using kafkacat. (https://github.com/edenhill/kafkacat)
My personal choice during development.
You can do things like
kafkacat -b localhost:9092 -f 'Topic %t[%p], offset::: %o, data: %s key: %k\n' -t testtopic
And for a specific partition, you just need to use -p flag.
Console producer and consumer do not provide this flexibility. You could achieve this through Kafka APIs.
You could manually assign partition to consumer using assign() operation KafkaConsumer/Assign. This will disable group rebalancing. Please use this very carefully.
You could specify partition detail in KafkaProducer message. If not specified, it stores as per Partitioner policy.
How can I consume particular partition of the topic? (for instance
with partition key 13)
There is a flag called --partition in kafka-console-consumer
--partition <Integer: partition> The partition to consume from.
Consumption starts from the end of
the partition unless '--offset' is
specified.
The command is as follows:
bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic test --partition 0 --from-beginning
How do I get, set, or reset the offset of a Kafka Connect connector/task/sink?
I can use the /usr/bin/kafka-consumer-groups tool which runs kafka.admin.ConsumerGroupCommand to see the offsets for all my regular Kafka consumer groups. However, Kafka Connect tasks and groups do not show up with this tool.
Similarly, I can use the zookeeper-shell to connect to Zookeeper and I can see zookeeper entries for regular Kafka consumer groups, but not for Kafka Connect sinks.
As of 0.10.0.0, Connect doesn't provide an API for managing offsets. It's something we want to improve in the future, but not there yet. The ConsumerGroupCommand would be the right tool to manage offsets for Sink connectors. Note that source connector offsets are stored in a special offsets topic for Connect (they aren't like normal Kafka offsets since they are defined by the source system, see offset.storage.topic in the worker configuration docs) and since sink Connectors uses the new consumer, they won't store their offsets in Zookeeper -- all modern clients use native Kafka-based offset storage. The ConsumerGroupCommand can work with these offsets, you just need to pass the --new-consumer option).
You can't set offsets, but you can use kafka-consumer-groups.sh tool to "scroll" the feed forward.
The consumer group of your connector has a name of connect-*CONNECTOR NAME*, but you can double check:
unset JMX_PORT; ./bin/kafka-consumer-groups.sh --bootstrap-server *KAFKA HOSTS* --list
To view current offset:
unset JMX_PORT; ./bin/kafka-consumer-groups.sh --bootstrap-server *KAFKA HOSTS* --group connect-*CONNECTOR NAME* --describe
To move the offset forward:
unset JMX_PORT; ./bin/kafka-console-consumer.sh --bootstrap-server *KAFKA HOSTS* --topic *TOPIC* --max-messages 10000 --consumer-property group.id=connect-*CONNECTOR NAME* > /dev/null
I suppose you can move the offset backward as well by deleting the consumer group first, using --delete flag.
Don't forget to pause and resume your connector via Kafka Connect REST API.
In my case(testing reading files into producer and consume in console, all in local only), I just saw this in producer output:
offset.storage.file.filename=/tmp/connect.offsets
So I wanted to open it but it is binary, with some hardly recognizable characters.
I deleted it(rename it also works), and then I can write into the same file and get the file content from consumer again. You have to restart the console producer to take effect because it attempts to read the offset file, if not there, create a new one, so that the offset is reset.
If you want to reset it without deletion, you can use:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group <group-name> --reset-offsets --to-earliest --topic <topic_name>
You can check all group names by:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list
and check details of each group:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group <group_name> --describe
In production environment, this offset is managed by zookeeper, so more steps (and caution) is needed. You can refer to this page:
https://metabroadcast.com/blog/resetting-kafka-offsets
https://community.hortonworks.com/articles/81357/manually-resetting-offset-for-a-kafka-topic.html
Steps:
kafka-topics --list --zookeeper localhost:2181
kafka-run-class kafka.tools.GetOffsetShell --broker-list localhost:9092 -topic vital_signs --time -1 // -1 for largest, -2 for smallest
set /consumers/{yourConsumerGroup}/offsets/{yourFancyTopic}/{partitionId} {newOffset}