Apache Kafka installed on Mac (Intel).
Single local producer and single local consumer.
1 topic with 3 partitions and 1 replication factor is created:
bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic animal --partitions 3 --replication-factor 1
Producer code:
bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic animal
Producer Messages:
>alligator
>crocodile
>tiger
When producing messages (manually via producer-console), all go into the same partition. Shouldn't they get distributed across partitions?
I've tried with 3 records (as above), but they get sent to 1 partition only. Checked within tmp/kafka-logs/topic-0/00**00.log
Other logs in topic- are empty.
I've tried with tens of records, but no luck.
I even increased the default partition configuration (num.partitions=3) within 'config/server.properties', but no luck.
I've also tried with different topics, but no luck.
Starting with kafka 2.4, the default partitioner was changed from round-robin to sticky, which will stick to the same partition (pun intended) for an entire batch.
With my kafka version, the kafka-console-producer uses a default batch size of 16384, so once you produce enough messages to fill that buffer, the partition will change.
If a producer, produces messages with the same key then it’s guaranteed to be produced on the same partition. so in your case if you want it to be consumed by different partitions than make sure to publish it with different keys.
You will need to set below property.
--property parse.key=true
See below command to produce record with key.
kafka-console-producer --broker-list 127.0.0.1:9092 --topic first_topic --property parse.key=true --property key.separator=,
> key1,value1
> key2,value2
I want to delete all messages that are contained in the __consumer_offsets table that start with a given key (resetting one particular consumer group without affecting the rest).
Is there a way to do this?
Kafka comes with a ConsumerGroupCommand tool. You cand find some information in the Kafka documentation.
If you plan to reset a particular Consumer Group ("myConsumerGroup") without affecting the rest you can use
> bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --reset-offsets --group myConsumerGroup --topic topic1 --to-latest
Depending on your requirement you can reset the offsets for each partition of the topic with that tool. The help function or documentation explain the options.
Regarding Kafka topic creation. I have an understanding that Kafka cluster can have several brokers/nodes/servers. Each broker can have one or more topics configured. A topic created could be in one or more brokers depending on partitions provided during topic creation. Is there any way how one can tell in which broker/s should a topic and it's partitions be created?
Regards,
Lokesh
When creating the topic, you can either just specify the number of partitions and replicas and let Kafka distribute them. Or you can directly specify the assignment - which partition and replica goes where.
If you are using the kafka-topics.sh script which is part of kafka, you can use the option --replica-assignment for it. For example:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic topic1 --replica-assignment 0:1:2,0:1:2,0:1:2
If the topic already exists, you can use the kafka-reassign-partitions.sh tool to change the assignment.
This might contain some more details about is: https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-2.2CreateTopics
I want to check the lag for a consumer group which was assigned manually to particular topic , is this possible ? . I am using Kafka - 0.10.0.1 .I used sh kafka-run-class.sh kafka.admin.ConsumerGroupCommand —new-consumer —describe —bootstrap-server localhost:9092 —group test but it says no group exists , so i wonder when we assign a partition manually can we check the lag for the consumer.
In Nussknacker we are using AKHQ GUI tool which provides various monitoring options as consumer and consumer groups lag and general Kafka operations as topic, topic data and schema registry management
./kafka-consumer-groups.sh --bootstrap-server localhost:9092
--describe --group
.
If You want API support or visual lag monitoring you can use https://github.com/yahoo/kafka-manager
I need to find out a way to ask Kafka for a list of topics. I know I can do that using the kafka-topics.sh script included in the bin\ directory. Once I have this list, I need all the consumers per topic. I could not find a script in that directory, nor a class in the kafka-consumer-api library that allows me to do it.
The reason behind this is that I need to figure out the difference between the topic's offset and the consumers' offsets.
Is there a way to achieve this? Or do I need to implement this functionality in each of my consumers?
Use kafka-consumer-groups.sh
For example
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
bin/kafka-consumer-groups.sh --describe --group mygroup --bootstrap-server localhost:9092
you can use this for 0.9.0.0. version kafka
./kafka-consumer-groups.sh --list --zookeeper hostname:potnumber
to view the groups you have created. This will display all the consumer group names.
./kafka-consumer-groups.sh --describe --zookeeper hostname:potnumber --describe --group consumer_group_name
To view the details
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
I realize that this question is nearly 4 years old now. Much has changed in Kafka since then. This is mentioned above, but only in small print, so I write this for users who stumble over this question as late as I did.
Offsets by default are now stored in a Kafka Topic (not in Zookeeper any more), see Offsets stored in Zookeeper or Kafka?
There's a kafka-consumer-groups utility which returns all the information, including the offset of the topic and partition, of the consumer, and even the lag (Remark: When you ask for the topic's offset, I assume that you mean the offsets of the partitions of the topic). In my Kafka 2.0 test cluster:
kafka-consumer-groups --bootstrap-server kafka:9092 --describe
--group console-consumer-69763 Consumer group 'console-consumer-69763' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
pytest 0 5 6 1 - - -
``
All the consumers per topic
(Replace --zookeeper with --bootstrap-server to get groups stored by newer Kafka clients)
Get all consumers-per-topic as a table of topictabconsumer:
for t in `kafka-consumer-groups.sh --zookeeper <HOST>:2181 --list 2>/dev/null`; do
echo $t | xargs -I {} sh -c "kafka-consumer-groups.sh --zookeeper <HOST>:2181 --describe --group {} 2>/dev/null | grep ^{} | awk '{print \$2\"\t\"\$1}' "
done > topic-consumer.txt
Make this pairs unique:
cat topic-consumer.txt | sort -u > topic-consumer-u.txt
Get the desired one:
less topic-consumer-u.txt | grep -i <TOPIC>
I do not see it mentioned here, but a command that i use often and that helps me to have a bird's eye view on all groups, topics, partitions, offsets, lags, consumers, etc
kafka-consumer-groups.bat --bootstrap-server localhost:9092 --describe --all-groups
A sample would look like this:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
Group Topic 2 7 7 0 <SOME-ID> XXXX <SOME-ID>
:
:
The most important column is the LAG, where for a healthy platform, ideally it should be 0(or nearer to 0 or a low number for high throughput) - at all times. So make sure you monitor it!!! ;-).
P.S:
An interesting article on how you can monitor the lag can be found here.
Kafka stores all the information in zookeeper. You can see all the topic related information under brokers->topics. If you wish to get all the topics programmatically you can do that using Zookeeper API.
It is explained in detail in below links
Tutorialspoint, Zookeeper Programmer guide
High level consumers are registered into Zookeeper, so you can fetch a list from ZK, similarly to the way kafka-topics.sh fetches the list of topics. I don't think there's a way to collect all consumers; any application sending in a few consume requests is actually a "consumer", and you cannot tell whether they are done already.
On the consumer side, there's a JMX metric exposed to monitor the lag. Also, there is Burrow for lag monitoring.
You can also use kafkactl for this:
# get all consumer groups (output as yaml)
kafkactl get consumer-groups -o yaml
# get only consumer groups assigned to a single topic (output as table)
kafkactl get consumer-groups --topic topic-a
Sample output (e.g. as yaml):
name: my-group
protocoltype: consumer
topics:
- topic-a
- topic-b
- topic-c
Disclaimer: I am contributor to this project