Terminate Kafka Console Consumer when all the messages have been read - apache-kafka

I know there has to be a way to do this, but I am not able to figure this out. I need to stop the kafka consumer once I have read all the messages from the queue.
Can somebody provide any info on this?

You can pass parameter: -consumer-timeout-ms with a value when starting the consumer and it will throw an exception if no messages have been read during that time. For example, to stop the consumer if no new messages have arrived in the last 2 seconds:
kafka.consumer.ConsoleConsumer -consumer-timeout-ms 2000
You can see this and all the other input options here

Currently, Kafka version 2.11-2.1.1 has a script called kafka-console-consumer.sh.
It has a new flag: --timeout-ms.
Basically, this flag is the maximum time to wait before exiting when there is no new log to wait. It's in millisecond term.
You can use this property to end you console consumer after reading all the messages.

You can use SimpleConsumerShell with no-wait-at-logend option. See SystemTools-SimpleConsumerShell
For example:
./kafka-run-class.bat kafka.tools.SimpleConsumerShell --broker-list localhost:9092 --topic kafkademo --partition 0 --no-wait-at-logend

If you are not dead set on using the Scala client, try kafkacat with the -e option telling it to exit when end of partition has been reached.
E.g. to consume all messages from mytopic partition 2 and then exit:
$ kafkacat -b mybroker -t mytopic -p 2 -o beginning -e
Or consume the last 3000 messages and then exit:
$ kafkacat -b mybroker -t mytopic -p 2 -o -3000 -e

Related

Kafka-Verifiable-Producer and Consumer Problem

I am doing experimenting with kafka. I already start the
kafka-console-producer and
kafka-console-consumer.
I send messages with kafka-producer and successfully receive at the kafka-console-consumer.
Now I want to produce and consume around 5000 messages at once. I look into documentation and get to know that there are two commands.
kafka-verifiable-producer.sh
kafka-verifiable-consumer.sh
I tried to use these commands .
kafka-verifiable-producer.sh --broker-list localhost:9092 --max-messages 5000 --topic data-sending
kafka-verifiable-consumer.sh --group-instance-id 1 --group-id data-world --topic data-sending --broker-list localhost:9092
The result is as follow
"timestamp":1581268289761,"name":"producer_send_success","key":null,"value":"4996","offset":44630,"topic":"try_1","partition":0}
{"timestamp":1581268289761,"name":"producer_send_success","key":null,"value":"4997","offset":44631,"topic":"try_1","partition":0}
{"timestamp":1581268289761,"name":"producer_send_success","key":null,"value":"4998","offset":44632,"topic":"try_1","partition":0}
{"timestamp":1581268289761,"name":"producer_send_success","key":null,"value":"4999","offset":44633,"topic":"try_1","partition":0}
{"timestamp":1581268289769,"name":"shutdown_complete"}
{"timestamp":1581268289771,"name":"tool_data","sent":5000,"acked":5000,"target_throughput":-1,"avg_throughput":5285.412262156448}
On the consumer console the result is as follow
{"timestamp":1581268089357,"name":"records_consumed","count":352,"partitions":[{"topic":"try_1","partition":0,"count":352,"minOffset":32777,"maxOffset":33128}]}
{"timestamp":1581268089359,"name":"offsets_committed","offsets":[{"topic":"try_1","partition":0,"offset":33129}],"success":true}
{"timestamp":1581268089384,"name":"records_consumed","count":500,"partitions":[{"topic":"try_1","partition":0,"count":500,"minOffset":33129,"maxOffset":33628}]}
{"timestamp":1581268089391,"name":"offsets_committed","offsets":[{"topic":"try_1","partition":0,"offset":33629}],"success":true}
{"timestamp":1581268089392,"name":"records_consumed","count":270,"partitions":[{"topic":"try_1","partition":0,"count":270,"minOffset":33629,"maxOffset":33898}]}
{"timestamp":1581268089394,"name":"offsets_committed","offsets":[{"topic":"try_1","partition":0,"offset":33899}],"success":true}
{"timestamp":1581268089415,"name":"records_consumed","count":500,"partitions":[{"topic":"try_1","partition":0,"count":500,"minOffset":33899,"maxOffset":34398}]}
{"timestamp":1581268089416,"name":"offsets_committed","offsets":[{"topic":"try_1","partition":0,"offset":34399}],"success":true}
{"timestamp":1581268089417,"name":"records_consumed","count":235,"partitions":[{"topic":"try_1","partition":0,"count":235,"minOffset":34399,"maxOffset":34633}]}
{"timestamp":1581268089419,"name":"offsets_committed","offsets":[{"topic":"try_1","partition":0,"offset":34634}],"success":true}
In above results, the key is null.
How i can send a bulk of messages with this command ?
I tried to look into one example how to use them but didn't found any. It produces integer number like values but where i can insert the messages?.
Is there any way i can use this command to produce messages in bulk? Also is it possible to implement such commands in windows or it is just for linux?
Any link to the examples would be greatly appreciated.
The script kafka-verifiable-producer.sh executes the classorg.apache.kafka.tools.VerifiableProducer.
(https://github.com/apache/kafka/blob/trunk/tools/src/main/java/org/apache/kafka/tools/VerifiableProducer.java)
Its program arguments --throughput, --repeating-keys and --value-prefix may fulfil your needs.
For example, the following produces messages with prefix value, 111 and with an incremental key which rotates for every 5 messages. You can also configure the throughput of the messages with the --throughput option. Int this example, it produces an average of 5 messages per second.
./kafka-verifiable-producer.sh --broker-list localhost:9092 --max-messages 10 --repeating-keys 5 --value-prefix 100 --throughput 5 --topic test
{"timestamp":1581271492652,"name":"startup_complete"}
{"timestamp":1581271492860,"name":"producer_send_success","key":"0","value":"100.0","offset":45,"topic":"test","partition":0}
{"timestamp":1581271492862,"name":"producer_send_success","key":"1","value":"100.1","offset":46,"topic":"test","partition":0}
{"timestamp":1581271493048,"name":"producer_send_success","key":"2","value":"100.2","offset":47,"topic":"test","partition":0}
{"timestamp":1581271493254,"name":"producer_send_success","key":"3","value":"100.3","offset":48,"topic":"test","partition":0}
{"timestamp":1581271493256,"name":"producer_send_success","key":"4","value":"100.4","offset":49,"topic":"test","partition":0}
{"timestamp":1581271493457,"name":"producer_send_success","key":"0","value":"100.5","offset":50,"topic":"test","partition":0}
{"timestamp":1581271493659,"name":"producer_send_success","key":"1","value":"100.6","offset":51,"topic":"test","partition":0}
{"timestamp":1581271493860,"name":"producer_send_success","key":"2","value":"100.7","offset":52,"topic":"test","partition":0}
{"timestamp":1581271494063,"name":"producer_send_success","key":"3","value":"100.8","offset":53,"topic":"test","partition":0}
{"timestamp":1581271494268,"name":"producer_send_success","key":"4","value":"100.9","offset":54,"topic":"test","partition":0}
{"timestamp":1581271494483,"name":"shutdown_complete"}
{"timestamp":1581271494484,"name":"tool_data","sent":10,"acked":10,"target_throughput":5,"avg_throughput":5.452562704471101}
The easiest is to modify/extend the above class If you are looking for more customized message keys and values.

kafka-run-class throwing java.lang.OutOfMemoryError error [duplicate]

I have being using the below CMD to get the latest offsets in from a Kafka Queue which has plain-text port open
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list server:9092 --topic sample_topic --time -1
But, now we only have the SSL port open, so I tried passing the SSL details as a property file
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list server:9093 --topic sample_topic --time -1 --consumer-config /path/to/file
Getting the below error -
Exception in thread "main" joptsimple.UnrecognizedOptionException: consumer-config is not a recognized option
How do I pass the SSL details to this command? These are all the available arguments for kafka-run-class.sh kafka.tools.GetOffsetShell
--broker-list <String: hostname:and port,...,hostname:port>
--max-wait-ms <Integer: ms>
--offsets <Integer: count>
--partitions <String: partition ids>
--time <Long: timestamp/-1(latest)/-2
--topic <String: topic>
Unfortunately kafka.tools.GetOffsetShell only supports PLAINTEXT connection. This tools is not used a lot and nobody has bothered updating it.
Depending on your use case, you have a few options:
Use the kafka-consumer-groups.sh tool: Assuming you have a consumer group consuming from that topic, this tool display the log end offsets of each partitions
Patch kafka.tools.GetOffsetShell: It's realtively easy to add support to secured connections bby reusing logic from the other tool. If you do so, consider sending a patch to Kafka =)
Write a tiny tool that calls Consumer.endOffsets()
Use kafka.tools.DumpLogSegments: As a last resort this tool can also be used to find the last offset

How to fetch recent messages from Kafka topic

Do we have any option like fetching recent 10/20/ etc., messages from Kafka topic. I can see --from-beginning option to fetch all messages from the topic but if I want to fetch only few messages first, last, middle or latest 10. do we have some options?
First N messages
You can use --max-messages N in order to fetch the first N messages of a topic.
For example, to get the first 10 messages, run
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --max-messages 10
Next N messages
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --max-messages 10
Last N messages
To get the last N messages, you need to define a specific partition and the offset:
bin/kafka-simple-consumer-shell.sh --bootstrap-server localhost:9092 --topic test--partition testPartition --offset yourOffset
M to N messages
Again, for this case you'd have to define both the partition and the offset.
For example, you can run the following in order to get N messages starting from an offset of your choice:
bin/kafka-simple-consumer-shell.sh --bootstrap-server localhost:9092 --topic test--partition testPartition --offset yourOffset --max-messages 10
If you don't want to stick to the binaries, I would suggest you to use kt which is a Kafka command line tool with more options and functionality.
For more details refer to the article How to fetch specific messages in Apache Kafka
Without specifying an offset and partition, you'll only be able to consume next N or first N. To consume in the "middle" of the unbounded stream, you need to give the offset
Other than console consumer, there's kafkacat
First twenty
kafkacat -C -b -t topic -o earliest -c 20
And from previous twenty (from partition zero)
kafkacat -C -b -t topic -P 0 -o -20

Is it possible to see the data partition wise in a kafka topic?

I have start working with Kafka few days ago. I am using Kafka on Windows environments, I want to see the data in each partition of a Kafka topic.
I have a topic called ExampleTopic with replication.factor set to 3 and 3 partitions. I am able to see the data into the topic but I want to see which messages are going in which partitions.
Please let me know is it possible if yes then how?
You can use the --partition parameter of the Kafka console consumer to specify which partition to consume from:
bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic ExampleTopic --partition 0
You can also specify an --offset parameter that indicates which offset to start from. If absent, the consumption starts at the end of the partition.
I have got an GUI based tool to view data in each partition of a topic named kafka tool.
http://www.kafkatool.com
It’s a tool to manage our kafka cluster. Also provide many features should try.
Use kafkacat, e.g. :
$ kafkacat -b localhost:9092 -t my_topic -C \
-f '\nKey (%K bytes): %k\t\nValue (%S bytes): %s\n\
Timestamp: %T\tPartition: %p\tOffset: %o\n--\n'
Key (1 bytes): 1
Value (79 bytes): {"uid":1,"name":"Cliff","locale":"en_US","address_city":"St Louis","elite":"P"}
Timestamp: 1520618381093 Partition: 0 Offset: 0

Number of commits and offset in each partition of a kafka topic

How to find the number of commits and current offset in each partition of a known kafka topic. I am using kafka v0.8.1.1
It is not clear from your question, what kind of offset you're interested in.
There are actually three types of offsets:
The offset of the first available message in topic's partition. Use -2
(earliest) as --time parameter for GetOffsetShell tool
The offset of the last available message in topic's partition. Use -1(latest) as --time
parameter.
The last read/processed message offset maintained by
kafka consumer. High level consumer stores this information, for every consumer group, in
an internal Kafka topic (used to be Zookeeper) and takes care about
keeping it up to date when you call commit() or when auto-commit
setting is set to true. For simple consumer, your code have to take
care about managing offsets.
In addition to command line utility, the offset information for #1 and #2 is also available via SimpleConsumer.earliestOrLatestOffset().
If the number of messages is not too large, you can specify a large --offsets parameter to GetOffsetShell and then count number of lines returned by the tool. Otherwise, you can write a simple loop in scala/java that would iterate all available offsets starting from the earliest.
From Kafka documentation:
Get Offset Shell
get offsets for a topic
bin/kafka-run-class.sh kafka.tools.GetOffsetShell
required argument [broker-list], [topic]
Option Description
------ -----------
--broker-list <hostname:port,..., REQUIRED: The list of hostname and hostname:port> port of the server to connect to.
--max-wait-ms <Integer: ms> The max amount of time each fetch request waits. (default: 1000)
--offsets <Integer: count> number of offsets returned (default: 1)
--partitions <partition ids> comma separated list of partition ids. If not specified, will find offsets for all partitions (default)
--time <Long: timestamp in milliseconds / -1(latest) / -2 (earliest) timestamp; offsets will come before this timestamp, as in getOffsetsBefore >
--topic <topic> REQUIRED: The topic to get offsets from.
Regarding the offset of the topic and partition you can use kafka.tools.GetOffsetShell. For example using these command (I have topic games):
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic games --time -1
I will get games:0:47841 which means that for topic games and 0 partition I have latest not used offset 47841 (latest available message).
You can use -2 to see the first available message.
Starting version 0.9.0.x you should start to use the kafka.admin.ConsumerGroupCommand tool. Below are the arguments that the tool take
List all consumer groups, describe a consumer group, or delete consumer group info.
Option Description
------ -----------
--bootstrap-server <server to connect REQUIRED (only when using new-
to> consumer): The server to connect to.
--command-config <command config Property file containing configs to be
property file> passed to Admin Client and Consumer.
--delete Pass in groups to delete topic
partition offsets and ownership
information over the entire consumer
group. For instance --group g1 --
group g2
Pass in groups with a single topic to
just delete the given topic's
partition offsets and ownership
information for the given consumer
groups. For instance --group g1 --
group g2 --topic t1
Pass in just a topic to delete the
given topic's partition offsets and
ownership information for every
consumer group. For instance --topic
t1
WARNING: Group deletion only works for
old ZK-based consumer groups, and
one has to use it carefully to only
delete groups that are not active.
--describe Describe consumer group and list
offset lag related to given group.
--group <consumer group> The consumer group we wish to act on.
--list List all consumer groups.
--new-consumer Use new consumer.
--topic <topic> The topic whose consumer group
information should be deleted.
--zookeeper <urls> REQUIRED (unless new-consumer is
used): The connection string for the
zookeeper connection in the form
host:port. Multiple URLS can be
given to allow fail-over.
To get offsets for a Topic_X for a consumerGroup_Y use the command as below
bin/kafka-run-class.sh kafka.admin.ConsumerGroupCommand --zookeeper <zookeeper urls> --describe --group consumerGroup_Y
Response would look like
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
consumerGroup, Topic_X, 0, 3030460, 3168412, 137952, none
consumerGroup, Topic_X, 1, 3030903, 3168884, 137981, none
consumerGroup, Topic_X, 2, 801564, 939540, 137976, none
consumerGroup, Topic_X, 3, 737290, 875262, 137972, none
consumerGroup, Topic_X, 4, 737288, 875254, 137966, none
consumerGroup, Topic_X, 5, 737276, 875241, 137965, none
consumerGroup, Topic_X, 6, 737290, 875251, 137961, none
consumerGroup, Topic_X, 7, 737290, 875248, 137958, none
consumerGroup, Topic_X, 8, 737288, 875246, 137958, none
consumerGroup, Topic_X, 9, 737293, 875251, 137958, none
consumerGroup, Topic_X, 10, 737289, 875244, 137955, none
consumerGroup, Topic_X, 11, 737273, 875226, 137953, none
This information was also helpful in creating a script to view the number of messages on a partition for a topic (from the command line). While tools like Kafka-Web-Console are nice, some of us live in a non-GUI world.
Here is the script ... use and modify it at your own risk :-)
#!/bin/bash
topic=$1
if [[ -z "${topic}" ]] ; then
echo "Usage: ${0} <topic>"
exit 1
fi
if [[ -z "${KAFKA_HOME}" ]] ; then
# $KAFKA_HOME not set, using default /kafka
KAFKA_HOME="/kafka"
fi
if [ ! -d ${KAFKA_HOME} ] ; then
echo "\$KAFKA_HOME does not point to a valid directory [$KAFKA_HOME]"
exit 1
fi
cd $KAFKA_HOME
echo
echo "Topic: ${topic}: "
#
printf "Partition Count\n"
printf "~~~~~~~~~~ ~~~~~~~~~~~~\n"
idx=0
for msg in `bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic ${topic} --broker-list localhost:9092 --time -1` ; do
name=`echo ${msg} | awk -F ":" '{print $1}'`
partition=`echo ${msg} | awk -F ":" '{print $2}'`
total=`echo ${msg} | awk -F ":" '{print $3}'`
printf "%10d %12d\n" ${partition} ${total}
idx=$((idx + 1))
done
if [ ${idx} -eq 0 ] ; then
echo "Topic name not found!"
exit 1
fi
echo
exit ${rc}
Say suppouse we have topic by name tomorrowis27
And our requirment is
Req 1: Wanted to know the partition and offset details of the topic.
Ans : We can use GetOffsetShell command as shown in the below screenshot.
Req 2: Wanted to know the no of offset consumed by a consumer group.
Ans: We can use ConsumerGroupCommand as shown in the below screenshot.