How can we run multiple kafka consumers through command line? - apache-kafka

I am testing kafka performance through the shell script they already provided in the kafka package. I have created a topic with 10 partitions and pumping data as shown below:
./bin/kafka-producer-perf-test.sh --topic test-topic --num-records 9000000 --record-size 300 --throughput 250000 --producer-props bootstrap.servers=110.17.14.302:9092 acks=1 max.in.flight.requests.per.connection=1 batch.size=5000
Now I want to consume the data which I am pumping as shown above from multiple consumers not just from single consumer. So I started using kafka-consumer-perf-test.sh. This is what I was doing:
./bin/kafka-consumer-perf-test.sh --zookeeper localhost:2181 --topic test-topic --group test1
Is there any way by which we can run multiple kafka consumers in a single consumer group through command line and each of those consumers working on different partitions using kafka-consumer-perf-test.sh? I am working with Kafka version 0.10.1.0
I saw this so post but it doesn't say where to configure how many consumers we want to run and what partition they will work on?
Update:
This is the error I saw:
./bin/kafka-consumer-perf-test.sh --zookeeper 110.27.14.10:2181 --messages 50 --topic test-topic --threads 1
[2017-01-11 22:34:09,785] WARN [ConsumerFetcherThread-perf-consumer-14195_kafka-cluster-3098529006-zeidk-1484174043509-46a51434-2-0], Error in fetch kafka.consumer.ConsumerFetcherThread$FetchRequest#54fb48b6 (kafka.consumer.ConsumerFetcherThread)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
at kafka.network.BlockingChannel.readCompletely(BlockingChannel.scala:129)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:120)
at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:99)
at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:83)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:132)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:132)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:132)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:131)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:131)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:131)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:130)
at kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:109)
at kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:29)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

Just run the same command (i.e., ./bin/kafka-consumer-perf-test.sh) multiple times in different consoles.
About partition assignment: Kafka will so this automatically for you. If you use consumer groups.
If you want to do manual partition assignment, you cannot use consumer groups. For this, you cannot use kafka-consumer-perf-test.sh but need to write your own.
Read JavaDoc here: https://kafka.apache.org/0101/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

Related

kafka-run-class throwing java.lang.OutOfMemoryError error [duplicate]

I have being using the below CMD to get the latest offsets in from a Kafka Queue which has plain-text port open
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list server:9092 --topic sample_topic --time -1
But, now we only have the SSL port open, so I tried passing the SSL details as a property file
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list server:9093 --topic sample_topic --time -1 --consumer-config /path/to/file
Getting the below error -
Exception in thread "main" joptsimple.UnrecognizedOptionException: consumer-config is not a recognized option
How do I pass the SSL details to this command? These are all the available arguments for kafka-run-class.sh kafka.tools.GetOffsetShell
--broker-list <String: hostname:and port,...,hostname:port>
--max-wait-ms <Integer: ms>
--offsets <Integer: count>
--partitions <String: partition ids>
--time <Long: timestamp/-1(latest)/-2
--topic <String: topic>
Unfortunately kafka.tools.GetOffsetShell only supports PLAINTEXT connection. This tools is not used a lot and nobody has bothered updating it.
Depending on your use case, you have a few options:
Use the kafka-consumer-groups.sh tool: Assuming you have a consumer group consuming from that topic, this tool display the log end offsets of each partitions
Patch kafka.tools.GetOffsetShell: It's realtively easy to add support to secured connections bby reusing logic from the other tool. If you do so, consider sending a patch to Kafka =)
Write a tiny tool that calls Consumer.endOffsets()
Use kafka.tools.DumpLogSegments: As a last resort this tool can also be used to find the last offset

delete topic-messages in Apache kafka

I'm testing the working of kafka-topics but I don´t undestand how the deletion works.
I have created a simple topic with
retention.ms = 60000
and
segment.ms = 60000
and
cleanup.policy=delete.
After this I created a producer and I sent some messages.
A consumer receive the messages without problems.
But I expect that, after one minute, if a repeat the consumer, it doesn't show the messages because they must have been deleted. But this behaviour doesn't occur.
If I create a query in ksql it's the same. The messages always appear.
I think that I don't understand how the deletion works.
Example:
1) Topic
./kafka-topics --create --zookeeper localhost:2181 --topic test --
replication-factor 2 --partitions 1 --config "cleanup.policy=delete" --
config "delete.retention.ms=60000" --config "segment.ms=60000"
2) producer
./kafka-avro-console-producer --broker-list broker:29092 --topic test--
property parse.key=true --property key.schema='{"type":"long"}' --property
"key.separator=:" --property value.schema='{"type": "record","name":
"ppp","namespace": "test.topic","fields": [{"name": "id","type": "long"}]}'
3) messages from producer
1:{"id": 1}
2:{"id": 2}
4:{"id": 4}
5:{"id": 5}
4) Consumer
./kafka-avro-console-consumer \
--bootstrap-server broker:29092 \
--property schema.registry.url=http://localhost:8081 \
--topic test--from-beginning --property print.key=true
The consumer shows the four messages.
But I expect that If I run the consumer again after one minute (I have waited more time too, even hours) the messages don´t show because the retention.ms and segment.ms are one minute.
When messages are actually deleted?
Another important think to know in deletion process in Kafka is log segment file:
Topics are divided into partitions right? This is what allows parallelism, scale etc..
Each partition is divided into log segments files. Why? Because Kafka writes data to Disk right...? we don't want to it keep the entire topic / partition in 1 huge file, but split it into smaller files (segments)..
Breaking data into smaller files has many advantages, don't really related to the question. Can read more here
The key thing to notice here is:
Retention policy is looking on the log semgnet's file time stamp.
"Retention by time is performed by examining the last modified
time (mtime) on each log segment file on disk. Under normal clus‐
ter operations, this is the time that the log segment was closed, and
represents the timestamp of the last message in the file"
(From Kafka-definitive Guide, page 26)
Version 0.10.1.0
The log retention time is no longer based on last modified time of the log segments. Instead it will be based on the largest timestamp of the messages in a log segment.
Which means it looks only on closed log segment files.
Make sure your 'segment' config params are right..
Change the retention.ms as mentioned by Ajay Srivastava above using kafka-topics --zookeeper localhost:2181 --alter --topic test --config retention.ms=60000 and test again.

Get the latest offsets in SSL Enabled Kafka via CMD

I have being using the below CMD to get the latest offsets in from a Kafka Queue which has plain-text port open
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list server:9092 --topic sample_topic --time -1
But, now we only have the SSL port open, so I tried passing the SSL details as a property file
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list server:9093 --topic sample_topic --time -1 --consumer-config /path/to/file
Getting the below error -
Exception in thread "main" joptsimple.UnrecognizedOptionException: consumer-config is not a recognized option
How do I pass the SSL details to this command? These are all the available arguments for kafka-run-class.sh kafka.tools.GetOffsetShell
--broker-list <String: hostname:and port,...,hostname:port>
--max-wait-ms <Integer: ms>
--offsets <Integer: count>
--partitions <String: partition ids>
--time <Long: timestamp/-1(latest)/-2
--topic <String: topic>
Unfortunately kafka.tools.GetOffsetShell only supports PLAINTEXT connection. This tools is not used a lot and nobody has bothered updating it.
Depending on your use case, you have a few options:
Use the kafka-consumer-groups.sh tool: Assuming you have a consumer group consuming from that topic, this tool display the log end offsets of each partitions
Patch kafka.tools.GetOffsetShell: It's realtively easy to add support to secured connections bby reusing logic from the other tool. If you do so, consider sending a patch to Kafka =)
Write a tiny tool that calls Consumer.endOffsets()
Use kafka.tools.DumpLogSegments: As a last resort this tool can also be used to find the last offset

How to get log end offset of all partitions for a given kafka topic using kafka command line?

When I describe a kafka topic it doesn't show the log end offset of any partition but show all the other metadata such as ISR,Replicas,Leader.
How do I see a log end offset of the partition for a given topic?
Ran this: ./kafka-topics.sh --zookeeper zk-service:2181 --describe --topic "__consumer_offsets"
Output Doesn't have a offset column.
Note: Need Only the log end offset.
Since you're only looking for the log end offset for a topic, you can use kafka-run-class with the kafka.tools.GetOffsetShell class.
Assuming your topic is __consumer_offsets, you would get the end offset by running:
./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --time -1 --topic __consumer_offsets
Change the --broker-list localhost:9092 to your desired Kafka address. This will list all of the log end offsets for each partition in the topic.
install kafkacat, its an easy to use kafka tool:
sudo apt-get update
sudo apt-get install kafkacat
kafkacat -C -b <kafka-broker-ip-and-port> -t <topic> -o -1
This will not consume anything because the offset is incremented after a message is added. But it will give you the offsets for all the partitions. Note however that this isn't the current offset that you are consuming at... The above answers will help you more in terms of looking into partition lag.
Following is the command you would need to get the offset of all partitions for a given kafka topic for a given consumer group:
kafka-consumer-groups --bootstrap-server <kafka-broker-list-with-ports> --describe --group <consumer-group-name>
Please note that the <consumer-group-name> at the end is important as the offsets are committed by consumers that are typically a part of a consumer group.
The output of this command may look something like:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
<topic-name> 0 62 62 0 <consumer-id> <host> <client>
In your post however, you're trying to get this information for the internal topic __consumer_offsets so you would need a consumer group which would have consumers consuming from this internal topic. You could perhaps do the following:
kafka-console-consumer --bootstrap-server <kafka-broker-list-with-ports> --topic __consumer_offsets --formatter "kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter" --max-messages 5
Output of the above command:
[<consumer-group-name>,<topic-name>,0]::[OffsetMetadata[481690879,NO_METADATA],CommitTime 1479708539051,ExpirationTime 1480313339051]
Just use the <consumer-group-name> from the output and put it in the kafka-consumer-groups command mentioned in the beginning and you'll get the offset details for all the 50 partitions for the given consumer group only.
I hope this helps.

Kafka ConsumerGroup does not exist

Setting up Kafka first time, Kafka 0.11. Using pretty much default configurations. Produced produced some messages to topic ABC. 2 Consumers are coded to consume messages from the same topic. Each consumer belongs to different group id GROUP.1 and GROUP.2
Want to look into the topic for all the messages and also the offset details.
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group GROUP.1
throws following error,
Error: The consumer group 'GROUP.1' does not exist.
Same error for GROUP.2 also. I got some output without error for one of the group yesterday, but not today. What I'm I missing? Need to configure somewhere to persist consumer group details, or will the command work only when the consumers with given group id is currently running, or?
I tried kafka-consumer-groups --zookeeper localhost:2181 --describe --group GROUP.1 but got the same error.
Also tried Kafka-consumer-offset-checker command.
kafka-consumer-offset-checker --zookeeper localhost:2181 --topic ABC --group GROUP.1
[2017-12-19 19:25:01,654] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Exiting due to: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/GROUP.1/offsets/ABC/2.
As you said you saw the group details yesterday, it's probably worth noting that by default offsets are only stored for 24 hours. So if you group has not committed offsets in 24 hours, Kafka has no more information about it.
If this is indeed the issue, you can increase the time by setting offsets.retention.minutes to a larger value.