kafka-run-class throwing java.lang.OutOfMemoryError error [duplicate] - apache-kafka

I have being using the below CMD to get the latest offsets in from a Kafka Queue which has plain-text port open
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list server:9092 --topic sample_topic --time -1
But, now we only have the SSL port open, so I tried passing the SSL details as a property file
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list server:9093 --topic sample_topic --time -1 --consumer-config /path/to/file
Getting the below error -
Exception in thread "main" joptsimple.UnrecognizedOptionException: consumer-config is not a recognized option
How do I pass the SSL details to this command? These are all the available arguments for kafka-run-class.sh kafka.tools.GetOffsetShell
--broker-list <String: hostname:and port,...,hostname:port>
--max-wait-ms <Integer: ms>
--offsets <Integer: count>
--partitions <String: partition ids>
--time <Long: timestamp/-1(latest)/-2
--topic <String: topic>

Unfortunately kafka.tools.GetOffsetShell only supports PLAINTEXT connection. This tools is not used a lot and nobody has bothered updating it.
Depending on your use case, you have a few options:
Use the kafka-consumer-groups.sh tool: Assuming you have a consumer group consuming from that topic, this tool display the log end offsets of each partitions
Patch kafka.tools.GetOffsetShell: It's realtively easy to add support to secured connections bby reusing logic from the other tool. If you do so, consider sending a patch to Kafka =)
Write a tiny tool that calls Consumer.endOffsets()
Use kafka.tools.DumpLogSegments: As a last resort this tool can also be used to find the last offset

Related

Get the latest offsets in SSL Enabled Kafka via CMD

I have being using the below CMD to get the latest offsets in from a Kafka Queue which has plain-text port open
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list server:9092 --topic sample_topic --time -1
But, now we only have the SSL port open, so I tried passing the SSL details as a property file
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list server:9093 --topic sample_topic --time -1 --consumer-config /path/to/file
Getting the below error -
Exception in thread "main" joptsimple.UnrecognizedOptionException: consumer-config is not a recognized option
How do I pass the SSL details to this command? These are all the available arguments for kafka-run-class.sh kafka.tools.GetOffsetShell
--broker-list <String: hostname:and port,...,hostname:port>
--max-wait-ms <Integer: ms>
--offsets <Integer: count>
--partitions <String: partition ids>
--time <Long: timestamp/-1(latest)/-2
--topic <String: topic>
Unfortunately kafka.tools.GetOffsetShell only supports PLAINTEXT connection. This tools is not used a lot and nobody has bothered updating it.
Depending on your use case, you have a few options:
Use the kafka-consumer-groups.sh tool: Assuming you have a consumer group consuming from that topic, this tool display the log end offsets of each partitions
Patch kafka.tools.GetOffsetShell: It's realtively easy to add support to secured connections bby reusing logic from the other tool. If you do so, consider sending a patch to Kafka =)
Write a tiny tool that calls Consumer.endOffsets()
Use kafka.tools.DumpLogSegments: As a last resort this tool can also be used to find the last offset

kafka consumer group id does not work as expected

I am new people on apache Kafka. When I go through quick start instruction via http://kafka.apache.org/quickstart with latest version kafka_2.12-2.2.0. I got a problem and can't figure it out by myself.
The issue is, on my laptop, I created 3 brokers to simulate cluster situation.
Each broker has its owned server property file. I made below change for each server property file and leave other default value as what it is.
broker.id=1 (server2: broker.id=2; server3: broker.id=3)
listeners=PLAINTEXT://127.0.0.1:9092 (server2: 127.0.0.1:9023; server3: 127.0.0.1:9004)
log.dirs=/tmp/kafka-logs (server2: /tmp/kafka-logs-2; server3: /tmp/kafka-logs-3)
num.partitions=3 (for all servers)
offsets.topic.replication.factor=3 (for all servers)
After I started ZK and those 3 brokers, I (can) create a topic 'TestTopic' with 3 partitions on any broker
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 3 --topic TestTopic
And then I use below command to start 3 consumers in the same group 'rickygroup'.
//consumer one
bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --from-beginning --topic TestTopic —group.id rickygroup —group.name rickygroup
//consumer two
bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9093 --from-beginning --topic TestTopic —group.id rickygroup —group.name rickygroup
//consumer three
bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9094 --from-beginning --topic TestTopic —group.id rickygroup —group.name rickygroup
Now, I use another terminal to publish some messages on Topic 'TestTopic'. The issue is, all of the above 3 consumers will get all and exactly the same messages. My understanding is 3 consumers should consume all messages indifference instead of the same. Otherwise, the consumer group shows repeated consuming instead of balance consuming.
Is there any misunderstanding on consumer group concept by me? or anything I did wrong here?
The console consumer uses --group (with two dashes), not -group.id and/or -group.name, which are not parsed options.

Unable to read from file through Kafka producer

I am trying to read a file using kafka producer.Zookeeper and Broker server are running. I am able to read inputs from command prompt using Kafka producer and Consumers using below commands -
Kafka Producer
kafka-console-producer --topic incoming --broker localhost:9092
Kafka Consumer
kafka-console-consumer --topic incoming --zookeeper localhost:2181
For reading from file i had tried below command line arguments -
kafka-console-producer -–broker-list localhost:9092 -–topic incoming --new-producer < C:\abc.txt
but it produced below error -
û is not a recognized option
I googled the message and it says about correcting the producer command which looks correct to me.
For kafka-10 you don't need to pass --new-producer flag. Following command is working for me:
kafka-console-producer.sh --broker-list localhost:9092 --topic incoming < C:\abc.txt

How can we run multiple kafka consumers through command line?

I am testing kafka performance through the shell script they already provided in the kafka package. I have created a topic with 10 partitions and pumping data as shown below:
./bin/kafka-producer-perf-test.sh --topic test-topic --num-records 9000000 --record-size 300 --throughput 250000 --producer-props bootstrap.servers=110.17.14.302:9092 acks=1 max.in.flight.requests.per.connection=1 batch.size=5000
Now I want to consume the data which I am pumping as shown above from multiple consumers not just from single consumer. So I started using kafka-consumer-perf-test.sh. This is what I was doing:
./bin/kafka-consumer-perf-test.sh --zookeeper localhost:2181 --topic test-topic --group test1
Is there any way by which we can run multiple kafka consumers in a single consumer group through command line and each of those consumers working on different partitions using kafka-consumer-perf-test.sh? I am working with Kafka version 0.10.1.0
I saw this so post but it doesn't say where to configure how many consumers we want to run and what partition they will work on?
Update:
This is the error I saw:
./bin/kafka-consumer-perf-test.sh --zookeeper 110.27.14.10:2181 --messages 50 --topic test-topic --threads 1
[2017-01-11 22:34:09,785] WARN [ConsumerFetcherThread-perf-consumer-14195_kafka-cluster-3098529006-zeidk-1484174043509-46a51434-2-0], Error in fetch kafka.consumer.ConsumerFetcherThread$FetchRequest#54fb48b6 (kafka.consumer.ConsumerFetcherThread)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
at kafka.network.BlockingChannel.readCompletely(BlockingChannel.scala:129)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:120)
at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:99)
at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:83)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:132)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:132)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:132)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:131)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:131)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:131)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:130)
at kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:109)
at kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:29)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
Just run the same command (i.e., ./bin/kafka-consumer-perf-test.sh) multiple times in different consoles.
About partition assignment: Kafka will so this automatically for you. If you use consumer groups.
If you want to do manual partition assignment, you cannot use consumer groups. For this, you cannot use kafka-consumer-perf-test.sh but need to write your own.
Read JavaDoc here: https://kafka.apache.org/0101/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

Consumer not receiving messages, kafka console, new consumer api, Kafka 0.9

I am doing the Kafka Quickstart for Kafka 0.9.0.0.
I have zookeeper listening at localhost:2181 because I ran
bin/zookeeper-server-start.sh config/zookeeper.properties
I have a single broker listening at localhost:9092 because I ran
bin/kafka-server-start.sh config/server.properties
I have a producer posting to topic "test" because I ran
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
yello
is this thing on?
let's try another
gimme more
When I run the old API consumer, it works by running
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
However, when I run the new API consumer, I don't get anything when I run
bin/kafka-console-consumer.sh --new-consumer --topic test --from-beginning \
--bootstrap-server localhost:9092
Is it possible to subscribe to a topic from the console consumer using the new api? How can I fix it?
I my MAC box I was facing the same issue of console-consumer not consuming any messages when used the command
kafka-console-consumer --bootstrap-server localhost:9095 --from-beginning --topic my-replicated-topic
But when I tried with
kafka-console-consumer --bootstrap-server localhost:9095 --from-beginning --topic my-replicated-topic --partition 0
It happily lists the messages sent. Is this a bug in Kafka 1.10.11?
I just ran into this issue and the solution was to delete /brokers in zookeeper and restart the kafka nodes.
bin/zookeeper-shell <zk-host>:2181
and then
rmr /brokers
Not sure why this solves it.
When I enabled debug logging, I saw this error message over and over again in the consumer:
2017-07-07 01:20:12 DEBUG AbstractCoordinator:548 - Sending GroupCoordinator request for group test to broker xx.xx.xx.xx:9092 (id: 1007 rack: null)
2017-07-07 01:20:12 DEBUG AbstractCoordinator:559 - Received GroupCoordinator response ClientResponse(receivedTimeMs=1499390412231, latencyMs=84, disconnected=false, requestHeader={api_key=10,api_version=0,correlation_id=13,client_id=consumer-1}, responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) for group test
2017-07-07 01:20:12 DEBUG AbstractCoordinator:581 - Group coordinator lookup for group test failed: The group coordinator is not available.
2017-07-07 01:20:12 DEBUG AbstractCoordinator:215 - Coordinator discovery failed for group test, refreshing metadata
For me the solution described in this thread worked - https://stackoverflow.com/a/51540528/7568227
Check if
offsets.topic.replication.factor
(or probably other config parameters related to replication)
is not higher than the number of brokers. That was the problem in my case.
There was no need to use --partition 0 anymore after this fix.
Otherwise I recommend to follow the debugging procedure described in the mentioned thread.
In my case, this doesn't work
kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
and this works
kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic --partition 0
because the topic __consumer_offsets was located on the unaccessible broker. Basically, I'd forgotten to replicate it. Relocating __consumer_offsets solved my issue.
Was getting the same issue on my Mac.
I checked the logs and found the following error.
Number of alive brokers '1' does not meet the required replication factor '3' for the offsets topic (configured via 'offsets.topic.replication.factor').
This error can be ignored if the cluster is starting up and not all brokers are up yet.
This can be fixed by changing the replication factor to 1. Add the following line in server.properties and restart Kafka/Zookeeper.
offsets.topic.replication.factor=1
I got the same problem, now I have figured out.
When you use --zookeeper, it is supposed to be provided with zookeeper address as parameter.
When you use --bootstrap-server, it is supposed to be provided with broker address as parameter.
Your localhost is the foo here.
if you replace the localhost word for the actual hostname, it should work.
like this:
producer
./bin/kafka-console-producer.sh --broker-list \
sandbox-hdp.hortonworks.com:9092 --topic test
consumer:
./bin/kafka-console-consumer.sh --topic test --from-beginning \
--bootstrap-server bin/kafka-console-consumer.sh --new-consumer \
--topic test --from-beginning \
--bootstrap-server localhost:9092
This problem also impacts ingesting data from the kafka using flume and sink the data to HDFS.
To fix the above issue:
Stop Kafka brokers
Connect to zookeeper cluster and remove /brokers z node
Restart kafka brokers
There is no issue with respect to kafka client version and scala version that we are using the cluster. Zookeeper might have wrong information about broker hosts.
To verify the action:
Create topic in kafka.
$ kafka-console-consumer --bootstrap-server slavenode01.cdh.com:9092 --topic rkkrishnaa3210 --from-beginning
Open a producer channel and feed some messages to it.
$ kafka-console-producer --broker-list slavenode03.cdh.com:9092 --topic rkkrishnaa3210
Open a consumer channel to consume the message from a specific topic.
$ kafka-console-consumer --bootstrap-server slavenode01.cdh.com:9092 --topic rkkrishnaa3210 --from-beginning
To test this from flume:
Flume agent config:
rk.sources = source1
rk.channels = channel1
rk.sinks = sink1
rk.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
rk.sources.source1.zookeeperConnect = ip-20-0-21-161.ec2.internal:2181
rk.sources.source1.topic = rkkrishnaa321
rk.sources.source1.groupId = flume1
rk.sources.source1.channels = channel1
rk.sources.source1.interceptors = i1
rk.sources.source1.interceptors.i1.type = timestamp
rk.sources.source1.kafka.consumer.timeout.ms = 100
rk.channels.channel1.type = memory
rk.channels.channel1.capacity = 10000
rk.channels.channel1.transactionCapacity = 1000
rk.sinks.sink1.type = hdfs
rk.sinks.sink1.hdfs.path = /user/ce_rk/kafka/%{topic}/%y-%m-%d
rk.sinks.sink1.hdfs.rollInterval = 5
rk.sinks.sink1.hdfs.rollSize = 0
rk.sinks.sink1.hdfs.rollCount = 0
rk.sinks.sink1.hdfs.fileType = DataStream
rk.sinks.sink1.channel = channel1
Run flume agent:
flume-ng agent --conf . -f flume.conf -Dflume.root.logger=DEBUG,console -n rk
Observe logs from the consumer that the message from the topic is written in HDFS.
18/02/16 05:21:14 INFO internals.AbstractCoordinator: Successfully joined group flume1 with generation 1
18/02/16 05:21:14 INFO internals.ConsumerCoordinator: Setting newly assigned partitions [rkkrishnaa3210-0] for group flume1
18/02/16 05:21:14 INFO kafka.SourceRebalanceListener: topic rkkrishnaa3210 - partition 0 assigned.
18/02/16 05:21:14 INFO kafka.KafkaSource: Kafka source source1 started.
18/02/16 05:21:14 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: source1: Successfully registered new MBean.
18/02/16 05:21:14 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: source1 started
18/02/16 05:21:41 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/02/16 05:21:42 INFO hdfs.BucketWriter: Creating /user/ce_rk/kafka/rkkrishnaa3210/18-02-16/FlumeData.1518758501920.tmp
18/02/16 05:21:48 INFO hdfs.BucketWriter: Closing /user/ce_rk/kafka/rkkrishnaa3210/18-02-16/FlumeData.1518758501920.tmp
18/02/16 05:21:48 INFO hdfs.BucketWriter: Renaming /user/ce_rk/kafka/rkkrishnaa3210/18-02-16/FlumeData.1518758501920.tmp to /user/ce_rk/kafka/rkkrishnaa3210/18-02-16/FlumeData.1518758501920
18/02/16 05:21:48 INFO hdfs.HDFSEventSink: Writer callback called.
Use this:
$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
Note: Remove --new-consumer from your command
For reference see here: https://kafka.apache.org/quickstart
Can you please try like this:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic my-replicated-topic
In my case it didn't worked using either approaches then I also increased the log level to DEBUG at config/log4j.properties, started the console consumer
./bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --from-beginning --topic MY_TOPIC
Then got the log below
[2018-03-11 12:11:25,711] DEBUG [MetadataCache brokerId=10] Error while fetching metadata for MY_TOPIC-3: leader not available (kafka.server.MetadataCache)
The point here is that I have two kafka nodes but one is down, by some reason by default kafka-console consumer will not consume if there is some partition not available because the node is down (the partition 3 in that case). It doesn't happen in my application.
Possible solutions are
Startup the down brokers
Delete the topic and create it again that way all partitions will be placed at the online broker node
Run the below command from bin:
./kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092
"test" is the topic name
I had this problem that consumer finished executing
in kafka_2.12-2.3.0.tgz.
Tried debugging but no logs were printed.
Try running fine with kafka_2.12-2.2.2
.Works fine.
And try running the zookeeper and kafka from the quickstart guide!
In my case, broker.id=1 in server.properties was problem.
This should be broker.id=0 when you use only one kafka server for development.
Don't forget remove all logs and restart zookeper and kafka
Remove /tmp/kafka-logs (defined in server.properties file)
Remove [your_kafka_home]/logs
Restart Zookeper and Kafka
In kafka_2.11-0.11.0.0 the zookeeper server is deprecated and and it is using bootstrap-server, and it will take broker ip address and port. If you give correct broker parameters you will be able to consume messages.
e.g. $ bin/kafka-console-consumer.sh --bootstrap-server :9093 --topic test --from-beginning
I'm using port 9093, for you it may vary.
regards.
replication factor must be at least 3
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic test