how to get the all messages in a topic from kafka server - apache-kafka

I would like to get all the messages from beginning in a topic from server.
Ex:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testTopic --from-beginning
When using the above console command, I would like to able to get all messages in a topic from the beginning but I couldn't consume all the messages in a topic from beginning using java code.

You can get all messages using the following command:
cd Users/kv/kafka/bin
./kafka-console-consumer.sh --bootstrap-server localhost:9092 \
--topic topicName --from-beginning --max-messages 100

The easiest way would be to start a consumer and drain all the messages. Now I don't know how many partitions you have in your topic and whether you already have a an existing consumer group or not, but you have a few options:
Have a look at this API: https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
1) If you already have a consumer in the same consumer group, and still want to start consuming from the beginning, you should use the seek option listed in the API doc and set the offset to 0 for each consumer in the group. This would start consuming from the beginning.
2) Otherwise, you can start a few consumers in a new consumer group & you would not have to worry about seek.
PS: Please remember to provide more details about your setup in the future if you have more questions on Kafka. A lot of things depend on how you have configured your infrastructure & how you would prefer it to be and would thus vary from case to case.

TopicPartition topicPartition = new TopicPartition(topic, 0);
List<TopicPartition> partitions = Arrays.asList(topicPartition);
consumer.assign(partitions);
consumer.seekToBeginning(partitions);

Just change the consumer group
ConsumerConfig.GROUP_ID_CONFIG - to new group id
and set
AUTO_OFFSET_RESET_CONFIG - earliest
sample code-
props.put(ConsumerConfig.GROUP_ID_CONFIG, "newID");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

Related

How to create a new consumer group in kafka

I am running kafka locally following instructions on quick start guide here,
and then I defined my consumer group configuration in config/consumer.properties so that my consumer can pick messages from the defined group.id
Running the following command,
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
results in,
test-consumer-group <-- group.id defined in conf/consumer.properties
console-consumer-67807 <-- when connecting to kafka via kafka-console-consumer.sh
I am able to connect to kafka via a python based consumer that is configured to use the provide group.id i.e test-consumer-group
First of all, I am not able to understand how/when kafka creates consumer groups. It seems it loads the conf/consumer.properties at some point of time and additionally it implicitly creates consumer-group (in my case console-consumer-67807) when connecting via kafka-console-consumer.sh.
How can I explicitly create my own consumer group, lets say my-created-consumer-group ?
You do not explicitly create consumer groups but rather build consumers which always belong to a consumer group. No matter which technology (Spark, Spring, Flink, ...) you are using, each Kafka Consumer will have a Consumer Group. The consumer group is configurable for each individual consumer.
It seems it loads the conf/consumer.properties at some point of time and additionally it implicitly creates consumer-group (in my case console-consumer-67807) when connecting via kafka-console-consumer.sh
If you do not tell your console consumer to actually make use of that file it will not be taken into consideration.
There are the following alternatives to provide the name of a consumer group:
Console Consumer with property file (--consumer.config)
This is how the file config/consumer.properties should look like
# consumer group id
group.id=my-created-consumer-group
And this is how you would then ensure that the console-consumer takes this group.id into consideration:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning --consumer.config /path/to/config/consumer.properties
Console consumer with --group
For console consumers the consumer group gets created automatically with prefix "console-consumer" and suffix something like a PID, unless you provide your own consumer group by adding --group:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning --group my-created-consumer-group
Standard code-based consumer API
When using the standard JAVA/Scala/... Consumer API you could provide the Consumer Group through the properties:
Properties settings = new Properties();
settings.put(ConsumerConfig.GROUP_ID_CONFIG, "basic-consumer");
// set more properties
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(settings)) {
consumer.subscribe(Arrays.asList("test-topic")

How to seperate a consumer from consumer group without losing offset?

I have a logstash kafka consumer group subscribing nearly 20topics which isn't performing well for a particular high priority kafka topic to process, so I've decided to remove one topic out of the consumer group and launch a separate consumer group for high priority topics, but unfortunately with that i'm losing the offset that was there in old consumer group.
Is there anyway I can start the new logstash consumer group with the initial offset from last consumer group?
Thanks
You can use kafka scripts to set offset for new group.
Sample scenario:
Stop application.
Check current offset for group. You can use following command. The output will contains information about current offset, log end offset, lag, etc for each topic.
./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group groupId --describe
Set offset for new group id, that will be used by new application (suppose the offset for topic you would like to switch is 10001)
./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group newGroupId --to-offset 10001 --topic topicName --reset-offsets --execute
Remove topicName from topics list for old application.
Set up new logstash configuration with newGroupId group id.
Start old and new logstash applications.

Kafka Consumer offset fetch

I am using a kafka version where the offset storage is kafka i.e._consumer_offsets
How to retrieve the consumer offset ,when the consumer is down or inactive?
Below Command work but with a flaw that it reads from beginning not the latest one
kafka-console-consumer --consumer.config /tmp/consumer.config --formatter "kafka.coordinator.GroupMetadataManager\$OffsetsMessageFormatter" --zookeeper <> --topic __consumer_offsets --from-beginning
If your consumer is down and you want last committed offset of the topic for the specific group please refer to the below mentioned sources
https://github.com/yahoo/kafka-manager
KafkaConsumerOffsets

How to check the lag of a consumer in Kafka which is assigned with a particular partition of a topic?

I want to check the lag for a consumer group which was assigned manually to particular topic , is this possible ? . I am using Kafka - 0.10.0.1 .I used sh kafka-run-class.sh kafka.admin.ConsumerGroupCommand —new-consumer —describe —bootstrap-server localhost:9092 —group test but it says no group exists , so i wonder when we assign a partition manually can we check the lag for the consumer.
In Nussknacker we are using AKHQ GUI tool which provides various monitoring options as consumer and consumer groups lag and general Kafka operations as topic, topic data and schema registry management
./kafka-consumer-groups.sh --bootstrap-server localhost:9092
--describe --group
.
If You want API support or visual lag monitoring you can use https://github.com/yahoo/kafka-manager

Kafka 10.2 new consumer vs old consumer

I've spent some hours to figure out what was going on but didn't manage to find the solution.
Here is my set up on a single machine:
1 zookeeper running
3 broker running (on port 9092/9093/9094)
1 topic with 3 partitions and 3 replications (each partition are properly assigned between brokers)
I'm using kafka console producer to insert messages. If i check the replication offset (cat replication-offset-checkpoint), I see that my messages are properly ingested by Kafka.
Now I use the kafka console consumer (new):
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic testTopicPartitionned2
I dont see anything consumed. I tried to delete my logs folder (/tmp/kafka-logs-[1,2,3]), create new topics, still nothing.
However when I use the old kafka consumer:
sudo bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testTopicPartitionned2
I can see my messages.
Am I missing something big here to make this new consumer work ?
Thanks in advance.
Check to see what setting the consumer is using for auto.offset.reset property
This will affect what a consumer group without a previously committed offset will do in terms of setting where to start reading messages from a partition.
Check the Kafka docs for more on this.
Try providing all your brokers to --bootstrap-server argument to see if you notice any differnce:
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --from-beginning --topic testTopicPartitionned2
Also, your topic name is rather long. I assume you've already made sure you provide the correct topic name.