Kafka consumer not consuming from beginning - apache-kafka

I have Kafka setup on my local machine and have started the zookeeper and a single broker server.
Now i have a single topic with following description:
~/Documents/backups/kafka_2.12-2.2.0/data/kafka$ kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic edu-topic --describe
Topic:edu-topic PartitionCount:3 ReplicationFactor:1 Configs:
Topic: edu-topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Topic: edu-topic Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: edu-topic Partition: 2 Leader: 0 Replicas: 0 Isr: 0
I have a producer which have produced some message before the consumer was started as follows:
~/Documents/backups/kafka_2.12-2.2.0/data/kafka$ kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic edu-topic
>book
>pen
>pencil
>marker
>
and when i started the consumer with --from-beginning option, it does not shows all the messages produced by the producer:
~/Documents/backups/kafka_2.12-2.2.0/data/kafka$ kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic edu-topic --group edu-service --from-beginning
However, it is showing the newly added messages.
What's wrong i am doing here? Any help?

--from-beginning: If the consumer does not already have an established offset to consume from, start with the earliest message
present in the log rather than the latest message.
Kafka consumer uses --from-beginning very first time if you retry which I suspect you did, it will start from where it left. You can consume the message again with any of the below options
reset consumer group offset using below
kafka-streams-application-reset.sh --application-id edu-service
--input-topics edu-topic --bootstrap-servers localhost:9092 --zookeeper 127.0.0.1:2181
then retry again from the beginning
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic edu-topic --group edu-service --from-beginning
Use new consumer id which will start consuming from staring points
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic edu-topic --group new-edu-service --from-beginning
You can also use an offset instead to consume the next N messages from a partition
kafka-console-consumer.sh --bootstrap-server localhost:9092 --offset 0 --partition 0 --topic edu-topic
--offset <String: consume offset> : The offset id to consume from (a non- negative number), or 'earliest' which means from beginning, or
'latest' which means from end (default: latest)
--partition <Integer: partition> : The partition to consume from Consumption starts from the end of the partition unless '--offset'
is specified.

Because you are using the old consumer group. --from-beginning only works for the new consumer group which its group name has not been recorded on the Kafka cluster yet.
To re-consume again from the start, either you can:
Start a new consumer group (change the group name) with the flag --from-beginning
Reset the offsets of this consumer group. I haven't tried yet but you can test it here

The flag
--from-begining
will affect the behavior of your GroupConsumer the first time it is started/created , or the stored (last commited consuming) offset is expired (or maybe when you try to reset the stored offset).
Otherwise the GroupConsumer will just continue at the stored (last commited) offset.
Please consider get more message from manual.

Just add
--from-beginning
But do know that messages from the beginning would not be in order
if you have used multiple partitions for the same topic.
Order is only Guaranteed at the partition level. (for the same partition)

Related

How to purge or delete a topic in kafka 2.1.0 version

Would like to share different ways to purge or delete a kafka topic in 2.1.0 version. I've found similar question here Purge Kafka Topic however, the accepted answer has been deprecated and it works on Kafka version 0.8 and below hence, creating this question with answer.
This is not a duplicate question.
Kafka by default keeps the messages for 168 hrs which is 7 days. If you wanted to force kafka to purge the topic, you can do this by multiple ways. Let’s see each in detail.
1. Using kafka-configs.sh command
Temporarily change the retention policy to 1 sec.
kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --add-config retention.ms=1000 --entity-name text_topic
You can check the current value of retention policy by running below command.
kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --describe --entity-name text_topic
Configs for topic 'text_topic' are retention.ms=1000
Wait for 1 sec and remove the retention policy config which will set it back to default.
kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --delete-config retention.ms --entity-name text_topic
2. Delete topic and re-create
Before we delete an existing topic, first get the partition and replica of the current topic as you would need these to re-create the topic. You can get this information by running describe of the topic
kafka-topics.sh --zookeeper localhost:2181 --describe --topic text_topic
Topic:text_topic PartitionCount:3 ReplicationFactor:3 Configs:
Topic: text_topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Delete the topic.
kafka-topics.sh --zookeeper localhost:2181 --delete --topic text_topic
Re-create the topic with replication and partition details.
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 3 --topic text_topic
3. Manually delete the data from kafka logs.
Stop zookeeper and kafka from all nodes.
Clean kafka logs from all nodes. kafka stores its log files at
/tmp/kafka-logs/MyTopic-0 where /tmp/kafka-logs is specified by the
log.dirattribute
Restart zookeeper and kafka.
Hope this helps !!

kafka consumer not showing the messages?

I created the new topic 'rahul' with the following command :
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic rahul
Created topic "rahul".
I also re-checked the topics with
bin/kafka-topics.sh --list --zookeeper localhost:2181
__consumer_offsets
rahhy
rahul`
Now starting the producer:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic rahul
hey
hi
hello
But when the time comes to consumer to show the messages: there is nothing
As of Kafka 0.9, you don't use Zookeeper for consumption or production
Try kafka-console-consumer --topic rahul --bootstrap-server localhost:9092
There are other ways you can check messages were sent to Kafka - by checking that the offsets of the topic partitions have changed using GetOffsetShell

Kafka topic have no duplication on messages

How to achieve such outcome with messages in kafka topics?
I.e changelog-like functionality - have multiple messages coming into the topic, but I only care about the last one that came in.
Also what happens in the case topic is partitioned?
Is it possible in Kafka?
To achieve this, you should set cleanup.policy for this topic to compact, as shown below:
CREATE TOPIC:
bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic my-topic --partitions 1 --replication-factor 1 --config cleanup.policy=compact
UPDATE TOPIC:
bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name my-topic --alter --add-config cleanup.policy=compact
With compact policy set, you have to assign a key for every message and Kafka producer will partition messages based on that key.

Display Kafka Consumer Lag using java

I am not able to get any solution to print Kafka consumer Lag.
./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group test-group --zookeeper localhost:2181 --topic test01
Group Topic Pid Offset logSize Lag Owner
test-group test01 0 7 9 2 test-group_Jitendra-E5530-1497519128391-bdea2e0c-0

Kafka uncommitted messages

Lets say the partition has 4 replicas (1 leader, 3 followers) and all are currently in sync. min.insync.replicas is set to 3 and request.required.acks is set to all or -1.
The producer send a message to the leader, the leader appends it to it's log. After that, two of the replicas crashed before they could fetch this message. One remaining replica successfully fetched the message and appended to it's own log.
The leader, after certain timeout, will send an error (NotEnoughReplicas, I think) to the producer since min.insync.replicas condition is not met.
My question is: what will happen to the message which was appended to leader and one of the replica's log?
Will it be delivered to the consumers when crashed replicas come back online and broker starts accepting and committing new messages (i.e. high watermark is forwarded in the log)?
If there is no min.insync.replicas available and producer uses ack=all, then the message is not committed and consumers will not receive that message, even after crashed replicas come back and are added to the ISR list again. You can test this in the following way.
Start two brokers with min.insync.replicas = 2
$ ./bin/kafka-server-start.sh ./config/server-1.properties
$ ./bin/kafka-server-start.sh ./config/server-2.properties
Create a topic with 1 partition and RF=2. Make sure both brokers are in the ISR list.
$ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --create --topic topic1 --partitions 1 --replication-factor 2
Created topic "topic1".
$ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --describe --topic topic1
Topic:topic1 PartitionCount:1 ReplicationFactor:2 Configs:
Topic: topic1 Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Run console consumer and console producer. Make sure produce uses ack=-1
$ ./bin/kafka-console-consumer.sh --new-consumer --bootstrap-server kafka-1:9092,kafka-2:9092 --topic topic1
$ ./bin/kafka-console-producer.sh --broker-list kafka-1:9092,kafka-2:9092 --topic topic1 --request-required-acks -1
Produce some messages. Consumer should receive them.
Kill one of the brokers (I killed broker with id=2). Check that ISR list is reduced to one broker.
$ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --describe --topic topic1
Topic:topic1 PartitionCount:1 ReplicationFactor:2 Configs:
Topic: topic1 Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1
Try to produce again. In the producer you should get some
Error: NOT_ENOUGH_REPLICAS
(one per retry) and finally
Messages are rejected since there are fewer in-sync replicas than required.
Consumer will not receive these messages.
Restart the killed broker and try to produce again.
Consumer will receive these message but not those that you sent while one of the replicas was down.
From my understanding, the watermark will not advance until both failed
follow-broker recovered and caught up.
See this blog post for more details: http://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/
Error observerd
Messages are rejected since there are fewer in-sync replicas than required.
To resolve this i had increase the number of replication factors and it worked