Why my kafka cosumer group does not work? - apache-kafka

I'm using sarama-cluster (written by Golang kafka consumer client)
In broker, my topic's partition offset was 11000 and my consumer group's partition offset was 10100.
Then I run my cluster-consumer, but nothing consume. (consume time was 1~2days later)
But when I produce message in the topic's partition, it consume! (In each partition)
A number of message is 901.
Why is it, that my consumer-cluster consume seems to activate when produce message?
My consumer setting was auto.offset.reset = lastest

This is because of your offset reset settings. auto.offset.reset = latest means your consumer group should wait for the newest records. If you want to consume from the beginning, use auto.offset.reset = earliest.
The official Kafka documentation: https://kafka.apache.org/0110/documentation.html

Related

Kafka Consumer not consuming from last commited offset after restart

I have a consumer polling from subscribed topic. It consumes each message and does some processing (within seconds), pushes to different topic and commits offset.
There are totally 5000 messages,
before restart - consumed 2900 messages and committed offset
after restart - started consuming from offset 0.
Even though consumer is created with same consumer group, it started processing messages from offset 0.
kafka version (strimzi) > 2.0.0
kafka-python == 2.0.1
We don't know how many partitions you have in your topic but when consumers are created within a same consumer group, they will consume records from different partitions ( We can't have two consumers in a consumer group that consume from the same partition and If you add a consumer the group coordinator will execute the process of Re-balancing to reassign each consumer to a specific partition).
I think the offset 0 comes from the property auto.offset.reset which can be :
latest: Start at the latest offset in log
earliest: Start with the earliest record.
none: Throw an exception when there is no existing offset data.
But this property kicks in only if your consumer group doesn't have a valid offset committed.
N.B: Records in a topic have a retention period log.retention.ms property so your latest messages could be deleted when your are processing the first records in the log.
Questions: While you want to consume message from one topic and process data and write them to another topic why you didn't use Kafka Streaming ?

Kafka: Who maintains that upto which offset number message is read by a consumer group?

I know that all the messages (or offset) in a Kafka Queue Partition has its offset number and it takes care of the sequence of offsets.
But if I have a Kafka Consumer Group (or single Kafka Consumer) which is reading particularly the Kafka Topic Partition then how it maintains up to which offset messages are read and who maintains this offset counter?
If the consumer goes down then how a new consumer will start reading the offset from the next unread (or not acknowledged) offset.
The information about Consumer Groups is all stored in the internal Kafka topic __consumer_offsets. Whenever a new group tries to read data from a topic it checks its offset position in that internal topic which has a deletion policy set to compact. The compaction keeps this topic small.
Kafka comes with a command line tool kafka-consumer-groups.sh that helps you understand which information is stored for each consumer group.
More information is given in the Kafka Documentation on offset tracking.

Consume Kafka Message using poll mode

I am new to Kafka, and I am using Kafka 1.0.
I read the kafka messages using pull mode, that is, I periodically poll()ing the Kafka topic for new messages, but I didn't write the offset back to Kafka.
I would ask how kafka knows that which offsets I have consumed or what is the mechanism that Kafka remembers the progress(Kafka offset)
Every consumer group maintains its offset per topic partitions. Since v0.9 the information of committed offsets for every consumer group is stored in an internal topic called (by default) __consumer_offsets (prior to v0.9 this information was stored on Zookeeper). When the offset manager receives an OffsetCommitRequest, it appends the request to a special compacted Kafka topic named __consumer_offsets. Finally, the offset manager will send a successful offset commit response to the consumer, only when all the replicas of the offsets topic receive the offsets.

Kafka multiple consumer

When we have multiple consumer reading from the topic with single partition Is there any possibility that all the consumer will get all the message.
I have created the two consumers with manual offset commit.started the first consumer and after 2 mins started 2nd consumer . The second consumer is reading from the message from where the 1st consumer stopped reading. Is there any possibility that the 2nd consumer will read all the message from beginning.I'm new to kafka please help me out.
In your consumer, you would be using commitSync which commits offset returned on last poll. Now, when you start your 2nd consumer, since it is in same consumer group it will read messages from last committed offset.
Messages which your consumer will consumes depends on the ConsumerGroup it belongs to. Suppose you have 2 partitions and 2 consumers in single Consumer Group, then each consumer will read from different partitions which helps to achieve parallelism.
So, if you want your 2nd consumer to read from beginning, you can do one of 2 things:
a) Try putting 2nd consumer in different consumer group. For this consumer group, there won't be any offset stored anywhere. At this time, auto.offset.reset config will decide the starting offset. Set auto.offset.reset to earliest(reset the offset to earliest offset) or to latest(reset the offset to latest offset).
b) Seek to start of all partitions your consumer is assigned by using: consumer.seekToBeginning(consumer.assignment())
Documentation: https://kafka.apache.org/11/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning-java.util.Collection-
https://kafka.apache.org/documentation/#consumerconfigs
Partition is always assigned to unique consumer in single consumer group irrespective of multiplpe consumers. It means only that consumer can read the data and others won't consume data until the partition is assigned to them. When consumer goes down, partition rebalance happens and it will be assigned to another consumer. Since you are performing manual commit, new consumer will start reading from committed offset.

Kafka MirrorMaker's consumer not fetching all messages from topics

I am trying to setup a Kafka Mirror mechanism, but it seems the Kafka MirrorMaker's consumer from the source Kafka cluster only reads from new incoming data to the topics as soon as the mirror maker process is started, i.e. it does not read historically saved data in the topics previously.
I am using Kafka MirrorMaker class for that as:
/bin/kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config consumer.config --num.streams 2 --producer.config producer.config --whitelist=".*"
consumer.config to read from Kafka source cluster, as:
zookeeper.connect=127.0.0.1:2181
zookeeper.connection.timeout.ms=6000
group.id=kafka-mirror
and producer.config settings to produce to the new Kafka mirrored cluster:
metadata.broker.list=localhost:9093
producer.type=sync
compression.codec=none
serializer.class=kafka.serializer.DefaultEncoder
Is there a way to define the consumer of Kafka MirrorMaker to read from the beginning of the topics of my source Kafka cluster? A bit strange, because I have defined in the consumer.config settings a new consumer group (kafka-mirror), so the consumer should just read from offset 0, i.e. from beginning of topics.
Many thanks in advance!
In consumer properties, add
auto.offset.reset=earliest
This should work
Look at the auto.offset.reset parameter from Kafka consumer configuration.
From Kafka documentation:
auto.offset.reset largest
What to do when there is no initial offset in Zookeeper or if an
offset is out of range:
* smallest : automatically reset the offset to the smallest offset
* largest : automatically reset the offset to the largest offset
* anything else: throw exception to the consumer. If this is set to largest, the consumer may lose some messages when the number of
partitions, for the topics it subscribes to, changes on the broker. To
prevent data loss during partition addition, set auto.offset.reset to
smallest
So, using smallest for auto.offset.reset should fix your problem.
Very late answer but this might be helpful to some one who is still looking for.
As of now kafka mirror doesn't support this. There is an open defect .KafkaMirror