Kafka consumer retrieve data using offset - apache-kafka

While writing data into Kafka Producer can I get offset of the record
Can I use the same offset and partition to retrieve specific record
Please share example if you can

When you send a record to Kafka, in order to know the offset and the partition assigned to such a record you can use one of the overloaded versions of the send method. There is the one with a Callback parameter which expose the onCompletion method which provides you a RecordMetadata instance with the information you want.
You can take a look at the Kafka Producer API for that here :
https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
From a consumer side, if you want to recover a specific record starting a specific offset, you can use the assign method (instead of subscribe) in order to have the consumer being assigned to a specific partition and then you can use seek for specifying the offset. Pay attention that the consumer won't receive just one record but all the records starting from that offset.
For this info see the Kafka Consumer API as well.
https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

Related

Is it possible in Kafka to read messages in reverse manner?

Can be created a new consumer group with a consumer which assigned to existing topiс, but somehow set a preference to consume backward: offset will move from the latest message for the moment to the earliest in every partition?
Kafka topics are meant to be consumed sequentually in the order of appearance within the topic partitions.
However, I see two options to solve your issue:
You can steer the consumer what data it poll from the topic partition like: Have your consumer seek to the latestet offset, then consume it and then seek to the latest offset minus one but read only one offset. Again seek to the previous offset and so on. Although I have never seen it, this should be possible with the consumer.seek and the ConsumerConfiguration max.poll.records.
You could use any kind of state store and order it descending by the offset for each partition. Then have another consumer reading the state store in the desired order.

Fetch message with specified key using Kafka Listener vs Kafka Consumer

I use a SpringBoot App to produce or consume/listen to Kafka
messages.
I produce a message in the topic and consume/listen to the
specific message by comparing the messageKey and then send the
consumed message for further processing.
I am stuck with what approach will be better suited to my requirement to get specific message i.e. Kafka Listener or Kafka Consumer what ?
KafkaListener is a Spring specific concept that wraps the Kafka Consumer API.
There is no way to get a message by a particular offset given the key. You must calculate the partition, then scan the entire offset

How does Zookeeper/Kafka retain offset for a consumer?

Is the offset a property of the topic/partition, or is it a property of a consumer?
If it's a property of a consumer, does that mean multiple consumers reading from the same partition could have different offsets?
Also what happens to a consumer if it goes down, how does Kafka know it's dealing with the same consumer when it comes back online? presumably a new client ID is generated so it wouldn't have the same ID as previously.
In most cases it is a property of a Consumer Group. When writing the consumers, you normally specify the consumer group in the group.id parameter. This group ID is used to recover / store the latest offset from / in the special topic __consumer_offsets where it is stored directly in the Kafka cluster it self. The consumer group is used not only for the offset but also to ensure that each partition will be consumed only from a single client per consumer group.
However Kafka gives you a lot of flexibility - so if you need you can store the offset somewhere else and you can do it based on whatever criteria you want. But in most cases following the consumer group concept and storing the offset inside Kafka is the best thing you can do.
Kafka identifies consumer based on group.id which is a consumer property and each consumer should have this property
A unique string that identifies the consumer group this consumer belongs to. This property is required if the consumer uses either the group management functionality by using subscribe(topic) or the Kafka-based offset management strategy
And coming to offset it is a consumer property and broker property, whenever consumer consumes messages from kafka topic it will submit offset (which means consumed this list of messages from 1 to 10) next time it will start consuming from 10, offset can be manually submitted or automatically submitted enable.auto.commit
If true the consumer's offset will be periodically committed in the background.
And each consumer group will have its offset, based on that kafka server identifies either new consumer or old consumer was restarted

Apache Kafka- Algorithm/Strategy used to pull messages from different partitions of a same topic by a Single Consumer

I have been studying Apache Kafka for a while now.
Lets consider the following example.
Consider I have a topic with 3 partitions. I have a single producer and single consumer. I am producing my messages without specifying the key attribute.
So i know on the producer side, when i publish a message, the strategy used by kafka to assign a message to either of those partitions would be Round-Robin.
Now, what i want to know is when I start a single consumer belonging to a certain consumer group listening to that same topic, what strategy will it use to pull the messages from the different partitons(as there are 3)?
Would it follow the a similar round-robin model, where it will send a fetch request to a leader of a partition 1, wait for a response, get the response, return the records to process. Then, send a fetch request to the leader of a partition 2 and so on?
If it follows some other strategy/algorithm, I would love to know what it is?
Thank you in advance.
There is no ordering guarantee outside of a partition so in a way that algorithm used is moot to the end user and subject to change.
Today, there is nothing terribly complex that happens in this instance. The protocol shows you that a fetch request includes a partition so you get a fetch per partition. That means the order depends on the consumer. A partition won't be starved because fetch requests will happen for all partitions assigned to the consumer.

kafka subscribe commit offset manually

I am using Kafka 9 and confused with the behavior of subscribe.
Why does it expects group.id with subscribe.
Do we need to commit the offset manually using commitSync. Even if don't do that I see that it always starts from the latest.
Is there a way a replay the messages from beginning.
Why does it expects group.id with subscribe?
The concept of consumer groups is used by Kafka to enable parallel consumption of topics - every message will be delivered once per consumer group, no matter how many consumers actually are in that group. This is why the group parameter is mandatory, without a group Kafka would not know how this consumer should be treated in relation to other consumers that might subscribe to the same topic.
Whenever you start a consumer it will join a consumer group, based on how many other consumers are in this consumer group it will then be assigned partitions to read from. For these partitions it then checks whether a list read offset is known, if one is found it will start reading messages from this point.
If no offset is found, the parameter auto.offset.reset controls whether reading starts at the earliest or latest message in the partition.
Do we need to commit the offset manually using commitSync? Even if
don't do that I see that it always starts from the latest.
Whether or not you need to commit the offset depends on the value you choose for the parameter enable.auto.commit. By default this is set to true, which means the consumer will automatically commit its offset regularly (how often is defined by auto.commit.interval.ms). If you set this to false, then you will need to commit the offsets yourself.
This default behavior is probably also what is causing your "problem" where your consumer always starts with the latest message. Since the offset was auto-committed it will use that offset.
Is there a way a replay the messages from beginning?
If you want to start reading from the beginning every time, you can call seekToBeginning, which will reset to the first message in all subscribed partitions if called without parameters, or just those partitions that you pass in.