Runtime consume record from the offset in kafka spring boot - apache-kafka

I want to read the record from the Kafka runtime passing parameter(offset).
I am using a #KafkaListener but in that, I am unable to set the offset runtime of the user request. And if no offset is passed it will consume the latest records. Any help is appreciated.

The latest 2.8 release has a new feature where you can use the KafkaTemplate to receive a specific record at a specific offset.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#kafka-template-receive
If you want to receive all records from that offset, use the seek mechanisms provided by the container.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#seek

Related

How to reread Kafka topic from the beginning

I have Spring Boot app that uses Kafka via Spring Kafka module. A neighboring team sends data in JSON format periodically to a compacted topic that serves as "snapshot" of their internal database at a certain moment of time. But,sometimes, the team updates contract without notification, our DTOs don't reflect the recent changes and, obviously, deserialization fails, and, because we have our listener containers configured as batched with default BATCH ack mode and BatchLoggingErrorHandler, we found, from time to time, that our Kuber pod with consumer is full of errors and we can't reread topic with fresh DTOs after microservice redeploy that simple since the last offset in every partition is commited and we can't change group.id (InfoSec department policy) to use auto.offset.reset = "earliest" as a workaround.
So, is there a way to reposition every consumer in a consumer group to the initial offset in assigned partitions programmaticaly? If it is true, I think we could write a REST endpoint, which being called triggers a "reprocessing from scratch".
See the documentation.
If your listener extends AbstractConsumerSeekAware, you can perform all kinds of seek operations (e.g. initial seek during initialization, arbitrary seeks between polls).

How to manage offsets in KafkaItemReader used in spring batch job in case of any exception occurs in mid of reading messages

I am working on a Kafka based spring boot application for the first time. My requirement was to create an output file with all the records using spring batch. I created a spring batch job where integrated with a customized class which extends KafkaItemReader. I don't want to commit the offsets for now as i might need to go back read some records from already consumed offsets. My consumer config has these properties;
enable.auto.commit: false
auto-offset-reset: latest
group.id:
There are two scenarios->
1. A happy path, where i can read all the messages from kafka topic and transform them and then write them to an output file using above configuration.
2. I am getting an exception while reading thru' the messages, and i am not sure how to manage the offsets in such cases. Even if i go back to rest the offset, how to make sure it is the correct offset for messages. I dont persist the payload of message record anywhere except it goes to the spring batch output file.
You need to use a persistent Job Repository for that and configure the KafkaItemReader to save its state. The state consists in the offset of each partition assigned to the reader and will be saved at chunk boundaries (aka at each transaction).
In a restart scenario, the reader will be initialized with the last offset for each partition from the execution context and resume where it left off.

Kafka consumer retrieve data using offset

While writing data into Kafka Producer can I get offset of the record
Can I use the same offset and partition to retrieve specific record
Please share example if you can
When you send a record to Kafka, in order to know the offset and the partition assigned to such a record you can use one of the overloaded versions of the send method. There is the one with a Callback parameter which expose the onCompletion method which provides you a RecordMetadata instance with the information you want.
You can take a look at the Kafka Producer API for that here :
https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
From a consumer side, if you want to recover a specific record starting a specific offset, you can use the assign method (instead of subscribe) in order to have the consumer being assigned to a specific partition and then you can use seek for specifying the offset. Pay attention that the consumer won't receive just one record but all the records starting from that offset.
For this info see the Kafka Consumer API as well.
https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

How to set offset in kafkalistener

I'm trying to consume data from kafka server, I'm using #Kafkalistener annotation.
The problem is that I'm getting all the messages every time I restart my application.
How do I save the last consumed offset in my application and use it to consume next messages?
This tutorial has the answer:
https://www.codenotfound.com/spring-kafka-boot-example.html
The kafka.consumer.auto-offset-reset property needs to be set to
'earliest' which ensures the new consumer group will get the message
sent in case the container started after the send was completed.
So you should set it to latest as per your requirement

kafka: keep track of producer offset

I'm using Apache Kafka (no confluent) and have a simple Kafka producer that pulls from a rest API, sends data to Kafka and shuts down.
I had this for testing while I was developing a consumer.
In the consumer I can keep track of the offset, but I don't seem to be able to set a custom offset in my producer.
In my rest calls I would need to keep track of a date so that I don't pull the same data all the time. Am I forced to store that "last timestamp" myself or am I missing something?
I think that in your scenario you aren't interested in a "Kafka" offset on the producer side (when you write to Kafka) but an "offset" tracing latest data you pulled from the REST API so you are right you have to do that by yourself.
On the Kafka producer side you are able to know the offset assigned to the latest sent message (inside the RecordMetadata) but it has no relationship with the latest timestamp when you pulled data from the REST API.