Uncommitted event is not received in the next poll - offset

I have a consumer with max.poll.records set to 1 and enable.auto.commit set to false for the manual offset control. However even when I am not calling commitSync, the subsequent poll is returning next event. Here are the details, I produced 4 events onto a topic, in consumer I am not committing for the third event I am skipping commitSync, I was expecting the third event to be returned in the next poll but fourth event has been returned. I am puzzled how evet 3 has been committed.
private static void pauseAndResume() {
int retryDelay = 5; // seconds
SimpleDateFormat sdf = new SimpleDateFormat("HH:mm:ss");
SimpleProducer.produce(4); //(produces Event1, Event2, Event3, Event4)
Properties properties = new Properties();
String topicName = "output-topic";
properties.put("bootstrap.servers", "localhost:29092");
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("group.id", "test-group");
properties.put("max.poll.records", 1);
properties.put("enable.auto.commit", false);
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<String, String>(properties);
List<String> topics = new ArrayList<String>();
topics.add(topicName);
kafkaConsumer.subscribe(topics);
Collection<TopicPartition> topicPartitions = new ArrayList<TopicPartition>();
PartitionInfo partitionInfo = kafkaConsumer.partitionsFor(topicName).get(0);
topicPartitions.add(new TopicPartition(partitionInfo.topic(), partitionInfo.partition()));
int eventsCount = 0;
try {
Date pausedAt = new Date();
while (true) {;
if (!kafkaConsumer.paused().isEmpty()) {
if ((new Date().getTime() - pausedAt.getTime()) / 1000 % 60 >= retryDelay) {
System.out.println("Resuming Consumer..." + sdf.format(new Date()));
kafkaConsumer.resume(topicPartitions);
}
}
ConsumerRecords<String, String> records = kafkaConsumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.println(eventsCount + ":" + record.value());
if (record.value().equals("Event3")) {
System.out.println("consumer is pausing...... for about " + retryDelay + " seconds " + sdf.format(new Date()));
kafkaConsumer.pause(topicPartitions);
pausedAt = new Date();
break;
}else {
kafkaConsumer.commitSync();
}
}
}
} catch (Exception e) {
System.out.println(e.getMessage());
} finally {
kafkaConsumer.close();
}
}
The link KafkaConsumer<K,V> doesn't tell how to stop offset advancing ):
I think some smart internals detected indefinite poll of Event3 and returned Event4 instead
As per my research (google and Kafka forums) I expect the Event3 to replayed as it was not committed, but it's not happening, request someone to point me in the right direction.
Many Thanks

I figured out a workaround to explicitly seek on the topic partition
//In this use case we are consuming from single topic which has only one partition
kafkaConsumer.seek(topicPartitions.iterator().next(), record.offset());

Related

How do I set in Kafka to not consume from where it left?

I have a Kafka consumer in Golang. I don't want to consume from where I left last time, but rather current message. How can I do it?
You can set enable.auto.commit to false and auto.offset.reset to latest for your consumer group id. This means kafka will not be automatically committing your offsets.
With auto commit disabled, your consumer group progress would not be saved (unless you do manually). So whenever the consumer is restarted for whatever reason, it does not find its progress saved and resets to the latest offset.
set a new group.id to your consumer.
Then use auto.offset.reset to define the behavior of this new consumer group, in you case: latest
Apache kafka consumer api provides a method called kafkaConsumer.seekToEnd() which can be used to ignore the existing messages and only consume messages published after the consumer has been started without changing the current group ID of the consumer.
Below is the implementation of the same. The program takes 3 arguments : topic name, group ID and offset range (0 to start from beginning, - 1 to receive messages after consumer has started, other than 0 or - 1 will imply to to consumer to consume from that offset)
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.errors.WakeupException;
import java.util.*;
public class Consumer {
private static Scanner in;
public static void main(String[] argv)throws Exception{
if (argv.length != 3) {
System.err.printf("Usage: %s <topicName> <groupId> <startingOffset>\n",
Consumer.class.getSimpleName());
System.exit(-1);
}
in = new Scanner(System.in);
String topicName = argv[0];
String groupId = argv[1];
final long startingOffset = Long.parseLong(argv[2]);
ConsumerThread consumerThread = new ConsumerThread(topicName,groupId,startingOffset);
consumerThread.start();
String line = "";
while (!line.equals("exit")) {
line = in.next();
}
consumerThread.getKafkaConsumer().wakeup();
System.out.println("Stopping consumer .....");
consumerThread.join();
}
private static class ConsumerThread extends Thread{
private String topicName;
private String groupId;
private long startingOffset;
private KafkaConsumer<String,String> kafkaConsumer;
public ConsumerThread(String topicName, String groupId, long startingOffset){
this.topicName = topicName;
this.groupId = groupId;
this.startingOffset=startingOffset;
}
public void run() {
Properties configProperties = new Properties();
configProperties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
configProperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
configProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
configProperties.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
configProperties.put(ConsumerConfig.CLIENT_ID_CONFIG, "offset123");
configProperties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,false);
configProperties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
//Figure out where to start processing messages from
kafkaConsumer = new KafkaConsumer<String, String>(configProperties);
kafkaConsumer.subscribe(Arrays.asList(topicName), new ConsumerRebalanceListener() {
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
System.out.printf("%s topic-partitions are revoked from this consumer\n", Arrays.toString(partitions.toArray()));
}
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
System.out.printf("%s topic-partitions are assigned to this consumer\n", Arrays.toString(partitions.toArray()));
Iterator<TopicPartition> topicPartitionIterator = partitions.iterator();
while(topicPartitionIterator.hasNext()){
TopicPartition topicPartition = topicPartitionIterator.next();
System.out.println("Current offset is " + kafkaConsumer.position(topicPartition) + " committed offset is ->" + kafkaConsumer.committed(topicPartition) );
if(startingOffset == -2) {
System.out.println("Leaving it alone");
}else if(startingOffset ==0){
System.out.println("Setting offset to begining");
kafkaConsumer.seekToBeginning(topicPartition);
}else if(startingOffset == -1){
System.out.println("Setting it to the end ");
kafkaConsumer.seekToEnd(topicPartition);
}else {
System.out.println("Resetting offset to " + startingOffset);
kafkaConsumer.seek(topicPartition, startingOffset);
}
}
}
});
//Start processing messages
try {
while (true) {
ConsumerRecords<String, String> records = kafkaConsumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.println(record.value());
}
if(startingOffset == -2)
kafkaConsumer.commitSync();
}
}catch(WakeupException ex){
System.out.println("Exception caught " + ex.getMessage());
}finally{
kafkaConsumer.close();
System.out.println("After closing KafkaConsumer");
}
}
public KafkaConsumer<String,String> getKafkaConsumer(){
return this.kafkaConsumer;
}
}
}

How to know when record is committed in Kafka?

In case of integration testing, I send a record into Kafka, and I would like to know when it will be processed and committed, and then do my assertions (instead of using a Thread.sleep)...
Here is my try :
public void sendRecordAndWaitUntilItsNotConsumed(ProducerRecord<String, String> record)
throws ExecutionException, InterruptedException {
RecordMetadata recordMetadata = producer.send(record).get();
TopicPartition topicPartition = new TopicPartition(recordMetadata.topic(),
recordMetadata.partition());
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerConfig)) {
consumer.assign(Collections.singletonList(topicPartition));
long recordOffset = recordMetadata.offset();
long currentOffset = getCurrentOffset(consumer, topicPartition);
while (currentOffset <= recordOffset) {
currentOffset = getCurrentOffset(consumer, topicPartition);
LOGGER.info("Waiting for message to be consumed - Current Offset = " + currentOffset
+ " - Record Offset = " + recordOffset);
}
}
}
private long getCurrentOffset(KafkaConsumer<String, String> consumer,
TopicPartition topicPartition) {
consumer.seekToEnd(Collections.emptyList());
return consumer.position(topicPartition);
}
But it doesn't work. Indeed, I disabled the commit of the consumer, and it doesn't loop on Waiting for message to be consumed...
It works using KafkaConsumer#committed instead of KafkaConsumer#position.
private void sendRecordAndWaitUntilItsNotConsumed(ProducerRecord<String, String> record) throws InterruptedException, ExecutionException {
RecordMetadata recordMetadata = producer.send(record).get();
TopicPartition topicPartition = new TopicPartition(recordMetadata.topic(),
recordMetadata.partition());
consumer.assign(Collections.singletonList(topicPartition));
long recordOffset = recordMetadata.offset();
long currentOffset = getCurrentOffset(topicPartition);
while (currentOffset < recordOffset) {
currentOffset = getCurrentOffset(topicPartition);
LOGGER.info("Waiting for message to be consumed - Current Offset = " + currentOffset
+ " - Record Offset = " + recordOffset);
TimeUnit.MILLISECONDS.sleep(200);
}
}
private long getCurrentOffset(TopicPartition topicPartition) {
OffsetAndMetadata offsetAndMetadata = consumer.committed(topicPartition);
return offsetAndMetadata != null ? offsetAndMetadata.offset() - 1 : -1;
}

kafka fetch records by timestamp, consumer loop

I am using Kafka 0.10.2.1 cluster. I am using the Kafka's offsetForTimes API to seek to a particular offset and would like to breakout of the loop when i have reached the end timestamp.
My code is like this:
//package kafka.ex.test;
import java.util.*;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.OffsetAndTimestamp;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;
public class ConsumerGroup {
public static OffsetAndTimestamp fetchOffsetByTime( KafkaConsumer<Long, String> consumer , TopicPartition partition , long startTime){
Map<TopicPartition, Long> query = new HashMap<>();
query.put(
partition,
startTime);
final Map<TopicPartition, OffsetAndTimestamp> offsetResult = consumer.offsetsForTimes(query);
if( offsetResult == null || offsetResult.isEmpty() ) {
System.out.println(" No Offset to Fetch ");
System.out.println(" Offset Size "+offsetResult.size());
return null;
}
final OffsetAndTimestamp offsetTimestamp = offsetResult.get(partition);
if(offsetTimestamp == null ){
System.out.println("No Offset Found for partition : "+partition.partition());
}
return offsetTimestamp;
}
public static KafkaConsumer<Long, String> assignOffsetToConsumer( KafkaConsumer<Long, String> consumer, String topic , long startTime ){
final List<PartitionInfo> partitionInfoList = consumer.partitionsFor(topic);
System.out.println("Number of Partitions : "+partitionInfoList.size());
final List<TopicPartition> topicPartitions = new ArrayList<>();
for (PartitionInfo pInfo : partitionInfoList) {
TopicPartition partition = new TopicPartition(topic, pInfo.partition());
topicPartitions.add(partition);
}
consumer.assign(topicPartitions);
for(TopicPartition partition : topicPartitions ){
OffsetAndTimestamp offSetTs = fetchOffsetByTime(consumer, partition, startTime);
if( offSetTs == null ){
System.out.println("No Offset Found for partition : " + partition.partition());
consumer.seekToEnd(Arrays.asList(partition));
}else {
System.out.println(" Offset Found for partition : " +offSetTs.offset()+" " +partition.partition());
System.out.println("FETCH offset success"+
" Offset " + offSetTs.offset() +
" offSetTs " + offSetTs);
consumer.seek(partition, offSetTs.offset());
}
}
return consumer;
}
public static void main(String[] args) throws Exception {
String topic = args[0].toString();
String group = args[1].toString();
long start_time_Stamp = Long.parseLong( args[3].toString());
String bootstrapServers = args[2].toString();
long end_time_Stamp = Long.parseLong( args[4].toString());
Properties props = new Properties();
boolean reachedEnd = false;
props.put("bootstrap.servers", bootstrapServers);
props.put("group.id", group);
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<Long, String> consumer = new KafkaConsumer<Long, String>(props);
assignOffsetToConsumer(consumer, topic, start_time_Stamp);
System.out.println("Subscribed to topic " + topic);
int i = 0;
int arr[] = {0,0,0,0,0};
while (true) {
ConsumerRecords<Long, String> records = consumer.poll(6000);
int count= 0;
long lasttimestamp = 0;
long lastOffset = 0;
for (ConsumerRecord<Long, String> record : records) {
count++;
if(arr[record.partition()] == 0){
arr[record.partition()] =1;
}
if (record.timestamp() >= end_time_Stamp) {
reachedEnd = true;
break;
}
System.out.println("record=>"+" offset="
+record.offset()
+ " timestamp="+record.timestamp()
+ " :"+record);
System.out.println("recordcount = "+count+" bitmap"+Arrays.toString(arr));
}
if (reachedEnd) break;
if (records == null || records.isEmpty()) break; // dont wait for records
}
}
}
I face the following problems:
consumer.poll fails even for 1000 millisecond. I had to poll a few times in loop if i use 1000 millisecond. I have an extremely large value now. But having already, seeked to the relevant offsets within a partition, how to reliably set the poll timeout so that data is returned immediately?
My observations is that when data is returned it is not always from all partitions. Even when data is returned from all partitions not all records are returned. The amount of records that are in the topic is more than 1000. But the amount of records that are actually fetched and printed in loop is less(~200). Is there any issue with the current usage of my Kafka APIs?
how to reliably break out of the loop having obtained all the data between start and end timestamp and not prematurely?
Amount of records fetched per poll depends on your consumer config
You are breaking the loop when one of the partitions reaches the endtimestamp , which is not what you want . You should check that all the partitions are seeked to end before exiting poll loop
Poll call is an async call and fetch requests and responses are per node , so you may or may not get all the responses in a poll depending on the broker response time

Unable to get number of messages in kafka topic

I am fairly new to kafka. I have created a sample producer and consumer in java. Using the producer, I was able to send data to a kafka topic but I am not able to get the number of records in the topic using the following consumer code.
public class ConsumerTests {
public static void main(String[] args) throws Exception {
BasicConfigurator.configure();
String topicName = "MobileData";
String groupId = "TestGroup";
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("group.id", groupId);
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);
kafkaConsumer.subscribe(Arrays.asList(topicName));
try {
while (true) {
ConsumerRecords<String, String> consumerRecords = consumer.poll(100);
System.out.println("Record count is " + records.count());
}
} catch (WakeupException e) {
// ignore for shutdown
} finally {
consumer.close();
}
}
}
I don't get any exception in the console but consumerRecords.count() always returns 0, even if there are messages in the topic. Please let me know, if I am missing something to get the record details.
The poll(...) call should normally be in a loop. It's always possible for the initial poll(...) to return no data (depending on the timeout) while the partition assignment is in progress. Here's an example:
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
System.out.println("Record count is " + records.count());
}
} catch (WakeupException e) {
// ignore for shutdown
} finally {
consumer.close();
}
For more info see this relevant article:

kafka consumer group is rebalancing

I am using Kafka .9 and new java consumer. I am polling inside a loop. I am getting commitfailedexcption because of group rebalance, when code try to execute consumer.commitSycn . Please note, I am adding session.timeout.ms as 30000 and heartbeat.interval.ms as 10000 to consumer and polling happens for sure in 30000. Can anyone help me out. Please let me know if any information is needed.
Here is the code :-
Properties props = new Properties();
props.put("bootstrap.servers", {allthreeservers});
props.put("group.id", groupId);
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", ObjectSerializer.class.getName());
props.put("auto.offset.reset", erlierst);
props.put("enable.auto.commit", false);
props.put("session.timeout.ms", 30000);
props.put("heartbeat.interval.ms", 10000);
props.put("request.timeout.ms", 31000);
props.put("kafka.consumer.topic.name", topic);
props.put("max.partition.fetch.bytes", 1000);
while (true) {
Boolean isPassed = true;
try {
ConsumerRecords<Object, Object> records = consumer.poll(1000);
if (records.count() > 0) {
ConsumeEventInThread consumerEventInThread = new ConsumeEventInThread(records, consumerService);
FutureTask<Boolean> futureTask = new FutureTask<>(consumerEventInThread);
executorServiceForAsyncKafkaEventProcessing.execute(futureTask);
try {
isPassed = (Boolean) futureTask.get(Long.parseLong(props.getProperty("session.timeout.ms")) - Long.parseLong("5000"), TimeUnit.MILLISECONDS);
} catch (Exception Exception) {
logger.warn("Time out after waiting till session time out");
}
consumer.commitSync();
logger.info("Successfully committed offset for topic " + Arrays.asList(props.getProperty("kafka.consumer.topic.name")));
}else{
logger.info("Failed to process consumed messages, will not Commit and consume again");
}
}
} catch (Exception e) {
logger.error("Unhandled exception in while consuming messages " + Arrays.asList(props.getProperty("kafka.consumer.topic.name")), e);
}
}
The CommitFailedException is thrown when the commit cannot be completed because the group has been rebalanced. This is the main thing we have to be careful of when using the Java client. Since all network IO (including heartbeating) and message processing is done in the foreground, it is possible for the session timeout to expire while a batch of messages is being processed. To handle this, you have two choices.
First you can adjust the session.timeout.ms setting to ensure that the handler has enough time to finish processing messages. You can then tune max.partition.fetch.bytes to limit the amount of data returned in a single batch, though you will have to consider how many partitions are in the subscribed topics.
The second option is to do message processing in a separate thread, but you will have to manage flow control to ensure that the threads can keep up.
You can set session.timeout.ms large enough that commit failures from rebalances are rare.The only drawback to this is a longer delay before partitions can be re-assigned in the event of a hard failure.
For more info please see doc
This is working example.
----Worker code-----
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.Callable;
public class Worker implements Callable<Boolean> {
ConsumerRecord record;
public Worker(ConsumerRecord record) {
this.record = record;
}
public Boolean call() {
Map<String, Object> data = new HashMap<>();
try {
data.put("partition", record.partition());
data.put("offset", record.offset());
data.put("value", record.value());
Thread.sleep(10000);
System.out.println("Processing Thread---" + Thread.currentThread().getName() + " data: " + data);
return Boolean.TRUE;
} catch (Exception e) {
e.printStackTrace();
return Boolean.FALSE;
}
}
}
---------Execution code------------------
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.util.*;
import java.util.concurrent.*;
public class AsyncConsumer {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test-group");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
props.put("enable.auto.commit", false);
props.put("session.timeout.ms", 30000);
props.put("heartbeat.interval.ms", 10000);
props.put("request.timeout.ms", 31000);
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("Test1", "Test2"));
int poolSize=10;
ExecutorService es= Executors.newFixedThreadPool(poolSize);
CompletionService<Boolean> completionService=new ExecutorCompletionService<Boolean>(es);
try {
while (true) {
System.out.println("Polling................");
ConsumerRecords<String, String> records = consumer.poll(1000);
List<ConsumerRecord> recordList = new ArrayList();
for (ConsumerRecord<String, String> record : records) {
recordList.add(record);
if(recordList.size() ==poolSize){
int taskCount=poolSize;
//process it
recordList.forEach( recordTobeProcess -> completionService.submit(new Worker(recordTobeProcess)));
while(taskCount >0){
try {
Future<Boolean> futureResult = completionService.poll(1, TimeUnit.SECONDS);
if (futureResult != null) {
boolean result = futureResult.get().booleanValue();
taskCount = taskCount - 1;
}
}catch (Exception e) {
e.printStackTrace();
}
}
recordList.clear();
Map<TopicPartition,OffsetAndMetadata> commitOffset= Collections.singletonMap(new TopicPartition(record.topic(),record.partition()),
new OffsetAndMetadata(record.offset() + 1));
consumer.commitSync(commitOffset);
}
}
}
} finally {
consumer.close();
}
}
}
You need to follow some rule like:
1) You need to pass fixed number of record(for example 10) to ConsumeEventInThread.
2) Create more thread for processing instead of one thread and submit all task on completionservice.
3) poll all submitted task and verify.
4) then commit(should use parametric commitSync method instead of non parametric).