Why commitAsync fails to commit the first 2 offsets - apache-kafka

I faced a weird problem at which the consumer can not make comitAsync the first 2 offsets of the log and i don't know the reason. It is very weird because the other messages at the same asynchronous send of the producer received and commited succesfuly by the consumer .Can someone find the source of this problem.. I quote my code below and an output example
package com.panos.example;
import kafka.utils.ShutdownableThread;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import java.util.Collections;
import java.util.Map;
import java.util.Properties;
public class Consumer extends ShutdownableThread {
private final KafkaConsumer<Integer, String> consumer;
private final String topic;
public Consumer(String topic) {
super("KafkaConsumerExample", false);
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.1.75:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "DemoConsumer");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "30000");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.IntegerDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
consumer = new KafkaConsumer<Integer, String>(props);
this.topic = topic;
}
#Override
public void doWork() {
consumer.subscribe(Collections.singletonList(this.topic));
try {
ConsumerRecords<Integer, String> records = consumer.poll(1000);
long startTime = System.currentTimeMillis();
if (!records.isEmpty()) {
System.out.println("C : {} Total No. of records received : {}" + records.count());
for (ConsumerRecord<Integer, String> record : records) {
System.out.println("Received message: (" + record.key() + ", " + record.value() + ") at offset " + record.offset());
consumer.commitAsync(new ConsumerCallBack(startTime,record.value(), record.offset()));
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
#Override
public String name() {
return null;
}
#Override
public boolean isInterruptible() {
return false;
}
class ConsumerCallBack implements OffsetCommitCallback {
private final long startTime;
private String message;
private final String NewLine = System.getProperty("line.separator");
private long offset;
public ConsumerCallBack(long startTime) {
this.startTime = startTime;
}
public ConsumerCallBack(long startTime, String message, long offset) {
this.startTime = startTime;
this.message=message;
this.offset = offset;
}
public void onComplete(Map<TopicPartition, OffsetAndMetadata> CurrentOffset,
Exception exception) {
long elapsedTime = System.currentTimeMillis() - startTime;
if (exception != null) {
System.out.println("Message : {" + message + "}, committed successfully at offset " + offset +
CurrentOffset + "elapsed time :" + elapsedTime);
} else {
System.out.println(exception.toString());
/* JOptionPane.showMessageDialog(new Frame(),
"Something Goes Wrong with the Server Please Try again Later.",
"Inane error",
JOptionPane.ERROR_MESSAGE);*/
}
}
}
}
As you can see all message committed successfully except the first 2 without any exception. Why this happens?
Received message: (1, Message_1) at offset 160
Received message: (2, Message_2) at offset 161
Received message: (3, Message_3) at offset 162
Received message: (4, Message_4) at offset 163
Message : {Message_3}, committed successfully at offset 162{test-0=OffsetAndMetadata{offset=164, metadata=''}}elapsed time :6
Message : {Message_4}, committed successfully at offset 163{test-0=OffsetAndMetadata{offset=164, metadata=''}}elapsed time :6

If you use commitAsync it can happen that multiple commits are squashed together into a single commit message. As offsets are committed in increasing order, a commit of offset X is an implicit commit for all offsets that are smaller than X. In your case, it seems, that the commits or the first three offsets are done my a single commit of offset 3.

Related

How do I set in Kafka to not consume from where it left?

I have a Kafka consumer in Golang. I don't want to consume from where I left last time, but rather current message. How can I do it?
You can set enable.auto.commit to false and auto.offset.reset to latest for your consumer group id. This means kafka will not be automatically committing your offsets.
With auto commit disabled, your consumer group progress would not be saved (unless you do manually). So whenever the consumer is restarted for whatever reason, it does not find its progress saved and resets to the latest offset.
set a new group.id to your consumer.
Then use auto.offset.reset to define the behavior of this new consumer group, in you case: latest
Apache kafka consumer api provides a method called kafkaConsumer.seekToEnd() which can be used to ignore the existing messages and only consume messages published after the consumer has been started without changing the current group ID of the consumer.
Below is the implementation of the same. The program takes 3 arguments : topic name, group ID and offset range (0 to start from beginning, - 1 to receive messages after consumer has started, other than 0 or - 1 will imply to to consumer to consume from that offset)
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.errors.WakeupException;
import java.util.*;
public class Consumer {
private static Scanner in;
public static void main(String[] argv)throws Exception{
if (argv.length != 3) {
System.err.printf("Usage: %s <topicName> <groupId> <startingOffset>\n",
Consumer.class.getSimpleName());
System.exit(-1);
}
in = new Scanner(System.in);
String topicName = argv[0];
String groupId = argv[1];
final long startingOffset = Long.parseLong(argv[2]);
ConsumerThread consumerThread = new ConsumerThread(topicName,groupId,startingOffset);
consumerThread.start();
String line = "";
while (!line.equals("exit")) {
line = in.next();
}
consumerThread.getKafkaConsumer().wakeup();
System.out.println("Stopping consumer .....");
consumerThread.join();
}
private static class ConsumerThread extends Thread{
private String topicName;
private String groupId;
private long startingOffset;
private KafkaConsumer<String,String> kafkaConsumer;
public ConsumerThread(String topicName, String groupId, long startingOffset){
this.topicName = topicName;
this.groupId = groupId;
this.startingOffset=startingOffset;
}
public void run() {
Properties configProperties = new Properties();
configProperties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
configProperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
configProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
configProperties.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
configProperties.put(ConsumerConfig.CLIENT_ID_CONFIG, "offset123");
configProperties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,false);
configProperties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
//Figure out where to start processing messages from
kafkaConsumer = new KafkaConsumer<String, String>(configProperties);
kafkaConsumer.subscribe(Arrays.asList(topicName), new ConsumerRebalanceListener() {
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
System.out.printf("%s topic-partitions are revoked from this consumer\n", Arrays.toString(partitions.toArray()));
}
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
System.out.printf("%s topic-partitions are assigned to this consumer\n", Arrays.toString(partitions.toArray()));
Iterator<TopicPartition> topicPartitionIterator = partitions.iterator();
while(topicPartitionIterator.hasNext()){
TopicPartition topicPartition = topicPartitionIterator.next();
System.out.println("Current offset is " + kafkaConsumer.position(topicPartition) + " committed offset is ->" + kafkaConsumer.committed(topicPartition) );
if(startingOffset == -2) {
System.out.println("Leaving it alone");
}else if(startingOffset ==0){
System.out.println("Setting offset to begining");
kafkaConsumer.seekToBeginning(topicPartition);
}else if(startingOffset == -1){
System.out.println("Setting it to the end ");
kafkaConsumer.seekToEnd(topicPartition);
}else {
System.out.println("Resetting offset to " + startingOffset);
kafkaConsumer.seek(topicPartition, startingOffset);
}
}
}
});
//Start processing messages
try {
while (true) {
ConsumerRecords<String, String> records = kafkaConsumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.println(record.value());
}
if(startingOffset == -2)
kafkaConsumer.commitSync();
}
}catch(WakeupException ex){
System.out.println("Exception caught " + ex.getMessage());
}finally{
kafkaConsumer.close();
System.out.println("After closing KafkaConsumer");
}
}
public KafkaConsumer<String,String> getKafkaConsumer(){
return this.kafkaConsumer;
}
}
}

Kafka consumer poll with offset 0 not returning message

I am using spring-kafka to poll message, when I use the annotation for the consumer and set offset to 0 it will see all messages from the earliest. But when I try to use a injected ConsumerFactory to create consumer on my own, then poll will only return a few message or no message at all. Is there some other config I need in order to be able to pull message? The poll timeout is already set to 10 seconds.
#Component
public class GenericConsumer {
private static final Logger logger = LoggerFactory.getLogger(GenericConsumer.class);
#Autowired
ConsumerFactory<String, Record> consumerFactory;
public ConsumerRecords<String, Record> poll(String topic, String group){
logger.info("---------- Polling kafka recrods from topic " + topic + " group" + group);
Consumer<String, Record> consumer = consumerFactory.createConsumer(group, "");
consumer.subscribe(Arrays.asList(topic));
// need to make a dummy poll before we can seek
consumer.poll(1000);
consumer.seekToBeginning(consumer.assignment());
ConsumerRecords<String, Record> records;
records = consumer.poll(10000);
logger.info("------------ Total " + records.count() + " records polled");
consumer.close();
return records;
}
}
It works fine for me, this was with boot 2.0.5, Spring Kafka 2.1.10 ...
#SpringBootApplication
public class So52284259Application implements ConsumerAwareRebalanceListener {
private static final Logger logger = LoggerFactory.getLogger(So52284259Application.class);
public static void main(String[] args) {
SpringApplication.run(So52284259Application.class, args);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template, GenericConsumer consumer) {
return args -> {
// for (int i = 0; i < 1000; i++) { // load up the topic on first run
// template.send("so52284259", "foo" + i);
// }
consumer.poll("so52284259", "generic");
};
}
#KafkaListener(id = "listener", topics = "so52284259")
public void listen(String in) {
if ("foo999".equals(in)) {
logger.info("#KafkaListener: " + in);
}
}
#Override
public void onPartitionsAssigned(Consumer<?, ?> consumer, Collection<TopicPartition> partitions) {
consumer.seekToBeginning(partitions);
}
#Bean
public NewTopic topic() {
return new NewTopic("so52284259", 1, (short) 1);
}
}
#Component
class GenericConsumer {
private static final Logger logger = LoggerFactory.getLogger(GenericConsumer.class);
#Autowired
ConsumerFactory<String, String> consumerFactory;
public void poll(String topic, String group) {
logger.info("---------- Polling kafka recrods from topic " + topic + " group" + group);
Consumer<String, String> consumer = consumerFactory.createConsumer(group, "");
consumer.subscribe(Arrays.asList(topic));
// need to make a dummy poll before we can seek
consumer.poll(1000);
consumer.seekToBeginning(consumer.assignment());
ConsumerRecords<String, String> records;
boolean done = false;
while (!done) {
records = consumer.poll(10000);
logger.info("------------ Total " + records.count() + " records polled");
Iterator<ConsumerRecord<String, String>> iterator = records.iterator();
while (iterator.hasNext()) {
String value = iterator.next().value();
if ("foo999".equals(value)) {
logger.info("Consumer: " + value);
done = true;
}
}
}
consumer.close();
}
}
and
2018-09-12 09:35:25.929 INFO 61390 --- [ main] com.example.GenericConsumer : ------------ Total 500 records polled
2018-09-12 09:35:25.931 INFO 61390 --- [ main] com.example.GenericConsumer : ------------ Total 500 records polled
2018-09-12 09:35:25.932 INFO 61390 --- [ main] com.example.GenericConsumer : Consumer: foo999
2018-09-12 09:35:25.942 INFO 61390 --- [ listener-0-C-1] com.example.So52284259Application : #KafkaListener: foo999

Unable to insert message into Kafka

I am facing below error when I try to insert message into Kafka via Producer API
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for metadata_notification-0: 30036 ms has passed since batch creation plus linger time
What could be the issue with this ?
More code goes below :
protected void produceMessage(String topic, String orgId, Map<String, byte[]> header, String payload) throws Exception {
System.out.println("In Produce Message Method");
final Producer<String, String> producer = createMetaProducer();
long time = System.currentTimeMillis();
try {
List<Header> messageHeaders = header.entrySet().stream().map(e -> new RecordHeader(e.getKey(), e.getValue())).collect(Collectors.toList());
System.out.println("Send header");
LOGGER.info("Sending headers:");
for(Header mh: messageHeaders) {
LOGGER.info("{}: {}",mh.key(),new String(mh.value()));
}
final ProducerRecord<String, String> record = new ProducerRecord<>(topic, null, orgId, payload, messageHeaders);
RecordMetadata metadata = producer.send(record).get();
long elapsedTime = System.currentTimeMillis() - time;
System.out.printf("sent record(key=%s value=%s) " +
"meta(partition=%d, offset=%d) time=%d\n",
record.key(), record.value(), metadata.partition(),
metadata.offset(), elapsedTime);
} finally {
producer.flush();
producer.close();
}
}
At this point as below , it fails and gives our error has above - TimeoutException
final ProducerRecord<String, String> record = new ProducerRecord<>(topic, null, orgId, payload, messageHeaders);
RecordMetadata metadata = producer.send(record).get();
In MY Case kafka is docker image
You might hit bug KAFKA-5886. For more details see KIP-91. To avoid the issue, you an increase request.timeout.ms.

How to know when record is committed in Kafka?

In case of integration testing, I send a record into Kafka, and I would like to know when it will be processed and committed, and then do my assertions (instead of using a Thread.sleep)...
Here is my try :
public void sendRecordAndWaitUntilItsNotConsumed(ProducerRecord<String, String> record)
throws ExecutionException, InterruptedException {
RecordMetadata recordMetadata = producer.send(record).get();
TopicPartition topicPartition = new TopicPartition(recordMetadata.topic(),
recordMetadata.partition());
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerConfig)) {
consumer.assign(Collections.singletonList(topicPartition));
long recordOffset = recordMetadata.offset();
long currentOffset = getCurrentOffset(consumer, topicPartition);
while (currentOffset <= recordOffset) {
currentOffset = getCurrentOffset(consumer, topicPartition);
LOGGER.info("Waiting for message to be consumed - Current Offset = " + currentOffset
+ " - Record Offset = " + recordOffset);
}
}
}
private long getCurrentOffset(KafkaConsumer<String, String> consumer,
TopicPartition topicPartition) {
consumer.seekToEnd(Collections.emptyList());
return consumer.position(topicPartition);
}
But it doesn't work. Indeed, I disabled the commit of the consumer, and it doesn't loop on Waiting for message to be consumed...
It works using KafkaConsumer#committed instead of KafkaConsumer#position.
private void sendRecordAndWaitUntilItsNotConsumed(ProducerRecord<String, String> record) throws InterruptedException, ExecutionException {
RecordMetadata recordMetadata = producer.send(record).get();
TopicPartition topicPartition = new TopicPartition(recordMetadata.topic(),
recordMetadata.partition());
consumer.assign(Collections.singletonList(topicPartition));
long recordOffset = recordMetadata.offset();
long currentOffset = getCurrentOffset(topicPartition);
while (currentOffset < recordOffset) {
currentOffset = getCurrentOffset(topicPartition);
LOGGER.info("Waiting for message to be consumed - Current Offset = " + currentOffset
+ " - Record Offset = " + recordOffset);
TimeUnit.MILLISECONDS.sleep(200);
}
}
private long getCurrentOffset(TopicPartition topicPartition) {
OffsetAndMetadata offsetAndMetadata = consumer.committed(topicPartition);
return offsetAndMetadata != null ? offsetAndMetadata.offset() - 1 : -1;
}

kafka fetch records by timestamp, consumer loop

I am using Kafka 0.10.2.1 cluster. I am using the Kafka's offsetForTimes API to seek to a particular offset and would like to breakout of the loop when i have reached the end timestamp.
My code is like this:
//package kafka.ex.test;
import java.util.*;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.OffsetAndTimestamp;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;
public class ConsumerGroup {
public static OffsetAndTimestamp fetchOffsetByTime( KafkaConsumer<Long, String> consumer , TopicPartition partition , long startTime){
Map<TopicPartition, Long> query = new HashMap<>();
query.put(
partition,
startTime);
final Map<TopicPartition, OffsetAndTimestamp> offsetResult = consumer.offsetsForTimes(query);
if( offsetResult == null || offsetResult.isEmpty() ) {
System.out.println(" No Offset to Fetch ");
System.out.println(" Offset Size "+offsetResult.size());
return null;
}
final OffsetAndTimestamp offsetTimestamp = offsetResult.get(partition);
if(offsetTimestamp == null ){
System.out.println("No Offset Found for partition : "+partition.partition());
}
return offsetTimestamp;
}
public static KafkaConsumer<Long, String> assignOffsetToConsumer( KafkaConsumer<Long, String> consumer, String topic , long startTime ){
final List<PartitionInfo> partitionInfoList = consumer.partitionsFor(topic);
System.out.println("Number of Partitions : "+partitionInfoList.size());
final List<TopicPartition> topicPartitions = new ArrayList<>();
for (PartitionInfo pInfo : partitionInfoList) {
TopicPartition partition = new TopicPartition(topic, pInfo.partition());
topicPartitions.add(partition);
}
consumer.assign(topicPartitions);
for(TopicPartition partition : topicPartitions ){
OffsetAndTimestamp offSetTs = fetchOffsetByTime(consumer, partition, startTime);
if( offSetTs == null ){
System.out.println("No Offset Found for partition : " + partition.partition());
consumer.seekToEnd(Arrays.asList(partition));
}else {
System.out.println(" Offset Found for partition : " +offSetTs.offset()+" " +partition.partition());
System.out.println("FETCH offset success"+
" Offset " + offSetTs.offset() +
" offSetTs " + offSetTs);
consumer.seek(partition, offSetTs.offset());
}
}
return consumer;
}
public static void main(String[] args) throws Exception {
String topic = args[0].toString();
String group = args[1].toString();
long start_time_Stamp = Long.parseLong( args[3].toString());
String bootstrapServers = args[2].toString();
long end_time_Stamp = Long.parseLong( args[4].toString());
Properties props = new Properties();
boolean reachedEnd = false;
props.put("bootstrap.servers", bootstrapServers);
props.put("group.id", group);
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<Long, String> consumer = new KafkaConsumer<Long, String>(props);
assignOffsetToConsumer(consumer, topic, start_time_Stamp);
System.out.println("Subscribed to topic " + topic);
int i = 0;
int arr[] = {0,0,0,0,0};
while (true) {
ConsumerRecords<Long, String> records = consumer.poll(6000);
int count= 0;
long lasttimestamp = 0;
long lastOffset = 0;
for (ConsumerRecord<Long, String> record : records) {
count++;
if(arr[record.partition()] == 0){
arr[record.partition()] =1;
}
if (record.timestamp() >= end_time_Stamp) {
reachedEnd = true;
break;
}
System.out.println("record=>"+" offset="
+record.offset()
+ " timestamp="+record.timestamp()
+ " :"+record);
System.out.println("recordcount = "+count+" bitmap"+Arrays.toString(arr));
}
if (reachedEnd) break;
if (records == null || records.isEmpty()) break; // dont wait for records
}
}
}
I face the following problems:
consumer.poll fails even for 1000 millisecond. I had to poll a few times in loop if i use 1000 millisecond. I have an extremely large value now. But having already, seeked to the relevant offsets within a partition, how to reliably set the poll timeout so that data is returned immediately?
My observations is that when data is returned it is not always from all partitions. Even when data is returned from all partitions not all records are returned. The amount of records that are in the topic is more than 1000. But the amount of records that are actually fetched and printed in loop is less(~200). Is there any issue with the current usage of my Kafka APIs?
how to reliably break out of the loop having obtained all the data between start and end timestamp and not prematurely?
Amount of records fetched per poll depends on your consumer config
You are breaking the loop when one of the partitions reaches the endtimestamp , which is not what you want . You should check that all the partitions are seeked to end before exiting poll loop
Poll call is an async call and fetch requests and responses are per node , so you may or may not get all the responses in a poll depending on the broker response time