Kafka commitAsync Retries with Commit Order - apache-kafka

I'm reading through Kafka the Definitive Guide and in the chapter on Consumers there is a blurb on "Retrying Async Commits":
A simple pattern to get commit order right for asynchronous retries is to use a monotonically increasing sequence number. Increase the sequence number every time you commit and add the sequence number at the time of the commit to the commitAsync callback. When you're getting ready to send a retry, check if the commit sequence number the callback got is equal to the instance variable; if it is, there was no newer commit and it is safe to retry. If the instance sequence number is higher, don't retry because a newer commit was already sent.
A quick example by the author would have been great here for dense folks like me. I am particularly unclear about the portion I bolded above.
Can anyone shed light on what this means or even better provide a toy example demonstrating this?

Here is what I think it is, but got to humble I could be wrong
try {
AtomicInteger atomicInteger = new AtomicInteger(0);
while (true) {
ConsumerRecords<String, String> records = consumer.poll(5);
for (ConsumerRecord<String, String> record : records) {
System.out.format("offset: %d\n", record.offset());
System.out.format("partition: %d\n", record.partition());
System.out.format("timestamp: %d\n", record.timestamp());
System.out.format("timeStampType: %s\n", record.timestampType());
System.out.format("topic: %s\n", record.topic());
System.out.format("key: %s\n", record.key());
System.out.format("value: %s\n", record.value());
}
consumer.commitAsync(new OffsetCommitCallback() {
private int marker = atomicInteger.incrementAndGet();
#Override
public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets,
Exception exception) {
if (exception != null) {
if (marker == atomicInteger.get()) consumer.commitAsync(this);
} else {
//Cant' try anymore
}
}
});
}
} catch (WakeupException e) {
// ignore for shutdown
} finally {
consumer.commitSync(); //Block
consumer.close();
System.out.println("Closed consumer and we are done");
}

Related

KafkaStreams - how to handle corrupted data

I have rather question what do to do in scenerio that KafkaStreams service is consuming some message and during processing it uncaught exception is thrown. At the moment I have implemented UncaughtExceptionHandler which is closing and cleanup old streams and starting new one, but it starts to consumes the same message which means that it ends with infinite loop of restarting...
In the end should I check the type of error and commit this message somehow to stop processing this error prone message?
Is it possible from this setUncaughtExceptionHandler method?
Regards
If you like, you can better handle such messages as Poison pills and skip them using the following piece of code, and move forward.
public final class poisionPillEvent {
public static void ignoreEvents(String topic, String poisonedEventMessage, Consumer<?, ?> consumer) {
final Pattern offsetPattern = Pattern.compile("\\w*offset*\\w[ ]\\d+");
final Pattern partitionPattern = Pattern.compile("\\w*" + topic + "*\\w[-]\\d+");
try {
// Parse the event to get the partition number and offset, in order to `seek` past the poison pill.
Matcher mPart = partitionPattern.matcher(poisonedEventMessage);
Matcher mOff = offsetPattern.matcher(poisonedEventMessage);
mPart.find();
Integer partition = Integer.parseInt(mPart.group().replace(topic + "-", ""));
mOff.find();
Long offset = Long.parseLong(mOff.group().replace("offset ", ""));
consumer.seek(new TopicPartition(topic, partition), offset + 1);
} catch (Exception ex) {
LOGGER.error("Unable to seek past bad message. {}", ex.toString());
}
}
}

How to avoid poll() retreiving same message if commitSync() is not called

I'm a bit confused about poll() expected behavior. In my app, if the processLogic() works, then I should commit manually the offset so when the next poll() is called I do receive the new messages.
Problem occurs when processLogic() throws an error. I set the consumer to seek to the offset failed during processing. On next poll(), it again receives the same message.(correct behavior as I ordered consumer to manually reset the offset to that position)Imagine it works fine and also doCommitSync() is called.
The unexpected behavior occurs in the following poll(). It should take the new messages, but it still retrieve the last message, which leads to call again to the processLogic() function and doCommitSync(). It also throws the following error during doCommitSync():
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
My consumer config:
enable.auto.commit=false
isolation.level=read_committed
auto.offset.reset=latest
public void runConsumer() {
Runtime.getRuntime().addShutdownHook(new Thread(this::shutdown));
try {
consumer.subscribe(topics);
while (!closed.get()) {
processedStatus.set(false);
final ConsumerRecords<String, String> consumedRecords = consumer.poll(numRecords);
if (!consumedRecords.isEmpty()) {
StreamSupport.stream(consumedRecords.spliterator(), false)
.map(ConsumerRecord::value)
.forEach(record -> {
try {
processLogic(); //do some logic which can fail
processedStatus.set(true);
} catch (Exception e) {
logger.error("Error applying action: " + record.getUuid(), e);
}
if (processedStatus.get()) {
doCommitSync();
} else {
consumer.seek(new TopicPartition(recordTopic, recordPartition), recordOffset);
}
});
}
}
} catch (WakeupException e) {
logger.error("Kafka Consumer wakeup exception");
} finally {
alertConsumer.close();
shutdownLatch.countDown();
}

Kafka Transactional Producer

I am using Kafka 2 and I was going through the following link.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging
Below is my sample code for Transactional producer.
My code:
public void runProducer(final int sendMessageCount) throws Exception {
final Producer<Long, String> producer = createProducer();
producer.initTransactions();
final long time = System.currentTimeMillis();
try {
producer.beginTransaction();
for (long index = time; index < (time + sendMessageCount); index++) {
final ProducerRecord<Long, String> record =
new ProducerRecord<>(TOPIC, index,
"Test " + index);
// send returns Future
producer.send(record).get();
}
producer.commitTransaction();
}
catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
e.printStackTrace();
// We can't recover from these exceptions, so our only option is to close the producer and exit.
producer.close();
}
catch (final KafkaException e) {
e.printStackTrace();
// For all other exceptions, just abort the transaction and try again.
producer.abortTransaction();
}
finally {
producer.flush();
producer.close();
}
}
Questions:
Do we need to call endTransaction after commitTransaction ?
Do we need to call sendOffsetsToTransaction? What will happen if I don't include this?
How does it work when we deploy the same code to multiple servers with same transactionId? Do we need to have a separate transactionId for each instance? Say, machine1 crashes after beginTransaction() and after sending few records? How does machine2 with same transactionId recovers.
Machine1 is using transactionId "test" and it crashed after beginTransaction() and after producing few records. When the same instance comes up how does it resume the same transaction? We will actually again start from init & begin transaction.
How does it work for the same topic which was not involving in transaction and involving in transaction now? I am starting a new consumerGroup with transaction_committed, Will it read the messages which were committed before the transaction? Will the consumer with transaction_uncommitted see the messages which were aborted by transaction?

Kafka Consumer with limited number of retries when processing messages

I'm having a hard time figuring simple patterns for handling exceptions in the consumer of a Kafka topic.
Scenario is as follows: in the consumer I call an external service. If the service is unavailable I want to retry a few times and then stop consuming.
The simplest pattern seems a blocking synchronous way of dealing with it, something like this in java:
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
boolean processed=false;
int count=0;
while (!processed) {
try {
callService(..);
} catch (Exception e) {
if (count++ < 3) {
Thread.sleep(5000);
continue;
} else throw new RuntimeException();
}
}
}
However, I have the feeling there must be a simpler approach (without using third party libraries), and one that avoids blocking the thread.
Seems like a common thing we would like to have, yet I could not find a simple example for this pattern.
There is no such retrial mechanism provided by Kafka out of the box. With the experience of using RabbitMQ where the MQ provides a retry exchange. These exchanges are called as Dead-Letter-Exchanges in RabbitMQ.
https://www.rabbitmq.com/dlx.html
You can apply the same pattern in the case of kafka.
On message processing failure we can publish a copy of the message to another topic and wait for the next message. Let’s call the new topic the ‘retry_topic’. The consumer of the ‘retry_topic’ will receive the message from the Kafka and then will wait some predefined time, for example one hour, before starting the message processing. This way we can postpone next attempts of the message processing without any impact on the ‘main_topic’ consumer. If processing in the ‘retry_topic’ consumer fails we just have to give up and store the message in the ‘failed_topic’ for further manual handling of this problem. The ‘main_topic’ consumer code may look like this:
Pushing message to retry_topic on failure/exception
void consumeMainTopicWithPostponedRetry() {
while (true) {
Message message = takeNextMessage("main_topic");
try {
process(message);
} catch (Exception ex) {
publishTo("retry_topic");
LOGGER.warn("Message processing failure. Will try once again in the future.", ex);
}
}
}
Consumer of the retry topic
void consumeRetryTopic() {
while (true) {
Message message = takeNextMessage("retry_topic");
try {
process(message);
waitSomeLongerTime();
} catch (Exception ex) {
publishTo("failed_topic");
LOGGER.warn("Message processing failure. Will skip it.", ex);
}
}
}
The above strategy and examples are picked from the below link. The whole credit goes to the owner of the blog post.
https://blog.pragmatists.com/retrying-consumer-architecture-in-the-apache-kafka-939ac4cb851a
For non-blocking way of doing above can be understood by reading the whole blog post. Hope this helps.

Consumer.poll() returns new records even without committing offsets?

If I have a enable.auto.commit=false and I call consumer.poll() without calling consumer.commitAsync() after, why does consumer.poll() return
new records the next time it's called?
Since I did not commit my offset, I would expect poll() would return the latest offset which should be the same records again.
I'm asking because I'm trying to handle failure scenarios during my processing. I was hoping without committing the offset, the poll() would return the same records again so I can re-process those failed records again.
public class MyConsumer implements Runnable {
#Override
public void run() {
while (true) {
ConsumerRecords<String, LogLine> records = consumer.poll(Long.MAX_VALUE);
for (ConsumerRecord record : records) {
try {
//process record
consumer.commitAsync();
} catch (Exception e) {
}
/**
If exception happens above, I was expecting poll to return new records so I can re-process the record that caused the exception.
**/
}
}
}
}
The starting offset of a poll is not decided by the broker but by the consumer. The consumer tracks the last received offset and asks for the following bunch of messages during the next poll.
Offset commits come into play when a consumer stops or fails and another instance that is not aware of the last consumed offset picks up consumption of a partition.
KafkaConsumer has pretty extensive Javadoc that is well worth a read.
Consumer will read from last commit offset if it get re balanced (means if any consumer leave the group or new consumer added) so handling de-duplication does not come straight forward in kafka so you have to store the last process offset in external store and when rebalance happens or app restart you should seek to that offset and start processing or you should check against some unique key in message against DB to find is dublicate
I would like to share some code how you can solve this in Java code.
The approach is that you poll the records, try to process them and if an exception occurs, you seek to the minima of the topic partitions. After that, you do the commitAsync().
public class MyConsumer implements Runnable {
#Override
public void run() {
while (true) {
List<ConsumerRecord<String, LogLine>> records = StreamSupport
.stream( consumer.poll(Long.MAX_VALUE).spliterator(), true )
.collect( Collectors.toList() );
boolean exceptionRaised = false;
for (ConsumerRecord<String, LogLine> record : records) {
try {
// process record
} catch (Exception e) {
exceptionRaised = true;
break;
}
}
if( exceptionRaised ) {
Map<TopicPartition, Long> offsetMinimumForTopicAndPartition = records
.stream()
.collect( Collectors.toMap( r -> new TopicPartition( r.topic(), r.partition() ),
ConsumerRecord::offset,
Math::min
) );
for( Map.Entry<TopicPartition, Long> entry : offsetMinimumForTopicAndPartition.entrySet() ) {
consumer.seek( entry.getKey(), entry.getValue() );
}
}
consumer.commitAsync();
}
}
}
With this setup, you poll the messages again and again until you successfully process all messages of one poll.
Please note, that your code should be able to handle a poison pill. Otherwise, your code will stuck in an endless loop.