Pause and Resume KafkaConsumer - apache-kafka

What I've to do is pause the KafkaConsumer if during message consuming an error is thrown.
This is what I wrote
#KafkaListener(...)
public void consume(
#Header(KafkaHeaders.CONSUMER) KafkaConsumer<String,String> consumer,
#Payload String message) {
try {
//consumer message
} catch(Exception e) {
saveConsumer(consumer);
consumer.pause();
}
}
Then I wrote a REST service in order to resume the consumer
#RestController
#RequestMapping("/consumer")
class ConsumerRestController {
#PostMapping("/resume")
public void resume() {
KafkaConsumer<String,String> consumer = getConsumer();
if(consumer != null) {
consumer.resume(consumer.paused());
}
}
}
Now, I've two questions.
First question: When I call consumer.pause() from #KafkaListener annotated method what happens?
Consumer is immediately paused or I can receive other messages associated on other offset of same topic-partition.
For example, I have "message1" with offset 3 and "message2" with offset 4, "message1" cause an exception, what happens to "message2"? Is it consumed anyway?
Second question: Resuming the consumer from REST service give a ConcurrentModificationException because KafkaConsumer is not thread safe. So, how come I have to do this?

Do not pause the consumer directly; pause the container instead.
#KafkaListener(id = "foo", ...)
#Autowired KafkaListenerEndpointRegistry;
...
registry.getListenerContainer("foo").pause();
The pause will take effect before the next poll; if you want to immediately pause (and not process the remaining records from the last poll), throw an exeption after pausing (assuming you are using the, now default, SeekToCurrentErrorHandler.

Related

Spring cloud Kafka does infinite retry when it fails

Currently, I am having an issue where one of the consumer functions throws an error which makes Kafka retry the records again and again.
#Bean
public Consumer<List<RuleEngineSubject>> processCohort() {
return personDtoList -> {
for(RuleEngineSubject subject : personDtoList)
processSubject(subject);
};
}
This is the consumer the processSubject throws a custom error which causes it to fail.
processCohort-in-0:
destination: internal-process-cohort
consumer:
max-attempts: 1
batch-mode: true
concurrency: 10
group: process-cohort-group
The above is my binder for Kafka.
Currently, I am attempting to retry 2 times and then send to a dead letter queue but I have been unsuccessful and not sure which is the right approach to take.
I have tried to implement a custom handler that will handle the error when it fails but does not retry again and I am not sure how to send to a dead letter queue
#Bean
ListenerContainerCustomizer<AbstractMessageListenerContainer<?, ?>> customizer() {
return (container, dest, group) -> {
if (group.equals("process-cohort-group")) {
container.setBatchErrorHandler(new BatchErrorHandler() {
#Override
public void handle(Exception thrownException, ConsumerRecords<?, ?> data) {
System.out.println(data.records(dest).iterator().);
data.records(dest).forEach(r -> {
System.out.println(r.value());
});
System.out.println("failed payload='{}'" + thrownException.getLocalizedMessage());
}
});
}
};
}
This stops infinite retry but does not send a dead letter queue. Can I get suggestions on how to retry two times and then send a dead letter queue. From my understanding batch listener does not how to recover when there is an error, could someone help shine light on this
Retry 15 times then throw it to topicname.DLT topic
#Bean
public ConcurrentKafkaListenerContainerFactory kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setCommonErrorHandler(
new DefaultErrorHandler(
new DeadLetterPublishingRecoverer(kafkaTemplate()), kafkaBackOffPolicy()));
factory.setConsumerFactory(kafkaConsumerFactory());
return factory;
}
#Bean
public ExponentialBackOffWithMaxRetries kafkaBackOffPolicy() {
var exponentialBackOff = new ExponentialBackOffWithMaxRetries(15);
exponentialBackOff.setInitialInterval(Duration.ofMillis(500).toMillis());
exponentialBackOff.setMultiplier(2);
exponentialBackOff.setMaxInterval(Duration.ofSeconds(2).toMillis());
return exponentialBackOff;
}
You need to configure a suitable error handler in the listener container; you can disable retry and dlq in the binding and use a DeadLetterPublishingRecoverer instead. See the answer Retry max 3 times when consuming batches in Spring Cloud Stream Kafka Binder

How to avoid poll() retreiving same message if commitSync() is not called

I'm a bit confused about poll() expected behavior. In my app, if the processLogic() works, then I should commit manually the offset so when the next poll() is called I do receive the new messages.
Problem occurs when processLogic() throws an error. I set the consumer to seek to the offset failed during processing. On next poll(), it again receives the same message.(correct behavior as I ordered consumer to manually reset the offset to that position)Imagine it works fine and also doCommitSync() is called.
The unexpected behavior occurs in the following poll(). It should take the new messages, but it still retrieve the last message, which leads to call again to the processLogic() function and doCommitSync(). It also throws the following error during doCommitSync():
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
My consumer config:
enable.auto.commit=false
isolation.level=read_committed
auto.offset.reset=latest
public void runConsumer() {
Runtime.getRuntime().addShutdownHook(new Thread(this::shutdown));
try {
consumer.subscribe(topics);
while (!closed.get()) {
processedStatus.set(false);
final ConsumerRecords<String, String> consumedRecords = consumer.poll(numRecords);
if (!consumedRecords.isEmpty()) {
StreamSupport.stream(consumedRecords.spliterator(), false)
.map(ConsumerRecord::value)
.forEach(record -> {
try {
processLogic(); //do some logic which can fail
processedStatus.set(true);
} catch (Exception e) {
logger.error("Error applying action: " + record.getUuid(), e);
}
if (processedStatus.get()) {
doCommitSync();
} else {
consumer.seek(new TopicPartition(recordTopic, recordPartition), recordOffset);
}
});
}
}
} catch (WakeupException e) {
logger.error("Kafka Consumer wakeup exception");
} finally {
alertConsumer.close();
shutdownLatch.countDown();
}

Kafka Consumer with limited number of retries when processing messages

I'm having a hard time figuring simple patterns for handling exceptions in the consumer of a Kafka topic.
Scenario is as follows: in the consumer I call an external service. If the service is unavailable I want to retry a few times and then stop consuming.
The simplest pattern seems a blocking synchronous way of dealing with it, something like this in java:
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
boolean processed=false;
int count=0;
while (!processed) {
try {
callService(..);
} catch (Exception e) {
if (count++ < 3) {
Thread.sleep(5000);
continue;
} else throw new RuntimeException();
}
}
}
However, I have the feeling there must be a simpler approach (without using third party libraries), and one that avoids blocking the thread.
Seems like a common thing we would like to have, yet I could not find a simple example for this pattern.
There is no such retrial mechanism provided by Kafka out of the box. With the experience of using RabbitMQ where the MQ provides a retry exchange. These exchanges are called as Dead-Letter-Exchanges in RabbitMQ.
https://www.rabbitmq.com/dlx.html
You can apply the same pattern in the case of kafka.
On message processing failure we can publish a copy of the message to another topic and wait for the next message. Let’s call the new topic the ‘retry_topic’. The consumer of the ‘retry_topic’ will receive the message from the Kafka and then will wait some predefined time, for example one hour, before starting the message processing. This way we can postpone next attempts of the message processing without any impact on the ‘main_topic’ consumer. If processing in the ‘retry_topic’ consumer fails we just have to give up and store the message in the ‘failed_topic’ for further manual handling of this problem. The ‘main_topic’ consumer code may look like this:
Pushing message to retry_topic on failure/exception
void consumeMainTopicWithPostponedRetry() {
while (true) {
Message message = takeNextMessage("main_topic");
try {
process(message);
} catch (Exception ex) {
publishTo("retry_topic");
LOGGER.warn("Message processing failure. Will try once again in the future.", ex);
}
}
}
Consumer of the retry topic
void consumeRetryTopic() {
while (true) {
Message message = takeNextMessage("retry_topic");
try {
process(message);
waitSomeLongerTime();
} catch (Exception ex) {
publishTo("failed_topic");
LOGGER.warn("Message processing failure. Will skip it.", ex);
}
}
}
The above strategy and examples are picked from the below link. The whole credit goes to the owner of the blog post.
https://blog.pragmatists.com/retrying-consumer-architecture-in-the-apache-kafka-939ac4cb851a
For non-blocking way of doing above can be understood by reading the whole blog post. Hope this helps.

How to check if Kafka Consumer is ready

I have Kafka commit policy set to latest and missing first few messages. If I give a sleep of 20 seconds before starting to send the messages to the input topic, everything is working as desired. I am not sure if the problem is with consumer taking long time for partition rebalancing. Is there a way to know if the consumer is ready before starting to poll ?
You can use consumer.assignment(), it will return set of partitions and verify whether all of the partitions are assigned which are available for that topic.
If you are using spring-kafka project, you can include spring-kafka-test dependancy and use below method to wait for topic assignment , but you need to have container.
ContainerTestUtils.waitForAssignment(Object container, int partitions);
You can do the following:
I have a test that reads data from kafka topic.
So you can't use KafkaConsumer in multithread environment, but you can pass parameter "AtomicReference assignment", update it in consumer-thread, and read it in another thread.
For example, snipped of working code in project for testing:
private void readAvro(String readFromKafka,
AtomicBoolean needStop,
List<Event> events,
String bootstrapServers,
int readTimeout) {
// print the topic name
AtomicReference<Set<TopicPartition>> assignment = new AtomicReference<>();
new Thread(() -> readAvro(bootstrapServers, readFromKafka, needStop, events, readTimeout, assignment)).start();
long startTime = System.currentTimeMillis();
long maxWaitingTime = 30_000;
for (long time = System.currentTimeMillis(); System.currentTimeMillis() - time < maxWaitingTime;) {
Set<TopicPartition> assignments = Optional.ofNullable(assignment.get()).orElse(new HashSet<>());
System.out.println("[!kafka-consumer!] Assignments [" + assignments.size() + "]: "
+ assignments.stream().map(v -> String.valueOf(v.partition())).collect(Collectors.joining(",")));
if (assignments.size() > 0) {
break;
}
try {
Thread.sleep(1_000);
} catch (InterruptedException e) {
e.printStackTrace();
needStop.set(true);
break;
}
}
System.out.println("Subscribed! Wait summary: " + (System.currentTimeMillis() - startTime));
}
private void readAvro(String bootstrapServers,
String readFromKafka,
AtomicBoolean needStop,
List<Event> events,
int readTimeout,
AtomicReference<Set<TopicPartition>> assignment) {
KafkaConsumer<String, byte[]> consumer = (KafkaConsumer<String, byte[]>) queueKafkaConsumer(bootstrapServers, "latest");
System.out.println("Subscribed to topic: " + readFromKafka);
consumer.subscribe(Collections.singletonList(readFromKafka));
long started = System.currentTimeMillis();
while (!needStop.get()) {
assignment.set(consumer.assignment());
ConsumerRecords<String, byte[]> records = consumer.poll(1_000);
events.addAll(CommonUtils4Tst.readEvents(records));
if (readTimeout == -1) {
if (events.size() > 0) {
break;
}
} else if (System.currentTimeMillis() - started > readTimeout) {
break;
}
}
needStop.set(true);
synchronized (MainTest.class) {
MainTest.class.notifyAll();
}
consumer.close();
}
P.S.
needStop - global flag, to stop all running thread if any in case of failure of success
events - list of object, that i want to check
readTimeout - how much time we will wait until read all data, if readTimeout == -1, then stop when we read anything
Thanks to Alexey (I have also voted up), I seemed to have resolved my issue essentially following the same idea.
Just want to share my experience... in our case we using Kafka in request & response way, somewhat like RPC. Request is being sent on one topic and then waiting for response on another topic. Running into a similar issue i.e. missing out first response.
I have tried ... KafkaConsumer.assignment(); repeatedly (with Thread.sleep(100);) but doesn't seem to help. Adding a KafkaConsumer.poll(50); seems to have primed the consumer (group) and receiving the first response too. Tested few times and it consistently working now.
BTW, testing requires stopping application & deleting Kafka topics and, for a good measure, restarted Kafka too.
PS: Just calling poll(50); without assignment(); fetching logic, like Alexey mentioned, may not guarantee that consumer (group) is ready.
You can modify an AlwaysSeekToEndListener (listens only to new messages) to include a callback:
public class AlwaysSeekToEndListener<K, V> implements ConsumerRebalanceListener {
private final Consumer<K, V> consumer;
private Runnable callback;
public AlwaysSeekToEndListener(Consumer<K, V> consumer) {
this.consumer = consumer;
}
public AlwaysSeekToEndListener(Consumer<K, V> consumer, Runnable callback) {
this.consumer = consumer;
this.callback = callback;
}
#Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
}
#Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
consumer.seekToEnd(partitions);
if (callback != null) {
callback.run();
}
}
}
and subscribe with a latch callback:
CountDownLatch initLatch = new CountDownLatch(1);
consumer.subscribe(singletonList(topic), new AlwaysSeekToEndListener<>(consumer, () -> initLatch.countDown()));
initLatch.await(); // blocks until consumer is ready and listening
then proceed to start your producer.
If your policy is set to latest - which takes effect if there are no previously committed offsets - but you have no previously committed offsets, then you should not worry about 'missing' messages, because you're telling Kafka not to care about messages that were sent 'previously' to your consumers being ready.
If you care about 'previous' messages, you should set the policy to earliest.
In any case, whatever the policy, the behaviour you are seeing is transient, i.e. once committed offsets are saved in Kafka, on every restart the consumers will pick up where they left previoulsy
I needed to know if a kafka consumer was ready before doing some testing, so i tried with consumer.assignment(), but it only returned the set of partitions assigned, but there was a problem, with this i cannot see if this partitions assigned to the group had offset setted, so later when i tried to use the consumer it didn´t have offset setted correctly.
The solutions was to use committed(), this will give you the last commited offsets of the given partitions that you put in the arguments.
So you can do something like: consumer.committed(consumer.assignment())
If there is no partitions assigned yet it will return:
{}
If there is partitions assigned, but no offset yet:
{name.of.topic-0=null, name.of.topic-1=null}
But if there is partitions and offset:
{name.of.topic-0=OffsetAndMetadata{offset=5197881, leaderEpoch=null, metadata=''}, name.of.topic-1=OffsetAndMetadata{offset=5198832, leaderEpoch=null, metadata=''}}
With this information you can use something like:
consumer.committed(consumer.assignment()).isEmpty();
consumer.committed(consumer.assignment()).containsValue(null);
And with this information you can be sure that the kafka consumer is ready.

Consumer.poll() returns new records even without committing offsets?

If I have a enable.auto.commit=false and I call consumer.poll() without calling consumer.commitAsync() after, why does consumer.poll() return
new records the next time it's called?
Since I did not commit my offset, I would expect poll() would return the latest offset which should be the same records again.
I'm asking because I'm trying to handle failure scenarios during my processing. I was hoping without committing the offset, the poll() would return the same records again so I can re-process those failed records again.
public class MyConsumer implements Runnable {
#Override
public void run() {
while (true) {
ConsumerRecords<String, LogLine> records = consumer.poll(Long.MAX_VALUE);
for (ConsumerRecord record : records) {
try {
//process record
consumer.commitAsync();
} catch (Exception e) {
}
/**
If exception happens above, I was expecting poll to return new records so I can re-process the record that caused the exception.
**/
}
}
}
}
The starting offset of a poll is not decided by the broker but by the consumer. The consumer tracks the last received offset and asks for the following bunch of messages during the next poll.
Offset commits come into play when a consumer stops or fails and another instance that is not aware of the last consumed offset picks up consumption of a partition.
KafkaConsumer has pretty extensive Javadoc that is well worth a read.
Consumer will read from last commit offset if it get re balanced (means if any consumer leave the group or new consumer added) so handling de-duplication does not come straight forward in kafka so you have to store the last process offset in external store and when rebalance happens or app restart you should seek to that offset and start processing or you should check against some unique key in message against DB to find is dublicate
I would like to share some code how you can solve this in Java code.
The approach is that you poll the records, try to process them and if an exception occurs, you seek to the minima of the topic partitions. After that, you do the commitAsync().
public class MyConsumer implements Runnable {
#Override
public void run() {
while (true) {
List<ConsumerRecord<String, LogLine>> records = StreamSupport
.stream( consumer.poll(Long.MAX_VALUE).spliterator(), true )
.collect( Collectors.toList() );
boolean exceptionRaised = false;
for (ConsumerRecord<String, LogLine> record : records) {
try {
// process record
} catch (Exception e) {
exceptionRaised = true;
break;
}
}
if( exceptionRaised ) {
Map<TopicPartition, Long> offsetMinimumForTopicAndPartition = records
.stream()
.collect( Collectors.toMap( r -> new TopicPartition( r.topic(), r.partition() ),
ConsumerRecord::offset,
Math::min
) );
for( Map.Entry<TopicPartition, Long> entry : offsetMinimumForTopicAndPartition.entrySet() ) {
consumer.seek( entry.getKey(), entry.getValue() );
}
}
consumer.commitAsync();
}
}
}
With this setup, you poll the messages again and again until you successfully process all messages of one poll.
Please note, that your code should be able to handle a poison pill. Otherwise, your code will stuck in an endless loop.