send whole batch to dlt without retrying - apache-kafka

i'm using spring kafka and i have a kafka consumer written in java spring boot. My consumer consume batch wise and relevant configuration beans are given below.
public ConsumerFactory<String, Object> consumerFactory() {
Map<String, Object> config = new HashMap<>();
// default configs like bootstrap servers, key and value deserializers are here
config.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "5");
return new DefaultKafkaConsumerFactory<>(config);
public ConcurrentKafkaListenerContainerFactory<String, Object> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory =
new ConcurrentKafkaListenerContainerFactory<>();
return factory;
I consume messages and send those messages to an API endpoint. If the api is not available or if the rest template throw an error i want to send the whole batch to a DLT without retrying.
What i want to do is send the whole batch to the DLT without retrying. If we throw BatchListenerFailedException that specific index number owning message from the batch will send to a DLT. In BatchListenerFailedException we can pass only one integer value as index value and not a list. But what i want is to send the whole batch as it is to a DLT topic without retrying. Is there a way to achieve that?
my spring Kafka version is 2.8.6
my default error handler is like below
public CommonErrorHandler commonErrorHandler() {
ExponentialBackOffWithMaxRetries exponentialBackOffWithMaxRetries = new ExponentialBackOffWithMaxRetries(5);
exponentialBackOffWithMaxRetries.setInitialInterval(my val);
exponentialBackOffWithMaxRetries.setMultiplier(my val);
exponentialBackOffWithMaxRetries.setMaxInterval(my val);
DefaultErrorHandler errorHandler = new DefaultErrorHandler(
new DeadLetterPublishingRecoverer(kafkaTemplate(),
(record, exception) -> new TopicPartition(record.topic() + "-dlt", record.partition())),
return errorHandler;
In my case in used ExponentialBackOffWithMaxRetries instead of FixedBackOff. In my case i have 3 scenarios.
1 - Retry messages and send it to DLT (Throwing any other exception than BatchListenerFailedException)
2 - Send couple of messages from the batch to DLT without retrying (Using BatchListenerFailedException for this)
3 - Send whole batch to the DLT without retrying.
3rd one is the place where i'm struggling. If i send some other exception then it will retry couple of times. (Even if i used FixedBackOff instead of ExponentialBackOffWithMaxRetries )

Throw something else other than BatchListenerFailedException; use a DefaultErrorHandler with a DeadLetterPublishingRecoverer with no retries (new FixedBackOff(0L, 0L)).
Starting with versions 3.0.0, 2.9.3, 2.8.11, you can configure not retryable exceptions for batch errors.
* Add exception types to the default list. By default, the following exceptions will
* not be retried:
* <ul>
* <li>{#link DeserializationException}</li>
* <li>{#link MessageConversionException}</li>
* <li>{#link ConversionException}</li>
* <li>{#link MethodArgumentResolutionException}</li>
* <li>{#link NoSuchMethodException}</li>
* <li>{#link ClassCastException}</li>
* </ul>
* All others will be retried, unless {#link #defaultFalse()} has been called.
* #param exceptionTypes the exception types.
* #see #removeClassification(Class)
* #see #setClassifications(Map, boolean)
public final void addNotRetryableExceptions(Class<? extends Exception>... exceptionTypes) {
add(false, exceptionTypes);
Note that 2.8.x is now out of OSS support.


Spring Cloud Stream Kafka Producer force producer.flush

I am creating a kafka streams/in-out kind of application.
Sample code looks like the following
private MessageChannel output;
public void process(List<String> input) {
Based on my understanding, kafka buffers the messages before sending them out. Now in case the app crashes / container crashes, there is a possibility of losing the buffered messages.
How can we ensure that messages are actually sent out
(something like KafkaTemplate.flush() here ?)
Based on the suggestion by Gary Russell, we should set the FLUSH header on the last message.
Follow up question -
Given the last send method call will become a blocking call because of the blocking nature of the kafkaProducer.flush() method, if there are exceptions thrown during the sending process of the kafka message (e.g. io exception/ auth exception) will they be raised in the same method context ?
e.g. in the following code will the kafka sender exception be caught in the catch block ?
public void process(List<String> input) {
try {
Message<?> message = //
message.setHeader(KafkaIntegrationHeaders.FLUSH, true);
catch(Exception e) {
The underlying KafkaProducerMessageHandler has a property:
* Specify a SpEL expression that evaluates to a {#link Boolean} to determine whether
* the producer should be flushed after the send. Defaults to looking for a
* {#link Boolean} value in a {#link KafkaIntegrationHeaders#FLUSH} header; false if
* absent.
* #param flushExpression the {#link Expression}.
public void setFlushExpression(Expression flushExpression) {
Currently, the binder doesn't support customizing this property.
However, if you are sending Message<?>s, you can set the KafkaIntegrationHeaders.FLUSH header to true on the last message in the batch.

Spring Cloud Stream Kafka Commit Failed since the group is rebalanced

I have got the CommitFailedException for some time-consuming Spring Cloud Stream applications. I know to fix this issue I need to set the max.poll.records and to match my expectations for the time it takes to process the batch. However, I am not quite sure how to set it for consumers in Spring Cloud Stream.
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest( at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync( at
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync( at
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync( at
org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary( at
org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits( at
org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke( at
org.springframework.kafka.listener.KafkaMessageListenerContainer$ at
java.util.concurrent.Executors$ at at
Moreover, how can I ensure this situation won't happen at all? Or alternatively, how can I inject some sort of roll-back in the case of this exception? The reason is I am doing some other external works and once it is finished I publish the output message accordingly. Therefore, if the message cannot get published due to any issues after the work was done on the external system, I have to revert it back (some sort of atomic transaction over Kafka publish and other external systems).
You can set arbitrary Kafka properties either at the binder level documentation here
Key/Value map of arbitrary Kafka client consumer properties. In addition to support known Kafka consumer properties, unknown consumer properties are allowed here as well. Properties here supersede any properties set in boot and in the configuration property above.
Default: Empty map.
Or at the binding level documentation here.<channelName>.consumer.configuration
Map with a key/value pair containing generic Kafka consumer properties. In addition to having Kafka consumer properties, other configuration properties can be passed here. For example some properties needed by the application such as
Default: Empty map.
You can get notified of commit failures by adding an OffsetCommitCallback to the listener container's ContainerProperties and setting syncCommits to false. To customize the container and its properties, add a ListenerContainerCustomizer bean to the application.
Async commit callback...
public class So57970152Application {
public static void main(String[] args) {, args);
public ListenerContainerCustomizer<AbstractMessageListenerContainer<byte[], byte[]>> customizer() {
return (container, dest, group) -> {
container.getContainerProperties().setCommitCallback((map, ex) -> {
if (ex == null) {
System.out.println("Successful commit for " + map);
else {
System.out.println("Commit failed for " + map + ": " + ex.getMessage());
public void listen(String in) {
public ApplicationRunner runner(KafkaTemplate<byte[], byte[]> template) {
return args -> {
template.send("input", "foo".getBytes());
Manual commits (sync)...
public class So57970152Application {
public static void main(String[] args) {, args);
public ListenerContainerCustomizer<AbstractMessageListenerContainer<byte[], byte[]>> customizer() {
return (container, dest, group) -> {
public void listen(String in, #Header(KafkaHeaders.ACKNOWLEDGMENT) Acknowledgment ack) {
try {
ack.acknowledge(); // MUST USE MANUAL_IMMEDIATE for this to work.
System.out.println("Commit successful");
catch (Exception e) {
System.out.println("Commit failed " + e.getMessage());
public ApplicationRunner runner(KafkaTemplate<byte[], byte[]> template) {
return args -> {
template.send("input", "foo".getBytes());
Set you heartbeat interval to less that 1/3rd of your session timeout. If the broker cannot determine if your consumer is alive, it will initiate a partition rebalance among the remaining consumers. So you have a heartbeat thread to inform the broker that the consumer is alive in case the application is taking a bit longer to process. Change these in your consumer configs:
Try increasing the session timeout if it does not work. You have to fiddle around with these values.

Spring Batch restart functionality not working when using #StepScope

I want to use Spring Batch (v3.0.9) restart functionality so that when JobInstance restarted the process step reads from the last failed chunk point forward. My restart works fine as long as I don't use #StepScope annotation to my myBatisPagingItemReader bean method.
I was using #StepScope so that i can do late binding to get the JobParameters in my myBatisPagingItemReader bean method #Value("#{jobParameters['run-date']}"))
If I use #StepScope annotation on myBatisPagingItemReader() bean method the restart does not work as it creates new instance (scope=step, name=scopedTarget.myBatisPagingItemReader).
If i use stepscope, is it possible for my myBatisPagingItemReader to set the read.count from the last failure to get restart working?
I have explained this issue with example below.
public class BatchConfig {
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<Model> myBatisPagingItemReader,
ItemProcessor<Model, Model> itemProcessor,
ItemWriter<Model> itemWriter) {
return stepBuilderFactory.get("data-load")
.<Model, Model>chunk(10)
.listener(new JobParameterExecutionContextCopyListener())
public Job job(JobBuilderFactory jobBuilderFactory, #Qualifier("step1")
Step step1) {
return jobBuilderFactory.get("load-job")
.incrementer(new RunIdIncrementer())
public ItemReader<Model> myBatisPagingItemReader(
SqlSessionFactory sqlSessionFactory,
#Value("#{JobParameters['run-date']}") String runDate)
MyBatisPagingItemReader<Model> reader = new
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("runDate", runDate);
return reader;
Restart Example when I use #Stepscope annotation to myBatisPagingItemReader(), the reader is fetching 5 records and I have chunk size(commit-interval) set to 3.
Job Instance - 01 - Job Parameter - 01/02/2019.
- process record-1
- process record-2
- process record-3
writer - writes all 3 records
chunk-1 commit successful
process record-4
process record-5 - Throws and exception
Job completes and set to 'FAILED' status
Now the Job is Restarted again using same Job Parameter.
Job Instance - 01 - Job Parameter - 01/02/2019.
process record-1
process record-2
process record-3
writer - writes all 3 records
chunk-1 commit successful
process record-4
process record-5 - Throws and exception
Job completes and set to 'FAILED' status
The #StepScope annotation on myBatisPagingItemReader() bean method creates a new instance , see below log message.
Creating object in scope=step, name=scopedTarget.myBatisPagingItemReader
Registered destruction callback in scope=step, name=scopedTarget.myBatisPagingItemReader
As it is new instance it start the process from start, instead of starting from chunk-2.
If i don't use #Stepscope, it restarts from chunk-2 as the restarted job step sets -
The issue here is that you are returning an ItemReader instead of the fully qualified class (MyBatisPagingItemReader) or at least ItemStreamReader. When you use Spring Batch's step scope, we create a proxy to allow for late initialization. The proxy is based on the return type of the method (ItemReader in your case). The issue you are running into is that because the proxy is of ItemReader, Spring Batch does not know that your bean also implements ItemStream and it is that interface that enables restartability. By default, Spring Batch will automatically register all beans of type ItemStream for you (you can also explicitly register the beans yourself, but it's typically not needed).
To address your issue, the following should work (note the change in the return type):
public MyBatisPagingItemReader<Model> myBatisPagingItemReader(
SqlSessionFactory sqlSessionFactory,
#Value("#{JobParameters['run-date']}") String runDate) {
MyBatisPagingItemReader<Model> reader =
new MyBatisPagingItemReader<>();
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("runDate", runDate);
return reader;
This is why it is my recommendation that where possible, when using #Bean annotated methods, you should return the most concrete type possible to allow Spring to help as much as possible.

RestTemplate.postForObject() Read timed out EVEN THOUGH SUCCESSFUL

I have two Java Spring Boot web service apps on the same server calling each other via REST. Service A calls Service B and the latter successfully acts upon the notfication.
THE PROBLEM is that Service A never receives the acknowlegement from Service B, so it thinks it has failed, and in accordance with its looping recovery logic, it tries again…and again…and again. Service B ends up doing 3 times the work for no added benefit.
The relevant code (stripped down and falsified to protect the guilty) is as follows:
Service A:
public void giveOrderToServiceB(#RequestBody CustomClass message) {
org.springframework.web.client.RestTemplate template = new RestTemplate(clientHttpRequestFactory());
com.mycompany.CustomReply reply = template.postForObject(serviceBUrl, message, CustomReply.class);
Service B REST Controller:
#PostMapping(value="ExecuteTheWork", produces=org.springframework.http.MediaType.APPLICATION_JSON_VALUE, consumes=MediaType.APPLICATION_JSON_VALUE)
public #ResponseBody CustomReply executeTheWork(#RequestBody CustomClass thing) {
// do something with the thing...
CustomReply reply = new CustomReply();
reply.setReply("Successfully executed the work.");
return reply;
The actual exception caught by Service A after calling RestTemplate.postForObject() is Read timed out
Please advise.
OK, I think I got it. I don't send the response back from Service B until after the method has completed all of its work, which can take several seconds to several minutes.
If I immediately answer (and skip the processing), it works consistently.
Need to spin off the actual work to a separate thread.
When you are registering the bean of rest template in your application it must then configure it with a timeout. Following is the Spring application config file
package com.temp.project.config;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.http.client.ClientHttpRequestFactory;
import org.springframework.http.client.HttpComponentsClientHttpRequestFactory;
import org.springframework.web.client.RestTemplate;
public class TempProjectConfig {
* Set Timeout for HTTP requests
* #return
public ClientHttpRequestFactory getClientHttpRequestFactory() {
int timeout = 1200000; // here is the timeout property set for rest template
HttpComponentsClientHttpRequestFactory clientHttpRequestFactory
= new HttpComponentsClientHttpRequestFactory();
return clientHttpRequestFactory;
* RestTemplate to call REST endpoints
* #param clientHttpRequestFactory
* #return
public RestTemplate getRestTemplate(ClientHttpRequestFactory clientHttpRequestFactory) {
return new RestTemplate(clientHttpRequestFactory);

Spring Kafka: How to discard messages already retrieved by poll() after doing a seek()?

This is a followup question to - Reading the same message several times from Kafka. If there is a better way to ask this question without posting a new question, let me know. In this post Gary mentions
"But you will still see later messages first if they have already been retrieved so you will have to discard those too."
Is there a clean way to discard messages already read by poll() after calling seek()? I started implementing logic to do this by saving the initial offset (in recordOffset), incrementing it on success. On failure, I call seek() and set the value of recordOffset to record.offset(). Then for every new message I check to see if the record.offset() is greater than recordOffset. If it is, I simply call acknowledge(), thereby "discarding" all the previously read messages. Here is the code -
// in onMessage()...
if (record.offset() > recordOffset){
try {
recordOffset = record.offset()+1;
} catch (Exception e) {
recordOffset = record.offset();, record.partition(), record.offset());
The problem with this approach is that it gets complicated with multiple partitions. Is there an easier/cleaner way?
Based on Gary's suggestion below, I tried adding an errorHandler like this -
#KafkaListener(topicPartitions =
{#org.springframework.kafka.annotation.TopicPartition(topic = "${kafka.consumer.topic}", partitions = { "1" })},
errorHandler = "SeekToCurrentErrorHandler")
Is there something wrong with this syntax as I get "Cannot resolve method 'errorHandler'"?
After Gary explained the 2 error handlers, I removed the above errorHandler and added below to the config file -
public ConcurrentKafkaListenerContainerFactory kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(new DefaultKafkaConsumerFactory<>(kafkaProps()));
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
return factory;
When I start the application, I get this error now...
java.lang.NoSuchMethodError: org.springframework.util.Assert.state(ZLjava/util/function/Supplier;)V
at org.springframework.kafka.listener.adapter.MessagingMessageListenerAdapter.determineInferredType(
Here is line 396 -
Assert.state(!this.isConsumerRecordList || validParametersForBatch,
() -> String.format(stateMessage, "ConsumerRecord"));
Assert.state(!this.isMessageList || validParametersForBatch,
() -> String.format(stateMessage, "Message<?>"));
Starting with version 2.0.1, if the container's ErrorHandler is a RemainingRecordsErrorHandler, such as the SeekToCurrentErrorHandler, the remaining records (including the failed one) are sent to the error handler instead of the listener.
This allows the SeekToCurrentErrorHandler to reposition every partition so the next poll will return the unprocessed record(s).
* An error handler that seeks to the current offset for each topic in the remaining
* records. Used to rewind partitions after a message failure so that it can be
* replayed.
* #author Gary Russell
* #since 2.0.1
public class SeekToCurrentErrorHandler implements RemainingRecordsErrorHandler
There are two types of error handler. The KafkaListenerErrorHandler (specified in the annotation) works at the listener level; it is wired into the listener adapter that invokes the #KafkaListener annotation and thus only has access to the current record.
The second error handler (configured on the listener container) works at the container level and thus has access to the remaining records. The SeekToCurrentErrorHandler is a container-level error handler.
It is configured on the container properties in the container factory...
public ConcurrentKafkaListenerContainerFactory kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory factory = new ConcurrentKafkaListenerContainerFactory();
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
return factory;
You go right way and yes you have to deal with different partitions as well. There is a FilteringMessageListenerAdapter, but you still have to write the logic.