Exactly once semantic with spring kafka

Exactly once semantic with spring kafka - apache-kafka

Im trying to test my exactly once configuration to make sure all the configs i set are correct and the behavior is as i expect
I seem to encounter a problem with duplicate sends
public static void main(String[] args) {
MessageProducer producer = new ProducerBuilder()
.setBootstrapServers("kafka:9992")
.setKeySerializerClass(StringSerializer.class)
.setValueSerializerClass(StringSerializer.class)
.setProducerEnableIdempotence(true).build();
MessageConsumer consumer = new ConsumerBuilder()
.setBootstrapServers("kafka:9992")
.setIsolationLevel("read_committed")
.setTopics("someTopic2")
.setGroupId("bla")
.setKeyDeserializerClass(StringDeserializer.class)
.setValueDeserializerClass(MapDeserializer.class)
.setConsumerMessageLogic(new ConsumerMessageLogic() {
#Override
public void onMessage(ConsumerRecord cr, Acknowledgment acknowledgment) {
producer.sendMessage(new TopicPartition("someTopic2", cr.partition()),
new OffsetAndMetadata(cr.offset() + 1),"something1", "im in transaction", cr.key());
acknowledgment.acknowledge();
}
}).build();
consumer.start();
}
this is my "test", you can assume the builder puts the right configuration.
ConsumerMessageLogic is a class that handles the "process" part of the read-process-write that the exactly once semantic is supporting
inside the producer class i have a send message method like so:
public void sendMessage(TopicPartition topicPartition, OffsetAndMetadata offsetAndMetadata,String sendToTopic, V message, PK partitionKey) {
try {
KafkaRecord<PK, V> partitionAndMessagePair = producerMessageLogic.prepareMessage(topicPartition.topic(), partitionKey, message);
if(kafkaTemplate.getProducerFactory().transactionCapable()){
kafkaTemplate.executeInTransaction(operations -> {
sendMessage(message, partitionKey, sendToTopic, partitionAndMessagePair, operations);
operations.sendOffsetsToTransaction(
Map.of(topicPartition, offsetAndMetadata),"bla");
return true;
});
}else{
sendMessage(message, partitionKey, topicPartition.topic(), partitionAndMessagePair, kafkaTemplate);
}
}catch (Exception e){
failureHandler.onFailure(partitionKey, message, e);
}
}
I create my consumer like so:
/**
* Start the message consumer
* The record event will be delegate on the onMessage()
*/
public void start() {
initConsumerMessageListenerContainer();
container.start();
}
/**
* Initialize the kafka message listener
*/
private void initConsumerMessageListenerContainer() {
// start a acknowledge message listener to allow the manual commit
messageListener = consumerMessageLogic::onMessage;
// start and initialize the consumer container
container = initContainer(messageListener);
// sets the number of consumers, the topic partitions will be divided by the consumers
container.setConcurrency(springConcurrency);
springContainerPollTimeoutOpt.ifPresent(p -> container.getContainerProperties().setPollTimeout(p));
if (springAckMode != null) {
container.getContainerProperties().setAckMode(springAckMode);
}
}
private ConcurrentMessageListenerContainer<PK, V> initContainer(AcknowledgingMessageListener<PK, V> messageListener) {
return new ConcurrentMessageListenerContainer<>(
consumerFactory(props),
containerProperties(messageListener));
}
when i create my producer i create it with UUID as transaction prefix like so
public ProducerFactory<PK, V> producerFactory(boolean isTransactional) {
ProducerFactory<PK, V> res = new DefaultKafkaProducerFactory<>(props);
if(isTransactional){
((DefaultKafkaProducerFactory<PK, V>) res).setTransactionIdPrefix(UUID.randomUUID().toString());
((DefaultKafkaProducerFactory<PK, V>) res).setProducerPerConsumerPartition(true);
}
return res;
}
Now after everything is set up, i bring 2 instances up on a topic with 2 partitions
each instance get 1 partitions from the consumed topic.
i send a message and wait in debug for the transaction timeout ( to simulate loss of connection)
in instance A, once the timeout passes the other instance( instance B) automatically processes the record and send it to the target topic cause a re-balance occurred
So far so good.
Now when i release the break point on instance A, it says its re-balancing and couldn't commit, but i still see another output record in my destination topic.
My expectation was that instance A wont continue its work once i release the breakpoint as the record was already processed.
Am i doing something wrong?
Can this scenario be achieved?
edit 2:
after garys remarks about the execute in transaction, i get the duplicate record if i freeze one of the instances till the timeout and release it after the other instance processed the record, then the freezed instance process and produce the same record to the out put topic...
public static void main(String[] args) {
MessageProducer producer = new ProducerBuilder()
.setBootstrapServers("kafka:9992")
.setKeySerializerClass(StringSerializer.class)
.setValueSerializerClass(StringSerializer.class)
.setProducerEnableIdempotence(true).build();
MessageConsumer consumer = new ConsumerBuilder()
.setBootstrapServers("kafka:9992")
.setIsolationLevel("read_committed")
.setTopics("someTopic2")
.setGroupId("bla")
.setKeyDeserializerClass(StringDeserializer.class)
.setValueDeserializerClass(MapDeserializer.class)
.setConsumerMessageLogic(new ConsumerMessageLogic() {
#Override
public void onMessage(ConsumerRecord cr, Acknowledgment acknowledgment) {
producer.sendMessage("something1", "im in transaction");
}
}).build();
consumer.start(producer.getProducerFactory());
}
the new sendMessage method in the producer without executeInTransaction
public void sendMessage(V message, PK partitionKey, String topicName) {
try {
KafkaRecord<PK, V> partitionAndMessagePair = producerMessageLogic.prepareMessage(topicName, partitionKey, message);
sendMessage(message, partitionKey, topicName, partitionAndMessagePair, kafkaTemplate);
}catch (Exception e){
failureHandler.onFailure(partitionKey, message, e);
}
}
as well as i changed the consumer container creation to have a transaction manager with the same producerfactory as suggested
/**
* Initialize the kafka message listener
*/
private void initConsumerMessageListenerContainer(ProducerFactory<PK,V> producerFactory) {
// start a acknowledge message listener to allow the manual commit
acknowledgingMessageListener = consumerMessageLogic::onMessage;
// start and initialize the consumer container
container = initContainer(acknowledgingMessageListener, producerFactory);
// sets the number of consumers, the topic partitions will be divided by the consumers
container.setConcurrency(springConcurrency);
springContainerPollTimeoutOpt.ifPresent(p -> container.getContainerProperties().setPollTimeout(p));
if (springAckMode != null) {
container.getContainerProperties().setAckMode(springAckMode);
}
}
private ConcurrentMessageListenerContainer<PK, V> initContainer(AcknowledgingMessageListener<PK, V> messageListener, ProducerFactory<PK,V> producerFactory) {
return new ConcurrentMessageListenerContainer<>(
consumerFactory(props),
containerProperties(messageListener, producerFactory));
}
#NonNull
private ContainerProperties containerProperties(MessageListener<PK, V> messageListener, ProducerFactory<PK,V> producerFactory) {
ContainerProperties containerProperties = new ContainerProperties(topics);
containerProperties.setMessageListener(messageListener);
containerProperties.setTransactionManager(new KafkaTransactionManager<>(producerFactory));
return containerProperties;
}
my expectation is that the broker once receiving the processed record from the freezed instance, that it'll know that that record was already handled by another instance as it contains the exact same metadata ( or is it? i mean, the PID will be different, but should it be different?)
Maybe the scenario im looking for is not even supported in the current exactly once support kafka and spring provides...
if i have 2 instances of read-process-write - that means i have 2 producers with 2 different PID's.
Now when i freeze one of the instances, when the unfrozen instance gets the record process responsibility due to a rebalance, it will send the record with its own PID and a sequence in the metadata.
Now when i release the frozen instance, he sends the same record but with its own PID, so theres no way the broker will know its a duplicate...
Am i wrong? how can i avoid this scenario? i though the re-balance stops the instance and doesnt let it complete its process ( where he produce the duplicate record) cause he no longer has responsibility about that record
Adding the logs:
frozen instance: you can see the freeze time at 10:53:34 and i released it at 10:54:02 ( rebalance time is 10 secs)
2020-06-16 10:53:34,393 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
Created new Producer: CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
2020-06-16 10:53:34,394 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
beginTransaction()
2020-06-16 10:53:34,395 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.doBegin:149] Created
Kafka transaction on producer [CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]]
2020-06-16 10:54:02,157 INFO [${sys:spring.application.name}] [kafka-
coordinator-heartbeat-thread | bla]
[o.a.k.c.c.i.AbstractCoordinator.:] [Consumer clientId=consumer-bla-1,
groupId=bla] Group coordinator X.X.X.X:9992 (id: 2147482646 rack:
null) is unavailable or invalid, will attempt rediscovery
2020-06-16 10:54:02,181 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Sending offsets to transaction: {someTopic2-
0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
2020-06-16 10:54:02,189 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] [i.i.k.s.p.SimpleSuccessHandler.:] Sent
message=[im in transaction] with offset=[252] to topic something1
2020-06-16 10:54:02,193 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] [o.a.k.c.p.i.TransactionManager.:]
[Producer clientId=producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0, transactionalId=b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] Discovered group coordinator
X.X.X.X:9992 (id: 1001 rack: null)
2020-06-16 10:54:02,263 INFO [${sys:spring.application.name}] [kafka-
coordinator-heartbeat-thread | bla]
[o.a.k.c.c.i.AbstractCoordinator.:] [Consumer clientId=consumer-bla-1,
groupId=bla] Discovered group coordinator 192.168.144.1:9992 (id:
2147482646 rack: null)
2020-06-16 10:54:02,295 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.processCommit:740]
Initiating transaction commit
2020-06-16 10:54:02,296 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
commitTransaction()
2020-06-16 10:54:02,299 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
2020-06-16 10:54:02,301 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.AbstractCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Attempt to heartbeat failed for
since member id consumer-bla-1-b3ad1c09-ad06-4bc4-a891-47a2288a830f is
not valid.
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.ConsumerCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Giving away all assigned
partitions as lost since generation has been reset,indicating that
consumer is no longer part of the group
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.ConsumerCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Lost previously assigned
partitions someTopic2-0
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.l.ConcurrentMessageListenerContainer.info:279]
bla: partitions lost: [someTopic2-0]
2020-06-16 10:54:02,303 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.l.ConcurrentMessageListenerContainer.info:279]
bla: partitions revoked: [someTopic2-0]
2020-06-16 10:54:02,303 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
The regular instance that takes over the partation and produce the record after a rebalance
2020-06-16 10:53:46,536 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
Created new Producer: CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
2020-06-16 10:53:46,537 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
beginTransaction()
2020-06-16 10:53:46,539 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.doBegin:149] Created
Kafka transaction on producer [CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]]
2020-06-16 10:53:46,556 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Sending offsets to transaction: {someTopic2-
0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
2020-06-16 10:53:46,563 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] [i.i.k.s.p.SimpleSuccessHandler.:] Sent
message=[im in transaction] with offset=[250] to topic something1
2020-06-16 10:53:46,566 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] [o.a.k.c.p.i.TransactionManager.:]
[Producer clientId=producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0, transactionalId=1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] Discovered group coordinator
X.X.X.X:9992 (id: 1001 rack: null)
2020-06-16 10:53:46,668 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.processCommit:740]
Initiating transaction commit
2020-06-16 10:53:46,669 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
commitTransaction()
2020-06-16 10:53:46,672 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
2020-06-16 10:53:51,673 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Received: 0 records
I noticed they both note the exact same offset to commit
Sending offsets to transaction: {someTopic2-0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
i thought when they try to commit the exact same thing the broker will abort one of the transactions...
I also noticed that if i reduce the transaction.timeout.ms to just 2 seconds, it doesnt abort the transaction no matter how long i freeze the instance on debug...
maybe the timer of transaction.timeout.ms starts only after i send the message?

You must not use executeInTransaction at all - see its Javadocs; it is used when there is no active transaction or if you explicitly don't want an operation to participate in an existing transaction.
You need to add a KafkaTransactionManager to the listener container; it must have a reference to same ProducerFactory as the template.
Then, the container will start the transaction and, if successful, send the offset to the transaction.

Related

Facing issue while consuming or producing data to a topic on specific server VPC

I m trying make producer using kafka and spring boot.
I have tried creating new application to produce message on a topic and to be consumed by other application. When m starting the server topic is not being recognized initially only. Error which is coming is shown below:
2021-09-03 15:33:20.024 WARN 1 --- [ntainer#0-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=lims-public-helper] Error while fetching metadata with correlation id 9 : { sms.requests=UNKNOWN_TOPIC_OR_PARTITION}
2021-09-03 15:33:20.026 WARN 1 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-1, groupId=lims-public-helper] The following subscribed topics are not assigned to any members: [ sms.requests]
I tried with other server it is working very fine with same configuration tried this server it is giving exception.
Kafka topics
$ kaf topics
NAME PARTITIONS REPLICAS
__consumer_offsets 50 3
__trace 9 1
sms.requests 3 1
sms.status 1 3
test 1 3
Consumer code:
public ConsumerFactory<String, OtpDTO> otpConsumerFactory() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
props.put(ConsumerConfig.GROUP_ID_CONFIG, limsGroupId);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, JsonDeserializer.class);
props.put(JsonSerializer.TYPE_MAPPINGS, "otpDTO:com.lims.helper.dto.OtpDTO");
props.put(JsonDeserializer.VALUE_DEFAULT_TYPE, "com.lims.helper.dto.OtpDTO");
props.put(ErrorHandlingDeserializer.KEY_DESERIALIZER_CLASS, StringDeserializer.class);
props.put(ErrorHandlingDeserializer.VALUE_DESERIALIZER_CLASS, JsonDeserializer.class);
return new DefaultKafkaConsumerFactory<>(props, new StringDeserializer(), new JsonDeserializer<>(OtpDTO.class));
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, OtpDTO> otpKafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, OtpDTO> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(otpConsumerFactory());
return factory;
}
consumer listner details:
#KafkaListener(topics = "${spring.kafka.topic.lims.sms.otp}", containerFactory = "otpKafkaListenerContainerFactory")
public void otpTopicMessage(#Payload OtpDTO otpDTO) {
log.info(String.format("--------##### otp topic consumer: %s", otpDTO));
}
properties details of topic:
spring.kafka.topic.lims.sms.otp=sms.requests
spring.kafka.topic.lims.sms.status=sms.status

CommitFailedException by Spring Kafka Consumer

I get the below error message sometimes while using Spring Kafka Consumer .I have implemented at least once semantics as shown in the code snippet
1 )My doubt is do I miss any message from consuming?
2)Do i need to handle this error .As this error was not reported by seekToCurrentErrorHandler()
org.apache.kafka.clients.consumer.CommitFailedException: Offset commit
cannot be completed since the consumer is not part of an active group
for auto partition assignment; it is likely that the consumer was
kicked out of the group.
My spring kafka consumer code snippet
public class KafkaConsumerConfig implements KafkaListenerConfigurer
#Bean
public SeekToCurrentErrorHandler seekToCurrentErrorHandler() {
SeekToCurrentErrorHandler seekToCurrentErrorHandler = new SeekToCurrentErrorHandler((record, e) -> {
System.out.println("RECORD from topic " + record.topic() + " at partition " + record.partition()
+ " at offset " + record.offset() + " did not process correctly due to a " + e.getCause());
}, new FixedBackOff(500L, 3L));
return seekToCurrentErrorHandler;
}
#Bean
public ConsumerFactory<String, ValidatedConsumerClass> consumerFactory() {
ErrorHandlingDeserializer<ValidatedConsumerClass> errorHandlingDeserializer;
errorHandlingDeserializer = new ErrorHandlingDeserializer<>( new JsonDeserializer<>(ValidatedConsumerClass.class));
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "grpid-098");
props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
return new DefaultKafkaConsumerFactory<>(props, new StringDeserializer(),
errorHandlingDeserializer);
}
#Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, ValidatedConsumerClass>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, ValidatedConsumerClass> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(AckMode.RECORD);
factory.setErrorHandler(seekToCurrentErrorHandler());
return factory;
}
Consumer reading the message
#Service
public class KafKaConsumerService extends AbstractConsumerSeekAware {
#KafkaListener(id = "foo", topics = "mytopic-5", concurrency = "5", groupId = "mytopic-1-groupid")
public void consumeFromTopic1(#Payload #Valid ValidatedConsumerClass message, ConsumerRecordMetadata c) {
databaseService.save(message);
System.out.println( "-- Consumer End -- " + c.partition() + " ---consumer thread-- " + Thread.currentThread().getName());
}

No, you are not missing anything.
No, you do not need to handle it, the STCEH already handled it and the record will be redelivered on the next poll.
In this case, the exception is caused outside of record processing (after processing is complete). Since the commit failed due to a rebalance, there is no need for the STCEH to reseeek (and it can't anyway because the records are no longer available). It simply rethrows the exception.
Everything works as expected...
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.properties.max.poll.interval.ms=5000
#SpringBootApplication
public class So69016372Application {
public static void main(String[] args) {
SpringApplication.run(So69016372Application.class, args);
}
#KafkaListener(id = "so69016372", topics = "so69016372")
public void listen(String in, #Header(KafkaHeaders.OFFSET) long offset) throws InterruptedException {
System.out.println(in + " #" + offset);
Thread.sleep(6000);
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("so69016372").partitions(1).replicas(1).build();
}
}
Result
2021-09-01 13:47:26.963 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
2021-09-01 13:47:31.991 INFO 13195 --- [ad | so69016372] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Member consumer-so69016372-1-f02f8d74-c2b8-47d9-92d3-bf68e5c81a8f sending LeaveGroup request to coordinator localhost:9092 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2021-09-01 13:47:32.989 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Failing OffsetCommit request since the consumer is not part of an active group
2021-09-01 13:47:32.994 ERROR 13195 --- [o69016372-0-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:200) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:112) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1602) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1210) ~[spring-kafka-2.7.6.jar:2.7.6]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1139) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1004) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1495) ~[kafka-clients-2.7.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2710) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2705) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2691) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2489) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1235) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1161) ~[spring-kafka-2.7.6.jar:2.7.6]
... 3 common frames omitted
2021-09-01 13:47:32.994 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
2021-09-01 13:47:32.994 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Lost previously assigned partitions so69016372-0
2021-09-01 13:47:32.995 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions lost: [so69016372-0]
2021-09-01 13:47:32.995 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions revoked: [so69016372-0]
...
2021-09-01 13:47:33.102 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
2021-09-01 13:47:38.141 INFO 13195 --- [ad | so69016372] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Member consumer-so69016372-1-e6ec685a-d9aa-43d3-b526-b04418095f09 sending LeaveGroup request to coordinator localhost:9092 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2021-09-01 13:47:39.108 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Failing OffsetCommit request since the consumer is not part of an active group
2021-09-01 13:47:39.109 ERROR 13195 --- [o69016372-0-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:200) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:112) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1602) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1210) ~[spring-kafka-2.7.6.jar:2.7.6]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1139) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1004) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1495) ~[kafka-clients-2.7.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2710) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2705) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2691) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2489) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1235) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1161) ~[spring-kafka-2.7.6.jar:2.7.6]
... 3 common frames omitted
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Lost previously assigned partitions so69016372-0
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions lost: [so69016372-0]
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions revoked: [so69016372-0]
...
2021-09-01 13:47:39.217 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
It will retry indefinitely.

How to handle UnkownProducerIdException

We are having some troubles with Spring Cloud and Kafka, at sometimes our microservice throws an UnkownProducerIdException, this is caused if the parameter transactional.id.expiration.ms is expired in the broker side.
My question, could it be possible to catch that exception and retry the failed message? If yes, what could be the best option to handle it?
I have took a look at:
- https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89068820
- Kafka UNKNOWN_PRODUCER_ID exception
We are using Spring Cloud Hoxton.RELEASE version and Spring Kafka version 2.2.4.RELEASE
We are using AWS Kafka solution so we can't set a new value on that property I mentioned before.
Here is some trace of the exception:
2020-04-07 20:54:00.563 ERROR 5188 --- [ad | producer-2] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-2] The broker returned org.apache.kafka.common.errors.UnknownProducerIdException: This exception is raised by the broker if it could not locate the producer metadata associated with the producerId in question. This could happen if, for instance, the producer's records were deleted because their retention time had elapsed. Once the last records of the producerId are removed, the producer's metadata is removed from the broker, and future appends by the producer will return this exception. for topic-partition test.produce.another-2 with producerId 35000, epoch 0, and sequence number 8
2020-04-07 20:54:00.563 INFO 5188 --- [ad | producer-2] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-2] ProducerId set to -1 with epoch -1
2020-04-07 20:54:00.565 ERROR 5188 --- [ad | producer-2] o.s.k.support.LoggingProducerListener : Exception thrown when sending a message with key='null' and payload='{...}' to topic <some-topic>:
To reproduce this exception:
- I have used the confluent docker images and set the environment variable KAFKA_TRANSACTIONAL_ID_EXPIRATION_MS to 10 seconds so I wouldn't wait too much for this exception to be thrown.
- In another process, send one by one in interval of 10 seconds 1 message in the topic the java will listen.
Here is a code example:
File Bindings.java
import org.springframework.cloud.stream.annotation.Input;
import org.springframework.cloud.stream.annotation.Output;
import org.springframework.messaging.MessageChannel;
import org.springframework.messaging.SubscribableChannel;
public interface Bindings {
#Input("test-input")
SubscribableChannel testListener();
#Output("test-output")
MessageChannel testProducer();
}
File application.yml (don't forget to set the environment variable KAFKA_HOST):
spring:
cloud:
stream:
kafka:
binder:
auto-create-topics: true
brokers: ${KAFKA_HOST}
transaction:
producer:
error-channel-enabled: true
producer-properties:
acks: all
retry.backoff.ms: 200
linger.ms: 100
max.in.flight.requests.per.connection: 1
enable.idempotence: true
retries: 3
compression.type: snappy
request.timeout.ms: 5000
key.serializer: org.apache.kafka.common.serialization.StringSerializer
consumer-properties:
session.timeout.ms: 20000
max.poll.interval.ms: 350000
enable.auto.commit: true
allow.auto.create.topics: true
auto.commit.interval.ms: 12000
max.poll.records: 5
isolation.level: read_committed
configuration:
auto.offset.reset: latest
bindings:
test-input:
# contentType: text/plain
destination: test.produce
group: group-input
consumer:
maxAttempts: 3
startOffset: latest
autoCommitOnError: true
queueBufferingMaxMessages: 100000
autoCommitOffset: true
test-output:
# contentType: text/plain
destination: test.produce.another
group: group-output
producer:
acks: all
debug: true
The listener handler:
#SpringBootApplication
#EnableBinding(Bindings.class)
public class PocApplication {
private static final Logger log = LoggerFactory.getLogger(PocApplication.class);
public static void main(String[] args) {
SpringApplication.run(PocApplication.class, args);
}
#Autowired
private BinderAwareChannelResolver binderAwareChannelResolver;
#StreamListener(Topics.TESTLISTENINPUT)
public void listen(Message<?> in, String headerKey) {
final MessageBuilder builder;
MessageChannel messageChannel;
messageChannel = this.binderAwareChannelResolver.resolveDestination("test-output");
Object payload = in.getPayload();
builder = MessageBuilder.withPayload(payload);
try {
log.info("Event received: {}", in);
if (!messageChannel.send(builder.build())) {
log.error("Something happend trying send the message! {}", in.getPayload());
}
log.info("Commit success");
} catch (UnknownProducerIdException e) {
log.error("UnkownProducerIdException catched ", e);
} catch (KafkaException e) {
log.error("KafkaException catched ", e);
}catch (Exception e) {
System.out.println("Commit failed " + e.getMessage());
}
}
}
Regards

} catch (UnknownProducerIdException e) {
log.error("UnkownProducerIdException catched ", e);
To catch exceptions there, you need to set the sync kafka producer property (https://cloud.spring.io/spring-cloud-static/spring-cloud-stream-binder-kafka/3.0.3.RELEASE/reference/html/spring-cloud-stream-binder-kafka.html#kafka-producer-properties). Otherwise, the error comes back asynchronously
You should not "eat" the exception there; it must be thrown back to the container so the container will roll back the transaction.
Also,
}catch (Exception e) {
System.out.println("Commit failed " + e.getMessage());
}
The commit is performed by the container after the stream listener returns to the container so you will never see a commit error here; again, you must let the exception propagate back to the container.
The container will retry the delivery according to the consumer binding's retry configuration.

probably you can also use the callback function to handle the exception, not sure about the springframework lib for kafka, if using kafka client, you can something like this:
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata metadata, Exception e) {
if(e != null) {
e.printStackTrace();
if(e.getClass().equals(UnknownProducerIdException.class)) {
logger.info("UnknownProducerIdException caught");
while(--retry>=0) {
send(topic,partition,msg);
}
}
} else {
logger.info("The offset of the record we just sent is: " + metadata.offset());
}
}
});

FlinkKafkaConsumer010 doesn't work when set with setStartFromTimestamp

I'm using flink streaming and flink-connector-kafka to process data from kafka. when I configure FlinkKafkaConsumer010 with setStartFromTimestamp(1586852770000L) , at this time, all data's time in kafka topic A is before 1586852770000L, then I send some message to partition-0 and partition-4 of Topic A (Topic A has 6 partitions, current system time is already after 1586852770000L). but my flink program doesn't consume any data from Topic A. So is this a issue?
if I stop my flink program and restart it, it can consume data from partition-0 and partition-4 of Topic A , but still won't consume any data from other 4 partitions if i send data to the other 4 partitions unless i restart my flink program again.
the log of kafka is as follows:
2020-04-15 11:48:46,447 TRACE org.apache.kafka.clients.consumer.internals.Fetcher - Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps={TopicA-4=1586836800000}, minVersion=1) to broker server1:9092 (id: 185 rack: null)
2020-04-15 11:48:46,463 TRACE org.apache.kafka.clients.NetworkClient - Sending {replica_id=-1,topics=[{topic=TopicA,partitions=[{partition=0,timestamp=1586836800000}]}]} to node 184.
2020-04-15 11:48:46,466 TRACE org.apache.kafka.clients.NetworkClient - Completed receive from node 185, for key 2, received {responses=[{topic=TopicA,partition_responses=[{partition=4,error_code=0,timestamp=1586852770000,offset=4}]}]}
2020-04-15 11:48:46,467 TRACE org.apache.kafka.clients.consumer.internals.Fetcher - Received ListOffsetResponse {responses=[{topic=TopicA,partition_responses=[{partition=4,error_code=0,timestamp=1586852770000,offset=4}]}]} from broker server1:9092 (id: 185 rack: null)
2020-04-15 11:48:46,467 DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - Handling ListOffsetResponse response for TopicA-4. Fetched offset 4, timestamp 1586852770000
2020-04-15 11:48:46,448 TRACE org.apache.kafka.clients.consumer.internals.Fetcher - Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps={TopicA-0=1586836800000}, minVersion=1) to broker server2:9092 (id: 184 rack: null)
2020-04-15 11:48:46,463 TRACE org.apache.kafka.clients.NetworkClient - Sending {replica_id=-1,topics=[{topic=TopicA,partitions=[{partition=0,timestamp=1586836800000}]}]} to node 184.
2020-04-15 11:48:46,467 TRACE org.apache.kafka.clients.NetworkClient - Completed receive from node 184, for key 2, received {responses=[{topic=TopicA,partition_responses=[{partition=0,error_code=0,timestamp=1586863210000,offset=47}]}]}
2020-04-15 11:48:46,467 TRACE org.apache.kafka.clients.consumer.internals.Fetcher - Received ListOffsetResponse {responses=[{topic=TopicA,partition_responses=[{partition=0,error_code=0,timestamp=1586863210000,offset=47}]}]} from broker server2:9092 (id: 184 rack: null)
2020-04-15 11:48:46,467 DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - Handling ListOffsetResponse response for TopicA-0. Fetched offset 47, timestamp 1586863210000
2020-04-15 11:48:46,448 TRACE org.apache.kafka.clients.consumer.internals.Fetcher - Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps={TopicA-2=1586836800000}, minVersion=1) to broker server3:9092 (id: 183 rack: null)
2020-04-15 11:48:46,465 TRACE org.apache.kafka.clients.NetworkClient - Sending {replica_id=-1,topics=[{topic=TopicA,partitions=[{partition=2,timestamp=1586836800000}]}]} to node 183.
2020-04-15 11:48:46,468 TRACE org.apache.kafka.clients.NetworkClient - Completed receive from node 183, for key 2, received {responses=[{topic=TopicA,partition_responses=[{partition=2,error_code=0,timestamp=-1,offset=-1}]}]}
2020-04-15 11:48:46,468 TRACE org.apache.kafka.clients.consumer.internals.Fetcher - Received ListOffsetResponse {responses=[{topic=TopicA,partition_responses=[{partition=2,error_code=
0,timestamp=-1,offset=-1}]}]} from broker server3:9092 (id: 183 rack: null)
2020-04-15 11:48:46,468 DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - Handling ListOffsetResponse response for TopicA-2. Fetched offset -1, timestamp -1
2020-04-15 11:48:46,481 INFO org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase - Consumer subtask 0 will start reading the following 2 partitions from timestamp 1586836800000: [KafkaTopicPartition{topic='TopicA', partition=4}, KafkaTopicPartition{topic='TopicA', partition=0}]
from the log, except partition-0 and partition-4, other 4 partition's offset is -1. why the return offset is -1 instead of the lastest offset?
in Kafka client's code( Fetcher.java,line: 674-680)
// Handle v1 and later response
log.debug("Handling ListOffsetResponse response for {}. Fetched offset {}, timestamp {}",topicPartition, partitionData.offset, partitionData.timestamp);
if (partitionData.offset != ListOffsetResponse.UNKNOWN_OFFSET) {
OffsetData offsetData = new OffsetData(partitionData.offset, partitionData.timestamp);
timestampOffsetMap.put(topicPartition, offsetData);
}
the value of ListOffsetResponse.UNKNOWN_OFFSET is -1 . So the other 4 partitions is filtered , and the kafka consumer will not consume data from the other 4 partitions.
My Flink version is 1.9.2 and flink kafka connertor is
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_2.11</artifactId>
<version>1.9.2</version>
</dependency>
the doc of flink kafka connector is as follows:
setStartFromTimestamp(long): Start from the specified timestamp. For
each partition, the record whose timestamp is larger than or equal to
the specified timestamp will be used as the start position. If a
partition’s latest record is earlier than the timestamp, the partition
will simply be read from the latest record.
test program code:
import java.util.Properties
import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010
import org.junit.Test
class TestFlinkKafka {
#Test
def testFlinkKafkaDemo: Unit ={
//1. set up the streaming execution environment.
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic( TimeCharacteristic.ProcessingTime)
// To use fault tolerant Kafka Consumers, checkpointing needs to be enabled at the execution environment
env.enableCheckpointing(60000)
//2. kafka source
val topic = "message"
val schema = new SimpleStringSchema()
//server1:9092,server2:9092,server3:9092
val props = getKafkaConsumerProperties("localhost:9092","flink-streaming-client", "latest")
val consumer = new FlinkKafkaConsumer010(topic, schema, props)
//consume data from special timestamp's offset
//2020/4/14 20:0:0
//consumer.setStartFromTimestamp(1586865600000L)
//2020/4/15 20:0:0
consumer.setStartFromTimestamp(1586952000000L)
consumer.setCommitOffsetsOnCheckpoints(true)
//3. transform
val stream = env.addSource(consumer)
.map(x => x)
//4. sink
stream.print()
//5. execute
env.execute("testFlinkKafkaConsumer")
}
def getKafkaConsumerProperties(brokerList:String, groupId:String, offsetReset:String): Properties ={
val props = new Properties()
props.setProperty("bootstrap.servers", brokerList)
props.setProperty("group.id", groupId)
props.setProperty("auto.offset.reset", offsetReset)
props.setProperty("flink.partition-discovery.interval-millis", "30000")
props
}
}
set log level for kafka:
log4j.logger.org.apache.kafka=TRACE
create kafka topic:
kafka-topics --zookeeper localhost:2181/kafka --create --topic message --partitions 6 --replication-factor 1
send message to kafka topic
kafka-console-producer --broker-list localhost:9092 --topic message
{"name":"tom"}
{"name":"michael"}

This problem was resolved by upgrading the Flink/Kafka connector to the newer, universal connector -- FlinkKafkaConsumer -- available from flink-connector-kafka_2.11. This version of the connector is recommended for all versions of Kafka from 1.0.0 forward. With Kafka 0.10.x or 0.11.x, it is better to use the version-specific flink-connector-kafka-0.10_2.11 or flink-connector-kafka-0.11_2.11 connectors. (And in all cases, substitute 2.12 for 2.11 if you are using Scala 2.12.)
See the Flink documentation for more information on Flink's Kafka connector.

app is alive but stops consuming messages after a while

spring-kafka consumer stops consuming messages after a while. The stoppage happens every time, but never at the same duration. When app is no longer consuming, in the end of the log always I see the statement that consumer sent LEAVE_GROUP signal. If I am not seeing any errors or exceptions, why is the consumer leaving the group?
org.springframework.boot:spring-boot-starter-parent:2.0.4.RELEASE
spring-kafka:2.1.8.RELEASE
org.apache.kafka:kafka-clients:1.0.2
I've set logging as
logging.level.org.apache.kafka=DEBUG
logging.level.org.springframework.kafka=INFO
other settings
spring.kafka.listener.concurrency=5
spring.kafka.listener.type=single
spring.kafka.listener.ack-mode=record
spring.kafka.listener.poll-timeout=10000
spring.kafka.consumer.heartbeat-interval=5000
spring.kafka.consumer.max-poll-records=50
spring.kafka.consumer.fetch-max-wait=10000
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.properties.security.protocol=SSL
spring.kafka.consumer.retry.maxAttempts=3
spring.kafka.consumer.retry.backoffperiod.millisecs=2000
ContainerFactory setup
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> recordsKafkaListenerContainerFactory(RetryTemplate retryTemplate) {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
factory.setConcurrency(listenerCount);
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.RECORD);
factory.getContainerProperties().setPollTimeout(pollTimeoutMillis);
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
factory.getContainerProperties().setAckOnError(false);
factory.setRetryTemplate(retryTemplate);
factory.setStatefulRetry(true);
factory.getContainerProperties().setIdleEventInterval(60000L);
return factory;
}
Listener configuration
#Component
public class RecordsEventListener implements ConsumerSeekAware {
private static final org.slf4j.Logger LOG = org.slf4j.LoggerFactory.getLogger(RecordsEventListener.class);
#Value("${mode.replay:false}")
public void setModeReplay(boolean enabled) {
this.isReplay = enabled;
}
#KafkaListener(topics = "${event.topic}", containerFactory = "RecordsKafkaListenerContainerFactory")
public void handleEvent(#Payload String payload) throws RecordsEventListenerException {
try {
//business logic
} catch (Exception e) {
LOG.error("Process error for event: {}",payload,e);
if(isRetryableException(e)) {
LOG.warn("Retryable exception detected. Going to retry.");
throw new RecordsEventListenerException(e);
}else{
LOG.warn("Dropping event because non retryable exception");
}
}
}
private Boolean isRetryableException(Exception e) {
return binaryExceptionClassifier.classify(e);
}
#Override
public void registerSeekCallback(ConsumerSeekCallback callback) {
//do nothing
}
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
//do this only once per start of app
if (isReplay && !partitonSeekToBeginningDone) {
assignments.forEach((t, p) -> callback.seekToBeginning(t.topic(), t.partition()));
partitonSeekToBeginningDone = true;
}
}
#Override
public void onIdleContainer(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
//do nothing
LOG.info("Container is IDLE; no messages to pull.");
assignments.forEach((t,p)->LOG.info("Topic:{}, Partition:{}, Offset:{}",t.topic(),t.partition(),p));
}
boolean isPartitionSeekToBeginningDone() {
return partitonSeekToBeginningDone;
}
void setPartitonSeekToBeginningDone(boolean partitonSeekToBeginningDone) {
this.partitonSeekToBeginningDone = partitonSeekToBeginningDone;
}
}
When app is no longer consuming, in the end of the log always I see the statement that consumer sent LEAVE_GROUP signal.
2019-05-02 18:31:05.770 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending Heartbeat request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:05.770 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send HEARTBEAT {group_id=app,generation_id=6,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5347 to node 2147482638
2019-05-02 18:31:05.872 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Received successful Heartbeat response
2019-05-02 18:31:10.856 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending Heartbeat request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:10.857 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send HEARTBEAT {group_id=app,generation_id=6,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5348 to node 2147482638
2019-05-02 18:31:10.958 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Received successful Heartbeat response
2019-05-02 18:31:11.767 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending LeaveGroup request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:11.767 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send LEAVE_GROUP {group_id=app,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5349 to node 2147482638
2019-05-02 18:31:11.768 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Disabling heartbeat thread
Full log

Thanks to all who replied. Turns out, it was indeed the broker dropping the consumer on session timeout. The broker a very old version (0.10.0.1) did not accommodate the newer features as outlined in KIP-62 that spring-kafka version we used could make use of.
Since we could not dictate the upgrade to the broker or changes to session timeout, we simply modified our processing logic so as to finish the task under the session timeout.