app is alive but stops consuming messages after a while - apache-kafka

spring-kafka consumer stops consuming messages after a while. The stoppage happens every time, but never at the same duration. When app is no longer consuming, in the end of the log always I see the statement that consumer sent LEAVE_GROUP signal. If I am not seeing any errors or exceptions, why is the consumer leaving the group?
org.springframework.boot:spring-boot-starter-parent:2.0.4.RELEASE
spring-kafka:2.1.8.RELEASE
org.apache.kafka:kafka-clients:1.0.2
I've set logging as
logging.level.org.apache.kafka=DEBUG
logging.level.org.springframework.kafka=INFO
other settings
spring.kafka.listener.concurrency=5
spring.kafka.listener.type=single
spring.kafka.listener.ack-mode=record
spring.kafka.listener.poll-timeout=10000
spring.kafka.consumer.heartbeat-interval=5000
spring.kafka.consumer.max-poll-records=50
spring.kafka.consumer.fetch-max-wait=10000
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.properties.security.protocol=SSL
spring.kafka.consumer.retry.maxAttempts=3
spring.kafka.consumer.retry.backoffperiod.millisecs=2000
ContainerFactory setup
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> recordsKafkaListenerContainerFactory(RetryTemplate retryTemplate) {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
factory.setConcurrency(listenerCount);
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.RECORD);
factory.getContainerProperties().setPollTimeout(pollTimeoutMillis);
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
factory.getContainerProperties().setAckOnError(false);
factory.setRetryTemplate(retryTemplate);
factory.setStatefulRetry(true);
factory.getContainerProperties().setIdleEventInterval(60000L);
return factory;
}
Listener configuration
#Component
public class RecordsEventListener implements ConsumerSeekAware {
private static final org.slf4j.Logger LOG = org.slf4j.LoggerFactory.getLogger(RecordsEventListener.class);
#Value("${mode.replay:false}")
public void setModeReplay(boolean enabled) {
this.isReplay = enabled;
}
#KafkaListener(topics = "${event.topic}", containerFactory = "RecordsKafkaListenerContainerFactory")
public void handleEvent(#Payload String payload) throws RecordsEventListenerException {
try {
//business logic
} catch (Exception e) {
LOG.error("Process error for event: {}",payload,e);
if(isRetryableException(e)) {
LOG.warn("Retryable exception detected. Going to retry.");
throw new RecordsEventListenerException(e);
}else{
LOG.warn("Dropping event because non retryable exception");
}
}
}
private Boolean isRetryableException(Exception e) {
return binaryExceptionClassifier.classify(e);
}
#Override
public void registerSeekCallback(ConsumerSeekCallback callback) {
//do nothing
}
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
//do this only once per start of app
if (isReplay && !partitonSeekToBeginningDone) {
assignments.forEach((t, p) -> callback.seekToBeginning(t.topic(), t.partition()));
partitonSeekToBeginningDone = true;
}
}
#Override
public void onIdleContainer(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
//do nothing
LOG.info("Container is IDLE; no messages to pull.");
assignments.forEach((t,p)->LOG.info("Topic:{}, Partition:{}, Offset:{}",t.topic(),t.partition(),p));
}
boolean isPartitionSeekToBeginningDone() {
return partitonSeekToBeginningDone;
}
void setPartitonSeekToBeginningDone(boolean partitonSeekToBeginningDone) {
this.partitonSeekToBeginningDone = partitonSeekToBeginningDone;
}
}
When app is no longer consuming, in the end of the log always I see the statement that consumer sent LEAVE_GROUP signal.
2019-05-02 18:31:05.770 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending Heartbeat request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:05.770 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send HEARTBEAT {group_id=app,generation_id=6,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5347 to node 2147482638
2019-05-02 18:31:05.872 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Received successful Heartbeat response
2019-05-02 18:31:10.856 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending Heartbeat request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:10.857 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send HEARTBEAT {group_id=app,generation_id=6,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5348 to node 2147482638
2019-05-02 18:31:10.958 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Received successful Heartbeat response
2019-05-02 18:31:11.767 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending LeaveGroup request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:11.767 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send LEAVE_GROUP {group_id=app,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5349 to node 2147482638
2019-05-02 18:31:11.768 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Disabling heartbeat thread
Full log

Thanks to all who replied. Turns out, it was indeed the broker dropping the consumer on session timeout. The broker a very old version (0.10.0.1) did not accommodate the newer features as outlined in KIP-62 that spring-kafka version we used could make use of.
Since we could not dictate the upgrade to the broker or changes to session timeout, we simply modified our processing logic so as to finish the task under the session timeout.

Related

Stateful Kafka Stream process is losing state when moving tasks from one pod to another during the rebalancing process

Stateful Kafka Stream process is losing state when moving tasks from one pod to another during the rebalancing process.
When killing the pod, it restarts and the task stays assigned to the same pod and the process restarts correctly (No issues with this scenario).
When we scale down and the task is forced to move to another pod we can see that the Kafka stream restored the changelog, but it didn't work because the process created a new state with version 1 instead of 3992 and with the CounterSize 1 instead of 2964. (It can be checked in the logs)
After checking the logs we saw that the same key goes to different partitions in the changelog topic when the task is assigned to a new pod as per the screenshots below ( Not sure if is an issue )
Details of what we are using:
Application name is customer-state.
We are using AWS MSK - Apache Kafka version 2.8.0 with 6 brokers
Application deployed as statefulSets into EKS/Kubernetes with 3 replicas
Java 11
Spring Cloud 2020.0.3
Kafka Stream 2.8.0
We applied a few configuration changes, but we cannot find why it is not taking the last state from the changelog.
spring.cloud.stream.kafka.streams.binder.configuration:
max.request.size: 5242892
max.partition.fetch.bytes: 5242892
max.fetch.bytes: 15728676
acceptable.recovery.lag: 0
num.standby.replicas: 1
num.stream.threads: 2
spring.kafka.streams.binder:
functions:
process.applicationId: customer-state-process
configuration:
group.instance.id: ${POD_NAME} => We have a helm chart that will populate the VARIABLE with a POD name. As it is a statefulSet the name will be always customer-state-0, customer-state-1, customer-state-3
session.timeout.ms: 30000
acceptable.recovery.lag: 0
What are the settings for the state-store?
#Bean
public StreamsBuilderFactoryBeanConfigurer streamsBuilderFactoryBeanCustomizer() {
return factoryBean -> {
try {
final StreamsBuilder streamsBuilder = factoryBean.getObject();
if (isNull(streamsBuilder)) {
throw new NullPointerException("streamsBuilder is null");
}
// Customer's State initialization
final PrimitiveAvroSerde<String> customerKeySerde = createKeySerde(factoryBean, true);
final SpecificAvroSerde<StateAvro> valueSerde = createValueSerde(factoryBean);
final StoreBuilder<KeyValueStore<String, StateAvro>> storeBuilder = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STATE_STORE_NAME),
customerKeySerde,
valueSerde);
streamsBuilder.addStateStore(storeBuilder);
} catch (Exception e) {
throw new RuntimeException("Can't create State Store");
}
};
}
How is CounterSize implemented and saved to kafka?
It's a Java Map serialized to Avro, and then published to Kafka
CounterSize is printed in the logs is just a number of entries for that Map
"type" : "record",
"name" : "StateAvro",
"namespace" : "com.dpml.avro.state",
"fields" : [ {
"name" : "version",
"type" : "long"
}, {
"name" : "DepositCounter",
"type" : [
"null",
"MetaCounterAvro"
],
"default": null
} ...
{
"type" : "record",
"name" : "MetaCounterAvro",
"namespace" : "com.dpml.avro.state",
"fields" : [ {
"name" : "entries",
"type" : {
"type" : "array",
"items" : "PairLongString",
"java-class" : "java.util.Map"
}
}, {
"name" : "hours",
"type" : "long"
} ]
}
This is the logs print:
log.debug("Successfully updated state for customerId={}, {}", getCustomerId(), createLogInfo(avro));
private String createLogInfo(StateAvro stateAvro) {
final var depositSize = Optional.ofNullable(stateAvro.getDepositCounter())
.map(MetaCounterAvro::getEntries)
.map(List::size)
.orElse(0);
return String.format("version=%d, eventTime=%d, triggerEvent=%s, updatedByEventId=%s, CounterSize=%s,
stateAvro.getVersion(),
stateAvro.getEventTime(),
stateAvro.getTriggerEvent().name(),
stateAvro.getUpdatedByEventId(),
depositSize);
}
LOGS:
=> offset from incoming event partition=2 offset=74800529
2022-02-24T10:52:58.867Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=topic-input-name, offset=74800529 task=1_2 - Received event {"header": {"timestamp": 1645699978378, "eventId": "8f10dc2e-fcf2-4fcb-a312-0bba7170e16d", "customerId": 1234567890}
2022-02-24T10:52:58.868Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - State already exists for customerId=1234567890 version=3990, eventTime=1645699976221, updatedByEventId=21844581-5fd9-41da-b05d-e1cb719349a5, CounterSize=2962
2022-02-24T10:52:58.868Z INFO c.k.d.k.t.CustomerNewEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - CustomerEvent -> adding customerId=1234567890 timeEvent=1645699978378 eventId=8f10dc2e-fcf2-4fcb-a312-0bba7170e16d Technical info -> partition=2 offset=74800529 task=1_2
2022-02-24T10:52:58.869Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Successfully updated state for customerId=1234567890 version=3991, eventTime=1645699978378, triggerEvent=CustomerEvent, updatedByEventId=8f10dc2e-fcf2-4fcb-a312-0bba7170e16d, CounterSize=2963
=> scaled down from 3 pods to 2 pods to force the rebalance
2022-02-24T10:52:59.011Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Attempt to heartbeat failed since group is rebalancing
2022-02-24T10:52:59.012Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] (Re-)joining group
2022-02-24T10:52:59.090Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Successfully joined group with generation Generation{generationId=192444, memberId='customer-state-0-1-6693263a-0054-46bb-b775-697741beb01a', protocol='stream'}
2022-02-24T10:52:59.101Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Successfully synced group in generation Generation{generationId=192444, memberId='customer-state-0-1-6693263a-0054-46bb-b775-697741beb01a', protocol='stream'}
2022-02-24T10:52:59.101Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Updating assignment with Assigned partitions: [topic-input-name-2, topic-input-name-0] Current owned partitions: [topic-input-name-2] Added partitions (assigned - owned): [topic-input-name-0] Revoked partitions (owned - assigned): []
2022-02-24T10:52:59.101Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Notifying assignor about the new Assignment(partitions=[topic-input-name-0, topic-input-name-2], userDataSize=98)
2022-02-24T10:52:59.102Z INFO o.a.k.s.p.i.StreamsPartitionAssignor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer] No followup rebalance was requested, resetting the rebalance schedule.
2022-02-24T10:52:59.102Z INFO o.a.k.s.p.i.TaskManager [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Handle new assignment with: New active tasks: [1_0, 1_2] New standby tasks: [1_3] Existing active tasks: [1_2] Existing standby tasks: [1_1]
2022-02-24T10:52:59.102Z INFO o.a.k.s.p.i.StandbyTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_1] Suspended running
2022-02-24T10:52:59.150Z INFO o.a.k.c.c.KafkaConsumer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Subscribed to partition(s): customer-state-process-customer-state-store-changelog-2
2022-02-24T10:52:59.154Z INFO o.a.k.s.p.i.StandbyTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_1] Closed clean
2022-02-24T10:52:59.155Z INFO i.c.k.s.KafkaAvroSerializerConfig [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - KafkaAvroSerializerConfig values: bearer.auth.token = [hidden] schema.registry.url = [https://cp-schema-registry.cp-schema-registry.svc.cluster.local:443] basic.auth.user.info = [hidden] auto.register.schemas = true max.schemas.per.subject = 1000 basic.auth.credentials.source = URL schema.registry.basic.auth.user.info = [hidden] bearer.auth.credentials.source = STATIC_TOKEN value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicRecordNameStrategy key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
2022-02-24T10:52:59.155Z INFO i.c.k.s.KafkaAvroDeserializerConfig [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - KafkaAvroDeserializerConfig values: bearer.auth.token = [hidden] schema.registry.url = [https://cp-schema-registry.cp-schema-registry.svc.cluster.local:443] basic.auth.user.info = [hidden] auto.register.schemas = true max.schemas.per.subject = 1000 basic.auth.credentials.source = URL schema.registry.basic.auth.user.info = [hidden] bearer.auth.credentials.source = STATIC_TOKEN specific.avro.reader = true value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicRecordNameStrategy key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
2022-02-24T10:52:59.157Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Adding newly assigned partitions: topic-input-name-0
2022-02-24T10:52:59.157Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] State transition from RUNNING to PARTITIONS_ASSIGNED
2022-02-24T10:52:59.157Z INFO o.a.k.s.KafkaStreams [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-client [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf] State transition from RUNNING to REBALANCING
2022-02-24T10:52:59.158Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Setting offset for partition topic-input-name-0 to the committed offset FetchPosition{offset=73628671, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-2.d2-msk-cluster-pt.c2.kafka.eu-central-1.amazonaws.com:9094 (id: 2 rack: euc1-az3)], epoch=141}}
2022-02-24T10:52:59.199Z INFO o.a.k.s.p.i.ProcessorStateManager [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] task [1_0] State store customer-state-store did not find checkpoint offset, hence would default to the starting offset at changelog customer-state-process-customer-state-store-changelog-0
2022-02-24T10:52:59.199Z INFO o.a.k.s.p.i.StreamTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] task [1_0] Initialized
2022-02-24T10:52:59.240Z INFO o.a.k.s.p.i.ProcessorStateManager [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_3] State store customer-state-store did not find checkpoint offset, hence would default to the starting offset at changelog customer-state-process-customer-state-store-changelog-3
2022-02-24T10:52:59.240Z INFO o.a.k.s.p.i.StandbyTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_3] Initialized
2022-02-24T10:52:59.286Z INFO o.a.k.c.c.KafkaConsumer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Subscribed to partition(s): customer-state-process-customer-state-store-changelog-0, customer-state-process-customer-state-store-changelog-3, customer-state-process-customer-state-store-changelog-2
2022-02-24T10:52:59.286Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Seeking to EARLIEST offset of partition customer-state-process-customer-state-store-changelog-0
2022-02-24T10:52:59.287Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Seeking to EARLIEST offset of partition customer-state-process-customer-state-store-changelog-3
2022-02-24T10:52:59.471Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Resetting offset for partition customer-state-process-customer-state-store-changelog-3 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-2.d2-msk-cluster-pt.c2.kafka.eu-central-1.amazonaws.com:9094 (id: 2 rack: euc1-az3)], epoch=0}}.
2022-02-24T10:52:59.471Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Resetting offset for partition customer-state-process-customer-state-store-changelog-0 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-4.d2-msk-cluster-pt.c2.kafka.eu-central-1.amazonaws.com:9094 (id: 4 rack: euc1-az3)], epoch=0}}.
2022-02-24T10:53:09.480Z INFO o.a.k.s.p.i.StoreChangelogReader [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Restoration in progress for 1 partitions. {customer-state-process-customer-state-store-changelog-0: position=19011, end=42297, totalRestored=19011}
2022-02-24T10:53:11.769Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Processed 939 total records, ran 2 punctuators, and committed 4 total tasks since the last update
2022-02-24T10:53:14.105Z INFO o.a.k.s.p.i.StoreChangelogReader [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Finished restoring changelog customer-state-process-customer-state-store-changelog-0 to store customer-state-store with a total number of 42297 records
2022-02-24T10:53:14.106Z INFO c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Fetching state store in transformer
2022-02-24T10:53:14.152Z INFO o.a.k.s.p.i.StreamTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] task [1_0] Restored and ready to run
2022-02-24T10:53:14.152Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Restoration took 14995 ms for all tasks [1_0, 1_2, 1_3]
2022-02-24T10:53:14.152Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING
2022-02-24T10:53:14.152Z INFO o.a.k.s.KafkaStreams [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-client [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf] State transition from REBALANCING to RUNNING
2022-02-24T10:53:14.162Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=null, offset=-1 task=1_0 - Received event {"header": {"timestamp": 1645699978060, "eventId": "96d3e5cc-f667-4735-91b8-529374c00d82", "customerId": 1234567890 }
=> At this point, the framework has finished restoring the changelog, so it should find the state of 2022-02-24T10:52:58.868Z, but it didn't
=> offset from incoming event become -1
2022-02-24T10:53:14.162Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Creating new state for customerId=1234567890
2022-02-24T10:53:14.514Z INFO c.k.d.k.t.CustomerNewEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - CustomerEvent -> adding customerId=1234567890 timeEvent=1645699978060 eventId=96d3e5cc-f667-4735-91b8-529374c00d82 Technical info -> partition=-1 offset=-1 task=1_0
2022-02-24T10:53:14.515Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Successfully updated state for customerId=1234567890, version=1, eventTime=1645699978060, triggerEvent=CustomerEvent, updatedByEventId=96d3e5cc-f667-4735-91b8-529374c00d82, CounterSize=1
2022-02-24T10:53:14.515Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=null, offset=-1 task=1_0 - Received event {"header": {"timestamp": 1645699977697, "eventId": "42a9535c-ab90-41da-965c-ee4c5d871626", "customerId": 1234567890 }
2022-02-24T10:53:14.515Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - State already exists for customerId=1234567890, version=1, eventTime=1645699978060, triggerEvent=CustomerEvent, updatedByEventId=96d3e5cc-f667-4735-91b8-529374c00d82, CounterSize=1
2022-02-24T10:53:14.515Z INFO c.k.d.k.t.CustomerNewEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - CustomerEvent -> adding customerId=1234567890 timeEvent=1645699977697 eventId=42a9535c-ab90-41da-965c-ee4c5d871626 Technical info -> partition=-1 offset=-1 task=1_0
2022-02-24T10:53:14.516Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Successfully updated state for customerId=1234567890, version=2, eventTime=1645699977697, triggerEvent=CustomerEvent, updatedByEventId=42a9535c-ab90-41da-965c-ee4c5d871626, CounterSize=2
2022-02-24T10:53:14.516Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=null, offset=-1 task=1_0 - Received event {"header": {"timestamp": 1645699975892, "eventId": "459dc2ec-7a69-47f1-8c3e-53a5e5f4303b", "customerId": 1234567890 }

CommitFailedException by Spring Kafka Consumer

I get the below error message sometimes while using Spring Kafka Consumer .I have implemented at least once semantics as shown in the code snippet
1 )My doubt is do I miss any message from consuming?
2)Do i need to handle this error .As this error was not reported by seekToCurrentErrorHandler()
org.apache.kafka.clients.consumer.CommitFailedException: Offset commit
cannot be completed since the consumer is not part of an active group
for auto partition assignment; it is likely that the consumer was
kicked out of the group.
My spring kafka consumer code snippet
public class KafkaConsumerConfig implements KafkaListenerConfigurer
#Bean
public SeekToCurrentErrorHandler seekToCurrentErrorHandler() {
SeekToCurrentErrorHandler seekToCurrentErrorHandler = new SeekToCurrentErrorHandler((record, e) -> {
System.out.println("RECORD from topic " + record.topic() + " at partition " + record.partition()
+ " at offset " + record.offset() + " did not process correctly due to a " + e.getCause());
}, new FixedBackOff(500L, 3L));
return seekToCurrentErrorHandler;
}
#Bean
public ConsumerFactory<String, ValidatedConsumerClass> consumerFactory() {
ErrorHandlingDeserializer<ValidatedConsumerClass> errorHandlingDeserializer;
errorHandlingDeserializer = new ErrorHandlingDeserializer<>( new JsonDeserializer<>(ValidatedConsumerClass.class));
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "grpid-098");
props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
return new DefaultKafkaConsumerFactory<>(props, new StringDeserializer(),
errorHandlingDeserializer);
}
#Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, ValidatedConsumerClass>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, ValidatedConsumerClass> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(AckMode.RECORD);
factory.setErrorHandler(seekToCurrentErrorHandler());
return factory;
}
Consumer reading the message
#Service
public class KafKaConsumerService extends AbstractConsumerSeekAware {
#KafkaListener(id = "foo", topics = "mytopic-5", concurrency = "5", groupId = "mytopic-1-groupid")
public void consumeFromTopic1(#Payload #Valid ValidatedConsumerClass message, ConsumerRecordMetadata c) {
databaseService.save(message);
System.out.println( "-- Consumer End -- " + c.partition() + " ---consumer thread-- " + Thread.currentThread().getName());
}
No, you are not missing anything.
No, you do not need to handle it, the STCEH already handled it and the record will be redelivered on the next poll.
In this case, the exception is caused outside of record processing (after processing is complete). Since the commit failed due to a rebalance, there is no need for the STCEH to reseeek (and it can't anyway because the records are no longer available). It simply rethrows the exception.
Everything works as expected...
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.properties.max.poll.interval.ms=5000
#SpringBootApplication
public class So69016372Application {
public static void main(String[] args) {
SpringApplication.run(So69016372Application.class, args);
}
#KafkaListener(id = "so69016372", topics = "so69016372")
public void listen(String in, #Header(KafkaHeaders.OFFSET) long offset) throws InterruptedException {
System.out.println(in + " #" + offset);
Thread.sleep(6000);
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("so69016372").partitions(1).replicas(1).build();
}
}
Result
2021-09-01 13:47:26.963 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
2021-09-01 13:47:31.991 INFO 13195 --- [ad | so69016372] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Member consumer-so69016372-1-f02f8d74-c2b8-47d9-92d3-bf68e5c81a8f sending LeaveGroup request to coordinator localhost:9092 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2021-09-01 13:47:32.989 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Failing OffsetCommit request since the consumer is not part of an active group
2021-09-01 13:47:32.994 ERROR 13195 --- [o69016372-0-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:200) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:112) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1602) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1210) ~[spring-kafka-2.7.6.jar:2.7.6]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1139) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1004) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1495) ~[kafka-clients-2.7.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2710) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2705) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2691) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2489) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1235) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1161) ~[spring-kafka-2.7.6.jar:2.7.6]
... 3 common frames omitted
2021-09-01 13:47:32.994 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
2021-09-01 13:47:32.994 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Lost previously assigned partitions so69016372-0
2021-09-01 13:47:32.995 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions lost: [so69016372-0]
2021-09-01 13:47:32.995 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions revoked: [so69016372-0]
...
2021-09-01 13:47:33.102 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
2021-09-01 13:47:38.141 INFO 13195 --- [ad | so69016372] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Member consumer-so69016372-1-e6ec685a-d9aa-43d3-b526-b04418095f09 sending LeaveGroup request to coordinator localhost:9092 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2021-09-01 13:47:39.108 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Failing OffsetCommit request since the consumer is not part of an active group
2021-09-01 13:47:39.109 ERROR 13195 --- [o69016372-0-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:200) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:112) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1602) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1210) ~[spring-kafka-2.7.6.jar:2.7.6]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1139) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1004) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1495) ~[kafka-clients-2.7.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2710) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2705) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2691) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2489) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1235) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1161) ~[spring-kafka-2.7.6.jar:2.7.6]
... 3 common frames omitted
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Lost previously assigned partitions so69016372-0
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions lost: [so69016372-0]
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions revoked: [so69016372-0]
...
2021-09-01 13:47:39.217 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
It will retry indefinitely.

Consumer fetch data returns OFFSET_OUT_OF_RANGE

I have a cluster with 3 kafka brokers, with a topic called fallback_topic
There is only one consumerGroup that consumes from this topic and only one consumer in this consumerGroup
After injecting a few messages, I can see the messages has been published to Kafka. The LogSize has been moved by the new messages; however, Consumer Offset stays the same and no message is ever consumed.
Below is the log when consumer.poll(3000) ran. The partition (4, 7, 10) received new messages from producer, but when consumer tried to read it, it reports error=OFFSET_OUT_OF_RANGE
04:20:41.311 [kafka-coordinator-heartbeat-thread | uniqueConsumerGroup] DEBUG o.a.k.clients.FetchSessionHandler - [Consumer clientId=consumer-1, groupId=uniqueConsumerGroup] Node 654000 sent a full fetch response that created a new incremental fetch session 685508830 with 7 response partition(s)
04:20:41.311 [kafka-coordinator-heartbeat-thread | uniqueConsumerGroup] DEBUG o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=uniqueConsumerGroup] Fetch READ_UNCOMMITTED at offset 1062 for partition fallback_topic-1 returned fetch data (error=NONE, highWaterMark=1062, lastStableOffset = -1, logStartOffset = 1062, abortedTransactions = null, recordsSizeInBytes=0)
04:20:41.311 [kafka-coordinator-heartbeat-thread | uniqueConsumerGroup] DEBUG o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=uniqueConsumerGroup] Fetch READ_UNCOMMITTED at offset 124094 for partition fallback_topic-4 returned fetch data (error=OFFSET_OUT_OF_RANGE, highWaterMark=-1, lastStableOffset = -1, logStartOffset = -1, abortedTransactions = null, recordsSizeInBytes=0)
04:20:41.311 [kafka-coordinator-heartbeat-thread | uniqueConsumerGroup] DEBUG o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=uniqueConsumerGroup] Fetch READ_UNCOMMITTED at offset 762 for partition fallback_topic-7 returned fetch data (error=OFFSET_OUT_OF_RANGE, highWaterMark=-1, lastStableOffset = -1, logStartOffset = -1, abortedTransactions = null, recordsSizeInBytes=0)
04:20:41.311 [kafka-coordinator-heartbeat-thread | uniqueConsumerGroup] DEBUG o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=uniqueConsumerGroup] Fetch READ_UNCOMMITTED at offset 897 for partition fallback_topic-10 returned fetch data (error=OFFSET_OUT_OF_RANGE, highWaterMark=-1, lastStableOffset = -1, logStartOffset = -1, abortedTransactions = null, recordsSizeInBytes=0)
My understanding is when leader of the partition changed the offset, but follower hasn't, thats when this error happens. But there is no broker outage, so consumer is using the same leader all the time. Can anyone help me with why there is a OFFSET_OUT_OF_RANGE error. Thank you very much. Below is my code and I skipped consumer.commitAsync() because my problem happened before the commit.
List<Event> events = new ArrayList<Event>();
consumer.subscribe(Arrays.asList("fallback_topic"));
ConsumerRecords<String, byte[]> records;
do {
logger.info("Start polling messages from " + topic);
records = consumer.poll(3000);
logger.info("done polling.");
records.partitions().forEach(tp -> logger.info("found records from "+tp.topic()+"-"+tp.partition()));
for (ConsumerRecord<String, byte[]> record : records) {
Event event = EventKafkaSerializer.serializer.deserializeEvent(new ByteArrayInputStream(record.value()));
logger.info(event.getId()+" "+event.getData().toString());
events.add(event);
}
} while(records.count()>0);
logger.info("Found total events "+events.size());
Found out why.
I forget to run consumer.close() at the end

Exactly once semantic with spring kafka

Im trying to test my exactly once configuration to make sure all the configs i set are correct and the behavior is as i expect
I seem to encounter a problem with duplicate sends
public static void main(String[] args) {
MessageProducer producer = new ProducerBuilder()
.setBootstrapServers("kafka:9992")
.setKeySerializerClass(StringSerializer.class)
.setValueSerializerClass(StringSerializer.class)
.setProducerEnableIdempotence(true).build();
MessageConsumer consumer = new ConsumerBuilder()
.setBootstrapServers("kafka:9992")
.setIsolationLevel("read_committed")
.setTopics("someTopic2")
.setGroupId("bla")
.setKeyDeserializerClass(StringDeserializer.class)
.setValueDeserializerClass(MapDeserializer.class)
.setConsumerMessageLogic(new ConsumerMessageLogic() {
#Override
public void onMessage(ConsumerRecord cr, Acknowledgment acknowledgment) {
producer.sendMessage(new TopicPartition("someTopic2", cr.partition()),
new OffsetAndMetadata(cr.offset() + 1),"something1", "im in transaction", cr.key());
acknowledgment.acknowledge();
}
}).build();
consumer.start();
}
this is my "test", you can assume the builder puts the right configuration.
ConsumerMessageLogic is a class that handles the "process" part of the read-process-write that the exactly once semantic is supporting
inside the producer class i have a send message method like so:
public void sendMessage(TopicPartition topicPartition, OffsetAndMetadata offsetAndMetadata,String sendToTopic, V message, PK partitionKey) {
try {
KafkaRecord<PK, V> partitionAndMessagePair = producerMessageLogic.prepareMessage(topicPartition.topic(), partitionKey, message);
if(kafkaTemplate.getProducerFactory().transactionCapable()){
kafkaTemplate.executeInTransaction(operations -> {
sendMessage(message, partitionKey, sendToTopic, partitionAndMessagePair, operations);
operations.sendOffsetsToTransaction(
Map.of(topicPartition, offsetAndMetadata),"bla");
return true;
});
}else{
sendMessage(message, partitionKey, topicPartition.topic(), partitionAndMessagePair, kafkaTemplate);
}
}catch (Exception e){
failureHandler.onFailure(partitionKey, message, e);
}
}
I create my consumer like so:
/**
* Start the message consumer
* The record event will be delegate on the onMessage()
*/
public void start() {
initConsumerMessageListenerContainer();
container.start();
}
/**
* Initialize the kafka message listener
*/
private void initConsumerMessageListenerContainer() {
// start a acknowledge message listener to allow the manual commit
messageListener = consumerMessageLogic::onMessage;
// start and initialize the consumer container
container = initContainer(messageListener);
// sets the number of consumers, the topic partitions will be divided by the consumers
container.setConcurrency(springConcurrency);
springContainerPollTimeoutOpt.ifPresent(p -> container.getContainerProperties().setPollTimeout(p));
if (springAckMode != null) {
container.getContainerProperties().setAckMode(springAckMode);
}
}
private ConcurrentMessageListenerContainer<PK, V> initContainer(AcknowledgingMessageListener<PK, V> messageListener) {
return new ConcurrentMessageListenerContainer<>(
consumerFactory(props),
containerProperties(messageListener));
}
when i create my producer i create it with UUID as transaction prefix like so
public ProducerFactory<PK, V> producerFactory(boolean isTransactional) {
ProducerFactory<PK, V> res = new DefaultKafkaProducerFactory<>(props);
if(isTransactional){
((DefaultKafkaProducerFactory<PK, V>) res).setTransactionIdPrefix(UUID.randomUUID().toString());
((DefaultKafkaProducerFactory<PK, V>) res).setProducerPerConsumerPartition(true);
}
return res;
}
Now after everything is set up, i bring 2 instances up on a topic with 2 partitions
each instance get 1 partitions from the consumed topic.
i send a message and wait in debug for the transaction timeout ( to simulate loss of connection)
in instance A, once the timeout passes the other instance( instance B) automatically processes the record and send it to the target topic cause a re-balance occurred
So far so good.
Now when i release the break point on instance A, it says its re-balancing and couldn't commit, but i still see another output record in my destination topic.
My expectation was that instance A wont continue its work once i release the breakpoint as the record was already processed.
Am i doing something wrong?
Can this scenario be achieved?
edit 2:
after garys remarks about the execute in transaction, i get the duplicate record if i freeze one of the instances till the timeout and release it after the other instance processed the record, then the freezed instance process and produce the same record to the out put topic...
public static void main(String[] args) {
MessageProducer producer = new ProducerBuilder()
.setBootstrapServers("kafka:9992")
.setKeySerializerClass(StringSerializer.class)
.setValueSerializerClass(StringSerializer.class)
.setProducerEnableIdempotence(true).build();
MessageConsumer consumer = new ConsumerBuilder()
.setBootstrapServers("kafka:9992")
.setIsolationLevel("read_committed")
.setTopics("someTopic2")
.setGroupId("bla")
.setKeyDeserializerClass(StringDeserializer.class)
.setValueDeserializerClass(MapDeserializer.class)
.setConsumerMessageLogic(new ConsumerMessageLogic() {
#Override
public void onMessage(ConsumerRecord cr, Acknowledgment acknowledgment) {
producer.sendMessage("something1", "im in transaction");
}
}).build();
consumer.start(producer.getProducerFactory());
}
the new sendMessage method in the producer without executeInTransaction
public void sendMessage(V message, PK partitionKey, String topicName) {
try {
KafkaRecord<PK, V> partitionAndMessagePair = producerMessageLogic.prepareMessage(topicName, partitionKey, message);
sendMessage(message, partitionKey, topicName, partitionAndMessagePair, kafkaTemplate);
}catch (Exception e){
failureHandler.onFailure(partitionKey, message, e);
}
}
as well as i changed the consumer container creation to have a transaction manager with the same producerfactory as suggested
/**
* Initialize the kafka message listener
*/
private void initConsumerMessageListenerContainer(ProducerFactory<PK,V> producerFactory) {
// start a acknowledge message listener to allow the manual commit
acknowledgingMessageListener = consumerMessageLogic::onMessage;
// start and initialize the consumer container
container = initContainer(acknowledgingMessageListener, producerFactory);
// sets the number of consumers, the topic partitions will be divided by the consumers
container.setConcurrency(springConcurrency);
springContainerPollTimeoutOpt.ifPresent(p -> container.getContainerProperties().setPollTimeout(p));
if (springAckMode != null) {
container.getContainerProperties().setAckMode(springAckMode);
}
}
private ConcurrentMessageListenerContainer<PK, V> initContainer(AcknowledgingMessageListener<PK, V> messageListener, ProducerFactory<PK,V> producerFactory) {
return new ConcurrentMessageListenerContainer<>(
consumerFactory(props),
containerProperties(messageListener, producerFactory));
}
#NonNull
private ContainerProperties containerProperties(MessageListener<PK, V> messageListener, ProducerFactory<PK,V> producerFactory) {
ContainerProperties containerProperties = new ContainerProperties(topics);
containerProperties.setMessageListener(messageListener);
containerProperties.setTransactionManager(new KafkaTransactionManager<>(producerFactory));
return containerProperties;
}
my expectation is that the broker once receiving the processed record from the freezed instance, that it'll know that that record was already handled by another instance as it contains the exact same metadata ( or is it? i mean, the PID will be different, but should it be different?)
Maybe the scenario im looking for is not even supported in the current exactly once support kafka and spring provides...
if i have 2 instances of read-process-write - that means i have 2 producers with 2 different PID's.
Now when i freeze one of the instances, when the unfrozen instance gets the record process responsibility due to a rebalance, it will send the record with its own PID and a sequence in the metadata.
Now when i release the frozen instance, he sends the same record but with its own PID, so theres no way the broker will know its a duplicate...
Am i wrong? how can i avoid this scenario? i though the re-balance stops the instance and doesnt let it complete its process ( where he produce the duplicate record) cause he no longer has responsibility about that record
Adding the logs:
frozen instance: you can see the freeze time at 10:53:34 and i released it at 10:54:02 ( rebalance time is 10 secs)
2020-06-16 10:53:34,393 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
Created new Producer: CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
2020-06-16 10:53:34,394 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
beginTransaction()
2020-06-16 10:53:34,395 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.doBegin:149] Created
Kafka transaction on producer [CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]]
2020-06-16 10:54:02,157 INFO [${sys:spring.application.name}] [kafka-
coordinator-heartbeat-thread | bla]
[o.a.k.c.c.i.AbstractCoordinator.:] [Consumer clientId=consumer-bla-1,
groupId=bla] Group coordinator X.X.X.X:9992 (id: 2147482646 rack:
null) is unavailable or invalid, will attempt rediscovery
2020-06-16 10:54:02,181 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Sending offsets to transaction: {someTopic2-
0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
2020-06-16 10:54:02,189 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] [i.i.k.s.p.SimpleSuccessHandler.:] Sent
message=[im in transaction] with offset=[252] to topic something1
2020-06-16 10:54:02,193 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] [o.a.k.c.p.i.TransactionManager.:]
[Producer clientId=producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0, transactionalId=b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] Discovered group coordinator
X.X.X.X:9992 (id: 1001 rack: null)
2020-06-16 10:54:02,263 INFO [${sys:spring.application.name}] [kafka-
coordinator-heartbeat-thread | bla]
[o.a.k.c.c.i.AbstractCoordinator.:] [Consumer clientId=consumer-bla-1,
groupId=bla] Discovered group coordinator 192.168.144.1:9992 (id:
2147482646 rack: null)
2020-06-16 10:54:02,295 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.processCommit:740]
Initiating transaction commit
2020-06-16 10:54:02,296 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
commitTransaction()
2020-06-16 10:54:02,299 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
2020-06-16 10:54:02,301 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.AbstractCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Attempt to heartbeat failed for
since member id consumer-bla-1-b3ad1c09-ad06-4bc4-a891-47a2288a830f is
not valid.
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.ConsumerCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Giving away all assigned
partitions as lost since generation has been reset,indicating that
consumer is no longer part of the group
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.ConsumerCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Lost previously assigned
partitions someTopic2-0
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.l.ConcurrentMessageListenerContainer.info:279]
bla: partitions lost: [someTopic2-0]
2020-06-16 10:54:02,303 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.l.ConcurrentMessageListenerContainer.info:279]
bla: partitions revoked: [someTopic2-0]
2020-06-16 10:54:02,303 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
The regular instance that takes over the partation and produce the record after a rebalance
2020-06-16 10:53:46,536 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
Created new Producer: CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
2020-06-16 10:53:46,537 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
beginTransaction()
2020-06-16 10:53:46,539 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.doBegin:149] Created
Kafka transaction on producer [CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]]
2020-06-16 10:53:46,556 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Sending offsets to transaction: {someTopic2-
0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
2020-06-16 10:53:46,563 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] [i.i.k.s.p.SimpleSuccessHandler.:] Sent
message=[im in transaction] with offset=[250] to topic something1
2020-06-16 10:53:46,566 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] [o.a.k.c.p.i.TransactionManager.:]
[Producer clientId=producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0, transactionalId=1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] Discovered group coordinator
X.X.X.X:9992 (id: 1001 rack: null)
2020-06-16 10:53:46,668 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.processCommit:740]
Initiating transaction commit
2020-06-16 10:53:46,669 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
commitTransaction()
2020-06-16 10:53:46,672 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
2020-06-16 10:53:51,673 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Received: 0 records
I noticed they both note the exact same offset to commit
Sending offsets to transaction: {someTopic2-0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
i thought when they try to commit the exact same thing the broker will abort one of the transactions...
I also noticed that if i reduce the transaction.timeout.ms to just 2 seconds, it doesnt abort the transaction no matter how long i freeze the instance on debug...
maybe the timer of transaction.timeout.ms starts only after i send the message?
You must not use executeInTransaction at all - see its Javadocs; it is used when there is no active transaction or if you explicitly don't want an operation to participate in an existing transaction.
You need to add a KafkaTransactionManager to the listener container; it must have a reference to same ProducerFactory as the template.
Then, the container will start the transaction and, if successful, send the offset to the transaction.

Kafka Streams - Consumer memory overload

I am planning a Spring+Kafka Streams application that handles incoming messages and stores updated internal state as a result of these messages.
This state is predicted to reach ~500mb per unique key (There are likely to be ~10k unique keys distributed across 2k partitions).
This state must generally be held in-memory for effective operation of my application but even on disk I would still face a similar problem (albeit just at a later date of scaling).
I am planning to deploy this application into a dynamically scaling environment such as AWS and will set a minimum number of instances, but I am wary of 2 situations:
On first startup (where perhaps just 1 consumer starts first) it will not be able to handle taking assignment of all the partitions because the in memory state will overflow the instances available memory.
After a major outtage (AWS availability zone outtage) it could be that 33% of consumers are taken out of the group and the additional memory load on the remaining instances could actually take out everyone who remains.
How do people protect their consumers from taking on more partitions than they can handle such that they do not overflow available memory/disk?
See the kafka documentation.
Since 0.11...
EDIT
For your second use case (and it also works for the first), perhaps you could implement a custom PartitionAssignor that limits the number of partitions assigned to each instance.
I haven't tried it; I don't know how the broker will react to the presence of unassigned partitions.
EDIT2
This seems to work ok; but YMMV...
public class NoMoreThanFiveAssignor extends RoundRobinAssignor {
#Override
public Map<String, List<TopicPartition>> assign(Map<String, Integer> partitionsPerTopic,
Map<String, Subscription> subscriptions) {
Map<String, List<TopicPartition>> assignments = super.assign(partitionsPerTopic, subscriptions);
assignments.forEach((memberId, assigned) -> {
if (assigned.size() > 5) {
System.out.println("Reducing assignments from " + assigned.size() + " to 5 for " + memberId);
assignments.put(memberId,
assigned.stream()
.limit(5)
.collect(Collectors.toList()));
}
});
return assignments;
}
}
and
#SpringBootApplication
public class So54072362Application {
public static void main(String[] args) {
SpringApplication.run(So54072362Application.class, args);
}
#Bean
public NewTopic topic() {
return new NewTopic("so54072362", 15, (short) 1);
}
#KafkaListener(id = "so54072362", topics = "so54072362")
public void listen(ConsumerRecord<?, ?> record) {
System.out.println(record);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template) {
return args -> {
for (int i = 0; i < 15; i++) {
template.send("so54072362", i, "foo", "bar");
}
};
}
}
and
spring.kafka.consumer.properties.partition.assignment.strategy=com.example.NoMoreThanFiveAssignor
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.auto-offset-reset=earliest
and
Reducing assignments from 15 to 5 for consumer-2-f37221f8-70bb-421d-9faf-6591cc26a76a
2019-01-07 15:24:28.288 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 7
2019-01-07 15:24:28.289 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:28.296 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Attempt to heartbeat failed since group is rebalancing
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Revoking previously assigned partitions [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked: [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.304 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] (Re-)joining group
Reducing assignments from 8 to 5 for consumer-2-c9a6928a-520c-4646-9dd9-4da14636744b
Reducing assignments from 7 to 5 for consumer-2-f37221f8-70bb-421d-9faf-6591cc26a76a
2019-01-07 15:24:46.310 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 8
2019-01-07 15:24:46.311 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:46.315 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Attempt to heartbeat failed since group is rebalancing
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Revoking previously assigned partitions [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked: [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] (Re-)joining group
2019-01-07 15:24:58.330 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 9
2019-01-07 15:24:58.332 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-14, so54072362-11, so54072362-5, so54072362-8, so54072362-2]
2019-01-07 15:24:58.336 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-14, so54072362-11, so54072362-5, so54072362-8, so54072362-2]
Of course, this leaves the unassigned partitions dangling, but it sounds like that's what you want, until the region comes back online.