Kafka Streams - Consumer memory overload - apache-kafka

I am planning a Spring+Kafka Streams application that handles incoming messages and stores updated internal state as a result of these messages.
This state is predicted to reach ~500mb per unique key (There are likely to be ~10k unique keys distributed across 2k partitions).
This state must generally be held in-memory for effective operation of my application but even on disk I would still face a similar problem (albeit just at a later date of scaling).
I am planning to deploy this application into a dynamically scaling environment such as AWS and will set a minimum number of instances, but I am wary of 2 situations:
On first startup (where perhaps just 1 consumer starts first) it will not be able to handle taking assignment of all the partitions because the in memory state will overflow the instances available memory.
After a major outtage (AWS availability zone outtage) it could be that 33% of consumers are taken out of the group and the additional memory load on the remaining instances could actually take out everyone who remains.
How do people protect their consumers from taking on more partitions than they can handle such that they do not overflow available memory/disk?

See the kafka documentation.
Since 0.11...
EDIT
For your second use case (and it also works for the first), perhaps you could implement a custom PartitionAssignor that limits the number of partitions assigned to each instance.
I haven't tried it; I don't know how the broker will react to the presence of unassigned partitions.
EDIT2
This seems to work ok; but YMMV...
public class NoMoreThanFiveAssignor extends RoundRobinAssignor {
#Override
public Map<String, List<TopicPartition>> assign(Map<String, Integer> partitionsPerTopic,
Map<String, Subscription> subscriptions) {
Map<String, List<TopicPartition>> assignments = super.assign(partitionsPerTopic, subscriptions);
assignments.forEach((memberId, assigned) -> {
if (assigned.size() > 5) {
System.out.println("Reducing assignments from " + assigned.size() + " to 5 for " + memberId);
assignments.put(memberId,
assigned.stream()
.limit(5)
.collect(Collectors.toList()));
}
});
return assignments;
}
}
and
#SpringBootApplication
public class So54072362Application {
public static void main(String[] args) {
SpringApplication.run(So54072362Application.class, args);
}
#Bean
public NewTopic topic() {
return new NewTopic("so54072362", 15, (short) 1);
}
#KafkaListener(id = "so54072362", topics = "so54072362")
public void listen(ConsumerRecord<?, ?> record) {
System.out.println(record);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template) {
return args -> {
for (int i = 0; i < 15; i++) {
template.send("so54072362", i, "foo", "bar");
}
};
}
}
and
spring.kafka.consumer.properties.partition.assignment.strategy=com.example.NoMoreThanFiveAssignor
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.auto-offset-reset=earliest
and
Reducing assignments from 15 to 5 for consumer-2-f37221f8-70bb-421d-9faf-6591cc26a76a
2019-01-07 15:24:28.288 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 7
2019-01-07 15:24:28.289 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:28.296 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Attempt to heartbeat failed since group is rebalancing
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Revoking previously assigned partitions [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked: [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.304 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] (Re-)joining group
Reducing assignments from 8 to 5 for consumer-2-c9a6928a-520c-4646-9dd9-4da14636744b
Reducing assignments from 7 to 5 for consumer-2-f37221f8-70bb-421d-9faf-6591cc26a76a
2019-01-07 15:24:46.310 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 8
2019-01-07 15:24:46.311 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:46.315 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Attempt to heartbeat failed since group is rebalancing
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Revoking previously assigned partitions [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked: [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] (Re-)joining group
2019-01-07 15:24:58.330 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 9
2019-01-07 15:24:58.332 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-14, so54072362-11, so54072362-5, so54072362-8, so54072362-2]
2019-01-07 15:24:58.336 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-14, so54072362-11, so54072362-5, so54072362-8, so54072362-2]
Of course, this leaves the unassigned partitions dangling, but it sounds like that's what you want, until the region comes back online.

Related

Spring #KafkaListener with topicPattern: handle runtime topic creation

I'm using Spring #KafkaListener with a topicPattern. If during the runtime of this application I create a new topic matching the pattern and start publishing to that, the listener application simply ignores those messages. In other words, it only pulls all the topics matching the pattern at startup and listens to those.
What's the easiest way to "refresh" that? Thanks!
By default, new topics will be picked up within 5 minutes (default) according to the setting of https://kafka.apache.org/documentation/#consumerconfigs_metadata.max.age.ms
The period of time in milliseconds after which we force a refresh of metadata even if we haven't seen any partition leadership changes to proactively discover any new brokers or partitions.
You can reduce it to speed things up at the expense of increased traffic.
EDIT
This shows it working as expected...
#SpringBootApplication
public class So71386069Application {
private static final Logger log = LoggerFactory.getLogger(So71386069Application.class);
public static void main(String[] args) {
SpringApplication.run(So71386069Application.class, args);
}
#KafkaListener(id = "so71386069", topicPattern = "so71386069.*",
properties = "metadata.max.age.ms:60000")
void listen(String in) {
System.out.println(in);
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("so71386069").partitions(1).replicas(1).build();
}
#Bean
ApplicationRunner runner(KafkaAdmin admin) {
return args -> {
try (AdminClient client = AdminClient.create(admin.getConfigurationProperties())) {
IntStream.range(0, 10).forEach(i -> {
try {
Thread.sleep(30_000);
String topic = "so71386069-" + i;
log.info("Creating {}", topic);
client.createTopics(Collections.singleton(
TopicBuilder.name(topic).partitions(1).replicas(1).build())).all().get();
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
catch (ExecutionException e) {
e.printStackTrace();
}
});
}
};
}
}
2022-03-07 15:41:07.131 INFO 33630 --- [o71386069-0-C-1] o.s.k.l.KafkaMessageListenerContainer
: so71386069: partitions assigned: [so71386069-0]
2022-03-07 15:41:34.007 INFO 33630 --- [ main] com.example.demo.So71386069Application
: Creating so71386069-0
2022-03-07 15:42:04.193 INFO 33630 --- [ main] com.example.demo.So71386069Application
: Creating so71386069-1
...
2022-03-07 15:42:07.590 INFO 33630 --- [o71386069-0-C-1] o.s.k.l.KafkaMessageListenerContainer
: so71386069: partitions revoked: [so71386069-0]
...
2022-03-07 15:42:07.599 INFO 33630 --- [o71386069-0-C-1] o.s.k.l.KafkaMessageListenerContainer
: so71386069: partitions assigned: [so71386069-0, so71386069-1-0, so71386069-0-0]
2022-03-07 15:42:34.378 INFO 33630 --- [ main] com.example.demo.So71386069Application
: Creating so71386069-2
2022-03-07 15:43:04.554 INFO 33630 --- [ main] com.example.demo.So71386069Application
: Creating so71386069-3
...
2022-03-07 15:43:08.403 INFO 33630 --- [o71386069-0-C-1] o.s.k.l.KafkaMessageListenerContainer
: so71386069: partitions revoked: [so71386069-0, so71386069-1-0, so71386069-0-0]
...
2022-03-07 15:43:08.411 INFO 33630 --- [o71386069-0-C-1] o.s.k.l.KafkaMessageListenerContainer
: so71386069: partitions assigned: [so71386069-0, so71386069-3-0, so71386069-2-0, so71386069-1-0, so71386069-0-0]
...
I think that’s how it is by design. The Kafka client always has to subscribe to a topic before be able to get messages.
In this case, on startup the Kafka client/consumer is subscribing to topics matching patterns once at the startup and that’s what it carries on with.
But this is really an interesting question. The easiest and simplest answer is “Restarting the client/consumer“. However, will keep a watch on others answers to learn about any ideas.

Stateful Kafka Stream process is losing state when moving tasks from one pod to another during the rebalancing process

Stateful Kafka Stream process is losing state when moving tasks from one pod to another during the rebalancing process.
When killing the pod, it restarts and the task stays assigned to the same pod and the process restarts correctly (No issues with this scenario).
When we scale down and the task is forced to move to another pod we can see that the Kafka stream restored the changelog, but it didn't work because the process created a new state with version 1 instead of 3992 and with the CounterSize 1 instead of 2964. (It can be checked in the logs)
After checking the logs we saw that the same key goes to different partitions in the changelog topic when the task is assigned to a new pod as per the screenshots below ( Not sure if is an issue )
Details of what we are using:
Application name is customer-state.
We are using AWS MSK - Apache Kafka version 2.8.0 with 6 brokers
Application deployed as statefulSets into EKS/Kubernetes with 3 replicas
Java 11
Spring Cloud 2020.0.3
Kafka Stream 2.8.0
We applied a few configuration changes, but we cannot find why it is not taking the last state from the changelog.
spring.cloud.stream.kafka.streams.binder.configuration:
max.request.size: 5242892
max.partition.fetch.bytes: 5242892
max.fetch.bytes: 15728676
acceptable.recovery.lag: 0
num.standby.replicas: 1
num.stream.threads: 2
spring.kafka.streams.binder:
functions:
process.applicationId: customer-state-process
configuration:
group.instance.id: ${POD_NAME} => We have a helm chart that will populate the VARIABLE with a POD name. As it is a statefulSet the name will be always customer-state-0, customer-state-1, customer-state-3
session.timeout.ms: 30000
acceptable.recovery.lag: 0
What are the settings for the state-store?
#Bean
public StreamsBuilderFactoryBeanConfigurer streamsBuilderFactoryBeanCustomizer() {
return factoryBean -> {
try {
final StreamsBuilder streamsBuilder = factoryBean.getObject();
if (isNull(streamsBuilder)) {
throw new NullPointerException("streamsBuilder is null");
}
// Customer's State initialization
final PrimitiveAvroSerde<String> customerKeySerde = createKeySerde(factoryBean, true);
final SpecificAvroSerde<StateAvro> valueSerde = createValueSerde(factoryBean);
final StoreBuilder<KeyValueStore<String, StateAvro>> storeBuilder = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STATE_STORE_NAME),
customerKeySerde,
valueSerde);
streamsBuilder.addStateStore(storeBuilder);
} catch (Exception e) {
throw new RuntimeException("Can't create State Store");
}
};
}
How is CounterSize implemented and saved to kafka?
It's a Java Map serialized to Avro, and then published to Kafka
CounterSize is printed in the logs is just a number of entries for that Map
"type" : "record",
"name" : "StateAvro",
"namespace" : "com.dpml.avro.state",
"fields" : [ {
"name" : "version",
"type" : "long"
}, {
"name" : "DepositCounter",
"type" : [
"null",
"MetaCounterAvro"
],
"default": null
} ...
{
"type" : "record",
"name" : "MetaCounterAvro",
"namespace" : "com.dpml.avro.state",
"fields" : [ {
"name" : "entries",
"type" : {
"type" : "array",
"items" : "PairLongString",
"java-class" : "java.util.Map"
}
}, {
"name" : "hours",
"type" : "long"
} ]
}
This is the logs print:
log.debug("Successfully updated state for customerId={}, {}", getCustomerId(), createLogInfo(avro));
private String createLogInfo(StateAvro stateAvro) {
final var depositSize = Optional.ofNullable(stateAvro.getDepositCounter())
.map(MetaCounterAvro::getEntries)
.map(List::size)
.orElse(0);
return String.format("version=%d, eventTime=%d, triggerEvent=%s, updatedByEventId=%s, CounterSize=%s,
stateAvro.getVersion(),
stateAvro.getEventTime(),
stateAvro.getTriggerEvent().name(),
stateAvro.getUpdatedByEventId(),
depositSize);
}
LOGS:
=> offset from incoming event partition=2 offset=74800529
2022-02-24T10:52:58.867Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=topic-input-name, offset=74800529 task=1_2 - Received event {"header": {"timestamp": 1645699978378, "eventId": "8f10dc2e-fcf2-4fcb-a312-0bba7170e16d", "customerId": 1234567890}
2022-02-24T10:52:58.868Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - State already exists for customerId=1234567890 version=3990, eventTime=1645699976221, updatedByEventId=21844581-5fd9-41da-b05d-e1cb719349a5, CounterSize=2962
2022-02-24T10:52:58.868Z INFO c.k.d.k.t.CustomerNewEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - CustomerEvent -> adding customerId=1234567890 timeEvent=1645699978378 eventId=8f10dc2e-fcf2-4fcb-a312-0bba7170e16d Technical info -> partition=2 offset=74800529 task=1_2
2022-02-24T10:52:58.869Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Successfully updated state for customerId=1234567890 version=3991, eventTime=1645699978378, triggerEvent=CustomerEvent, updatedByEventId=8f10dc2e-fcf2-4fcb-a312-0bba7170e16d, CounterSize=2963
=> scaled down from 3 pods to 2 pods to force the rebalance
2022-02-24T10:52:59.011Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Attempt to heartbeat failed since group is rebalancing
2022-02-24T10:52:59.012Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] (Re-)joining group
2022-02-24T10:52:59.090Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Successfully joined group with generation Generation{generationId=192444, memberId='customer-state-0-1-6693263a-0054-46bb-b775-697741beb01a', protocol='stream'}
2022-02-24T10:52:59.101Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Successfully synced group in generation Generation{generationId=192444, memberId='customer-state-0-1-6693263a-0054-46bb-b775-697741beb01a', protocol='stream'}
2022-02-24T10:52:59.101Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Updating assignment with Assigned partitions: [topic-input-name-2, topic-input-name-0] Current owned partitions: [topic-input-name-2] Added partitions (assigned - owned): [topic-input-name-0] Revoked partitions (owned - assigned): []
2022-02-24T10:52:59.101Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Notifying assignor about the new Assignment(partitions=[topic-input-name-0, topic-input-name-2], userDataSize=98)
2022-02-24T10:52:59.102Z INFO o.a.k.s.p.i.StreamsPartitionAssignor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer] No followup rebalance was requested, resetting the rebalance schedule.
2022-02-24T10:52:59.102Z INFO o.a.k.s.p.i.TaskManager [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Handle new assignment with: New active tasks: [1_0, 1_2] New standby tasks: [1_3] Existing active tasks: [1_2] Existing standby tasks: [1_1]
2022-02-24T10:52:59.102Z INFO o.a.k.s.p.i.StandbyTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_1] Suspended running
2022-02-24T10:52:59.150Z INFO o.a.k.c.c.KafkaConsumer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Subscribed to partition(s): customer-state-process-customer-state-store-changelog-2
2022-02-24T10:52:59.154Z INFO o.a.k.s.p.i.StandbyTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_1] Closed clean
2022-02-24T10:52:59.155Z INFO i.c.k.s.KafkaAvroSerializerConfig [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - KafkaAvroSerializerConfig values: bearer.auth.token = [hidden] schema.registry.url = [https://cp-schema-registry.cp-schema-registry.svc.cluster.local:443] basic.auth.user.info = [hidden] auto.register.schemas = true max.schemas.per.subject = 1000 basic.auth.credentials.source = URL schema.registry.basic.auth.user.info = [hidden] bearer.auth.credentials.source = STATIC_TOKEN value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicRecordNameStrategy key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
2022-02-24T10:52:59.155Z INFO i.c.k.s.KafkaAvroDeserializerConfig [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - KafkaAvroDeserializerConfig values: bearer.auth.token = [hidden] schema.registry.url = [https://cp-schema-registry.cp-schema-registry.svc.cluster.local:443] basic.auth.user.info = [hidden] auto.register.schemas = true max.schemas.per.subject = 1000 basic.auth.credentials.source = URL schema.registry.basic.auth.user.info = [hidden] bearer.auth.credentials.source = STATIC_TOKEN specific.avro.reader = true value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicRecordNameStrategy key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
2022-02-24T10:52:59.157Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Adding newly assigned partitions: topic-input-name-0
2022-02-24T10:52:59.157Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] State transition from RUNNING to PARTITIONS_ASSIGNED
2022-02-24T10:52:59.157Z INFO o.a.k.s.KafkaStreams [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-client [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf] State transition from RUNNING to REBALANCING
2022-02-24T10:52:59.158Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Setting offset for partition topic-input-name-0 to the committed offset FetchPosition{offset=73628671, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-2.d2-msk-cluster-pt.c2.kafka.eu-central-1.amazonaws.com:9094 (id: 2 rack: euc1-az3)], epoch=141}}
2022-02-24T10:52:59.199Z INFO o.a.k.s.p.i.ProcessorStateManager [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] task [1_0] State store customer-state-store did not find checkpoint offset, hence would default to the starting offset at changelog customer-state-process-customer-state-store-changelog-0
2022-02-24T10:52:59.199Z INFO o.a.k.s.p.i.StreamTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] task [1_0] Initialized
2022-02-24T10:52:59.240Z INFO o.a.k.s.p.i.ProcessorStateManager [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_3] State store customer-state-store did not find checkpoint offset, hence would default to the starting offset at changelog customer-state-process-customer-state-store-changelog-3
2022-02-24T10:52:59.240Z INFO o.a.k.s.p.i.StandbyTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_3] Initialized
2022-02-24T10:52:59.286Z INFO o.a.k.c.c.KafkaConsumer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Subscribed to partition(s): customer-state-process-customer-state-store-changelog-0, customer-state-process-customer-state-store-changelog-3, customer-state-process-customer-state-store-changelog-2
2022-02-24T10:52:59.286Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Seeking to EARLIEST offset of partition customer-state-process-customer-state-store-changelog-0
2022-02-24T10:52:59.287Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Seeking to EARLIEST offset of partition customer-state-process-customer-state-store-changelog-3
2022-02-24T10:52:59.471Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Resetting offset for partition customer-state-process-customer-state-store-changelog-3 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-2.d2-msk-cluster-pt.c2.kafka.eu-central-1.amazonaws.com:9094 (id: 2 rack: euc1-az3)], epoch=0}}.
2022-02-24T10:52:59.471Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Resetting offset for partition customer-state-process-customer-state-store-changelog-0 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-4.d2-msk-cluster-pt.c2.kafka.eu-central-1.amazonaws.com:9094 (id: 4 rack: euc1-az3)], epoch=0}}.
2022-02-24T10:53:09.480Z INFO o.a.k.s.p.i.StoreChangelogReader [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Restoration in progress for 1 partitions. {customer-state-process-customer-state-store-changelog-0: position=19011, end=42297, totalRestored=19011}
2022-02-24T10:53:11.769Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Processed 939 total records, ran 2 punctuators, and committed 4 total tasks since the last update
2022-02-24T10:53:14.105Z INFO o.a.k.s.p.i.StoreChangelogReader [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Finished restoring changelog customer-state-process-customer-state-store-changelog-0 to store customer-state-store with a total number of 42297 records
2022-02-24T10:53:14.106Z INFO c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Fetching state store in transformer
2022-02-24T10:53:14.152Z INFO o.a.k.s.p.i.StreamTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] task [1_0] Restored and ready to run
2022-02-24T10:53:14.152Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Restoration took 14995 ms for all tasks [1_0, 1_2, 1_3]
2022-02-24T10:53:14.152Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING
2022-02-24T10:53:14.152Z INFO o.a.k.s.KafkaStreams [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-client [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf] State transition from REBALANCING to RUNNING
2022-02-24T10:53:14.162Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=null, offset=-1 task=1_0 - Received event {"header": {"timestamp": 1645699978060, "eventId": "96d3e5cc-f667-4735-91b8-529374c00d82", "customerId": 1234567890 }
=> At this point, the framework has finished restoring the changelog, so it should find the state of 2022-02-24T10:52:58.868Z, but it didn't
=> offset from incoming event become -1
2022-02-24T10:53:14.162Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Creating new state for customerId=1234567890
2022-02-24T10:53:14.514Z INFO c.k.d.k.t.CustomerNewEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - CustomerEvent -> adding customerId=1234567890 timeEvent=1645699978060 eventId=96d3e5cc-f667-4735-91b8-529374c00d82 Technical info -> partition=-1 offset=-1 task=1_0
2022-02-24T10:53:14.515Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Successfully updated state for customerId=1234567890, version=1, eventTime=1645699978060, triggerEvent=CustomerEvent, updatedByEventId=96d3e5cc-f667-4735-91b8-529374c00d82, CounterSize=1
2022-02-24T10:53:14.515Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=null, offset=-1 task=1_0 - Received event {"header": {"timestamp": 1645699977697, "eventId": "42a9535c-ab90-41da-965c-ee4c5d871626", "customerId": 1234567890 }
2022-02-24T10:53:14.515Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - State already exists for customerId=1234567890, version=1, eventTime=1645699978060, triggerEvent=CustomerEvent, updatedByEventId=96d3e5cc-f667-4735-91b8-529374c00d82, CounterSize=1
2022-02-24T10:53:14.515Z INFO c.k.d.k.t.CustomerNewEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - CustomerEvent -> adding customerId=1234567890 timeEvent=1645699977697 eventId=42a9535c-ab90-41da-965c-ee4c5d871626 Technical info -> partition=-1 offset=-1 task=1_0
2022-02-24T10:53:14.516Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Successfully updated state for customerId=1234567890, version=2, eventTime=1645699977697, triggerEvent=CustomerEvent, updatedByEventId=42a9535c-ab90-41da-965c-ee4c5d871626, CounterSize=2
2022-02-24T10:53:14.516Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=null, offset=-1 task=1_0 - Received event {"header": {"timestamp": 1645699975892, "eventId": "459dc2ec-7a69-47f1-8c3e-53a5e5f4303b", "customerId": 1234567890 }

CommitFailedException by Spring Kafka Consumer

I get the below error message sometimes while using Spring Kafka Consumer .I have implemented at least once semantics as shown in the code snippet
1 )My doubt is do I miss any message from consuming?
2)Do i need to handle this error .As this error was not reported by seekToCurrentErrorHandler()
org.apache.kafka.clients.consumer.CommitFailedException: Offset commit
cannot be completed since the consumer is not part of an active group
for auto partition assignment; it is likely that the consumer was
kicked out of the group.
My spring kafka consumer code snippet
public class KafkaConsumerConfig implements KafkaListenerConfigurer
#Bean
public SeekToCurrentErrorHandler seekToCurrentErrorHandler() {
SeekToCurrentErrorHandler seekToCurrentErrorHandler = new SeekToCurrentErrorHandler((record, e) -> {
System.out.println("RECORD from topic " + record.topic() + " at partition " + record.partition()
+ " at offset " + record.offset() + " did not process correctly due to a " + e.getCause());
}, new FixedBackOff(500L, 3L));
return seekToCurrentErrorHandler;
}
#Bean
public ConsumerFactory<String, ValidatedConsumerClass> consumerFactory() {
ErrorHandlingDeserializer<ValidatedConsumerClass> errorHandlingDeserializer;
errorHandlingDeserializer = new ErrorHandlingDeserializer<>( new JsonDeserializer<>(ValidatedConsumerClass.class));
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "grpid-098");
props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
return new DefaultKafkaConsumerFactory<>(props, new StringDeserializer(),
errorHandlingDeserializer);
}
#Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, ValidatedConsumerClass>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, ValidatedConsumerClass> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(AckMode.RECORD);
factory.setErrorHandler(seekToCurrentErrorHandler());
return factory;
}
Consumer reading the message
#Service
public class KafKaConsumerService extends AbstractConsumerSeekAware {
#KafkaListener(id = "foo", topics = "mytopic-5", concurrency = "5", groupId = "mytopic-1-groupid")
public void consumeFromTopic1(#Payload #Valid ValidatedConsumerClass message, ConsumerRecordMetadata c) {
databaseService.save(message);
System.out.println( "-- Consumer End -- " + c.partition() + " ---consumer thread-- " + Thread.currentThread().getName());
}
No, you are not missing anything.
No, you do not need to handle it, the STCEH already handled it and the record will be redelivered on the next poll.
In this case, the exception is caused outside of record processing (after processing is complete). Since the commit failed due to a rebalance, there is no need for the STCEH to reseeek (and it can't anyway because the records are no longer available). It simply rethrows the exception.
Everything works as expected...
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.properties.max.poll.interval.ms=5000
#SpringBootApplication
public class So69016372Application {
public static void main(String[] args) {
SpringApplication.run(So69016372Application.class, args);
}
#KafkaListener(id = "so69016372", topics = "so69016372")
public void listen(String in, #Header(KafkaHeaders.OFFSET) long offset) throws InterruptedException {
System.out.println(in + " #" + offset);
Thread.sleep(6000);
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("so69016372").partitions(1).replicas(1).build();
}
}
Result
2021-09-01 13:47:26.963 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
2021-09-01 13:47:31.991 INFO 13195 --- [ad | so69016372] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Member consumer-so69016372-1-f02f8d74-c2b8-47d9-92d3-bf68e5c81a8f sending LeaveGroup request to coordinator localhost:9092 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2021-09-01 13:47:32.989 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Failing OffsetCommit request since the consumer is not part of an active group
2021-09-01 13:47:32.994 ERROR 13195 --- [o69016372-0-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:200) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:112) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1602) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1210) ~[spring-kafka-2.7.6.jar:2.7.6]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1139) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1004) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1495) ~[kafka-clients-2.7.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2710) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2705) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2691) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2489) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1235) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1161) ~[spring-kafka-2.7.6.jar:2.7.6]
... 3 common frames omitted
2021-09-01 13:47:32.994 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
2021-09-01 13:47:32.994 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Lost previously assigned partitions so69016372-0
2021-09-01 13:47:32.995 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions lost: [so69016372-0]
2021-09-01 13:47:32.995 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions revoked: [so69016372-0]
...
2021-09-01 13:47:33.102 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
2021-09-01 13:47:38.141 INFO 13195 --- [ad | so69016372] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Member consumer-so69016372-1-e6ec685a-d9aa-43d3-b526-b04418095f09 sending LeaveGroup request to coordinator localhost:9092 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2021-09-01 13:47:39.108 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Failing OffsetCommit request since the consumer is not part of an active group
2021-09-01 13:47:39.109 ERROR 13195 --- [o69016372-0-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:200) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:112) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1602) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1210) ~[spring-kafka-2.7.6.jar:2.7.6]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1139) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1004) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1495) ~[kafka-clients-2.7.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2710) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2705) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2691) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2489) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1235) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1161) ~[spring-kafka-2.7.6.jar:2.7.6]
... 3 common frames omitted
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Lost previously assigned partitions so69016372-0
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions lost: [so69016372-0]
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions revoked: [so69016372-0]
...
2021-09-01 13:47:39.217 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
It will retry indefinitely.

app is alive but stops consuming messages after a while

spring-kafka consumer stops consuming messages after a while. The stoppage happens every time, but never at the same duration. When app is no longer consuming, in the end of the log always I see the statement that consumer sent LEAVE_GROUP signal. If I am not seeing any errors or exceptions, why is the consumer leaving the group?
org.springframework.boot:spring-boot-starter-parent:2.0.4.RELEASE
spring-kafka:2.1.8.RELEASE
org.apache.kafka:kafka-clients:1.0.2
I've set logging as
logging.level.org.apache.kafka=DEBUG
logging.level.org.springframework.kafka=INFO
other settings
spring.kafka.listener.concurrency=5
spring.kafka.listener.type=single
spring.kafka.listener.ack-mode=record
spring.kafka.listener.poll-timeout=10000
spring.kafka.consumer.heartbeat-interval=5000
spring.kafka.consumer.max-poll-records=50
spring.kafka.consumer.fetch-max-wait=10000
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.properties.security.protocol=SSL
spring.kafka.consumer.retry.maxAttempts=3
spring.kafka.consumer.retry.backoffperiod.millisecs=2000
ContainerFactory setup
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> recordsKafkaListenerContainerFactory(RetryTemplate retryTemplate) {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
factory.setConcurrency(listenerCount);
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.RECORD);
factory.getContainerProperties().setPollTimeout(pollTimeoutMillis);
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
factory.getContainerProperties().setAckOnError(false);
factory.setRetryTemplate(retryTemplate);
factory.setStatefulRetry(true);
factory.getContainerProperties().setIdleEventInterval(60000L);
return factory;
}
Listener configuration
#Component
public class RecordsEventListener implements ConsumerSeekAware {
private static final org.slf4j.Logger LOG = org.slf4j.LoggerFactory.getLogger(RecordsEventListener.class);
#Value("${mode.replay:false}")
public void setModeReplay(boolean enabled) {
this.isReplay = enabled;
}
#KafkaListener(topics = "${event.topic}", containerFactory = "RecordsKafkaListenerContainerFactory")
public void handleEvent(#Payload String payload) throws RecordsEventListenerException {
try {
//business logic
} catch (Exception e) {
LOG.error("Process error for event: {}",payload,e);
if(isRetryableException(e)) {
LOG.warn("Retryable exception detected. Going to retry.");
throw new RecordsEventListenerException(e);
}else{
LOG.warn("Dropping event because non retryable exception");
}
}
}
private Boolean isRetryableException(Exception e) {
return binaryExceptionClassifier.classify(e);
}
#Override
public void registerSeekCallback(ConsumerSeekCallback callback) {
//do nothing
}
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
//do this only once per start of app
if (isReplay && !partitonSeekToBeginningDone) {
assignments.forEach((t, p) -> callback.seekToBeginning(t.topic(), t.partition()));
partitonSeekToBeginningDone = true;
}
}
#Override
public void onIdleContainer(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
//do nothing
LOG.info("Container is IDLE; no messages to pull.");
assignments.forEach((t,p)->LOG.info("Topic:{}, Partition:{}, Offset:{}",t.topic(),t.partition(),p));
}
boolean isPartitionSeekToBeginningDone() {
return partitonSeekToBeginningDone;
}
void setPartitonSeekToBeginningDone(boolean partitonSeekToBeginningDone) {
this.partitonSeekToBeginningDone = partitonSeekToBeginningDone;
}
}
When app is no longer consuming, in the end of the log always I see the statement that consumer sent LEAVE_GROUP signal.
2019-05-02 18:31:05.770 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending Heartbeat request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:05.770 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send HEARTBEAT {group_id=app,generation_id=6,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5347 to node 2147482638
2019-05-02 18:31:05.872 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Received successful Heartbeat response
2019-05-02 18:31:10.856 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending Heartbeat request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:10.857 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send HEARTBEAT {group_id=app,generation_id=6,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5348 to node 2147482638
2019-05-02 18:31:10.958 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Received successful Heartbeat response
2019-05-02 18:31:11.767 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending LeaveGroup request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:11.767 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send LEAVE_GROUP {group_id=app,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5349 to node 2147482638
2019-05-02 18:31:11.768 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Disabling heartbeat thread
Full log
Thanks to all who replied. Turns out, it was indeed the broker dropping the consumer on session timeout. The broker a very old version (0.10.0.1) did not accommodate the newer features as outlined in KIP-62 that spring-kafka version we used could make use of.
Since we could not dictate the upgrade to the broker or changes to session timeout, we simply modified our processing logic so as to finish the task under the session timeout.

Apache Ignite Kafka connection issues

I'm trying to do stream processing and CEP on a Kafka message stream. For this I picked Apache Ignite to realise a prototype first. However I cannot connect to the queue:
Use
kafka_2.11-0.10.1.0
apache-ignite-fabric-1.8.0-bin
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Kafka works properly, I tested it with a consumer.
Then I start ignite, then I run following in a spring boot commandline app.
KafkaStreamer<String, String, String> kafkaStreamer = new KafkaStreamer<>();
Ignition.setClientMode(true);
Ignite ignite = Ignition.start();
Properties settings = new Properties();
// Set a few key parameters
settings.put("bootstrap.servers", "localhost:9092");
settings.put("group.id", "test");
settings.put("zookeeper.connect", "localhost:2181");
settings.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
settings.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
settings.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
settings.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
// Create an instance of StreamsConfig from the Properties instance
kafka.consumer.ConsumerConfig config = new ConsumerConfig(settings);
IgniteCache<String, String> cache = ignite.getOrCreateCache("myCache");
try (IgniteDataStreamer<String, String> stmr = ignite.dataStreamer("myCache")) {
// allow overwriting cache data
stmr.allowOverwrite(true);
kafkaStreamer.setIgnite(ignite);
kafkaStreamer.setStreamer(stmr);
// set the topic
kafkaStreamer.setTopic("test");
// set the number of threads to process Kafka streams
kafkaStreamer.setThreads(1);
// set Kafka consumer configurations
kafkaStreamer.setConsumerConfig(config);
// set decoders
StringDecoder keyDecoder = new StringDecoder(null);
StringDecoder valueDecoder = new StringDecoder(null);
kafkaStreamer.setKeyDecoder(keyDecoder);
kafkaStreamer.setValueDecoder(valueDecoder);
kafkaStreamer.start();
} finally {
kafkaStreamer.stop();
}
When the application starts I get
2017-02-23 10:25:23.409 WARN 1388 --- [ main] kafka.utils.VerifiableProperties : Property bootstrap.servers is not valid
2017-02-23 10:25:23.410 INFO 1388 --- [ main] kafka.utils.VerifiableProperties : Property group.id is overridden to test
2017-02-23 10:25:23.410 WARN 1388 --- [ main] kafka.utils.VerifiableProperties : Property key.deserializer is not valid
2017-02-23 10:25:23.411 WARN 1388 --- [ main] kafka.utils.VerifiableProperties : Property key.serializer is not valid
2017-02-23 10:25:23.411 WARN 1388 --- [ main] kafka.utils.VerifiableProperties : Property value.deserializer is not valid
2017-02-23 10:25:23.411 WARN 1388 --- [ main] kafka.utils.VerifiableProperties : Property value.serializer is not valid
2017-02-23 10:25:23.411 INFO 1388 --- [ main] kafka.utils.VerifiableProperties : Property zookeeper.connect is overridden to localhost:2181
Then
2017-02-23 10:25:24.057 WARN 1388 --- [r-finder-thread] kafka.client.ClientUtils$ : Fetching topic metadata with correlation id 0 for topics [Set(test)] from broker [BrokerEndPoint(0,user.local,9092)] failed
java.nio.channels.ClosedChannelException: null
at kafka.network.BlockingChannel.send(BlockingChannel.scala:110) ~[kafka_2.11-0.10.0.1.jar:na]
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:80) ~[kafka_2.11-0.10.0.1.jar:na]
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:79) ~[kafka_2.11-0.10.0.1.jar:na]
at kafka.producer.SyncProducer.send(SyncProducer.scala:124) ~[kafka_2.11-0.10.0.1.jar:na]
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:59) [kafka_2.11-0.10.0.1.jar:na]
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:94) [kafka_2.11-0.10.0.1.jar:na]
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66) [kafka_2.11-0.10.0.1.jar:na]
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) [kafka_2.11-0.10.0.1.jar:na]
And reading from the queue doesn't work.
Does anyone have an idea how to fix this?
Edit: If I comment the contents of the finally block then following error comes
[2m2017-02-27 16:42:27.780[0;39m [31mERROR[0;39m [35m29946[0;39m [2m---[0;39m [2m[pool-3-thread-1][0;39m [36m [0;39m [2m:[0;39m Message is ignored due to an error [msg=MessageAndMetadata(test,0,Message(magic = 1, attributes = 0, CreateTime = -1, crc = 2558126716, key = java.nio.HeapByteBuffer[pos=0 lim=1 cap=79], payload = java.nio.HeapByteBuffer[pos=0 lim=74 cap=74]),15941704,kafka.serializer.StringDecoder#74a96647,kafka.serializer.StringDecoder#42849d34,-1,CreateTime)]
java.lang.IllegalStateException: Data streamer has been closed.
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.enterBusy(DataStreamerImpl.java:401) ~[ignite-core-1.8.0.jar:1.8.0]
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.addDataInternal(DataStreamerImpl.java:613) ~[ignite-core-1.8.0.jar:1.8.0]
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.addData(DataStreamerImpl.java:667) ~[ignite-core-1.8.0.jar:1.8.0]
at org.apache.ignite.stream.kafka.KafkaStreamer$1.run(KafkaStreamer.java:180) ~[ignite-kafka-1.8.0.jar:1.8.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Thanks!
I think this happens because KafkaStreamer is getting closed right after it's started (kafkaStreamer.stop() call in finally block). kafkaStreamer.start() is not synchronous, it just spins out threads to consume from Kafka and exits.