CommitFailedException by Spring Kafka Consumer - apache-kafka

I get the below error message sometimes while using Spring Kafka Consumer .I have implemented at least once semantics as shown in the code snippet
1 )My doubt is do I miss any message from consuming?
2)Do i need to handle this error .As this error was not reported by seekToCurrentErrorHandler()
org.apache.kafka.clients.consumer.CommitFailedException: Offset commit
cannot be completed since the consumer is not part of an active group
for auto partition assignment; it is likely that the consumer was
kicked out of the group.
My spring kafka consumer code snippet
public class KafkaConsumerConfig implements KafkaListenerConfigurer
#Bean
public SeekToCurrentErrorHandler seekToCurrentErrorHandler() {
SeekToCurrentErrorHandler seekToCurrentErrorHandler = new SeekToCurrentErrorHandler((record, e) -> {
System.out.println("RECORD from topic " + record.topic() + " at partition " + record.partition()
+ " at offset " + record.offset() + " did not process correctly due to a " + e.getCause());
}, new FixedBackOff(500L, 3L));
return seekToCurrentErrorHandler;
}
#Bean
public ConsumerFactory<String, ValidatedConsumerClass> consumerFactory() {
ErrorHandlingDeserializer<ValidatedConsumerClass> errorHandlingDeserializer;
errorHandlingDeserializer = new ErrorHandlingDeserializer<>( new JsonDeserializer<>(ValidatedConsumerClass.class));
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "grpid-098");
props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
return new DefaultKafkaConsumerFactory<>(props, new StringDeserializer(),
errorHandlingDeserializer);
}
#Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, ValidatedConsumerClass>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, ValidatedConsumerClass> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(AckMode.RECORD);
factory.setErrorHandler(seekToCurrentErrorHandler());
return factory;
}
Consumer reading the message
#Service
public class KafKaConsumerService extends AbstractConsumerSeekAware {
#KafkaListener(id = "foo", topics = "mytopic-5", concurrency = "5", groupId = "mytopic-1-groupid")
public void consumeFromTopic1(#Payload #Valid ValidatedConsumerClass message, ConsumerRecordMetadata c) {
databaseService.save(message);
System.out.println( "-- Consumer End -- " + c.partition() + " ---consumer thread-- " + Thread.currentThread().getName());
}

No, you are not missing anything.
No, you do not need to handle it, the STCEH already handled it and the record will be redelivered on the next poll.
In this case, the exception is caused outside of record processing (after processing is complete). Since the commit failed due to a rebalance, there is no need for the STCEH to reseeek (and it can't anyway because the records are no longer available). It simply rethrows the exception.
Everything works as expected...
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.properties.max.poll.interval.ms=5000
#SpringBootApplication
public class So69016372Application {
public static void main(String[] args) {
SpringApplication.run(So69016372Application.class, args);
}
#KafkaListener(id = "so69016372", topics = "so69016372")
public void listen(String in, #Header(KafkaHeaders.OFFSET) long offset) throws InterruptedException {
System.out.println(in + " #" + offset);
Thread.sleep(6000);
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("so69016372").partitions(1).replicas(1).build();
}
}
Result
2021-09-01 13:47:26.963 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
2021-09-01 13:47:31.991 INFO 13195 --- [ad | so69016372] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Member consumer-so69016372-1-f02f8d74-c2b8-47d9-92d3-bf68e5c81a8f sending LeaveGroup request to coordinator localhost:9092 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2021-09-01 13:47:32.989 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Failing OffsetCommit request since the consumer is not part of an active group
2021-09-01 13:47:32.994 ERROR 13195 --- [o69016372-0-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:200) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:112) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1602) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1210) ~[spring-kafka-2.7.6.jar:2.7.6]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1139) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1004) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1495) ~[kafka-clients-2.7.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2710) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2705) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2691) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2489) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1235) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1161) ~[spring-kafka-2.7.6.jar:2.7.6]
... 3 common frames omitted
2021-09-01 13:47:32.994 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
2021-09-01 13:47:32.994 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Lost previously assigned partitions so69016372-0
2021-09-01 13:47:32.995 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions lost: [so69016372-0]
2021-09-01 13:47:32.995 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions revoked: [so69016372-0]
...
2021-09-01 13:47:33.102 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
2021-09-01 13:47:38.141 INFO 13195 --- [ad | so69016372] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Member consumer-so69016372-1-e6ec685a-d9aa-43d3-b526-b04418095f09 sending LeaveGroup request to coordinator localhost:9092 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2021-09-01 13:47:39.108 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Failing OffsetCommit request since the consumer is not part of an active group
2021-09-01 13:47:39.109 ERROR 13195 --- [o69016372-0-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:200) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:112) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1602) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1210) ~[spring-kafka-2.7.6.jar:2.7.6]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1139) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1004) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1495) ~[kafka-clients-2.7.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2710) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2705) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2691) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2489) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1235) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1161) ~[spring-kafka-2.7.6.jar:2.7.6]
... 3 common frames omitted
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Lost previously assigned partitions so69016372-0
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions lost: [so69016372-0]
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions revoked: [so69016372-0]
...
2021-09-01 13:47:39.217 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
It will retry indefinitely.

Related

Spring #KafkaListener with topicPattern: handle runtime topic creation

I'm using Spring #KafkaListener with a topicPattern. If during the runtime of this application I create a new topic matching the pattern and start publishing to that, the listener application simply ignores those messages. In other words, it only pulls all the topics matching the pattern at startup and listens to those.
What's the easiest way to "refresh" that? Thanks!
By default, new topics will be picked up within 5 minutes (default) according to the setting of https://kafka.apache.org/documentation/#consumerconfigs_metadata.max.age.ms
The period of time in milliseconds after which we force a refresh of metadata even if we haven't seen any partition leadership changes to proactively discover any new brokers or partitions.
You can reduce it to speed things up at the expense of increased traffic.
EDIT
This shows it working as expected...
#SpringBootApplication
public class So71386069Application {
private static final Logger log = LoggerFactory.getLogger(So71386069Application.class);
public static void main(String[] args) {
SpringApplication.run(So71386069Application.class, args);
}
#KafkaListener(id = "so71386069", topicPattern = "so71386069.*",
properties = "metadata.max.age.ms:60000")
void listen(String in) {
System.out.println(in);
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("so71386069").partitions(1).replicas(1).build();
}
#Bean
ApplicationRunner runner(KafkaAdmin admin) {
return args -> {
try (AdminClient client = AdminClient.create(admin.getConfigurationProperties())) {
IntStream.range(0, 10).forEach(i -> {
try {
Thread.sleep(30_000);
String topic = "so71386069-" + i;
log.info("Creating {}", topic);
client.createTopics(Collections.singleton(
TopicBuilder.name(topic).partitions(1).replicas(1).build())).all().get();
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
catch (ExecutionException e) {
e.printStackTrace();
}
});
}
};
}
}
2022-03-07 15:41:07.131 INFO 33630 --- [o71386069-0-C-1] o.s.k.l.KafkaMessageListenerContainer
: so71386069: partitions assigned: [so71386069-0]
2022-03-07 15:41:34.007 INFO 33630 --- [ main] com.example.demo.So71386069Application
: Creating so71386069-0
2022-03-07 15:42:04.193 INFO 33630 --- [ main] com.example.demo.So71386069Application
: Creating so71386069-1
...
2022-03-07 15:42:07.590 INFO 33630 --- [o71386069-0-C-1] o.s.k.l.KafkaMessageListenerContainer
: so71386069: partitions revoked: [so71386069-0]
...
2022-03-07 15:42:07.599 INFO 33630 --- [o71386069-0-C-1] o.s.k.l.KafkaMessageListenerContainer
: so71386069: partitions assigned: [so71386069-0, so71386069-1-0, so71386069-0-0]
2022-03-07 15:42:34.378 INFO 33630 --- [ main] com.example.demo.So71386069Application
: Creating so71386069-2
2022-03-07 15:43:04.554 INFO 33630 --- [ main] com.example.demo.So71386069Application
: Creating so71386069-3
...
2022-03-07 15:43:08.403 INFO 33630 --- [o71386069-0-C-1] o.s.k.l.KafkaMessageListenerContainer
: so71386069: partitions revoked: [so71386069-0, so71386069-1-0, so71386069-0-0]
...
2022-03-07 15:43:08.411 INFO 33630 --- [o71386069-0-C-1] o.s.k.l.KafkaMessageListenerContainer
: so71386069: partitions assigned: [so71386069-0, so71386069-3-0, so71386069-2-0, so71386069-1-0, so71386069-0-0]
...
I think that’s how it is by design. The Kafka client always has to subscribe to a topic before be able to get messages.
In this case, on startup the Kafka client/consumer is subscribing to topics matching patterns once at the startup and that’s what it carries on with.
But this is really an interesting question. The easiest and simplest answer is “Restarting the client/consumer“. However, will keep a watch on others answers to learn about any ideas.

Exactly once semantic with spring kafka

Im trying to test my exactly once configuration to make sure all the configs i set are correct and the behavior is as i expect
I seem to encounter a problem with duplicate sends
public static void main(String[] args) {
MessageProducer producer = new ProducerBuilder()
.setBootstrapServers("kafka:9992")
.setKeySerializerClass(StringSerializer.class)
.setValueSerializerClass(StringSerializer.class)
.setProducerEnableIdempotence(true).build();
MessageConsumer consumer = new ConsumerBuilder()
.setBootstrapServers("kafka:9992")
.setIsolationLevel("read_committed")
.setTopics("someTopic2")
.setGroupId("bla")
.setKeyDeserializerClass(StringDeserializer.class)
.setValueDeserializerClass(MapDeserializer.class)
.setConsumerMessageLogic(new ConsumerMessageLogic() {
#Override
public void onMessage(ConsumerRecord cr, Acknowledgment acknowledgment) {
producer.sendMessage(new TopicPartition("someTopic2", cr.partition()),
new OffsetAndMetadata(cr.offset() + 1),"something1", "im in transaction", cr.key());
acknowledgment.acknowledge();
}
}).build();
consumer.start();
}
this is my "test", you can assume the builder puts the right configuration.
ConsumerMessageLogic is a class that handles the "process" part of the read-process-write that the exactly once semantic is supporting
inside the producer class i have a send message method like so:
public void sendMessage(TopicPartition topicPartition, OffsetAndMetadata offsetAndMetadata,String sendToTopic, V message, PK partitionKey) {
try {
KafkaRecord<PK, V> partitionAndMessagePair = producerMessageLogic.prepareMessage(topicPartition.topic(), partitionKey, message);
if(kafkaTemplate.getProducerFactory().transactionCapable()){
kafkaTemplate.executeInTransaction(operations -> {
sendMessage(message, partitionKey, sendToTopic, partitionAndMessagePair, operations);
operations.sendOffsetsToTransaction(
Map.of(topicPartition, offsetAndMetadata),"bla");
return true;
});
}else{
sendMessage(message, partitionKey, topicPartition.topic(), partitionAndMessagePair, kafkaTemplate);
}
}catch (Exception e){
failureHandler.onFailure(partitionKey, message, e);
}
}
I create my consumer like so:
/**
* Start the message consumer
* The record event will be delegate on the onMessage()
*/
public void start() {
initConsumerMessageListenerContainer();
container.start();
}
/**
* Initialize the kafka message listener
*/
private void initConsumerMessageListenerContainer() {
// start a acknowledge message listener to allow the manual commit
messageListener = consumerMessageLogic::onMessage;
// start and initialize the consumer container
container = initContainer(messageListener);
// sets the number of consumers, the topic partitions will be divided by the consumers
container.setConcurrency(springConcurrency);
springContainerPollTimeoutOpt.ifPresent(p -> container.getContainerProperties().setPollTimeout(p));
if (springAckMode != null) {
container.getContainerProperties().setAckMode(springAckMode);
}
}
private ConcurrentMessageListenerContainer<PK, V> initContainer(AcknowledgingMessageListener<PK, V> messageListener) {
return new ConcurrentMessageListenerContainer<>(
consumerFactory(props),
containerProperties(messageListener));
}
when i create my producer i create it with UUID as transaction prefix like so
public ProducerFactory<PK, V> producerFactory(boolean isTransactional) {
ProducerFactory<PK, V> res = new DefaultKafkaProducerFactory<>(props);
if(isTransactional){
((DefaultKafkaProducerFactory<PK, V>) res).setTransactionIdPrefix(UUID.randomUUID().toString());
((DefaultKafkaProducerFactory<PK, V>) res).setProducerPerConsumerPartition(true);
}
return res;
}
Now after everything is set up, i bring 2 instances up on a topic with 2 partitions
each instance get 1 partitions from the consumed topic.
i send a message and wait in debug for the transaction timeout ( to simulate loss of connection)
in instance A, once the timeout passes the other instance( instance B) automatically processes the record and send it to the target topic cause a re-balance occurred
So far so good.
Now when i release the break point on instance A, it says its re-balancing and couldn't commit, but i still see another output record in my destination topic.
My expectation was that instance A wont continue its work once i release the breakpoint as the record was already processed.
Am i doing something wrong?
Can this scenario be achieved?
edit 2:
after garys remarks about the execute in transaction, i get the duplicate record if i freeze one of the instances till the timeout and release it after the other instance processed the record, then the freezed instance process and produce the same record to the out put topic...
public static void main(String[] args) {
MessageProducer producer = new ProducerBuilder()
.setBootstrapServers("kafka:9992")
.setKeySerializerClass(StringSerializer.class)
.setValueSerializerClass(StringSerializer.class)
.setProducerEnableIdempotence(true).build();
MessageConsumer consumer = new ConsumerBuilder()
.setBootstrapServers("kafka:9992")
.setIsolationLevel("read_committed")
.setTopics("someTopic2")
.setGroupId("bla")
.setKeyDeserializerClass(StringDeserializer.class)
.setValueDeserializerClass(MapDeserializer.class)
.setConsumerMessageLogic(new ConsumerMessageLogic() {
#Override
public void onMessage(ConsumerRecord cr, Acknowledgment acknowledgment) {
producer.sendMessage("something1", "im in transaction");
}
}).build();
consumer.start(producer.getProducerFactory());
}
the new sendMessage method in the producer without executeInTransaction
public void sendMessage(V message, PK partitionKey, String topicName) {
try {
KafkaRecord<PK, V> partitionAndMessagePair = producerMessageLogic.prepareMessage(topicName, partitionKey, message);
sendMessage(message, partitionKey, topicName, partitionAndMessagePair, kafkaTemplate);
}catch (Exception e){
failureHandler.onFailure(partitionKey, message, e);
}
}
as well as i changed the consumer container creation to have a transaction manager with the same producerfactory as suggested
/**
* Initialize the kafka message listener
*/
private void initConsumerMessageListenerContainer(ProducerFactory<PK,V> producerFactory) {
// start a acknowledge message listener to allow the manual commit
acknowledgingMessageListener = consumerMessageLogic::onMessage;
// start and initialize the consumer container
container = initContainer(acknowledgingMessageListener, producerFactory);
// sets the number of consumers, the topic partitions will be divided by the consumers
container.setConcurrency(springConcurrency);
springContainerPollTimeoutOpt.ifPresent(p -> container.getContainerProperties().setPollTimeout(p));
if (springAckMode != null) {
container.getContainerProperties().setAckMode(springAckMode);
}
}
private ConcurrentMessageListenerContainer<PK, V> initContainer(AcknowledgingMessageListener<PK, V> messageListener, ProducerFactory<PK,V> producerFactory) {
return new ConcurrentMessageListenerContainer<>(
consumerFactory(props),
containerProperties(messageListener, producerFactory));
}
#NonNull
private ContainerProperties containerProperties(MessageListener<PK, V> messageListener, ProducerFactory<PK,V> producerFactory) {
ContainerProperties containerProperties = new ContainerProperties(topics);
containerProperties.setMessageListener(messageListener);
containerProperties.setTransactionManager(new KafkaTransactionManager<>(producerFactory));
return containerProperties;
}
my expectation is that the broker once receiving the processed record from the freezed instance, that it'll know that that record was already handled by another instance as it contains the exact same metadata ( or is it? i mean, the PID will be different, but should it be different?)
Maybe the scenario im looking for is not even supported in the current exactly once support kafka and spring provides...
if i have 2 instances of read-process-write - that means i have 2 producers with 2 different PID's.
Now when i freeze one of the instances, when the unfrozen instance gets the record process responsibility due to a rebalance, it will send the record with its own PID and a sequence in the metadata.
Now when i release the frozen instance, he sends the same record but with its own PID, so theres no way the broker will know its a duplicate...
Am i wrong? how can i avoid this scenario? i though the re-balance stops the instance and doesnt let it complete its process ( where he produce the duplicate record) cause he no longer has responsibility about that record
Adding the logs:
frozen instance: you can see the freeze time at 10:53:34 and i released it at 10:54:02 ( rebalance time is 10 secs)
2020-06-16 10:53:34,393 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
Created new Producer: CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
2020-06-16 10:53:34,394 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
beginTransaction()
2020-06-16 10:53:34,395 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.doBegin:149] Created
Kafka transaction on producer [CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]]
2020-06-16 10:54:02,157 INFO [${sys:spring.application.name}] [kafka-
coordinator-heartbeat-thread | bla]
[o.a.k.c.c.i.AbstractCoordinator.:] [Consumer clientId=consumer-bla-1,
groupId=bla] Group coordinator X.X.X.X:9992 (id: 2147482646 rack:
null) is unavailable or invalid, will attempt rediscovery
2020-06-16 10:54:02,181 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Sending offsets to transaction: {someTopic2-
0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
2020-06-16 10:54:02,189 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] [i.i.k.s.p.SimpleSuccessHandler.:] Sent
message=[im in transaction] with offset=[252] to topic something1
2020-06-16 10:54:02,193 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] [o.a.k.c.p.i.TransactionManager.:]
[Producer clientId=producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0, transactionalId=b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] Discovered group coordinator
X.X.X.X:9992 (id: 1001 rack: null)
2020-06-16 10:54:02,263 INFO [${sys:spring.application.name}] [kafka-
coordinator-heartbeat-thread | bla]
[o.a.k.c.c.i.AbstractCoordinator.:] [Consumer clientId=consumer-bla-1,
groupId=bla] Discovered group coordinator 192.168.144.1:9992 (id:
2147482646 rack: null)
2020-06-16 10:54:02,295 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.processCommit:740]
Initiating transaction commit
2020-06-16 10:54:02,296 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
commitTransaction()
2020-06-16 10:54:02,299 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
2020-06-16 10:54:02,301 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.AbstractCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Attempt to heartbeat failed for
since member id consumer-bla-1-b3ad1c09-ad06-4bc4-a891-47a2288a830f is
not valid.
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.ConsumerCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Giving away all assigned
partitions as lost since generation has been reset,indicating that
consumer is no longer part of the group
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.ConsumerCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Lost previously assigned
partitions someTopic2-0
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.l.ConcurrentMessageListenerContainer.info:279]
bla: partitions lost: [someTopic2-0]
2020-06-16 10:54:02,303 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.l.ConcurrentMessageListenerContainer.info:279]
bla: partitions revoked: [someTopic2-0]
2020-06-16 10:54:02,303 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
The regular instance that takes over the partation and produce the record after a rebalance
2020-06-16 10:53:46,536 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
Created new Producer: CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
2020-06-16 10:53:46,537 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
beginTransaction()
2020-06-16 10:53:46,539 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.doBegin:149] Created
Kafka transaction on producer [CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]]
2020-06-16 10:53:46,556 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Sending offsets to transaction: {someTopic2-
0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
2020-06-16 10:53:46,563 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] [i.i.k.s.p.SimpleSuccessHandler.:] Sent
message=[im in transaction] with offset=[250] to topic something1
2020-06-16 10:53:46,566 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] [o.a.k.c.p.i.TransactionManager.:]
[Producer clientId=producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0, transactionalId=1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] Discovered group coordinator
X.X.X.X:9992 (id: 1001 rack: null)
2020-06-16 10:53:46,668 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.processCommit:740]
Initiating transaction commit
2020-06-16 10:53:46,669 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
commitTransaction()
2020-06-16 10:53:46,672 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
2020-06-16 10:53:51,673 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Received: 0 records
I noticed they both note the exact same offset to commit
Sending offsets to transaction: {someTopic2-0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
i thought when they try to commit the exact same thing the broker will abort one of the transactions...
I also noticed that if i reduce the transaction.timeout.ms to just 2 seconds, it doesnt abort the transaction no matter how long i freeze the instance on debug...
maybe the timer of transaction.timeout.ms starts only after i send the message?
You must not use executeInTransaction at all - see its Javadocs; it is used when there is no active transaction or if you explicitly don't want an operation to participate in an existing transaction.
You need to add a KafkaTransactionManager to the listener container; it must have a reference to same ProducerFactory as the template.
Then, the container will start the transaction and, if successful, send the offset to the transaction.

app is alive but stops consuming messages after a while

spring-kafka consumer stops consuming messages after a while. The stoppage happens every time, but never at the same duration. When app is no longer consuming, in the end of the log always I see the statement that consumer sent LEAVE_GROUP signal. If I am not seeing any errors or exceptions, why is the consumer leaving the group?
org.springframework.boot:spring-boot-starter-parent:2.0.4.RELEASE
spring-kafka:2.1.8.RELEASE
org.apache.kafka:kafka-clients:1.0.2
I've set logging as
logging.level.org.apache.kafka=DEBUG
logging.level.org.springframework.kafka=INFO
other settings
spring.kafka.listener.concurrency=5
spring.kafka.listener.type=single
spring.kafka.listener.ack-mode=record
spring.kafka.listener.poll-timeout=10000
spring.kafka.consumer.heartbeat-interval=5000
spring.kafka.consumer.max-poll-records=50
spring.kafka.consumer.fetch-max-wait=10000
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.properties.security.protocol=SSL
spring.kafka.consumer.retry.maxAttempts=3
spring.kafka.consumer.retry.backoffperiod.millisecs=2000
ContainerFactory setup
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> recordsKafkaListenerContainerFactory(RetryTemplate retryTemplate) {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
factory.setConcurrency(listenerCount);
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.RECORD);
factory.getContainerProperties().setPollTimeout(pollTimeoutMillis);
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
factory.getContainerProperties().setAckOnError(false);
factory.setRetryTemplate(retryTemplate);
factory.setStatefulRetry(true);
factory.getContainerProperties().setIdleEventInterval(60000L);
return factory;
}
Listener configuration
#Component
public class RecordsEventListener implements ConsumerSeekAware {
private static final org.slf4j.Logger LOG = org.slf4j.LoggerFactory.getLogger(RecordsEventListener.class);
#Value("${mode.replay:false}")
public void setModeReplay(boolean enabled) {
this.isReplay = enabled;
}
#KafkaListener(topics = "${event.topic}", containerFactory = "RecordsKafkaListenerContainerFactory")
public void handleEvent(#Payload String payload) throws RecordsEventListenerException {
try {
//business logic
} catch (Exception e) {
LOG.error("Process error for event: {}",payload,e);
if(isRetryableException(e)) {
LOG.warn("Retryable exception detected. Going to retry.");
throw new RecordsEventListenerException(e);
}else{
LOG.warn("Dropping event because non retryable exception");
}
}
}
private Boolean isRetryableException(Exception e) {
return binaryExceptionClassifier.classify(e);
}
#Override
public void registerSeekCallback(ConsumerSeekCallback callback) {
//do nothing
}
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
//do this only once per start of app
if (isReplay && !partitonSeekToBeginningDone) {
assignments.forEach((t, p) -> callback.seekToBeginning(t.topic(), t.partition()));
partitonSeekToBeginningDone = true;
}
}
#Override
public void onIdleContainer(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
//do nothing
LOG.info("Container is IDLE; no messages to pull.");
assignments.forEach((t,p)->LOG.info("Topic:{}, Partition:{}, Offset:{}",t.topic(),t.partition(),p));
}
boolean isPartitionSeekToBeginningDone() {
return partitonSeekToBeginningDone;
}
void setPartitonSeekToBeginningDone(boolean partitonSeekToBeginningDone) {
this.partitonSeekToBeginningDone = partitonSeekToBeginningDone;
}
}
When app is no longer consuming, in the end of the log always I see the statement that consumer sent LEAVE_GROUP signal.
2019-05-02 18:31:05.770 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending Heartbeat request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:05.770 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send HEARTBEAT {group_id=app,generation_id=6,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5347 to node 2147482638
2019-05-02 18:31:05.872 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Received successful Heartbeat response
2019-05-02 18:31:10.856 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending Heartbeat request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:10.857 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send HEARTBEAT {group_id=app,generation_id=6,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5348 to node 2147482638
2019-05-02 18:31:10.958 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Received successful Heartbeat response
2019-05-02 18:31:11.767 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending LeaveGroup request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:11.767 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send LEAVE_GROUP {group_id=app,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5349 to node 2147482638
2019-05-02 18:31:11.768 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Disabling heartbeat thread
Full log
Thanks to all who replied. Turns out, it was indeed the broker dropping the consumer on session timeout. The broker a very old version (0.10.0.1) did not accommodate the newer features as outlined in KIP-62 that spring-kafka version we used could make use of.
Since we could not dictate the upgrade to the broker or changes to session timeout, we simply modified our processing logic so as to finish the task under the session timeout.

Kafka Streams - Consumer memory overload

I am planning a Spring+Kafka Streams application that handles incoming messages and stores updated internal state as a result of these messages.
This state is predicted to reach ~500mb per unique key (There are likely to be ~10k unique keys distributed across 2k partitions).
This state must generally be held in-memory for effective operation of my application but even on disk I would still face a similar problem (albeit just at a later date of scaling).
I am planning to deploy this application into a dynamically scaling environment such as AWS and will set a minimum number of instances, but I am wary of 2 situations:
On first startup (where perhaps just 1 consumer starts first) it will not be able to handle taking assignment of all the partitions because the in memory state will overflow the instances available memory.
After a major outtage (AWS availability zone outtage) it could be that 33% of consumers are taken out of the group and the additional memory load on the remaining instances could actually take out everyone who remains.
How do people protect their consumers from taking on more partitions than they can handle such that they do not overflow available memory/disk?
See the kafka documentation.
Since 0.11...
EDIT
For your second use case (and it also works for the first), perhaps you could implement a custom PartitionAssignor that limits the number of partitions assigned to each instance.
I haven't tried it; I don't know how the broker will react to the presence of unassigned partitions.
EDIT2
This seems to work ok; but YMMV...
public class NoMoreThanFiveAssignor extends RoundRobinAssignor {
#Override
public Map<String, List<TopicPartition>> assign(Map<String, Integer> partitionsPerTopic,
Map<String, Subscription> subscriptions) {
Map<String, List<TopicPartition>> assignments = super.assign(partitionsPerTopic, subscriptions);
assignments.forEach((memberId, assigned) -> {
if (assigned.size() > 5) {
System.out.println("Reducing assignments from " + assigned.size() + " to 5 for " + memberId);
assignments.put(memberId,
assigned.stream()
.limit(5)
.collect(Collectors.toList()));
}
});
return assignments;
}
}
and
#SpringBootApplication
public class So54072362Application {
public static void main(String[] args) {
SpringApplication.run(So54072362Application.class, args);
}
#Bean
public NewTopic topic() {
return new NewTopic("so54072362", 15, (short) 1);
}
#KafkaListener(id = "so54072362", topics = "so54072362")
public void listen(ConsumerRecord<?, ?> record) {
System.out.println(record);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template) {
return args -> {
for (int i = 0; i < 15; i++) {
template.send("so54072362", i, "foo", "bar");
}
};
}
}
and
spring.kafka.consumer.properties.partition.assignment.strategy=com.example.NoMoreThanFiveAssignor
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.auto-offset-reset=earliest
and
Reducing assignments from 15 to 5 for consumer-2-f37221f8-70bb-421d-9faf-6591cc26a76a
2019-01-07 15:24:28.288 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 7
2019-01-07 15:24:28.289 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:28.296 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Attempt to heartbeat failed since group is rebalancing
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Revoking previously assigned partitions [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked: [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.304 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] (Re-)joining group
Reducing assignments from 8 to 5 for consumer-2-c9a6928a-520c-4646-9dd9-4da14636744b
Reducing assignments from 7 to 5 for consumer-2-f37221f8-70bb-421d-9faf-6591cc26a76a
2019-01-07 15:24:46.310 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 8
2019-01-07 15:24:46.311 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:46.315 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Attempt to heartbeat failed since group is rebalancing
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Revoking previously assigned partitions [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked: [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] (Re-)joining group
2019-01-07 15:24:58.330 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 9
2019-01-07 15:24:58.332 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-14, so54072362-11, so54072362-5, so54072362-8, so54072362-2]
2019-01-07 15:24:58.336 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-14, so54072362-11, so54072362-5, so54072362-8, so54072362-2]
Of course, this leaves the unassigned partitions dangling, but it sounds like that's what you want, until the region comes back online.

Apache Ignite Kafka connection issues

I'm trying to do stream processing and CEP on a Kafka message stream. For this I picked Apache Ignite to realise a prototype first. However I cannot connect to the queue:
Use
kafka_2.11-0.10.1.0
apache-ignite-fabric-1.8.0-bin
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Kafka works properly, I tested it with a consumer.
Then I start ignite, then I run following in a spring boot commandline app.
KafkaStreamer<String, String, String> kafkaStreamer = new KafkaStreamer<>();
Ignition.setClientMode(true);
Ignite ignite = Ignition.start();
Properties settings = new Properties();
// Set a few key parameters
settings.put("bootstrap.servers", "localhost:9092");
settings.put("group.id", "test");
settings.put("zookeeper.connect", "localhost:2181");
settings.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
settings.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
settings.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
settings.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
// Create an instance of StreamsConfig from the Properties instance
kafka.consumer.ConsumerConfig config = new ConsumerConfig(settings);
IgniteCache<String, String> cache = ignite.getOrCreateCache("myCache");
try (IgniteDataStreamer<String, String> stmr = ignite.dataStreamer("myCache")) {
// allow overwriting cache data
stmr.allowOverwrite(true);
kafkaStreamer.setIgnite(ignite);
kafkaStreamer.setStreamer(stmr);
// set the topic
kafkaStreamer.setTopic("test");
// set the number of threads to process Kafka streams
kafkaStreamer.setThreads(1);
// set Kafka consumer configurations
kafkaStreamer.setConsumerConfig(config);
// set decoders
StringDecoder keyDecoder = new StringDecoder(null);
StringDecoder valueDecoder = new StringDecoder(null);
kafkaStreamer.setKeyDecoder(keyDecoder);
kafkaStreamer.setValueDecoder(valueDecoder);
kafkaStreamer.start();
} finally {
kafkaStreamer.stop();
}
When the application starts I get
2017-02-23 10:25:23.409 WARN 1388 --- [ main] kafka.utils.VerifiableProperties : Property bootstrap.servers is not valid
2017-02-23 10:25:23.410 INFO 1388 --- [ main] kafka.utils.VerifiableProperties : Property group.id is overridden to test
2017-02-23 10:25:23.410 WARN 1388 --- [ main] kafka.utils.VerifiableProperties : Property key.deserializer is not valid
2017-02-23 10:25:23.411 WARN 1388 --- [ main] kafka.utils.VerifiableProperties : Property key.serializer is not valid
2017-02-23 10:25:23.411 WARN 1388 --- [ main] kafka.utils.VerifiableProperties : Property value.deserializer is not valid
2017-02-23 10:25:23.411 WARN 1388 --- [ main] kafka.utils.VerifiableProperties : Property value.serializer is not valid
2017-02-23 10:25:23.411 INFO 1388 --- [ main] kafka.utils.VerifiableProperties : Property zookeeper.connect is overridden to localhost:2181
Then
2017-02-23 10:25:24.057 WARN 1388 --- [r-finder-thread] kafka.client.ClientUtils$ : Fetching topic metadata with correlation id 0 for topics [Set(test)] from broker [BrokerEndPoint(0,user.local,9092)] failed
java.nio.channels.ClosedChannelException: null
at kafka.network.BlockingChannel.send(BlockingChannel.scala:110) ~[kafka_2.11-0.10.0.1.jar:na]
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:80) ~[kafka_2.11-0.10.0.1.jar:na]
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:79) ~[kafka_2.11-0.10.0.1.jar:na]
at kafka.producer.SyncProducer.send(SyncProducer.scala:124) ~[kafka_2.11-0.10.0.1.jar:na]
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:59) [kafka_2.11-0.10.0.1.jar:na]
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:94) [kafka_2.11-0.10.0.1.jar:na]
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66) [kafka_2.11-0.10.0.1.jar:na]
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) [kafka_2.11-0.10.0.1.jar:na]
And reading from the queue doesn't work.
Does anyone have an idea how to fix this?
Edit: If I comment the contents of the finally block then following error comes
[2m2017-02-27 16:42:27.780[0;39m [31mERROR[0;39m [35m29946[0;39m [2m---[0;39m [2m[pool-3-thread-1][0;39m [36m [0;39m [2m:[0;39m Message is ignored due to an error [msg=MessageAndMetadata(test,0,Message(magic = 1, attributes = 0, CreateTime = -1, crc = 2558126716, key = java.nio.HeapByteBuffer[pos=0 lim=1 cap=79], payload = java.nio.HeapByteBuffer[pos=0 lim=74 cap=74]),15941704,kafka.serializer.StringDecoder#74a96647,kafka.serializer.StringDecoder#42849d34,-1,CreateTime)]
java.lang.IllegalStateException: Data streamer has been closed.
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.enterBusy(DataStreamerImpl.java:401) ~[ignite-core-1.8.0.jar:1.8.0]
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.addDataInternal(DataStreamerImpl.java:613) ~[ignite-core-1.8.0.jar:1.8.0]
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.addData(DataStreamerImpl.java:667) ~[ignite-core-1.8.0.jar:1.8.0]
at org.apache.ignite.stream.kafka.KafkaStreamer$1.run(KafkaStreamer.java:180) ~[ignite-kafka-1.8.0.jar:1.8.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Thanks!
I think this happens because KafkaStreamer is getting closed right after it's started (kafkaStreamer.stop() call in finally block). kafkaStreamer.start() is not synchronous, it just spins out threads to consume from Kafka and exits.