Related
I get the below error message sometimes while using Spring Kafka Consumer .I have implemented at least once semantics as shown in the code snippet
1 )My doubt is do I miss any message from consuming?
2)Do i need to handle this error .As this error was not reported by seekToCurrentErrorHandler()
org.apache.kafka.clients.consumer.CommitFailedException: Offset commit
cannot be completed since the consumer is not part of an active group
for auto partition assignment; it is likely that the consumer was
kicked out of the group.
My spring kafka consumer code snippet
public class KafkaConsumerConfig implements KafkaListenerConfigurer
#Bean
public SeekToCurrentErrorHandler seekToCurrentErrorHandler() {
SeekToCurrentErrorHandler seekToCurrentErrorHandler = new SeekToCurrentErrorHandler((record, e) -> {
System.out.println("RECORD from topic " + record.topic() + " at partition " + record.partition()
+ " at offset " + record.offset() + " did not process correctly due to a " + e.getCause());
}, new FixedBackOff(500L, 3L));
return seekToCurrentErrorHandler;
}
#Bean
public ConsumerFactory<String, ValidatedConsumerClass> consumerFactory() {
ErrorHandlingDeserializer<ValidatedConsumerClass> errorHandlingDeserializer;
errorHandlingDeserializer = new ErrorHandlingDeserializer<>( new JsonDeserializer<>(ValidatedConsumerClass.class));
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "grpid-098");
props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
return new DefaultKafkaConsumerFactory<>(props, new StringDeserializer(),
errorHandlingDeserializer);
}
#Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, ValidatedConsumerClass>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, ValidatedConsumerClass> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(AckMode.RECORD);
factory.setErrorHandler(seekToCurrentErrorHandler());
return factory;
}
Consumer reading the message
#Service
public class KafKaConsumerService extends AbstractConsumerSeekAware {
#KafkaListener(id = "foo", topics = "mytopic-5", concurrency = "5", groupId = "mytopic-1-groupid")
public void consumeFromTopic1(#Payload #Valid ValidatedConsumerClass message, ConsumerRecordMetadata c) {
databaseService.save(message);
System.out.println( "-- Consumer End -- " + c.partition() + " ---consumer thread-- " + Thread.currentThread().getName());
}
No, you are not missing anything.
No, you do not need to handle it, the STCEH already handled it and the record will be redelivered on the next poll.
In this case, the exception is caused outside of record processing (after processing is complete). Since the commit failed due to a rebalance, there is no need for the STCEH to reseeek (and it can't anyway because the records are no longer available). It simply rethrows the exception.
Everything works as expected...
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.properties.max.poll.interval.ms=5000
#SpringBootApplication
public class So69016372Application {
public static void main(String[] args) {
SpringApplication.run(So69016372Application.class, args);
}
#KafkaListener(id = "so69016372", topics = "so69016372")
public void listen(String in, #Header(KafkaHeaders.OFFSET) long offset) throws InterruptedException {
System.out.println(in + " #" + offset);
Thread.sleep(6000);
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("so69016372").partitions(1).replicas(1).build();
}
}
Result
2021-09-01 13:47:26.963 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
2021-09-01 13:47:31.991 INFO 13195 --- [ad | so69016372] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Member consumer-so69016372-1-f02f8d74-c2b8-47d9-92d3-bf68e5c81a8f sending LeaveGroup request to coordinator localhost:9092 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2021-09-01 13:47:32.989 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Failing OffsetCommit request since the consumer is not part of an active group
2021-09-01 13:47:32.994 ERROR 13195 --- [o69016372-0-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:200) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:112) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1602) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1210) ~[spring-kafka-2.7.6.jar:2.7.6]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1139) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1004) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1495) ~[kafka-clients-2.7.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2710) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2705) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2691) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2489) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1235) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1161) ~[spring-kafka-2.7.6.jar:2.7.6]
... 3 common frames omitted
2021-09-01 13:47:32.994 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
2021-09-01 13:47:32.994 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Lost previously assigned partitions so69016372-0
2021-09-01 13:47:32.995 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions lost: [so69016372-0]
2021-09-01 13:47:32.995 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions revoked: [so69016372-0]
...
2021-09-01 13:47:33.102 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
2021-09-01 13:47:38.141 INFO 13195 --- [ad | so69016372] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Member consumer-so69016372-1-e6ec685a-d9aa-43d3-b526-b04418095f09 sending LeaveGroup request to coordinator localhost:9092 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2021-09-01 13:47:39.108 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Failing OffsetCommit request since the consumer is not part of an active group
2021-09-01 13:47:39.109 ERROR 13195 --- [o69016372-0-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:200) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:112) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1602) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1210) ~[spring-kafka-2.7.6.jar:2.7.6]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1139) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1004) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1495) ~[kafka-clients-2.7.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2710) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2705) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2691) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2489) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1235) ~[spring-kafka-2.7.6.jar:2.7.6]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1161) ~[spring-kafka-2.7.6.jar:2.7.6]
... 3 common frames omitted
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-so69016372-1, groupId=so69016372] Lost previously assigned partitions so69016372-0
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions lost: [so69016372-0]
2021-09-01 13:47:39.109 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions revoked: [so69016372-0]
...
2021-09-01 13:47:39.217 INFO 13195 --- [o69016372-0-C-1] o.s.k.l.KafkaMessageListenerContainer : so69016372: partitions assigned: [so69016372-0]
foo #0
It will retry indefinitely.
I'm using flume to get data from Kafka to HDFS. (Kafka Source and HDFS Sink). These are the versions I'm using.
hadoop-3.2.2
flume-1.9.0
kafka_2.11-0.10.1.0
This is my kafka-fluem-hdfs.conf:
a1.sources=r1 r2
a1.channels=c1 c2
a1.sinks=k1 k2
## source1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers=h01:9092,h02:9092,h03:9092
a1.sources.r1.kafka.topics=topic_start
## source2
a1.sources.r2.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r2.batchSize = 5000
a1.sources.r2.batchDurationMillis = 2000
a1.sources.r2.kafka.bootstrap.servers=h01:9092,h02:9092,h03:9092
a1.sources.r2.kafka.topics=topic_event
## channel1
a1.channels.c1.type = file
a1.channels.c1.checkpointDir=/usr/local/flume/checkpoint/behavior1
a1.channels.c1.dataDirs = /usr/local/flume/data/behavior1/
a1.channels.c1.keep-alive = 6
## channel2
a1.channels.c2.type = file
a1.channels.c2.checkpointDir=/usr/local/flume/checkpoint/behavior2
a1.channels.c2.dataDirs = /usr/local/flume/data/behavior2/
a1.channels.c2.keep-alive = 6
## sink1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /origin_data/gmall/log/topic_start/%Y-%m-%d
a1.sinks.k1.hdfs.filePrefix = logstart-
##sink2
a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path = /origin_data/gmall/log/topic_event/%Y-%m-%d
a1.sinks.k2.hdfs.filePrefix = logevent-
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k2.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.rollSize = 134217728
a1.sinks.k2.hdfs.rollCount = 0
a1.sinks.k1.hdfs.fileType = CompressedStream
a1.sinks.k2.hdfs.fileType = CompressedStream
a1.sinks.k1.hdfs.codeC = gzip
a1.sinks.k2.hdfs.codeC = gzip
#a1.sinks.k1.hdfs.codeC=com.hadoop.compression.lzo.LzopCodec
#a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.callTimeout=360000
#a1.sinks.k1.hdfs.maxIoWorkers=32
#a1.sinks.k1.hdfs.fileSuffix=.lzo
#a1.sinks.k2.hdfs.codeC=com.hadoop.compression.lzo.LzopCodec
#a1.sinks.k2.hdfs.writeFormat=Text
a1.sinks.k2.hdfs.callTimeout=360000
#a1.sinks.k2.hdfs.maxIoWorkers=32
#a1.sinks.k2.hdfs.fileSuffix=.lzo
a1.sources.r1.channels = c1
a1.sinks.k1.channel= c1
a1.sources.r2.channels = c2
a1.sinks.k2.channel= c2
The part of the log file:
2021-08-19 15:37:39,308 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp
2021-08-19 15:37:40,476 INFO [hdfs-k1-call-runner-0] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)) - Successfully loaded & initialized native-zlib library
2021-08-19 15:37:40,509 INFO [hdfs-k1-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,516 INFO [hdfs-k1-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:37:40,525 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:37:40,522 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp
2021-08-19 15:37:40,858 INFO [hdfs-k2-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,889 INFO [hdfs-k2-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas
my problems:
Create /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp, but there is no such file in hdfs.
More logs after I start flume:
....
....
....
2021-08-19 15:30:01,748 INFO [lifecycleSupervisor-1-0] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior1/checkpoint, elements to sync = 0
2021-08-19 15:30:01,754 INFO [lifecycleSupervisor-1-1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387001047, queueSize: 0, queueHead: 5765
2021-08-19 15:30:01,758 INFO [lifecycleSupervisor-1-0] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387001048, queueSize: 0, queueHead: 5778
2021-08-19 15:30:01,783 INFO [lifecycleSupervisor-1-0] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior1/log-26 position: 0 logWriteOrderID: 1629387001048
2021-08-19 15:30:01,783 INFO [lifecycleSupervisor-1-0] file.FileChannel (FileChannel.java:start(289)) - Queue Size after replay: 0 [channel=c1]
2021-08-19 15:30:01,784 INFO [lifecycleSupervisor-1-1] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior2/log-26 position: 0 logWriteOrderID: 1629387001047
2021-08-19 15:30:01,787 INFO [lifecycleSupervisor-1-1] file.FileChannel (FileChannel.java:start(289)) - Queue Size after replay: 0 [channel=c2]
2021-08-19 15:30:01,789 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(196)) - Starting Sink k1
2021-08-19 15:30:01,795 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
2021-08-19 15:30:01,795 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SINK, name: k1 started
2021-08-19 15:30:01,797 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(196)) - Starting Sink k2
2021-08-19 15:30:01,798 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(207)) - Starting Source r2
2021-08-19 15:30:01,799 INFO [lifecycleSupervisor-1-5] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SINK, name: k2: Successfully registered new MBean.
2021-08-19 15:30:01,803 INFO [lifecycleSupervisor-1-5] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SINK, name: k2 started
2021-08-19 15:30:01,799 INFO [lifecycleSupervisor-1-6] kafka.KafkaSource (KafkaSource.java:doStart(524)) - Starting org.apache.flume.source.kafka.KafkaSource{name:r2,state:IDLE}...
2021-08-19 15:30:01,815 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(207)) - Starting Source r1
2021-08-19 15:30:01,818 INFO [lifecycleSupervisor-1-0] kafka.KafkaSource (KafkaSource.java:doStart(524)) - Starting org.apache.flume.source.kafka.KafkaSource{name:r1,state:IDLE}...
2021-08-19 15:30:01,918 INFO [lifecycleSupervisor-1-6] consumer.ConsumerConfig (AbstractConfig.java:logAll(279)) - ConsumerConfig values:
......
.......
.......
2021-08-19 15:30:01,926 INFO [lifecycleSupervisor-1-0] consumer.ConsumerConfig (AbstractConfig.java:logAll(279)) - ConsumerConfig values:
.....
......
......
2021-08-19 15:30:02,210 INFO [lifecycleSupervisor-1-0] utils.AppInfoParser (AppInfoParser.java:<init>(109)) - Kafka version : 2.0.1
2021-08-19 15:30:02,210 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r2: Successfully registered new MBean.
2021-08-19 15:30:02,211 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r2 started
2021-08-19 15:30:02,210 INFO [lifecycleSupervisor-1-0] utils.AppInfoParser (AppInfoParser.java:<init>(110)) - Kafka commitId : fa14705e51bd2ce5
2021-08-19 15:30:02,213 INFO [lifecycleSupervisor-1-0] kafka.KafkaSource (KafkaSource.java:doStart(547)) - Kafka source r1 started.
2021-08-19 15:30:02,214 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
2021-08-19 15:30:02,214 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r1 started
2021-08-19 15:30:02,726 INFO [PollableSourceRunner-KafkaSource-r1] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:30:02,730 INFO [PollableSourceRunner-KafkaSource-r2] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:30:02,740 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-1, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:30:02,747 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-2, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:30:02,748 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-1, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:30:02,770 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-1, groupId=flume] (Re-)joining group
2021-08-19 15:30:02,776 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-2, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:30:02,776 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:30:02,845 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:30:02,935 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-1, groupId=flume] Successfully joined group with generation 66
2021-08-19 15:30:02,936 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-1, groupId=flume] Setting newly assigned partitions [topic_event-0]
2021-08-19 15:30:02,936 INFO [PollableSourceRunner-KafkaSource-r2] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_event - partition 0 assigned.
2021-08-19 15:30:02,950 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-2, groupId=flume] Successfully joined group with generation 66
2021-08-19 15:30:02,950 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-2, groupId=flume] Setting newly assigned partitions [topic_start-0]
2021-08-19 15:30:02,950 INFO [PollableSourceRunner-KafkaSource-r1] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_start - partition 0 assigned.
2021-08-19 15:30:04,912 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:30:04,912 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:30:04,984 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz.tmp
2021-08-19 15:30:06,577 INFO [hdfs-k2-call-runner-0] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)) - Successfully loaded & initialized native-zlib library
2021-08-19 15:30:06,606 INFO [hdfs-k2-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:30:06,648 INFO [hdfs-k2-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:30:06,665 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz.tmp
2021-08-19 15:30:06,675 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:30:06,916 INFO [hdfs-k1-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:30:06,927 INFO [hdfs-k1-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:30:06,931 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:30:16,676 INFO [hdfs-k2-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:30:16,676 INFO [hdfs-k2-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz.tmp
2021-08-19 15:30:16,682 INFO [hdfs-k2-call-runner-2] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz.tmp to /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz
2021-08-19 15:30:16,932 INFO [hdfs-k1-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:30:16,932 INFO [hdfs-k1-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz.tmp
2021-08-19 15:30:16,934 INFO [hdfs-k1-call-runner-2] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz.tmp to /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz
2021-08-19 15:30:30,932 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior2/checkpoint, elements to sync = 970
2021-08-19 15:30:30,936 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior1/checkpoint, elements to sync = 967
2021-08-19 15:30:30,951 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387004945, queueSize: 0, queueHead: 6733
2021-08-19 15:30:30,953 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387004946, queueSize: 0, queueHead: 6743
2021-08-19 15:30:30,963 INFO [Log-BackgroundWorker-c2] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior2/log-26 position: 1147366 logWriteOrderID: 1629387004945
2021-08-19 15:30:30,964 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-20
2021-08-19 15:30:30,967 INFO [Log-BackgroundWorker-c1] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior1/log-26 position: 487027 logWriteOrderID: 1629387004946
.....
.....
.....
2021-08-19 15:36:19,570 INFO [lifecycleSupervisor-1-8] utils.AppInfoParser (AppInfoParser.java:<init>(109)) - Kafka version : 2.0.1
2021-08-19 15:36:19,570 INFO [lifecycleSupervisor-1-8] utils.AppInfoParser (AppInfoParser.java:<init>(110)) - Kafka commitId : fa14705e51bd2ce5
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-6] utils.AppInfoParser (AppInfoParser.java:<init>(109)) - Kafka version : 2.0.1
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-6] utils.AppInfoParser (AppInfoParser.java:<init>(110)) - Kafka commitId : fa14705e51bd2ce5
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-8] kafka.KafkaSource (KafkaSource.java:doStart(547)) - Kafka source r1 started.
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-6] kafka.KafkaSource (KafkaSource.java:doStart(547)) - Kafka source r2 started.
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-8] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r2: Successfully registered new MBean.
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-8] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r1 started
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r2 started
2021-08-19 15:36:20,012 INFO [PollableSourceRunner-KafkaSource-r2] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:36:20,015 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-1, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:36:20,025 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-1, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:36:20,027 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-1, groupId=flume] (Re-)joining group
2021-08-19 15:36:20,030 INFO [PollableSourceRunner-KafkaSource-r1] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:36:20,034 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-2, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:36:20,039 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-2, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:36:20,039 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:36:20,068 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:36:20,152 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-1, groupId=flume] Successfully joined group with generation 69
2021-08-19 15:36:20,153 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-1, groupId=flume] Setting newly assigned partitions [topic_event-0]
2021-08-19 15:36:20,154 INFO [PollableSourceRunner-KafkaSource-r2] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_event - partition 0 assigned.
2021-08-19 15:36:20,159 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-2, groupId=flume] Successfully joined group with generation 69
2021-08-19 15:36:20,159 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-2, groupId=flume] Setting newly assigned partitions [topic_start-0]
2021-08-19 15:36:20,159 INFO [PollableSourceRunner-KafkaSource-r1] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_start - partition 0 assigned.
2021-08-19 15:37:39,286 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:37:39,286 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:37:39,308 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp
2021-08-19 15:37:40,476 INFO [hdfs-k1-call-runner-0] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)) - Successfully loaded & initialized native-zlib library
2021-08-19 15:37:40,509 INFO [hdfs-k1-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,516 INFO [hdfs-k1-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:37:40,525 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:37:40,522 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp
2021-08-19 15:37:40,858 INFO [hdfs-k2-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,889 INFO [hdfs-k2-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:37:40,889 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:37:48,532 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior2/checkpoint, elements to sync = 949
2021-08-19 15:37:48,533 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior1/checkpoint, elements to sync = 1002
2021-08-19 15:37:48,562 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387382580, queueSize: 0, queueHead: 7743
2021-08-19 15:37:48,562 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387382581, queueSize: 0, queueHead: 7680
2021-08-19 15:37:48,570 INFO [Log-BackgroundWorker-c1] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior1/log-27 position: 504908 logWriteOrderID: 1629387382580
2021-08-19 15:37:48,571 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-21
2021-08-19 15:37:48,578 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-22
2021-08-19 15:37:48,578 INFO [Log-BackgroundWorker-c2] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior2/log-27 position: 1144058 logWriteOrderID: 1629387382581
2021-08-19 15:37:48,581 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-20
2021-08-19 15:37:48,585 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-23
2021-08-19 15:37:48,587 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-21
2021-08-19 15:37:48,591 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-24
2021-08-19 15:37:48,593 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-22
2021-08-19 15:37:48,597 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-25
2021-08-19 15:37:48,600 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-23
2021-08-19 15:37:48,606 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-24
2021-08-19 15:37:48,612 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-25
2021-08-19 15:37:50,526 INFO [hdfs-k1-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:37:50,526 INFO [hdfs-k1-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp
2021-08-19 15:37:50,694 INFO [hdfs-k1-call-runner-6] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp to /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz
2021-08-19 15:37:50,890 INFO [hdfs-k2-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:37:50,890 INFO [hdfs-k2-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp
2021-08-19 15:37:50,893 INFO [hdfs-k2-call-runner-3] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp to /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz
.......
.......
.......
Im trying to test my exactly once configuration to make sure all the configs i set are correct and the behavior is as i expect
I seem to encounter a problem with duplicate sends
public static void main(String[] args) {
MessageProducer producer = new ProducerBuilder()
.setBootstrapServers("kafka:9992")
.setKeySerializerClass(StringSerializer.class)
.setValueSerializerClass(StringSerializer.class)
.setProducerEnableIdempotence(true).build();
MessageConsumer consumer = new ConsumerBuilder()
.setBootstrapServers("kafka:9992")
.setIsolationLevel("read_committed")
.setTopics("someTopic2")
.setGroupId("bla")
.setKeyDeserializerClass(StringDeserializer.class)
.setValueDeserializerClass(MapDeserializer.class)
.setConsumerMessageLogic(new ConsumerMessageLogic() {
#Override
public void onMessage(ConsumerRecord cr, Acknowledgment acknowledgment) {
producer.sendMessage(new TopicPartition("someTopic2", cr.partition()),
new OffsetAndMetadata(cr.offset() + 1),"something1", "im in transaction", cr.key());
acknowledgment.acknowledge();
}
}).build();
consumer.start();
}
this is my "test", you can assume the builder puts the right configuration.
ConsumerMessageLogic is a class that handles the "process" part of the read-process-write that the exactly once semantic is supporting
inside the producer class i have a send message method like so:
public void sendMessage(TopicPartition topicPartition, OffsetAndMetadata offsetAndMetadata,String sendToTopic, V message, PK partitionKey) {
try {
KafkaRecord<PK, V> partitionAndMessagePair = producerMessageLogic.prepareMessage(topicPartition.topic(), partitionKey, message);
if(kafkaTemplate.getProducerFactory().transactionCapable()){
kafkaTemplate.executeInTransaction(operations -> {
sendMessage(message, partitionKey, sendToTopic, partitionAndMessagePair, operations);
operations.sendOffsetsToTransaction(
Map.of(topicPartition, offsetAndMetadata),"bla");
return true;
});
}else{
sendMessage(message, partitionKey, topicPartition.topic(), partitionAndMessagePair, kafkaTemplate);
}
}catch (Exception e){
failureHandler.onFailure(partitionKey, message, e);
}
}
I create my consumer like so:
/**
* Start the message consumer
* The record event will be delegate on the onMessage()
*/
public void start() {
initConsumerMessageListenerContainer();
container.start();
}
/**
* Initialize the kafka message listener
*/
private void initConsumerMessageListenerContainer() {
// start a acknowledge message listener to allow the manual commit
messageListener = consumerMessageLogic::onMessage;
// start and initialize the consumer container
container = initContainer(messageListener);
// sets the number of consumers, the topic partitions will be divided by the consumers
container.setConcurrency(springConcurrency);
springContainerPollTimeoutOpt.ifPresent(p -> container.getContainerProperties().setPollTimeout(p));
if (springAckMode != null) {
container.getContainerProperties().setAckMode(springAckMode);
}
}
private ConcurrentMessageListenerContainer<PK, V> initContainer(AcknowledgingMessageListener<PK, V> messageListener) {
return new ConcurrentMessageListenerContainer<>(
consumerFactory(props),
containerProperties(messageListener));
}
when i create my producer i create it with UUID as transaction prefix like so
public ProducerFactory<PK, V> producerFactory(boolean isTransactional) {
ProducerFactory<PK, V> res = new DefaultKafkaProducerFactory<>(props);
if(isTransactional){
((DefaultKafkaProducerFactory<PK, V>) res).setTransactionIdPrefix(UUID.randomUUID().toString());
((DefaultKafkaProducerFactory<PK, V>) res).setProducerPerConsumerPartition(true);
}
return res;
}
Now after everything is set up, i bring 2 instances up on a topic with 2 partitions
each instance get 1 partitions from the consumed topic.
i send a message and wait in debug for the transaction timeout ( to simulate loss of connection)
in instance A, once the timeout passes the other instance( instance B) automatically processes the record and send it to the target topic cause a re-balance occurred
So far so good.
Now when i release the break point on instance A, it says its re-balancing and couldn't commit, but i still see another output record in my destination topic.
My expectation was that instance A wont continue its work once i release the breakpoint as the record was already processed.
Am i doing something wrong?
Can this scenario be achieved?
edit 2:
after garys remarks about the execute in transaction, i get the duplicate record if i freeze one of the instances till the timeout and release it after the other instance processed the record, then the freezed instance process and produce the same record to the out put topic...
public static void main(String[] args) {
MessageProducer producer = new ProducerBuilder()
.setBootstrapServers("kafka:9992")
.setKeySerializerClass(StringSerializer.class)
.setValueSerializerClass(StringSerializer.class)
.setProducerEnableIdempotence(true).build();
MessageConsumer consumer = new ConsumerBuilder()
.setBootstrapServers("kafka:9992")
.setIsolationLevel("read_committed")
.setTopics("someTopic2")
.setGroupId("bla")
.setKeyDeserializerClass(StringDeserializer.class)
.setValueDeserializerClass(MapDeserializer.class)
.setConsumerMessageLogic(new ConsumerMessageLogic() {
#Override
public void onMessage(ConsumerRecord cr, Acknowledgment acknowledgment) {
producer.sendMessage("something1", "im in transaction");
}
}).build();
consumer.start(producer.getProducerFactory());
}
the new sendMessage method in the producer without executeInTransaction
public void sendMessage(V message, PK partitionKey, String topicName) {
try {
KafkaRecord<PK, V> partitionAndMessagePair = producerMessageLogic.prepareMessage(topicName, partitionKey, message);
sendMessage(message, partitionKey, topicName, partitionAndMessagePair, kafkaTemplate);
}catch (Exception e){
failureHandler.onFailure(partitionKey, message, e);
}
}
as well as i changed the consumer container creation to have a transaction manager with the same producerfactory as suggested
/**
* Initialize the kafka message listener
*/
private void initConsumerMessageListenerContainer(ProducerFactory<PK,V> producerFactory) {
// start a acknowledge message listener to allow the manual commit
acknowledgingMessageListener = consumerMessageLogic::onMessage;
// start and initialize the consumer container
container = initContainer(acknowledgingMessageListener, producerFactory);
// sets the number of consumers, the topic partitions will be divided by the consumers
container.setConcurrency(springConcurrency);
springContainerPollTimeoutOpt.ifPresent(p -> container.getContainerProperties().setPollTimeout(p));
if (springAckMode != null) {
container.getContainerProperties().setAckMode(springAckMode);
}
}
private ConcurrentMessageListenerContainer<PK, V> initContainer(AcknowledgingMessageListener<PK, V> messageListener, ProducerFactory<PK,V> producerFactory) {
return new ConcurrentMessageListenerContainer<>(
consumerFactory(props),
containerProperties(messageListener, producerFactory));
}
#NonNull
private ContainerProperties containerProperties(MessageListener<PK, V> messageListener, ProducerFactory<PK,V> producerFactory) {
ContainerProperties containerProperties = new ContainerProperties(topics);
containerProperties.setMessageListener(messageListener);
containerProperties.setTransactionManager(new KafkaTransactionManager<>(producerFactory));
return containerProperties;
}
my expectation is that the broker once receiving the processed record from the freezed instance, that it'll know that that record was already handled by another instance as it contains the exact same metadata ( or is it? i mean, the PID will be different, but should it be different?)
Maybe the scenario im looking for is not even supported in the current exactly once support kafka and spring provides...
if i have 2 instances of read-process-write - that means i have 2 producers with 2 different PID's.
Now when i freeze one of the instances, when the unfrozen instance gets the record process responsibility due to a rebalance, it will send the record with its own PID and a sequence in the metadata.
Now when i release the frozen instance, he sends the same record but with its own PID, so theres no way the broker will know its a duplicate...
Am i wrong? how can i avoid this scenario? i though the re-balance stops the instance and doesnt let it complete its process ( where he produce the duplicate record) cause he no longer has responsibility about that record
Adding the logs:
frozen instance: you can see the freeze time at 10:53:34 and i released it at 10:54:02 ( rebalance time is 10 secs)
2020-06-16 10:53:34,393 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
Created new Producer: CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
2020-06-16 10:53:34,394 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
beginTransaction()
2020-06-16 10:53:34,395 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.doBegin:149] Created
Kafka transaction on producer [CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]]
2020-06-16 10:54:02,157 INFO [${sys:spring.application.name}] [kafka-
coordinator-heartbeat-thread | bla]
[o.a.k.c.c.i.AbstractCoordinator.:] [Consumer clientId=consumer-bla-1,
groupId=bla] Group coordinator X.X.X.X:9992 (id: 2147482646 rack:
null) is unavailable or invalid, will attempt rediscovery
2020-06-16 10:54:02,181 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Sending offsets to transaction: {someTopic2-
0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
2020-06-16 10:54:02,189 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] [i.i.k.s.p.SimpleSuccessHandler.:] Sent
message=[im in transaction] with offset=[252] to topic something1
2020-06-16 10:54:02,193 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] [o.a.k.c.p.i.TransactionManager.:]
[Producer clientId=producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0, transactionalId=b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] Discovered group coordinator
X.X.X.X:9992 (id: 1001 rack: null)
2020-06-16 10:54:02,263 INFO [${sys:spring.application.name}] [kafka-
coordinator-heartbeat-thread | bla]
[o.a.k.c.c.i.AbstractCoordinator.:] [Consumer clientId=consumer-bla-1,
groupId=bla] Discovered group coordinator 192.168.144.1:9992 (id:
2147482646 rack: null)
2020-06-16 10:54:02,295 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.processCommit:740]
Initiating transaction commit
2020-06-16 10:54:02,296 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
commitTransaction()
2020-06-16 10:54:02,299 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
2020-06-16 10:54:02,301 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.AbstractCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Attempt to heartbeat failed for
since member id consumer-bla-1-b3ad1c09-ad06-4bc4-a891-47a2288a830f is
not valid.
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.ConsumerCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Giving away all assigned
partitions as lost since generation has been reset,indicating that
consumer is no longer part of the group
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.ConsumerCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Lost previously assigned
partitions someTopic2-0
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.l.ConcurrentMessageListenerContainer.info:279]
bla: partitions lost: [someTopic2-0]
2020-06-16 10:54:02,303 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.l.ConcurrentMessageListenerContainer.info:279]
bla: partitions revoked: [someTopic2-0]
2020-06-16 10:54:02,303 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
The regular instance that takes over the partation and produce the record after a rebalance
2020-06-16 10:53:46,536 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
Created new Producer: CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
2020-06-16 10:53:46,537 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
beginTransaction()
2020-06-16 10:53:46,539 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.doBegin:149] Created
Kafka transaction on producer [CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]]
2020-06-16 10:53:46,556 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Sending offsets to transaction: {someTopic2-
0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
2020-06-16 10:53:46,563 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] [i.i.k.s.p.SimpleSuccessHandler.:] Sent
message=[im in transaction] with offset=[250] to topic something1
2020-06-16 10:53:46,566 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] [o.a.k.c.p.i.TransactionManager.:]
[Producer clientId=producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0, transactionalId=1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] Discovered group coordinator
X.X.X.X:9992 (id: 1001 rack: null)
2020-06-16 10:53:46,668 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.processCommit:740]
Initiating transaction commit
2020-06-16 10:53:46,669 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
commitTransaction()
2020-06-16 10:53:46,672 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
2020-06-16 10:53:51,673 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Received: 0 records
I noticed they both note the exact same offset to commit
Sending offsets to transaction: {someTopic2-0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
i thought when they try to commit the exact same thing the broker will abort one of the transactions...
I also noticed that if i reduce the transaction.timeout.ms to just 2 seconds, it doesnt abort the transaction no matter how long i freeze the instance on debug...
maybe the timer of transaction.timeout.ms starts only after i send the message?
You must not use executeInTransaction at all - see its Javadocs; it is used when there is no active transaction or if you explicitly don't want an operation to participate in an existing transaction.
You need to add a KafkaTransactionManager to the listener container; it must have a reference to same ProducerFactory as the template.
Then, the container will start the transaction and, if successful, send the offset to the transaction.
spring-kafka consumer stops consuming messages after a while. The stoppage happens every time, but never at the same duration. When app is no longer consuming, in the end of the log always I see the statement that consumer sent LEAVE_GROUP signal. If I am not seeing any errors or exceptions, why is the consumer leaving the group?
org.springframework.boot:spring-boot-starter-parent:2.0.4.RELEASE
spring-kafka:2.1.8.RELEASE
org.apache.kafka:kafka-clients:1.0.2
I've set logging as
logging.level.org.apache.kafka=DEBUG
logging.level.org.springframework.kafka=INFO
other settings
spring.kafka.listener.concurrency=5
spring.kafka.listener.type=single
spring.kafka.listener.ack-mode=record
spring.kafka.listener.poll-timeout=10000
spring.kafka.consumer.heartbeat-interval=5000
spring.kafka.consumer.max-poll-records=50
spring.kafka.consumer.fetch-max-wait=10000
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.properties.security.protocol=SSL
spring.kafka.consumer.retry.maxAttempts=3
spring.kafka.consumer.retry.backoffperiod.millisecs=2000
ContainerFactory setup
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> recordsKafkaListenerContainerFactory(RetryTemplate retryTemplate) {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
factory.setConcurrency(listenerCount);
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.RECORD);
factory.getContainerProperties().setPollTimeout(pollTimeoutMillis);
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
factory.getContainerProperties().setAckOnError(false);
factory.setRetryTemplate(retryTemplate);
factory.setStatefulRetry(true);
factory.getContainerProperties().setIdleEventInterval(60000L);
return factory;
}
Listener configuration
#Component
public class RecordsEventListener implements ConsumerSeekAware {
private static final org.slf4j.Logger LOG = org.slf4j.LoggerFactory.getLogger(RecordsEventListener.class);
#Value("${mode.replay:false}")
public void setModeReplay(boolean enabled) {
this.isReplay = enabled;
}
#KafkaListener(topics = "${event.topic}", containerFactory = "RecordsKafkaListenerContainerFactory")
public void handleEvent(#Payload String payload) throws RecordsEventListenerException {
try {
//business logic
} catch (Exception e) {
LOG.error("Process error for event: {}",payload,e);
if(isRetryableException(e)) {
LOG.warn("Retryable exception detected. Going to retry.");
throw new RecordsEventListenerException(e);
}else{
LOG.warn("Dropping event because non retryable exception");
}
}
}
private Boolean isRetryableException(Exception e) {
return binaryExceptionClassifier.classify(e);
}
#Override
public void registerSeekCallback(ConsumerSeekCallback callback) {
//do nothing
}
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
//do this only once per start of app
if (isReplay && !partitonSeekToBeginningDone) {
assignments.forEach((t, p) -> callback.seekToBeginning(t.topic(), t.partition()));
partitonSeekToBeginningDone = true;
}
}
#Override
public void onIdleContainer(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
//do nothing
LOG.info("Container is IDLE; no messages to pull.");
assignments.forEach((t,p)->LOG.info("Topic:{}, Partition:{}, Offset:{}",t.topic(),t.partition(),p));
}
boolean isPartitionSeekToBeginningDone() {
return partitonSeekToBeginningDone;
}
void setPartitonSeekToBeginningDone(boolean partitonSeekToBeginningDone) {
this.partitonSeekToBeginningDone = partitonSeekToBeginningDone;
}
}
When app is no longer consuming, in the end of the log always I see the statement that consumer sent LEAVE_GROUP signal.
2019-05-02 18:31:05.770 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending Heartbeat request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:05.770 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send HEARTBEAT {group_id=app,generation_id=6,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5347 to node 2147482638
2019-05-02 18:31:05.872 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Received successful Heartbeat response
2019-05-02 18:31:10.856 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending Heartbeat request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:10.857 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send HEARTBEAT {group_id=app,generation_id=6,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5348 to node 2147482638
2019-05-02 18:31:10.958 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Received successful Heartbeat response
2019-05-02 18:31:11.767 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending LeaveGroup request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:11.767 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send LEAVE_GROUP {group_id=app,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5349 to node 2147482638
2019-05-02 18:31:11.768 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Disabling heartbeat thread
Full log
Thanks to all who replied. Turns out, it was indeed the broker dropping the consumer on session timeout. The broker a very old version (0.10.0.1) did not accommodate the newer features as outlined in KIP-62 that spring-kafka version we used could make use of.
Since we could not dictate the upgrade to the broker or changes to session timeout, we simply modified our processing logic so as to finish the task under the session timeout.
I am planning a Spring+Kafka Streams application that handles incoming messages and stores updated internal state as a result of these messages.
This state is predicted to reach ~500mb per unique key (There are likely to be ~10k unique keys distributed across 2k partitions).
This state must generally be held in-memory for effective operation of my application but even on disk I would still face a similar problem (albeit just at a later date of scaling).
I am planning to deploy this application into a dynamically scaling environment such as AWS and will set a minimum number of instances, but I am wary of 2 situations:
On first startup (where perhaps just 1 consumer starts first) it will not be able to handle taking assignment of all the partitions because the in memory state will overflow the instances available memory.
After a major outtage (AWS availability zone outtage) it could be that 33% of consumers are taken out of the group and the additional memory load on the remaining instances could actually take out everyone who remains.
How do people protect their consumers from taking on more partitions than they can handle such that they do not overflow available memory/disk?
See the kafka documentation.
Since 0.11...
EDIT
For your second use case (and it also works for the first), perhaps you could implement a custom PartitionAssignor that limits the number of partitions assigned to each instance.
I haven't tried it; I don't know how the broker will react to the presence of unassigned partitions.
EDIT2
This seems to work ok; but YMMV...
public class NoMoreThanFiveAssignor extends RoundRobinAssignor {
#Override
public Map<String, List<TopicPartition>> assign(Map<String, Integer> partitionsPerTopic,
Map<String, Subscription> subscriptions) {
Map<String, List<TopicPartition>> assignments = super.assign(partitionsPerTopic, subscriptions);
assignments.forEach((memberId, assigned) -> {
if (assigned.size() > 5) {
System.out.println("Reducing assignments from " + assigned.size() + " to 5 for " + memberId);
assignments.put(memberId,
assigned.stream()
.limit(5)
.collect(Collectors.toList()));
}
});
return assignments;
}
}
and
#SpringBootApplication
public class So54072362Application {
public static void main(String[] args) {
SpringApplication.run(So54072362Application.class, args);
}
#Bean
public NewTopic topic() {
return new NewTopic("so54072362", 15, (short) 1);
}
#KafkaListener(id = "so54072362", topics = "so54072362")
public void listen(ConsumerRecord<?, ?> record) {
System.out.println(record);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template) {
return args -> {
for (int i = 0; i < 15; i++) {
template.send("so54072362", i, "foo", "bar");
}
};
}
}
and
spring.kafka.consumer.properties.partition.assignment.strategy=com.example.NoMoreThanFiveAssignor
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.auto-offset-reset=earliest
and
Reducing assignments from 15 to 5 for consumer-2-f37221f8-70bb-421d-9faf-6591cc26a76a
2019-01-07 15:24:28.288 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 7
2019-01-07 15:24:28.289 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:28.296 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Attempt to heartbeat failed since group is rebalancing
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Revoking previously assigned partitions [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked: [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.304 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] (Re-)joining group
Reducing assignments from 8 to 5 for consumer-2-c9a6928a-520c-4646-9dd9-4da14636744b
Reducing assignments from 7 to 5 for consumer-2-f37221f8-70bb-421d-9faf-6591cc26a76a
2019-01-07 15:24:46.310 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 8
2019-01-07 15:24:46.311 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:46.315 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Attempt to heartbeat failed since group is rebalancing
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Revoking previously assigned partitions [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked: [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] (Re-)joining group
2019-01-07 15:24:58.330 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 9
2019-01-07 15:24:58.332 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-14, so54072362-11, so54072362-5, so54072362-8, so54072362-2]
2019-01-07 15:24:58.336 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-14, so54072362-11, so54072362-5, so54072362-8, so54072362-2]
Of course, this leaves the unassigned partitions dangling, but it sounds like that's what you want, until the region comes back online.