Create /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp, but there is no such file in hdfs - apache-kafka

I'm using flume to get data from Kafka to HDFS. (Kafka Source and HDFS Sink). These are the versions I'm using.
hadoop-3.2.2
flume-1.9.0
kafka_2.11-0.10.1.0
This is my kafka-fluem-hdfs.conf:
a1.sources=r1 r2
a1.channels=c1 c2
a1.sinks=k1 k2
## source1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers=h01:9092,h02:9092,h03:9092
a1.sources.r1.kafka.topics=topic_start
## source2
a1.sources.r2.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r2.batchSize = 5000
a1.sources.r2.batchDurationMillis = 2000
a1.sources.r2.kafka.bootstrap.servers=h01:9092,h02:9092,h03:9092
a1.sources.r2.kafka.topics=topic_event
## channel1
a1.channels.c1.type = file
a1.channels.c1.checkpointDir=/usr/local/flume/checkpoint/behavior1
a1.channels.c1.dataDirs = /usr/local/flume/data/behavior1/
a1.channels.c1.keep-alive = 6
## channel2
a1.channels.c2.type = file
a1.channels.c2.checkpointDir=/usr/local/flume/checkpoint/behavior2
a1.channels.c2.dataDirs = /usr/local/flume/data/behavior2/
a1.channels.c2.keep-alive = 6
## sink1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /origin_data/gmall/log/topic_start/%Y-%m-%d
a1.sinks.k1.hdfs.filePrefix = logstart-
##sink2
a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path = /origin_data/gmall/log/topic_event/%Y-%m-%d
a1.sinks.k2.hdfs.filePrefix = logevent-
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k2.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.rollSize = 134217728
a1.sinks.k2.hdfs.rollCount = 0
a1.sinks.k1.hdfs.fileType = CompressedStream
a1.sinks.k2.hdfs.fileType = CompressedStream
a1.sinks.k1.hdfs.codeC = gzip
a1.sinks.k2.hdfs.codeC = gzip
#a1.sinks.k1.hdfs.codeC=com.hadoop.compression.lzo.LzopCodec
#a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.callTimeout=360000
#a1.sinks.k1.hdfs.maxIoWorkers=32
#a1.sinks.k1.hdfs.fileSuffix=.lzo
#a1.sinks.k2.hdfs.codeC=com.hadoop.compression.lzo.LzopCodec
#a1.sinks.k2.hdfs.writeFormat=Text
a1.sinks.k2.hdfs.callTimeout=360000
#a1.sinks.k2.hdfs.maxIoWorkers=32
#a1.sinks.k2.hdfs.fileSuffix=.lzo
a1.sources.r1.channels = c1
a1.sinks.k1.channel= c1
a1.sources.r2.channels = c2
a1.sinks.k2.channel= c2
The part of the log file:
2021-08-19 15:37:39,308 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp
2021-08-19 15:37:40,476 INFO [hdfs-k1-call-runner-0] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)) - Successfully loaded & initialized native-zlib library
2021-08-19 15:37:40,509 INFO [hdfs-k1-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,516 INFO [hdfs-k1-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:37:40,525 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:37:40,522 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp
2021-08-19 15:37:40,858 INFO [hdfs-k2-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,889 INFO [hdfs-k2-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas
my problems:
Create /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp, but there is no such file in hdfs.
More logs after I start flume:
....
....
....
2021-08-19 15:30:01,748 INFO [lifecycleSupervisor-1-0] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior1/checkpoint, elements to sync = 0
2021-08-19 15:30:01,754 INFO [lifecycleSupervisor-1-1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387001047, queueSize: 0, queueHead: 5765
2021-08-19 15:30:01,758 INFO [lifecycleSupervisor-1-0] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387001048, queueSize: 0, queueHead: 5778
2021-08-19 15:30:01,783 INFO [lifecycleSupervisor-1-0] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior1/log-26 position: 0 logWriteOrderID: 1629387001048
2021-08-19 15:30:01,783 INFO [lifecycleSupervisor-1-0] file.FileChannel (FileChannel.java:start(289)) - Queue Size after replay: 0 [channel=c1]
2021-08-19 15:30:01,784 INFO [lifecycleSupervisor-1-1] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior2/log-26 position: 0 logWriteOrderID: 1629387001047
2021-08-19 15:30:01,787 INFO [lifecycleSupervisor-1-1] file.FileChannel (FileChannel.java:start(289)) - Queue Size after replay: 0 [channel=c2]
2021-08-19 15:30:01,789 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(196)) - Starting Sink k1
2021-08-19 15:30:01,795 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
2021-08-19 15:30:01,795 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SINK, name: k1 started
2021-08-19 15:30:01,797 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(196)) - Starting Sink k2
2021-08-19 15:30:01,798 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(207)) - Starting Source r2
2021-08-19 15:30:01,799 INFO [lifecycleSupervisor-1-5] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SINK, name: k2: Successfully registered new MBean.
2021-08-19 15:30:01,803 INFO [lifecycleSupervisor-1-5] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SINK, name: k2 started
2021-08-19 15:30:01,799 INFO [lifecycleSupervisor-1-6] kafka.KafkaSource (KafkaSource.java:doStart(524)) - Starting org.apache.flume.source.kafka.KafkaSource{name:r2,state:IDLE}...
2021-08-19 15:30:01,815 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(207)) - Starting Source r1
2021-08-19 15:30:01,818 INFO [lifecycleSupervisor-1-0] kafka.KafkaSource (KafkaSource.java:doStart(524)) - Starting org.apache.flume.source.kafka.KafkaSource{name:r1,state:IDLE}...
2021-08-19 15:30:01,918 INFO [lifecycleSupervisor-1-6] consumer.ConsumerConfig (AbstractConfig.java:logAll(279)) - ConsumerConfig values:
......
.......
.......
2021-08-19 15:30:01,926 INFO [lifecycleSupervisor-1-0] consumer.ConsumerConfig (AbstractConfig.java:logAll(279)) - ConsumerConfig values:
.....
......
......
2021-08-19 15:30:02,210 INFO [lifecycleSupervisor-1-0] utils.AppInfoParser (AppInfoParser.java:<init>(109)) - Kafka version : 2.0.1
2021-08-19 15:30:02,210 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r2: Successfully registered new MBean.
2021-08-19 15:30:02,211 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r2 started
2021-08-19 15:30:02,210 INFO [lifecycleSupervisor-1-0] utils.AppInfoParser (AppInfoParser.java:<init>(110)) - Kafka commitId : fa14705e51bd2ce5
2021-08-19 15:30:02,213 INFO [lifecycleSupervisor-1-0] kafka.KafkaSource (KafkaSource.java:doStart(547)) - Kafka source r1 started.
2021-08-19 15:30:02,214 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
2021-08-19 15:30:02,214 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r1 started
2021-08-19 15:30:02,726 INFO [PollableSourceRunner-KafkaSource-r1] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:30:02,730 INFO [PollableSourceRunner-KafkaSource-r2] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:30:02,740 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-1, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:30:02,747 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-2, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:30:02,748 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-1, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:30:02,770 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-1, groupId=flume] (Re-)joining group
2021-08-19 15:30:02,776 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-2, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:30:02,776 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:30:02,845 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:30:02,935 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-1, groupId=flume] Successfully joined group with generation 66
2021-08-19 15:30:02,936 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-1, groupId=flume] Setting newly assigned partitions [topic_event-0]
2021-08-19 15:30:02,936 INFO [PollableSourceRunner-KafkaSource-r2] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_event - partition 0 assigned.
2021-08-19 15:30:02,950 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-2, groupId=flume] Successfully joined group with generation 66
2021-08-19 15:30:02,950 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-2, groupId=flume] Setting newly assigned partitions [topic_start-0]
2021-08-19 15:30:02,950 INFO [PollableSourceRunner-KafkaSource-r1] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_start - partition 0 assigned.
2021-08-19 15:30:04,912 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:30:04,912 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:30:04,984 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz.tmp
2021-08-19 15:30:06,577 INFO [hdfs-k2-call-runner-0] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)) - Successfully loaded & initialized native-zlib library
2021-08-19 15:30:06,606 INFO [hdfs-k2-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:30:06,648 INFO [hdfs-k2-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:30:06,665 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz.tmp
2021-08-19 15:30:06,675 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:30:06,916 INFO [hdfs-k1-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:30:06,927 INFO [hdfs-k1-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:30:06,931 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:30:16,676 INFO [hdfs-k2-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:30:16,676 INFO [hdfs-k2-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz.tmp
2021-08-19 15:30:16,682 INFO [hdfs-k2-call-runner-2] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz.tmp to /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz
2021-08-19 15:30:16,932 INFO [hdfs-k1-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:30:16,932 INFO [hdfs-k1-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz.tmp
2021-08-19 15:30:16,934 INFO [hdfs-k1-call-runner-2] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz.tmp to /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz
2021-08-19 15:30:30,932 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior2/checkpoint, elements to sync = 970
2021-08-19 15:30:30,936 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior1/checkpoint, elements to sync = 967
2021-08-19 15:30:30,951 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387004945, queueSize: 0, queueHead: 6733
2021-08-19 15:30:30,953 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387004946, queueSize: 0, queueHead: 6743
2021-08-19 15:30:30,963 INFO [Log-BackgroundWorker-c2] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior2/log-26 position: 1147366 logWriteOrderID: 1629387004945
2021-08-19 15:30:30,964 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-20
2021-08-19 15:30:30,967 INFO [Log-BackgroundWorker-c1] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior1/log-26 position: 487027 logWriteOrderID: 1629387004946
.....
.....
.....
2021-08-19 15:36:19,570 INFO [lifecycleSupervisor-1-8] utils.AppInfoParser (AppInfoParser.java:<init>(109)) - Kafka version : 2.0.1
2021-08-19 15:36:19,570 INFO [lifecycleSupervisor-1-8] utils.AppInfoParser (AppInfoParser.java:<init>(110)) - Kafka commitId : fa14705e51bd2ce5
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-6] utils.AppInfoParser (AppInfoParser.java:<init>(109)) - Kafka version : 2.0.1
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-6] utils.AppInfoParser (AppInfoParser.java:<init>(110)) - Kafka commitId : fa14705e51bd2ce5
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-8] kafka.KafkaSource (KafkaSource.java:doStart(547)) - Kafka source r1 started.
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-6] kafka.KafkaSource (KafkaSource.java:doStart(547)) - Kafka source r2 started.
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-8] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r2: Successfully registered new MBean.
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-8] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r1 started
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r2 started
2021-08-19 15:36:20,012 INFO [PollableSourceRunner-KafkaSource-r2] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:36:20,015 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-1, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:36:20,025 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-1, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:36:20,027 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-1, groupId=flume] (Re-)joining group
2021-08-19 15:36:20,030 INFO [PollableSourceRunner-KafkaSource-r1] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:36:20,034 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-2, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:36:20,039 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-2, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:36:20,039 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:36:20,068 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:36:20,152 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-1, groupId=flume] Successfully joined group with generation 69
2021-08-19 15:36:20,153 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-1, groupId=flume] Setting newly assigned partitions [topic_event-0]
2021-08-19 15:36:20,154 INFO [PollableSourceRunner-KafkaSource-r2] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_event - partition 0 assigned.
2021-08-19 15:36:20,159 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-2, groupId=flume] Successfully joined group with generation 69
2021-08-19 15:36:20,159 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-2, groupId=flume] Setting newly assigned partitions [topic_start-0]
2021-08-19 15:36:20,159 INFO [PollableSourceRunner-KafkaSource-r1] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_start - partition 0 assigned.
2021-08-19 15:37:39,286 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:37:39,286 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:37:39,308 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp
2021-08-19 15:37:40,476 INFO [hdfs-k1-call-runner-0] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)) - Successfully loaded & initialized native-zlib library
2021-08-19 15:37:40,509 INFO [hdfs-k1-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,516 INFO [hdfs-k1-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:37:40,525 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:37:40,522 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp
2021-08-19 15:37:40,858 INFO [hdfs-k2-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,889 INFO [hdfs-k2-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:37:40,889 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:37:48,532 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior2/checkpoint, elements to sync = 949
2021-08-19 15:37:48,533 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior1/checkpoint, elements to sync = 1002
2021-08-19 15:37:48,562 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387382580, queueSize: 0, queueHead: 7743
2021-08-19 15:37:48,562 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387382581, queueSize: 0, queueHead: 7680
2021-08-19 15:37:48,570 INFO [Log-BackgroundWorker-c1] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior1/log-27 position: 504908 logWriteOrderID: 1629387382580
2021-08-19 15:37:48,571 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-21
2021-08-19 15:37:48,578 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-22
2021-08-19 15:37:48,578 INFO [Log-BackgroundWorker-c2] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior2/log-27 position: 1144058 logWriteOrderID: 1629387382581
2021-08-19 15:37:48,581 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-20
2021-08-19 15:37:48,585 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-23
2021-08-19 15:37:48,587 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-21
2021-08-19 15:37:48,591 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-24
2021-08-19 15:37:48,593 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-22
2021-08-19 15:37:48,597 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-25
2021-08-19 15:37:48,600 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-23
2021-08-19 15:37:48,606 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-24
2021-08-19 15:37:48,612 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-25
2021-08-19 15:37:50,526 INFO [hdfs-k1-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:37:50,526 INFO [hdfs-k1-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp
2021-08-19 15:37:50,694 INFO [hdfs-k1-call-runner-6] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp to /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz
2021-08-19 15:37:50,890 INFO [hdfs-k2-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:37:50,890 INFO [hdfs-k2-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp
2021-08-19 15:37:50,893 INFO [hdfs-k2-call-runner-3] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp to /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz
.......
.......
.......

Related

Stateful Kafka Stream process is losing state when moving tasks from one pod to another during the rebalancing process

Stateful Kafka Stream process is losing state when moving tasks from one pod to another during the rebalancing process.
When killing the pod, it restarts and the task stays assigned to the same pod and the process restarts correctly (No issues with this scenario).
When we scale down and the task is forced to move to another pod we can see that the Kafka stream restored the changelog, but it didn't work because the process created a new state with version 1 instead of 3992 and with the CounterSize 1 instead of 2964. (It can be checked in the logs)
After checking the logs we saw that the same key goes to different partitions in the changelog topic when the task is assigned to a new pod as per the screenshots below ( Not sure if is an issue )
Details of what we are using:
Application name is customer-state.
We are using AWS MSK - Apache Kafka version 2.8.0 with 6 brokers
Application deployed as statefulSets into EKS/Kubernetes with 3 replicas
Java 11
Spring Cloud 2020.0.3
Kafka Stream 2.8.0
We applied a few configuration changes, but we cannot find why it is not taking the last state from the changelog.
spring.cloud.stream.kafka.streams.binder.configuration:
max.request.size: 5242892
max.partition.fetch.bytes: 5242892
max.fetch.bytes: 15728676
acceptable.recovery.lag: 0
num.standby.replicas: 1
num.stream.threads: 2
spring.kafka.streams.binder:
functions:
process.applicationId: customer-state-process
configuration:
group.instance.id: ${POD_NAME} => We have a helm chart that will populate the VARIABLE with a POD name. As it is a statefulSet the name will be always customer-state-0, customer-state-1, customer-state-3
session.timeout.ms: 30000
acceptable.recovery.lag: 0
What are the settings for the state-store?
#Bean
public StreamsBuilderFactoryBeanConfigurer streamsBuilderFactoryBeanCustomizer() {
return factoryBean -> {
try {
final StreamsBuilder streamsBuilder = factoryBean.getObject();
if (isNull(streamsBuilder)) {
throw new NullPointerException("streamsBuilder is null");
}
// Customer's State initialization
final PrimitiveAvroSerde<String> customerKeySerde = createKeySerde(factoryBean, true);
final SpecificAvroSerde<StateAvro> valueSerde = createValueSerde(factoryBean);
final StoreBuilder<KeyValueStore<String, StateAvro>> storeBuilder = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STATE_STORE_NAME),
customerKeySerde,
valueSerde);
streamsBuilder.addStateStore(storeBuilder);
} catch (Exception e) {
throw new RuntimeException("Can't create State Store");
}
};
}
How is CounterSize implemented and saved to kafka?
It's a Java Map serialized to Avro, and then published to Kafka
CounterSize is printed in the logs is just a number of entries for that Map
"type" : "record",
"name" : "StateAvro",
"namespace" : "com.dpml.avro.state",
"fields" : [ {
"name" : "version",
"type" : "long"
}, {
"name" : "DepositCounter",
"type" : [
"null",
"MetaCounterAvro"
],
"default": null
} ...
{
"type" : "record",
"name" : "MetaCounterAvro",
"namespace" : "com.dpml.avro.state",
"fields" : [ {
"name" : "entries",
"type" : {
"type" : "array",
"items" : "PairLongString",
"java-class" : "java.util.Map"
}
}, {
"name" : "hours",
"type" : "long"
} ]
}
This is the logs print:
log.debug("Successfully updated state for customerId={}, {}", getCustomerId(), createLogInfo(avro));
private String createLogInfo(StateAvro stateAvro) {
final var depositSize = Optional.ofNullable(stateAvro.getDepositCounter())
.map(MetaCounterAvro::getEntries)
.map(List::size)
.orElse(0);
return String.format("version=%d, eventTime=%d, triggerEvent=%s, updatedByEventId=%s, CounterSize=%s,
stateAvro.getVersion(),
stateAvro.getEventTime(),
stateAvro.getTriggerEvent().name(),
stateAvro.getUpdatedByEventId(),
depositSize);
}
LOGS:
=> offset from incoming event partition=2 offset=74800529
2022-02-24T10:52:58.867Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=topic-input-name, offset=74800529 task=1_2 - Received event {"header": {"timestamp": 1645699978378, "eventId": "8f10dc2e-fcf2-4fcb-a312-0bba7170e16d", "customerId": 1234567890}
2022-02-24T10:52:58.868Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - State already exists for customerId=1234567890 version=3990, eventTime=1645699976221, updatedByEventId=21844581-5fd9-41da-b05d-e1cb719349a5, CounterSize=2962
2022-02-24T10:52:58.868Z INFO c.k.d.k.t.CustomerNewEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - CustomerEvent -> adding customerId=1234567890 timeEvent=1645699978378 eventId=8f10dc2e-fcf2-4fcb-a312-0bba7170e16d Technical info -> partition=2 offset=74800529 task=1_2
2022-02-24T10:52:58.869Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Successfully updated state for customerId=1234567890 version=3991, eventTime=1645699978378, triggerEvent=CustomerEvent, updatedByEventId=8f10dc2e-fcf2-4fcb-a312-0bba7170e16d, CounterSize=2963
=> scaled down from 3 pods to 2 pods to force the rebalance
2022-02-24T10:52:59.011Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Attempt to heartbeat failed since group is rebalancing
2022-02-24T10:52:59.012Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] (Re-)joining group
2022-02-24T10:52:59.090Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Successfully joined group with generation Generation{generationId=192444, memberId='customer-state-0-1-6693263a-0054-46bb-b775-697741beb01a', protocol='stream'}
2022-02-24T10:52:59.101Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Successfully synced group in generation Generation{generationId=192444, memberId='customer-state-0-1-6693263a-0054-46bb-b775-697741beb01a', protocol='stream'}
2022-02-24T10:52:59.101Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Updating assignment with Assigned partitions: [topic-input-name-2, topic-input-name-0] Current owned partitions: [topic-input-name-2] Added partitions (assigned - owned): [topic-input-name-0] Revoked partitions (owned - assigned): []
2022-02-24T10:52:59.101Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Notifying assignor about the new Assignment(partitions=[topic-input-name-0, topic-input-name-2], userDataSize=98)
2022-02-24T10:52:59.102Z INFO o.a.k.s.p.i.StreamsPartitionAssignor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer] No followup rebalance was requested, resetting the rebalance schedule.
2022-02-24T10:52:59.102Z INFO o.a.k.s.p.i.TaskManager [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Handle new assignment with: New active tasks: [1_0, 1_2] New standby tasks: [1_3] Existing active tasks: [1_2] Existing standby tasks: [1_1]
2022-02-24T10:52:59.102Z INFO o.a.k.s.p.i.StandbyTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_1] Suspended running
2022-02-24T10:52:59.150Z INFO o.a.k.c.c.KafkaConsumer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Subscribed to partition(s): customer-state-process-customer-state-store-changelog-2
2022-02-24T10:52:59.154Z INFO o.a.k.s.p.i.StandbyTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_1] Closed clean
2022-02-24T10:52:59.155Z INFO i.c.k.s.KafkaAvroSerializerConfig [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - KafkaAvroSerializerConfig values: bearer.auth.token = [hidden] schema.registry.url = [https://cp-schema-registry.cp-schema-registry.svc.cluster.local:443] basic.auth.user.info = [hidden] auto.register.schemas = true max.schemas.per.subject = 1000 basic.auth.credentials.source = URL schema.registry.basic.auth.user.info = [hidden] bearer.auth.credentials.source = STATIC_TOKEN value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicRecordNameStrategy key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
2022-02-24T10:52:59.155Z INFO i.c.k.s.KafkaAvroDeserializerConfig [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - KafkaAvroDeserializerConfig values: bearer.auth.token = [hidden] schema.registry.url = [https://cp-schema-registry.cp-schema-registry.svc.cluster.local:443] basic.auth.user.info = [hidden] auto.register.schemas = true max.schemas.per.subject = 1000 basic.auth.credentials.source = URL schema.registry.basic.auth.user.info = [hidden] bearer.auth.credentials.source = STATIC_TOKEN specific.avro.reader = true value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicRecordNameStrategy key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
2022-02-24T10:52:59.157Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Adding newly assigned partitions: topic-input-name-0
2022-02-24T10:52:59.157Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] State transition from RUNNING to PARTITIONS_ASSIGNED
2022-02-24T10:52:59.157Z INFO o.a.k.s.KafkaStreams [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-client [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf] State transition from RUNNING to REBALANCING
2022-02-24T10:52:59.158Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Setting offset for partition topic-input-name-0 to the committed offset FetchPosition{offset=73628671, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-2.d2-msk-cluster-pt.c2.kafka.eu-central-1.amazonaws.com:9094 (id: 2 rack: euc1-az3)], epoch=141}}
2022-02-24T10:52:59.199Z INFO o.a.k.s.p.i.ProcessorStateManager [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] task [1_0] State store customer-state-store did not find checkpoint offset, hence would default to the starting offset at changelog customer-state-process-customer-state-store-changelog-0
2022-02-24T10:52:59.199Z INFO o.a.k.s.p.i.StreamTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] task [1_0] Initialized
2022-02-24T10:52:59.240Z INFO o.a.k.s.p.i.ProcessorStateManager [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_3] State store customer-state-store did not find checkpoint offset, hence would default to the starting offset at changelog customer-state-process-customer-state-store-changelog-3
2022-02-24T10:52:59.240Z INFO o.a.k.s.p.i.StandbyTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_3] Initialized
2022-02-24T10:52:59.286Z INFO o.a.k.c.c.KafkaConsumer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Subscribed to partition(s): customer-state-process-customer-state-store-changelog-0, customer-state-process-customer-state-store-changelog-3, customer-state-process-customer-state-store-changelog-2
2022-02-24T10:52:59.286Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Seeking to EARLIEST offset of partition customer-state-process-customer-state-store-changelog-0
2022-02-24T10:52:59.287Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Seeking to EARLIEST offset of partition customer-state-process-customer-state-store-changelog-3
2022-02-24T10:52:59.471Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Resetting offset for partition customer-state-process-customer-state-store-changelog-3 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-2.d2-msk-cluster-pt.c2.kafka.eu-central-1.amazonaws.com:9094 (id: 2 rack: euc1-az3)], epoch=0}}.
2022-02-24T10:52:59.471Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Resetting offset for partition customer-state-process-customer-state-store-changelog-0 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-4.d2-msk-cluster-pt.c2.kafka.eu-central-1.amazonaws.com:9094 (id: 4 rack: euc1-az3)], epoch=0}}.
2022-02-24T10:53:09.480Z INFO o.a.k.s.p.i.StoreChangelogReader [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Restoration in progress for 1 partitions. {customer-state-process-customer-state-store-changelog-0: position=19011, end=42297, totalRestored=19011}
2022-02-24T10:53:11.769Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Processed 939 total records, ran 2 punctuators, and committed 4 total tasks since the last update
2022-02-24T10:53:14.105Z INFO o.a.k.s.p.i.StoreChangelogReader [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Finished restoring changelog customer-state-process-customer-state-store-changelog-0 to store customer-state-store with a total number of 42297 records
2022-02-24T10:53:14.106Z INFO c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Fetching state store in transformer
2022-02-24T10:53:14.152Z INFO o.a.k.s.p.i.StreamTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] task [1_0] Restored and ready to run
2022-02-24T10:53:14.152Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Restoration took 14995 ms for all tasks [1_0, 1_2, 1_3]
2022-02-24T10:53:14.152Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING
2022-02-24T10:53:14.152Z INFO o.a.k.s.KafkaStreams [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-client [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf] State transition from REBALANCING to RUNNING
2022-02-24T10:53:14.162Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=null, offset=-1 task=1_0 - Received event {"header": {"timestamp": 1645699978060, "eventId": "96d3e5cc-f667-4735-91b8-529374c00d82", "customerId": 1234567890 }
=> At this point, the framework has finished restoring the changelog, so it should find the state of 2022-02-24T10:52:58.868Z, but it didn't
=> offset from incoming event become -1
2022-02-24T10:53:14.162Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Creating new state for customerId=1234567890
2022-02-24T10:53:14.514Z INFO c.k.d.k.t.CustomerNewEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - CustomerEvent -> adding customerId=1234567890 timeEvent=1645699978060 eventId=96d3e5cc-f667-4735-91b8-529374c00d82 Technical info -> partition=-1 offset=-1 task=1_0
2022-02-24T10:53:14.515Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Successfully updated state for customerId=1234567890, version=1, eventTime=1645699978060, triggerEvent=CustomerEvent, updatedByEventId=96d3e5cc-f667-4735-91b8-529374c00d82, CounterSize=1
2022-02-24T10:53:14.515Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=null, offset=-1 task=1_0 - Received event {"header": {"timestamp": 1645699977697, "eventId": "42a9535c-ab90-41da-965c-ee4c5d871626", "customerId": 1234567890 }
2022-02-24T10:53:14.515Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - State already exists for customerId=1234567890, version=1, eventTime=1645699978060, triggerEvent=CustomerEvent, updatedByEventId=96d3e5cc-f667-4735-91b8-529374c00d82, CounterSize=1
2022-02-24T10:53:14.515Z INFO c.k.d.k.t.CustomerNewEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - CustomerEvent -> adding customerId=1234567890 timeEvent=1645699977697 eventId=42a9535c-ab90-41da-965c-ee4c5d871626 Technical info -> partition=-1 offset=-1 task=1_0
2022-02-24T10:53:14.516Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Successfully updated state for customerId=1234567890, version=2, eventTime=1645699977697, triggerEvent=CustomerEvent, updatedByEventId=42a9535c-ab90-41da-965c-ee4c5d871626, CounterSize=2
2022-02-24T10:53:14.516Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=null, offset=-1 task=1_0 - Received event {"header": {"timestamp": 1645699975892, "eventId": "459dc2ec-7a69-47f1-8c3e-53a5e5f4303b", "customerId": 1234567890 }

Why does the log show DEBUG?

I've done the following configuration for akka
akka {
loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = "ERROR"
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
}
For me it means, show only ERROR logs, but it shows everything:
17:12:20.758 [SAP-SENDER-akka.kafka.default-dispatcher-6] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-1, groupId=SAP-SENDER-GROUP)] Initializing the Kafka consumer
17:12:20.806 [SAP-SENDER-akka.kafka.default-dispatcher-6] DEBUG org.apache.kafka.clients.Metadata - Updated cluster metadata version 1 to Cluster(id = null, nodes = [localhost:9092 (id: -1 rack: null)], partitions = [], controller = null)
17:12:20.811 [SAP-SENDER-akka.actor.default-dispatcher-2] ERROR SAP-SENDER - It is done Connection failed..
17:12:20.828 [SAP-SENDER-akka.kafka.default-dispatcher-6] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name fetch-throttle-time
17:12:20.845 [SAP-SENDER-akka.kafka.default-dispatcher-6] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name connections-closed:
17:12:20.846 [SAP-SENDER-akka.kafka.default-dispatcher-6] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name connections-created:
17:12:20.846 [SAP-SENDER-akka.kafka.default-dispatcher-6] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name successful-authentication:
Why does it show everything?

Kafka Streams - Consumer memory overload

I am planning a Spring+Kafka Streams application that handles incoming messages and stores updated internal state as a result of these messages.
This state is predicted to reach ~500mb per unique key (There are likely to be ~10k unique keys distributed across 2k partitions).
This state must generally be held in-memory for effective operation of my application but even on disk I would still face a similar problem (albeit just at a later date of scaling).
I am planning to deploy this application into a dynamically scaling environment such as AWS and will set a minimum number of instances, but I am wary of 2 situations:
On first startup (where perhaps just 1 consumer starts first) it will not be able to handle taking assignment of all the partitions because the in memory state will overflow the instances available memory.
After a major outtage (AWS availability zone outtage) it could be that 33% of consumers are taken out of the group and the additional memory load on the remaining instances could actually take out everyone who remains.
How do people protect their consumers from taking on more partitions than they can handle such that they do not overflow available memory/disk?
See the kafka documentation.
Since 0.11...
EDIT
For your second use case (and it also works for the first), perhaps you could implement a custom PartitionAssignor that limits the number of partitions assigned to each instance.
I haven't tried it; I don't know how the broker will react to the presence of unassigned partitions.
EDIT2
This seems to work ok; but YMMV...
public class NoMoreThanFiveAssignor extends RoundRobinAssignor {
#Override
public Map<String, List<TopicPartition>> assign(Map<String, Integer> partitionsPerTopic,
Map<String, Subscription> subscriptions) {
Map<String, List<TopicPartition>> assignments = super.assign(partitionsPerTopic, subscriptions);
assignments.forEach((memberId, assigned) -> {
if (assigned.size() > 5) {
System.out.println("Reducing assignments from " + assigned.size() + " to 5 for " + memberId);
assignments.put(memberId,
assigned.stream()
.limit(5)
.collect(Collectors.toList()));
}
});
return assignments;
}
}
and
#SpringBootApplication
public class So54072362Application {
public static void main(String[] args) {
SpringApplication.run(So54072362Application.class, args);
}
#Bean
public NewTopic topic() {
return new NewTopic("so54072362", 15, (short) 1);
}
#KafkaListener(id = "so54072362", topics = "so54072362")
public void listen(ConsumerRecord<?, ?> record) {
System.out.println(record);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template) {
return args -> {
for (int i = 0; i < 15; i++) {
template.send("so54072362", i, "foo", "bar");
}
};
}
}
and
spring.kafka.consumer.properties.partition.assignment.strategy=com.example.NoMoreThanFiveAssignor
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.auto-offset-reset=earliest
and
Reducing assignments from 15 to 5 for consumer-2-f37221f8-70bb-421d-9faf-6591cc26a76a
2019-01-07 15:24:28.288 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 7
2019-01-07 15:24:28.289 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:28.296 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Attempt to heartbeat failed since group is rebalancing
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Revoking previously assigned partitions [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.303 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked: [so54072362-0, so54072362-1, so54072362-2, so54072362-3, so54072362-4]
2019-01-07 15:24:46.304 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] (Re-)joining group
Reducing assignments from 8 to 5 for consumer-2-c9a6928a-520c-4646-9dd9-4da14636744b
Reducing assignments from 7 to 5 for consumer-2-f37221f8-70bb-421d-9faf-6591cc26a76a
2019-01-07 15:24:46.310 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 8
2019-01-07 15:24:46.311 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:46.315 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Attempt to heartbeat failed since group is rebalancing
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Revoking previously assigned partitions [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked: [so54072362-9, so54072362-5, so54072362-7, so54072362-1, so54072362-3]
2019-01-07 15:24:58.324 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] (Re-)joining group
2019-01-07 15:24:58.330 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Successfully joined group with generation 9
2019-01-07 15:24:58.332 INFO 23485 --- [o54072362-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=so54072362] Setting newly assigned partitions [so54072362-14, so54072362-11, so54072362-5, so54072362-8, so54072362-2]
2019-01-07 15:24:58.336 INFO 23485 --- [o54072362-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so54072362-14, so54072362-11, so54072362-5, so54072362-8, so54072362-2]
Of course, this leaves the unassigned partitions dangling, but it sounds like that's what you want, until the region comes back online.

What parameters should I pass for the schema-registry to run on non-master mode?

I want to run the schema-registry in non-master-mode in Kubernetes, I passed the environment variable master.eligibility=false, However, it's still electing the master.
Please point me where else I should change the configuration! There are no errors in the environment value being wrong.
cmd:
helm install helm-test-0.1.0.tgz --set env.name.SCHEMA_REGISTRY_KAFKASTORE_BOOTSERVERS="PLAINTEXT://xx.xx.xx.xx:9092\,PLAINTEXT://xx.xx.xx.xx:9092\,PLAINTEXT://xx.xx.xx.xx:9092" --set env.name.SCHEMA_REGISTRY_LISTENERS="http://0.0.0.0:8083" --set env.name.SCHEMA_REGISTRY_MASTER_ELIGIBILITY=false
Details:
replicaCount: 1
image:
repository: confluentinc/cp-schema-registry
tag: "5.0.0"
pullPolicy: IfNotPresent
env:
name:
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: "PLAINTEXT://xx.xxx.xx.xx:9092, PLAINTEXT://xx.xxx.xx.xx:9092, PLAINTEXT://xx.xxx.xx.xx:9092"
SCHEMA_REGISTRY_LISTENERS: "http://0.0.0.0:8883"
SCHEMA_REGISTRY_HOST_NAME: localhost
SCHEMA_REGISTRY_MASTER_ELIGIBILITY: false
Pod - schema-registry properties:
root#test-app-788455bb47-tjlhw:/# cat /etc/schema-registry/schema-registry.properties
master.eligibility=false
listeners=http://0.0.0.0:8883
host.name=xx.xx.xxx.xx
kafkastore.bootstrap.servers=PLAINTEXT://xx.xx.xx.xx:9092,PLAINTEXT://xx.xx.xx.xx:9092,PLAINTEXT://xx.xx.xx.xx:9092
echo "===> Launching ... "
+ echo '===> Launching ... '
exec /etc/confluent/docker/launch
+ exec /etc/confluent/docker/launch
===> Launching ...
===> Launching schema-registry ...
[2018-10-15 18:52:45,993] INFO SchemaRegistryConfig values:
resource.extension.class = []
metric.reporters = []
kafkastore.sasl.kerberos.kinit.cmd = /usr/bin/kinit
response.mediatype.default = application/vnd.schemaregistry.v1+json
kafkastore.ssl.trustmanager.algorithm = PKIX
inter.instance.protocol = http
authentication.realm =
ssl.keystore.type = JKS
kafkastore.topic = _schemas
metrics.jmx.prefix = kafka.schema.registry
kafkastore.ssl.enabled.protocols = TLSv1.2,TLSv1.1,TLSv1
kafkastore.topic.replication.factor = 3
ssl.truststore.password = [hidden]
kafkastore.timeout.ms = 500
host.name = xx.xxx.xx.xx
kafkastore.bootstrap.servers = [PLAINTEXT://xx.xxx.xx.xx:9092, PLAINTEXT://xx.xxx.xx.xx:9092, PLAINTEXT://xx.xxx.xx.xx:9092]
schema.registry.zk.namespace = schema_registry
kafkastore.sasl.kerberos.ticket.renew.window.factor = 0.8
kafkastore.sasl.kerberos.service.name =
schema.registry.resource.extension.class = []
ssl.endpoint.identification.algorithm =
compression.enable = false
kafkastore.ssl.truststore.type = JKS
avro.compatibility.level = backward
kafkastore.ssl.protocol = TLS
kafkastore.ssl.provider =
kafkastore.ssl.truststore.location =
response.mediatype.preferred = [application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json]
kafkastore.ssl.keystore.type = JKS
authentication.skip.paths = []
ssl.truststore.type = JKS
kafkastore.ssl.truststore.password = [hidden]
access.control.allow.origin =
ssl.truststore.location =
ssl.keystore.password = [hidden]
port = 8081
kafkastore.ssl.keystore.location =
metrics.tag.map = {}
master.eligibility = false
Logs of the schema-registry pod:
(org.apache.kafka.clients.consumer.ConsumerConfig)
[2018-10-15 18:52:48,571] INFO Kafka version : 2.0.0-cp1 (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-15 18:52:48,571] INFO Kafka commitId : 4b1dd33f255ddd2f (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-15 18:52:48,599] INFO Cluster ID: V-MGQtptQnuWK_K9-wot1Q (org.apache.kafka.clients.Metadata)
[2018-10-15 18:52:48,602] INFO Initialized last consumed offset to -1 (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-15 18:52:48,605] INFO [kafka-store-reader-thread-_schemas]: Starting (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-15 18:52:48,715] INFO [Consumer clientId=KafkaStore-reader-_schemas, groupId=schema-registry-10.100.4.189-8083] Resetting offset for partition _schemas-0 to offset 0. (org.apache.kafka.clients.consumer.internals.Fetcher)
[2018-10-15 18:52:48,721] INFO Cluster ID: V-MGQtptQnuWK_K9-wot1Q (org.apache.kafka.clients.Metadata)
[2018-10-15 18:52:48,775] INFO Wait to catch up until the offset of the last message at 228 (io.confluent.kafka.schemaregistry.storage.KafkaStore)
[2018-10-15 18:52:49,831] INFO Joining schema registry with Kafka-based coordination (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry)
[2018-10-15 18:52:49,852] INFO Kafka version : 2.0.0-cp1 (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-15 18:52:49,852] INFO Kafka commitId : 4b1dd33f255ddd2f (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-15 18:52:49,909] INFO Cluster ID: V-MGQtptQnuWK_K9-wot1Q (org.apache.kafka.clients.Metadata)
[2018-10-15 18:52:49,915] INFO [Schema registry clientId=sr-1, groupId=schema-registry] Discovered group coordinator ip-10-150-4-5.ec2.internal:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-10-15 18:52:49,919] INFO [Schema registry clientId=sr-1, groupId=schema-registry] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-10-15 18:52:52,975] INFO [Schema registry clientId=sr-1, groupId=schema-registry] Successfully joined group with generation 92 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-10-15 18:52:52,980] INFO Finished rebalance with master election result: Assignment{version=1, error=0, master='sr-1-abcd4cf2-8a02-4105-8361-9aa82107acd8', masterIdentity=version=1,host=ip-xx-xxx-xx-xx.ec2.internal,port=8083,scheme=http,masterEligibility=true} (io.confluent.kafka.schemaregistry.masterelector.kafka.KafkaGroupMasterElector)
[2018-10-15 18:52:53,088] INFO Adding listener: http://0.0.0.0:8083 (io.confluent.rest.Application)
[2018-10-15 18:52:53,347] INFO jetty-9.4.11.v20180605; built: 2018-06-05T18:24:03.829Z; git: d5fc0523cfa96bfebfbda19606cad384d772f04c; jvm 1.8.0_172-b01 (org.eclipse.jetty.server.Server)
[2018-10-15 18:52:53,428] INFO DefaultSessionIdManager workerName=node0 (org.eclipse.jetty.server.session)
[2018-10-15 18:52:53,429] INFO No SessionScavenger set, using defaults (org.eclipse.jetty.server.session)
[2018-10-15 18:52:53,432] INFO node0 Scavenging every 660000ms (org.eclipse.jetty.server.session)
Oct 15, 2018 6:52:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider io.confluent.kafka.schemaregistry.rest.resources.SubjectsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluent.kafka.schemaregistry.rest.resources.SubjectsResource will be ignored.
Oct 15, 2018 6:52:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider io.confluent.kafka.schemaregistry.rest.resources.ConfigResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluent.kafka.schemaregistry.rest.resources.ConfigResource will be ignored.
Oct 15, 2018 6:52:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider io.confluent.kafka.schemaregistry.rest.resources.SchemasResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluent.kafka.schemaregistry.rest.resources.SchemasResource will be ignored.
Oct 15, 2018 6:52:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider io.confluent.kafka.schemaregistry.rest.resources.SubjectVersionsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluent.kafka.schemaregistry.rest.resources.SubjectVersionsResource will be ignored.
Oct 15, 2018 6:52:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider io.confluent.kafka.schemaregistry.rest.resources.CompatibilityResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluent.kafka.schemaregistry.rest.resources.CompatibilityResource will be ignored.
[2018-10-15 18:52:54,364] INFO HV000001: Hibernate Validator 5.1.3.Final (org.hibernate.validator.internal.util.Version)
[2018-10-15 18:52:54,587] INFO Started o.e.j.s.ServletContextHandler#764faa6{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
[2018-10-15 18:52:54,619] INFO Started o.e.j.s.ServletContextHandler#14a50707{/ws,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
[2018-10-15 18:52:54,642] INFO Started NetworkTrafficServerConnector#62656be4{HTTP/1.1,[http/1.1]}{0.0.0.0:8083} (org.eclipse.jetty.server.AbstractConnector)
[2018-10-15 18:52:54,644] INFO Started #9700ms (org.eclipse.jetty.server.Server)
[2018-10-15 18:52:54,644] INFO Server started, listening for requests... (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain)
I checked and your configs look good. I believe, it is, in fact, starting as a follower and the logs are basically displaying who the master is in this case:
Assignment{version=1, error=0, master='sr-1-abcd4cf2-8a02-4105-8361-9aa82107acd8', masterIdentity=version=1,host=ip-xx-xxx-xx-xx.ec2.internal,port=8083,scheme=http,masterEligibility=true}

Kafka Streams not writing to sink topic

I am trying to learn Kafka Streams using Confluent's test platform and the setup instruction here. I can start up and connect to my test broker, but the streams application never writes to my sink topic. Looking in the logs, Kafka Streams is constantly fetching and monitoring the offset (if I am reading the logs correctly), but it never actually reads or writes anything.
14:07:29.654 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Received successful Heartbeat response
14:07:29.770 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Fetch READ_UNCOMMITTED at offset 4 for partition exportStatusUpdatesV2_rquinlivan-0 returned fetch data (error=NONE, highWaterMark=4, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=0)
14:07:29.770 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Added READ_UNCOMMITTED fetch request for partition exportStatusUpdatesV2_rquinlivan-0 at offset 4 to node localhost:29092 (id: 1 rack: null)
14:07:29.770 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending READ_UNCOMMITTED fetch for partitions [exportStatusUpdatesV2_rquinlivan-0] to broker localhost:29092 (id: 1 rack: null)
14:07:30.273 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Fetch READ_UNCOMMITTED at offset 4 for partition exportStatusUpdatesV2_rquinlivan-0 returned fetch data (error=NONE, highWaterMark=4, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=0)
14:07:30.273 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Added READ_UNCOMMITTED fetch request for partition exportStatusUpdatesV2_rquinlivan-0 at offset 4 to node localhost:29092 (id: 1 rack: null)
14:07:30.273 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending READ_UNCOMMITTED fetch for partitions [exportStatusUpdatesV2_rquinlivan-0] to broker localhost:29092 (id: 1 rack: null)
14:07:30.775 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Fetch READ_UNCOMMITTED at offset 4 for partition exportStatusUpdatesV2_rquinlivan-0 returned fetch data (error=NONE, highWaterMark=4, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=0)
14:07:30.776 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Added READ_UNCOMMITTED fetch request for partition exportStatusUpdatesV2_rquinlivan-0 at offset 4 to node localhost:29092 (id: 1 rack: null)
14:07:30.776 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending READ_UNCOMMITTED fetch for partitions [exportStatusUpdatesV2_rquinlivan-0] to broker localhost:29092 (id: 1 rack: null)
14:07:31.279 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Fetch READ_UNCOMMITTED at offset 4 for partition exportStatusUpdatesV2_rquinlivan-0 returned fetch data (error=NONE, highWaterMark=4, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=0)
14:07:31.279 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Added READ_UNCOMMITTED fetch request for partition exportStatusUpdatesV2_rquinlivan-0 at offset 4 to node localhost:29092 (id: 1 rack: null)
14:07:31.279 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending READ_UNCOMMITTED fetch for partitions [exportStatusUpdatesV2_rquinlivan-0] to broker localhost:29092 (id: 1 rack: null)
14:07:31.782 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Fetch READ_UNCOMMITTED at offset 4 for partition exportStatusUpdatesV2_rquinlivan-0 returned fetch data (error=NONE, highWaterMark=4, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=0)
14:07:31.782 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Added READ_UNCOMMITTED fetch request for partition exportStatusUpdatesV2_rquinlivan-0 at offset 4 to node localhost:29092 (id: 1 rack: null)
14:07:31.782 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending READ_UNCOMMITTED fetch for partitions [exportStatusUpdatesV2_rquinlivan-0] to broker localhost:29092 (id: 1 rack: null)
14:07:32.284 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Fetch READ_UNCOMMITTED at offset 4 for partition exportStatusUpdatesV2_rquinlivan-0 returned fetch data (error=NONE, highWaterMark=4, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=0)
14:07:32.284 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Added READ_UNCOMMITTED fetch request for partition exportStatusUpdatesV2_rquinlivan-0 at offset 4 to node localhost:29092 (id: 1 rack: null)
14:07:32.284 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending READ_UNCOMMITTED fetch for partitions [exportStatusUpdatesV2_rquinlivan-0] to broker localhost:29092 (id: 1 rack: null)
14:07:32.656 [kafka-coordinator-heartbeat-thread | katanaTest] DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending Heartbeat request to coordinator localhost:29092 (id: 2147483646 rack: null)
I don't understand from this stack trace what the issue is, and there is never an error logged. How can I debug why my streams application isn't working? What is the recommended method of debugging in Kafka Streams?