I'm using flume to get data from Kafka to HDFS. (Kafka Source and HDFS Sink). These are the versions I'm using.
hadoop-3.2.2
flume-1.9.0
kafka_2.11-0.10.1.0
This is my kafka-fluem-hdfs.conf:
a1.sources=r1 r2
a1.channels=c1 c2
a1.sinks=k1 k2
## source1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers=h01:9092,h02:9092,h03:9092
a1.sources.r1.kafka.topics=topic_start
## source2
a1.sources.r2.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r2.batchSize = 5000
a1.sources.r2.batchDurationMillis = 2000
a1.sources.r2.kafka.bootstrap.servers=h01:9092,h02:9092,h03:9092
a1.sources.r2.kafka.topics=topic_event
## channel1
a1.channels.c1.type = file
a1.channels.c1.checkpointDir=/usr/local/flume/checkpoint/behavior1
a1.channels.c1.dataDirs = /usr/local/flume/data/behavior1/
a1.channels.c1.keep-alive = 6
## channel2
a1.channels.c2.type = file
a1.channels.c2.checkpointDir=/usr/local/flume/checkpoint/behavior2
a1.channels.c2.dataDirs = /usr/local/flume/data/behavior2/
a1.channels.c2.keep-alive = 6
## sink1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /origin_data/gmall/log/topic_start/%Y-%m-%d
a1.sinks.k1.hdfs.filePrefix = logstart-
##sink2
a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path = /origin_data/gmall/log/topic_event/%Y-%m-%d
a1.sinks.k2.hdfs.filePrefix = logevent-
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k2.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.rollSize = 134217728
a1.sinks.k2.hdfs.rollCount = 0
a1.sinks.k1.hdfs.fileType = CompressedStream
a1.sinks.k2.hdfs.fileType = CompressedStream
a1.sinks.k1.hdfs.codeC = gzip
a1.sinks.k2.hdfs.codeC = gzip
#a1.sinks.k1.hdfs.codeC=com.hadoop.compression.lzo.LzopCodec
#a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.callTimeout=360000
#a1.sinks.k1.hdfs.maxIoWorkers=32
#a1.sinks.k1.hdfs.fileSuffix=.lzo
#a1.sinks.k2.hdfs.codeC=com.hadoop.compression.lzo.LzopCodec
#a1.sinks.k2.hdfs.writeFormat=Text
a1.sinks.k2.hdfs.callTimeout=360000
#a1.sinks.k2.hdfs.maxIoWorkers=32
#a1.sinks.k2.hdfs.fileSuffix=.lzo
a1.sources.r1.channels = c1
a1.sinks.k1.channel= c1
a1.sources.r2.channels = c2
a1.sinks.k2.channel= c2
The part of the log file:
2021-08-19 15:37:39,308 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp
2021-08-19 15:37:40,476 INFO [hdfs-k1-call-runner-0] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)) - Successfully loaded & initialized native-zlib library
2021-08-19 15:37:40,509 INFO [hdfs-k1-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,516 INFO [hdfs-k1-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:37:40,525 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:37:40,522 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp
2021-08-19 15:37:40,858 INFO [hdfs-k2-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,889 INFO [hdfs-k2-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas
my problems:
Create /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp, but there is no such file in hdfs.
More logs after I start flume:
....
....
....
2021-08-19 15:30:01,748 INFO [lifecycleSupervisor-1-0] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior1/checkpoint, elements to sync = 0
2021-08-19 15:30:01,754 INFO [lifecycleSupervisor-1-1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387001047, queueSize: 0, queueHead: 5765
2021-08-19 15:30:01,758 INFO [lifecycleSupervisor-1-0] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387001048, queueSize: 0, queueHead: 5778
2021-08-19 15:30:01,783 INFO [lifecycleSupervisor-1-0] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior1/log-26 position: 0 logWriteOrderID: 1629387001048
2021-08-19 15:30:01,783 INFO [lifecycleSupervisor-1-0] file.FileChannel (FileChannel.java:start(289)) - Queue Size after replay: 0 [channel=c1]
2021-08-19 15:30:01,784 INFO [lifecycleSupervisor-1-1] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior2/log-26 position: 0 logWriteOrderID: 1629387001047
2021-08-19 15:30:01,787 INFO [lifecycleSupervisor-1-1] file.FileChannel (FileChannel.java:start(289)) - Queue Size after replay: 0 [channel=c2]
2021-08-19 15:30:01,789 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(196)) - Starting Sink k1
2021-08-19 15:30:01,795 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
2021-08-19 15:30:01,795 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SINK, name: k1 started
2021-08-19 15:30:01,797 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(196)) - Starting Sink k2
2021-08-19 15:30:01,798 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(207)) - Starting Source r2
2021-08-19 15:30:01,799 INFO [lifecycleSupervisor-1-5] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SINK, name: k2: Successfully registered new MBean.
2021-08-19 15:30:01,803 INFO [lifecycleSupervisor-1-5] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SINK, name: k2 started
2021-08-19 15:30:01,799 INFO [lifecycleSupervisor-1-6] kafka.KafkaSource (KafkaSource.java:doStart(524)) - Starting org.apache.flume.source.kafka.KafkaSource{name:r2,state:IDLE}...
2021-08-19 15:30:01,815 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(207)) - Starting Source r1
2021-08-19 15:30:01,818 INFO [lifecycleSupervisor-1-0] kafka.KafkaSource (KafkaSource.java:doStart(524)) - Starting org.apache.flume.source.kafka.KafkaSource{name:r1,state:IDLE}...
2021-08-19 15:30:01,918 INFO [lifecycleSupervisor-1-6] consumer.ConsumerConfig (AbstractConfig.java:logAll(279)) - ConsumerConfig values:
......
.......
.......
2021-08-19 15:30:01,926 INFO [lifecycleSupervisor-1-0] consumer.ConsumerConfig (AbstractConfig.java:logAll(279)) - ConsumerConfig values:
.....
......
......
2021-08-19 15:30:02,210 INFO [lifecycleSupervisor-1-0] utils.AppInfoParser (AppInfoParser.java:<init>(109)) - Kafka version : 2.0.1
2021-08-19 15:30:02,210 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r2: Successfully registered new MBean.
2021-08-19 15:30:02,211 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r2 started
2021-08-19 15:30:02,210 INFO [lifecycleSupervisor-1-0] utils.AppInfoParser (AppInfoParser.java:<init>(110)) - Kafka commitId : fa14705e51bd2ce5
2021-08-19 15:30:02,213 INFO [lifecycleSupervisor-1-0] kafka.KafkaSource (KafkaSource.java:doStart(547)) - Kafka source r1 started.
2021-08-19 15:30:02,214 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
2021-08-19 15:30:02,214 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r1 started
2021-08-19 15:30:02,726 INFO [PollableSourceRunner-KafkaSource-r1] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:30:02,730 INFO [PollableSourceRunner-KafkaSource-r2] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:30:02,740 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-1, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:30:02,747 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-2, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:30:02,748 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-1, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:30:02,770 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-1, groupId=flume] (Re-)joining group
2021-08-19 15:30:02,776 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-2, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:30:02,776 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:30:02,845 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:30:02,935 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-1, groupId=flume] Successfully joined group with generation 66
2021-08-19 15:30:02,936 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-1, groupId=flume] Setting newly assigned partitions [topic_event-0]
2021-08-19 15:30:02,936 INFO [PollableSourceRunner-KafkaSource-r2] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_event - partition 0 assigned.
2021-08-19 15:30:02,950 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-2, groupId=flume] Successfully joined group with generation 66
2021-08-19 15:30:02,950 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-2, groupId=flume] Setting newly assigned partitions [topic_start-0]
2021-08-19 15:30:02,950 INFO [PollableSourceRunner-KafkaSource-r1] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_start - partition 0 assigned.
2021-08-19 15:30:04,912 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:30:04,912 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:30:04,984 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz.tmp
2021-08-19 15:30:06,577 INFO [hdfs-k2-call-runner-0] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)) - Successfully loaded & initialized native-zlib library
2021-08-19 15:30:06,606 INFO [hdfs-k2-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:30:06,648 INFO [hdfs-k2-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:30:06,665 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz.tmp
2021-08-19 15:30:06,675 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:30:06,916 INFO [hdfs-k1-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:30:06,927 INFO [hdfs-k1-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:30:06,931 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:30:16,676 INFO [hdfs-k2-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:30:16,676 INFO [hdfs-k2-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz.tmp
2021-08-19 15:30:16,682 INFO [hdfs-k2-call-runner-2] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz.tmp to /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz
2021-08-19 15:30:16,932 INFO [hdfs-k1-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:30:16,932 INFO [hdfs-k1-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz.tmp
2021-08-19 15:30:16,934 INFO [hdfs-k1-call-runner-2] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz.tmp to /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz
2021-08-19 15:30:30,932 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior2/checkpoint, elements to sync = 970
2021-08-19 15:30:30,936 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior1/checkpoint, elements to sync = 967
2021-08-19 15:30:30,951 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387004945, queueSize: 0, queueHead: 6733
2021-08-19 15:30:30,953 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387004946, queueSize: 0, queueHead: 6743
2021-08-19 15:30:30,963 INFO [Log-BackgroundWorker-c2] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior2/log-26 position: 1147366 logWriteOrderID: 1629387004945
2021-08-19 15:30:30,964 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-20
2021-08-19 15:30:30,967 INFO [Log-BackgroundWorker-c1] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior1/log-26 position: 487027 logWriteOrderID: 1629387004946
.....
.....
.....
2021-08-19 15:36:19,570 INFO [lifecycleSupervisor-1-8] utils.AppInfoParser (AppInfoParser.java:<init>(109)) - Kafka version : 2.0.1
2021-08-19 15:36:19,570 INFO [lifecycleSupervisor-1-8] utils.AppInfoParser (AppInfoParser.java:<init>(110)) - Kafka commitId : fa14705e51bd2ce5
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-6] utils.AppInfoParser (AppInfoParser.java:<init>(109)) - Kafka version : 2.0.1
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-6] utils.AppInfoParser (AppInfoParser.java:<init>(110)) - Kafka commitId : fa14705e51bd2ce5
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-8] kafka.KafkaSource (KafkaSource.java:doStart(547)) - Kafka source r1 started.
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-6] kafka.KafkaSource (KafkaSource.java:doStart(547)) - Kafka source r2 started.
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-8] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r2: Successfully registered new MBean.
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-8] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r1 started
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r2 started
2021-08-19 15:36:20,012 INFO [PollableSourceRunner-KafkaSource-r2] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:36:20,015 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-1, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:36:20,025 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-1, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:36:20,027 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-1, groupId=flume] (Re-)joining group
2021-08-19 15:36:20,030 INFO [PollableSourceRunner-KafkaSource-r1] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:36:20,034 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-2, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:36:20,039 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-2, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:36:20,039 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:36:20,068 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:36:20,152 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-1, groupId=flume] Successfully joined group with generation 69
2021-08-19 15:36:20,153 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-1, groupId=flume] Setting newly assigned partitions [topic_event-0]
2021-08-19 15:36:20,154 INFO [PollableSourceRunner-KafkaSource-r2] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_event - partition 0 assigned.
2021-08-19 15:36:20,159 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-2, groupId=flume] Successfully joined group with generation 69
2021-08-19 15:36:20,159 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-2, groupId=flume] Setting newly assigned partitions [topic_start-0]
2021-08-19 15:36:20,159 INFO [PollableSourceRunner-KafkaSource-r1] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_start - partition 0 assigned.
2021-08-19 15:37:39,286 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:37:39,286 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:37:39,308 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp
2021-08-19 15:37:40,476 INFO [hdfs-k1-call-runner-0] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)) - Successfully loaded & initialized native-zlib library
2021-08-19 15:37:40,509 INFO [hdfs-k1-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,516 INFO [hdfs-k1-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:37:40,525 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:37:40,522 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp
2021-08-19 15:37:40,858 INFO [hdfs-k2-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,889 INFO [hdfs-k2-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:37:40,889 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:37:48,532 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior2/checkpoint, elements to sync = 949
2021-08-19 15:37:48,533 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior1/checkpoint, elements to sync = 1002
2021-08-19 15:37:48,562 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387382580, queueSize: 0, queueHead: 7743
2021-08-19 15:37:48,562 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387382581, queueSize: 0, queueHead: 7680
2021-08-19 15:37:48,570 INFO [Log-BackgroundWorker-c1] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior1/log-27 position: 504908 logWriteOrderID: 1629387382580
2021-08-19 15:37:48,571 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-21
2021-08-19 15:37:48,578 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-22
2021-08-19 15:37:48,578 INFO [Log-BackgroundWorker-c2] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior2/log-27 position: 1144058 logWriteOrderID: 1629387382581
2021-08-19 15:37:48,581 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-20
2021-08-19 15:37:48,585 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-23
2021-08-19 15:37:48,587 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-21
2021-08-19 15:37:48,591 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-24
2021-08-19 15:37:48,593 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-22
2021-08-19 15:37:48,597 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-25
2021-08-19 15:37:48,600 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-23
2021-08-19 15:37:48,606 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-24
2021-08-19 15:37:48,612 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-25
2021-08-19 15:37:50,526 INFO [hdfs-k1-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:37:50,526 INFO [hdfs-k1-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp
2021-08-19 15:37:50,694 INFO [hdfs-k1-call-runner-6] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp to /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz
2021-08-19 15:37:50,890 INFO [hdfs-k2-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:37:50,890 INFO [hdfs-k2-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp
2021-08-19 15:37:50,893 INFO [hdfs-k2-call-runner-3] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp to /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz
.......
.......
.......
When I run the code locally it runs fine but when I run in the server the same code I get the above error.
When I run locally I read the data from local mongodb then I have no error. But when I run in server I read data from mongodb replica server
I have tried changing the
".config("spark.mongodb.input.partitionerOptions", "MongoPaginateByCountPartitioner")"
to
MongoDefaultPartitioner,MongoSplitVectorPartitioner
def save_n_rename(df):
print('------------------------------------- WRITING INITIATED -------------------------------------------')
df.write.format('com.mongodb.spark.sql.DefaultSource').mode('overwrite')\
.option('uri', '{}/{}.Revenue_Analytics'.format(mongo_final_url, mongo_final_db)).save()
print('------------------------------------- WRITING COMPLETED -------------------------------------------')
def main():
spark = SparkSession.builder \
.master(props.get(env, 'executionMode')) \
.appName("Revenue_Analytics") \
.config("spark.mongodb.input.partitionerOptions", "MongoPaginateByCountPartitioner") \
.getOrCreate()
start = time()
df = processing(spark)
mins_elapsed, secs_elapsed = divmod(time() - start, 60)
print("----------- Completed processing in {}m {:.2f}s -----------".format(mins_elapsed, secs_elapsed))
save_n_rename(df)
if __name__ == '__main__':
main()
EDIT 1:
MongoDB Version: 4.2.0
Pyspark version: 2.4.4
traceback:
19/10/24 12:57:45 INFO CodeGenerator: Code generated in 7.006073 ms
19/10/24 12:57:45 INFO CodeGenerator: Code generated in 4.714324 ms
19/10/24 12:57:45 INFO cluster: Cluster created with settings {hosts=[172.16.10.252:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
19/10/24 12:57:45 INFO cluster: Cluster description not yet available. Waiting for 30000 ms before timing out
19/10/24 12:57:45 INFO connection: Opened connection [connectionId{localValue:45, serverValue:172200}] to 172.16.10.252:27017
19/10/24 12:57:45 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=172.16.10.252:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 0, 0]}, minWireVersion=0, maxWireVersion=7, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=419102, setName='rs0', canonicalAddress=mongo-repl-3:27017, hosts=[172.16.10.250:27017, mongo-repl-2:27017, mongo-repl-3:27017], passives=[], arbiters=[], primary='172.16.10.250:27017', tagSet=TagSet{[]}, electionId=null, setVersion=3, lastWriteDate=Thu Oct 24 12:57:45 IST 2019, lastUpdateTimeNanos=2312527044492704}
19/10/24 12:57:45 INFO MongoClientCache: Creating MongoClient: [172.16.10.252:27017]
19/10/24 12:57:45 INFO connection: Opened connection [connectionId{localValue:46, serverValue:172201}] to 172.16.10.252:27017
19/10/24 12:57:45 INFO CodeGenerator: Code generated in 6.280343 ms
19/10/24 12:57:45 INFO CodeGenerator: Code generated in 3.269567 ms
19/10/24 12:57:45 INFO cluster: Cluster created with settings {hosts=[172.16.10.252:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
19/10/24 12:57:45 INFO cluster: Cluster description not yet available. Waiting for 30000 ms before timing out
19/10/24 12:57:45 INFO connection: Opened connection [connectionId{localValue:47, serverValue:172202}] to 172.16.10.252:27017
19/10/24 12:57:45 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=172.16.10.252:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 0, 0]}, minWireVersion=0, maxWireVersion=7, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=570933, setName='rs0', canonicalAddress=mongo-repl-3:27017, hosts=[172.16.10.250:27017, mongo-repl-2:27017, mongo-repl-3:27017], passives=[], arbiters=[], primary='172.16.10.250:27017', tagSet=TagSet{[]}, electionId=null, setVersion=3, lastWriteDate=Thu Oct 24 12:57:45 IST 2019, lastUpdateTimeNanos=2312527212534350}
19/10/24 12:57:45 INFO MongoClientCache: Creating MongoClient: [172.16.10.252:27017]
19/10/24 12:57:45 INFO connection: Opened connection [connectionId{localValue:48, serverValue:172203}] to 172.16.10.252:27017
19/10/24 12:57:45 INFO CodeGenerator: Code generated in 6.001824 ms
19/10/24 12:57:45 INFO CodeGenerator: Code generated in 3.610373 ms
19/10/24 12:57:45 INFO cluster: Cluster created with settings {hosts=[172.16.10.252:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
19/10/24 12:57:45 INFO cluster: Cluster description not yet available. Waiting for 30000 ms before timing out
19/10/24 12:57:45 INFO connection: Opened connection [connectionId{localValue:49, serverValue:172204}] to 172.16.10.252:27017
19/10/24 12:57:45 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=172.16.10.252:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 0, 0]}, minWireVersion=0, maxWireVersion=7, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=502689, setName='rs0', canonicalAddress=mongo-repl-3:27017, hosts=[172.16.10.250:27017, mongo-repl-2:27017, mongo-repl-3:27017], passives=[], arbiters=[], primary='172.16.10.250:27017', tagSet=TagSet{[]}, electionId=null, setVersion=3, lastWriteDate=Thu Oct 24 12:57:45 IST 2019, lastUpdateTimeNanos=2312527352871977}
19/10/24 12:57:45 INFO MongoClientCache: Creating MongoClient: [172.16.10.252:27017]
19/10/24 12:57:45 INFO connection: Opened connection [connectionId{localValue:50, serverValue:172205}] to 172.16.10.252:27017
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.552305 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 3.230598 ms
19/10/24 12:57:46 INFO cluster: Cluster created with settings {hosts=[172.16.10.252:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
19/10/24 12:57:46 INFO cluster: Cluster description not yet available. Waiting for 30000 ms before timing out
19/10/24 12:57:46 INFO connection: Opened connection [connectionId{localValue:51, serverValue:172206}] to 172.16.10.252:27017
19/10/24 12:57:46 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=172.16.10.252:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 0, 0]}, minWireVersion=0, maxWireVersion=7, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=535708, setName='rs0', canonicalAddress=mongo-repl-3:27017, hosts=[172.16.10.250:27017, mongo-repl-2:27017, mongo-repl-3:27017], passives=[], arbiters=[], primary='172.16.10.250:27017', tagSet=TagSet{[]}, electionId=null, setVersion=3, lastWriteDate=Thu Oct 24 12:57:46 IST 2019, lastUpdateTimeNanos=2312527492689014}
19/10/24 12:57:46 INFO MongoClientCache: Creating MongoClient: [172.16.10.252:27017]
19/10/24 12:57:46 INFO connection: Opened connection [connectionId{localValue:52, serverValue:172207}] to 172.16.10.252:27017
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 14.755534 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.132629 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.480881 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 4.944708 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.26496 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.270467 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.068084 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 4.947876 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 4.996435 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.080908 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 4.843392 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 4.93398 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 6.395543 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.189256 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 6.958948 ms
19/10/24 12:57:46 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:46 INFO connection: Closed connection [connectionId{localValue:32, serverValue:172187}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:46 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:46 INFO connection: Closed connection [connectionId{localValue:30, serverValue:172185}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:49 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:49 INFO connection: Closed connection [connectionId{localValue:36, serverValue:172191}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:49 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:49 INFO connection: Closed connection [connectionId{localValue:38, serverValue:172193}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:49 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:49 INFO connection: Closed connection [connectionId{localValue:40, serverValue:172195}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:50 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:50 INFO connection: Closed connection [connectionId{localValue:42, serverValue:172197}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:50 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:50 INFO connection: Closed connection [connectionId{localValue:44, serverValue:172199}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:50 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:50 INFO connection: Closed connection [connectionId{localValue:46, serverValue:172201}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:50 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:50 INFO connection: Closed connection [connectionId{localValue:48, serverValue:172203}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:51 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:51 INFO connection: Closed connection [connectionId{localValue:50, serverValue:172205}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:51 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:51 INFO connection: Closed connection [connectionId{localValue:52, serverValue:172207}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:58:03 ERROR MongoRDD:
-----------------------------
WARNING: Partitioning failed.
-----------------------------
Partitioning using the 'DefaultMongoPartitioner$' failed.
Please check the stacktrace to determine the cause of the failure or check the Partitioner API documentation.
Note: Not all partitioners are suitable for all toplogies and not all partitioners support views.%n
-----------------------------
19/10/24 12:58:04 INFO SparkContext: Invoking stop() from shutdown hook
19/10/24 12:58:04 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:58:04 INFO connection: Closed connection [connectionId{localValue:34, serverValue:172189}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:58:04 INFO SparkUI: Stopped Spark web UI at http://172.16.10.242:4040
19/10/24 12:58:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/10/24 12:58:04 INFO MemoryStore: MemoryStore cleared
19/10/24 12:58:04 INFO BlockManager: BlockManager stopped
19/10/24 12:58:04 INFO BlockManagerMaster: BlockManagerMaster stopped
19/10/24 12:58:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/10/24 12:58:04 INFO SparkContext: Successfully stopped SparkContext
19/10/24 12:58:04 INFO ShutdownHookManager: Shutdown hook called
19/10/24 12:58:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-04e7bf58-133a-4c10-b5c4-20ac740ab880
19/10/24 12:58:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-e36f3499-1c23-4f25-b5ce-3a6a9685f9bb
19/10/24 12:58:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-e36f3499-1c23-4f25-b5ce-3a6a9685f9bb/pyspark-28bc9fe4-4bd8-44dd-b541-a25def4e3930
------------------------------------- WRITING INITIATED -------------------------------------------
Traceback (most recent call last):
File "/home/svr_data_analytic/hmis-analytics-data-processing/src/main/python/sales/revenue.py", line 402, in <module>
main()
File "/home/svr_data_analytic/hmis-analytics-data-processing/src/main/python/sales/revenue.py", line 398, in main
save_n_rename(df)
File "/home/svr_data_analytic/hmis-analytics-data-processing/src/main/python/sales/revenue.py", line 383, in save_n_rename
.option('uri', '{}/{}.Revenue_Analytics'.format(mongo_final_url, mongo_final_db)).save()
File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 736, in save
File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o740.save.
: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange hashpartitioning(itemtype_id#4652, 200)
+- *(70) Project [quantity#3658, amount#3578, discount#3591, item_net_amount#3761, billitems_id#3816, bill_doctor_id#3879, item_doctor_id#3902, bill_no#5673, name#5803, bill_date#5671, type#5851, total_amount#6053, bill_discount#6054, bills_id#6120, Patient_Type#6307, bill_type#4350, admit_doc_id#1096, item_name#4400, itemtype_id#4652, group_name#5625, category_name#5511, classification_name#5397]
+- SortMergeJoin [item_classification_id#4961], [item_cls_id#5404], LeftOuter
:- *(67) Sort [item_classification_id#4961 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(item_classification_id#4961, 200)
: +- *(66) Project [quantity#3658, amount#3578, discount#3591, item_net_amount#3761, billitems_id#3816, bill_doctor_id#3879, item_doctor_id#3902, bill_no#5673, name#5803, bill_date#5671, type#5851, total_amount#6053, bill_discount#6054, bills_id#6120, Patient_Type#6307, bill_type#4350, admit_doc_id#1096, item_name#4400, itemtype_id#4652, item_classification_id#4961, group_name#5625, category_name#5511]
: +- SortMergeJoin [item_category_id#4857], [item_cat_id#5510], LeftOuter
: :- *(63) Sort [item_category_id#4857 ASC NULLS FIRST], false, 0
: : +- Exchange hashpartitioning(item_category_id#4857, 200)
: : +- *(62) Project [quantity#3658, amount#3578, discount#3591, item_net_amount#3761, billitems_id#3816, bill_doctor_id#3879, item_doctor_id#3902, bill_no#5673, name#5803, bill_date#5671, type#5851, total_amount#6053, bill_discount#6054, bills_id#6120, Patient_Type#6307, bill_type#4350, admit_doc_id#1096, item_name#4400, itemtype_id#4652, item_category_id#4857, item_classification_id#4961, group_name#5625]
: : +- SortMergeJoin [item_group_id#4754], [item_grp_id#5624], LeftOuter
: : :- *(59) Sort [item_group_id#4754 ASC NULLS FIRST], false, 0
: : : +- Exchange hashpartitioning(item_group_id#4754, 200)
: : : +- *(58) Project [quantity#3658, amount#3578, discount#3591, item_net_amount#3761, billitems_id#3816, bill_doctor_id#3879, item_doctor_id#3902, bill_no#5673, name#5803, bill_date#5671, type#5851, total_amount#6053, bill_discount#6054, bills_id#6120, Patient_Type#6307, bill_type#4350, admit_doc_id#1096, item_name#4400, itemtype_id#4652, item_group_id#4754, item_category_id#4857, item_classification_id#4961]
: : : +- SortMergeJoin [billitems_item_id#3857], [item_id#4551], LeftOuter
: : : :- *(55) Sort [billitems_item_id#3857 ASC NULLS FIRST], false, 0
: : : : +- Exchange hashpartitioning(billitems_item_id#3857, 200)
: : : : +- *(54) Project [quantity#3658, amount#3578, discount#3591, item_net_amount#3761, billitems_id#3816, billitems_item_id#3857, bill_doctor_id#3879, item_doctor_id#3902, bill_no#5673, name#5803, bill_date#5671, type#5851, total_amount#6053, bill_discount#6054, bills_id#6120, Patient_Type#6307, bill_type#4350, admit_doc_id#1096]
: : : : +- SortMergeJoin [ip_app_id#6144], [ipapp_id#1094], LeftOuter
: : : : :- *(51) Sort [ip_app_id#6144 ASC NULLS FIRST], false, 0
: : : : : +- Exchange hashpartitioning(ip_app_id#6144, 200)
: : : : : +- *(50) Project [quantity#3658, amount#3578, discount#3591, item_net_amount#3761, billitems_id#3816, billitems_item_id#3857, bill_doctor_id#3879, item_doctor_id#3902, bill_no#5673, name#5803, bill_date#5671, type#5851, total_amount#6053, bill_discount#6054, bills_id#6120, ip_app_id#6144, Patient_Type#6307, bill_type#4350]
: : : : : +- *(50) SortMergeJoin [bill_id#3836], [bills_id#6120], Inner
: : : : : :- *(39) Sort [bill_id#3836 ASC NULLS FIRST], false, 0
: : : : : : +- Exchange hashpartitioning(bill_id#3836, 200)
: : : : : : +- *(38) Project [quantity#3658, amount#3578, discount#3591, total#3666 AS item_net_amount#3761, _id#3577.oid AS billitems_id#3816, bills#3586.$id.oid AS bill_id#3836, item#3620.$id.oid AS billitems_item_id#3857, bill_doctor#3582.$id.oid AS bill_doctor_id#3879, doctor#3594.$id.oid AS item_doctor_id#3902]
: : : : : : +- *(38) Filter ((((cast(from_unixtime(unix_timestamp(bill_date#3581, yyyy-MM-dd h:mm:ss, Some(Asia/Kolkata)), yyyy, Some(Asia/Kolkata)) as int) >= 2018) && isnotnull(bills#3586.$id.oid)) && isnotnull(is_previous_bill_item#3616)) && (is_previous_bill_item#3616 = false))
: : : : : : +- *(38) Scan MongoRelation(MongoRDD[25] at RDD at MongoRDD.scala:51,Some(StructType(StructField(_id,StructType(StructField(oid,StringType,true)),true), StructField(amount,DoubleType,true), StructField(billDoctor,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true)),true), StructField(billType,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(bill_date,TimestampType,true), StructField(bill_doctor,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(bill_doctor_name,StringType,true), StructField(bill_item_unique_id,StringType,true), StructField(bill_unique_id,StringType,true), StructField(bills,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(cgst_amount,DoubleType,true), StructField(cgst_per,DoubleType,true), StructField(created_at,TimestampType,true), StructField(description,StringType,true), StructField(discount,DoubleType,true), StructField(discount_amount,IntegerType,true), StructField(discount_per,DoubleType,true), StructField(doctor,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(doctor_fee,DoubleType,true), StructField(etl_billType,StringType,true), StructField(etl_billedOutlet,StringType,true), StructField(etl_data,BooleanType,true), StructField(etl_data_batch,StringType,true), StructField(etl_doctor,StringType,true), StructField(etl_item,StringType,true), StructField(etl_surgery,StringType,true), StructField(etl_taxMaster,StringType,true), StructField(igst_amount,DoubleType,true), StructField(igst_per,DoubleType,true), StructField(initial_amount,DoubleType,true), StructField(inventoryItemBatchDetail,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(inventoryLocationStock,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(inventoryStockLocation,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(ipAppointment,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(is_deleted,BooleanType,true), StructField(is_despatched_item,BooleanType,true), StructField(is_modified,BooleanType,true), StructField(is_modified_deleted,BooleanType,true), StructField(is_old_bill,BooleanType,true), StructField(is_previous_bill_item,BooleanType,true), StructField(is_sponsor_bill,BooleanType,true), StructField(is_stent_invoice_loaded,BooleanType,true), StructField(is_tax_reversed,BooleanType,true), StructField(item,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(item_category,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(item_group,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(item_movement_summary,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(legacy_billno,StringType,true), StructField(legacy_branchcode,StringType,true), StructField(legacy_concessionrate,StringType,true), StructField(legacy_dailycharge,StringType,true), StructField(legacy_dosage,StringType,true), StructField(legacy_emergency,StringType,true), StructField(legacy_itemcessamount,StringType,true), StructField(legacy_medicineusagereference,StringType,true), StructField(legacy_oldbillitemcost,StringType,true), StructField(legacy_oldvatamount,StringType,true), StructField(legacy_oldvatpercentage,StringType,true), StructField(legacy_prescriptionreference,StringType,true), StructField(legacy_productserialnumber,StringType,true), StructField(legacy_recordlocked,StringType,true), StructField(legacy_salestaxpercentage,StringType,true), StructField(legacy_sellingcgstamount,StringType,true), StructField(legacy_sellingdiscountamount,DoubleType,true), StructField(legacy_sellingdiscountpercentage,DoubleType,true), StructField(legacy_sellingsgstamount,StringType,true), StructField(legacy_slno,StringType,true), StructField(legacy_transfered,StringType,true), StructField(legacy_updategstvat,StringType,true), StructField(legacy_vatamount,StringType,true), StructField(legacy_vatinclusive,StringType,true), StructField(legacy_vatpercentage,StringType,true), StructField(less,BooleanType,true), StructField(local_storage_delete,BooleanType,true), StructField(master_tax,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(modified_at,TimestampType,true), StructField(mrp_price,DoubleType,true), StructField(organization,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(organization_code,StringType,true), StructField(package_order,IntegerType,true), StructField(previous_return_qty,StringType,true), StructField(quantity,IntegerType,true), StructField(rack,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(reversed_gst_amount,DoubleType,true), StructField(sess_amount,DoubleType,true), StructField(sgst_amount,DoubleType,true), StructField(sgst_per,DoubleType,true), StructField(surgery,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true)),true), StructField(taxMaster,NullType,true), StructField(total,DoubleType,true), StructField(total_sales_return_amount,DoubleType,true), StructField(unit_price,DoubleType,true)))) [bill_doctor#3582,is_previous_bill_item#3616,total#3666,item#3620,bills#3586,doctor#3594,_id#3577,quantity#3658,bill_date#3581,discount#3591,amount#3578] PushedFilters: [IsNotNull(is_previous_bill_item), EqualTo(is_previous_bill_item,false)], ReadSchema: struct<bill_doctor:struct<$ref:string,$id:struct<oid:string>,$db:string>,is_previous_bill_item:bo...
: : : : : +- *(49) Sort [bills_id#6120 ASC NULLS FIRST], false, 0
: : : : : +- Exchange hashpartitioning(bills_id#6120, 200)