Airflow duplicates content of the logs while writing to GCS - google-cloud-storage

I configured Airflow 1.9 to store dag logs in Google Cloud Storage following (exactly) this description. It is working, however parts of the content of all DAGs logs seems to be duplicated (see below). It appears as if the log were appended to itself with additional information about the upload. The log file on a local drive doesn't have those duplicates.
It seems that gcs_write is by default using an append mode - so the only hack I found is to change it to False. Is there a configuration for that? What is the reason for this anyway?
I have changed following variables in the cfg file:
task_log_reader=gcs.task
logging_config_class=log_config.LOGGING_CONFIG
remote_log_conn_id=gcs
log_config.py:
GCS_LOG_FOLDER = 'gs://XXXX/'
LOG_LEVEL = conf.get('core', 'LOGGING_LEVEL').upper()
LOG_FORMAT = conf.get('core', 'log_format')
BASE_LOG_FOLDER = conf.get('core', 'BASE_LOG_FOLDER')
PROCESSOR_LOG_FOLDER = conf.get('scheduler', 'child_process_log_directory')
FILENAME_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log'
PROCESSOR_FILENAME_TEMPLATE = '{{ filename }}.log'
LOGGING_CONFIG = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'airflow.task': {
'format': LOG_FORMAT,
},
'airflow.processor': {
'format': LOG_FORMAT,
},
},
'handlers': {
'console': {
'class': 'logging.StreamHandler',
'formatter': 'airflow.task',
'stream': 'ext://sys.stdout'
},
'file.task': {
'class': 'airflow.utils.log.file_task_handler.FileTaskHandler',
'formatter': 'airflow.task',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
'filename_template': FILENAME_TEMPLATE,
},
'file.processor': {
'class': 'airflow.utils.log.file_processor_handler.FileProcessorHandler',
'formatter': 'airflow.processor',
'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
'filename_template': PROCESSOR_FILENAME_TEMPLATE,
}
, 'gcs.task': {
'class': 'airflow.utils.log.gcs_task_handler.GCSTaskHandler',
'formatter': 'airflow.task',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
'gcs_log_folder': GCS_LOG_FOLDER,
'filename_template': FILENAME_TEMPLATE,
},
},
'loggers': {
'': {
'handlers': ['console'],
'level': LOG_LEVEL
},
'airflow': {
'handlers': ['console'],
'level': LOG_LEVEL,
'propagate': False,
},
'airflow.processor': {
'handlers': ['file.processor'],
'level': LOG_LEVEL,
'propagate': True,
},
'airflow.task': {
'handlers': ['gcs.task'],
'level': LOG_LEVEL,
'propagate': False,
},
'airflow.task_runner': {
'handlers': ['gcs.task'],
'level': LOG_LEVEL,
'propagate': True,
},
}
}
Log:
*** Reading remote log from gs://XXXX/mwt1/mwt1_task1/2018-10-02T15:30:00/1.log.
[2018-11-16 10:27:17,304] {{cli.py:374}} INFO - Running on host fdfdf2f790e1
[2018-11-16 10:27:17,336] {{models.py:1197}} INFO - Dependencies all met for <TaskInstance: mwt1.mwt1_task1 2018-10-02 15:30:00 [queued]>
[2018-11-16 10:27:17,342] {{models.py:1197}} INFO - Dependencies all met for <TaskInstance: mwt1.mwt1_task1 2018-10-02 15:30:00 [queued]>
[2018-11-16 10:27:17,342] {{models.py:1407}} INFO -
--------------------------------------------------------------------------------
Starting attempt 1 of 4
--------------------------------------------------------------------------------
[2018-11-16 10:27:17,354] {{models.py:1428}} INFO - Executing <Task(BashOperator): mwt1_task1> on 2018-10-02 15:30:00
[2018-11-16 10:27:17,355] {{base_task_runner.py:115}} INFO - Running: ['bash', '-c', 'airflow run mwt1 mwt1_task1 2018-10-02T15:30:00 --job_id 48 --raw -sd DAGS_FOLDER/mwt1.py']
[2018-11-16 10:27:17,939] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:17,938] {{__init__.py:45}} INFO - Using executor LocalExecutor
[2018-11-16 10:27:18,231] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,230] {{models.py:189}} INFO - Filling up the DagBag from /usr/local/airflow/dags/mwt1.py
[2018-11-16 10:27:18,451] {{cli.py:374}} INFO - Running on host fdfdf2f790e1
[2018-11-16 10:27:18,473] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,473] {{bash_operator.py:70}} INFO - Tmp dir root location:
[2018-11-16 10:27:18,473] {{base_task_runner.py:98}} INFO - Subtask: /tmp
[2018-11-16 10:27:18,474] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,473] {{bash_operator.py:80}} INFO - Temporary script location: /tmp/airflowtmp5g0d6e4h//tmp/airflowtmp5g0d6e4h/mwt1_task1_8ob3n0y
[2018-11-16 10:27:18,474] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,473] {{bash_operator.py:88}} INFO - Running command: bdasdasdasd
[2018-11-16 10:27:18,479] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,479] {{bash_operator.py:97}} INFO - Output:
[2018-11-16 10:27:18,479] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,479] {{bash_operator.py:101}} INFO - /tmp/airflowtmp5g0d6e4h/mwt1_task1_8ob3n0y: line 1: bdasdasdasd: command not found
[2018-11-16 10:27:18,480] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,480] {{bash_operator.py:105}} INFO - Command exited with return code 127
[2018-11-16 10:27:18,488] {{models.py:1595}} ERROR - Bash command failed
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1493, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/bash_operator.py", line 109, in execute
raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed
[2018-11-16 10:27:18,490] {{models.py:1616}} INFO - Marking task as UP_FOR_RETRY
[2018-11-16 10:27:18,503] {{models.py:1644}} ERROR - Bash command failed
[2018-11-16 10:27:17,304] {{cli.py:374}} INFO - Running on host fdfdf2f790e1
[2018-11-16 10:27:17,336] {{models.py:1197}} INFO - Dependencies all met for <TaskInstance: mwt1.mwt1_task1 2018-10-02 15:30:00 [queued]>
[2018-11-16 10:27:17,342] {{models.py:1197}} INFO - Dependencies all met for <TaskInstance: mwt1.mwt1_task1 2018-10-02 15:30:00 [queued]>
[2018-11-16 10:27:17,342] {{models.py:1407}} INFO -
--------------------------------------------------------------------------------
Starting attempt 1 of 4
--------------------------------------------------------------------------------
[2018-11-16 10:27:17,354] {{models.py:1428}} INFO - Executing <Task(BashOperator): mwt1_task1> on 2018-10-02 15:30:00
[2018-11-16 10:27:17,355] {{base_task_runner.py:115}} INFO - Running: ['bash', '-c', 'airflow run mwt1 mwt1_task1 2018-10-02T15:30:00 --job_id 48 --raw -sd DAGS_FOLDER/mwt1.py']
[2018-11-16 10:27:17,939] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:17,938] {{__init__.py:45}} INFO - Using executor LocalExecutor
[2018-11-16 10:27:18,231] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,230] {{models.py:189}} INFO - Filling up the DagBag from /usr/local/airflow/dags/mwt1.py
[2018-11-16 10:27:18,451] {{cli.py:374}} INFO - Running on host fdfdf2f790e1
[2018-11-16 10:27:18,473] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,473] {{bash_operator.py:70}} INFO - Tmp dir root location:
[2018-11-16 10:27:18,473] {{base_task_runner.py:98}} INFO - Subtask: /tmp
[2018-11-16 10:27:18,474] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,473] {{bash_operator.py:80}} INFO - Temporary script location: /tmp/airflowtmp5g0d6e4h//tmp/airflowtmp5g0d6e4h/mwt1_task1_8ob3n0y
[2018-11-16 10:27:18,474] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,473] {{bash_operator.py:88}} INFO - Running command: bdasdasdasd
[2018-11-16 10:27:18,479] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,479] {{bash_operator.py:97}} INFO - Output:
[2018-11-16 10:27:18,479] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,479] {{bash_operator.py:101}} INFO - /tmp/airflowtmp5g0d6e4h/mwt1_task1_8ob3n0y: line 1: bdasdasdasd: command not found
[2018-11-16 10:27:18,480] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,480] {{bash_operator.py:105}} INFO - Command exited with return code 127
[2018-11-16 10:27:18,488] {{models.py:1595}} ERROR - Bash command failed
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1493, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/bash_operator.py", line 109, in execute
raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed
[2018-11-16 10:27:18,490] {{models.py:1616}} INFO - Marking task as UP_FOR_RETRY
[2018-11-16 10:27:18,503] {{models.py:1644}} ERROR - Bash command failed
[2018-11-16 10:27:18,504] {{base_task_runner.py:98}} INFO - Subtask: /usr/local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
[2018-11-16 10:27:18,504] {{base_task_runner.py:98}} INFO - Subtask: """)
[2018-11-16 10:27:18,504] {{base_task_runner.py:98}} INFO - Subtask: Traceback (most recent call last):
[2018-11-16 10:27:18,504] {{base_task_runner.py:98}} INFO - Subtask: File "/usr/local/bin/airflow", line 27, in <module>
[2018-11-16 10:27:18,504] {{base_task_runner.py:98}} INFO - Subtask: args.func(args)
[2018-11-16 10:27:18,505] {{base_task_runner.py:98}} INFO - Subtask: File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 392, in run
[2018-11-16 10:27:18,505] {{base_task_runner.py:98}} INFO - Subtask: pool=args.pool,
[2018-11-16 10:27:18,505] {{base_task_runner.py:98}} INFO - Subtask: File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 50, in wrapper
[2018-11-16 10:27:18,505] {{base_task_runner.py:98}} INFO - Subtask: result = func(*args, **kwargs)
[2018-11-16 10:27:18,505] {{base_task_runner.py:98}} INFO - Subtask: File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1493, in _run_raw_task
[2018-11-16 10:27:18,505] {{base_task_runner.py:98}} INFO - Subtask: result = task_copy.execute(context=context)
[2018-11-16 10:27:18,505] {{base_task_runner.py:98}} INFO - Subtask: File "/usr/local/lib/python3.6/site-packages/airflow/operators/bash_operator.py", line 109, in execute
[2018-11-16 10:27:18,506] {{base_task_runner.py:98}} INFO - Subtask: raise AirflowException("Bash command failed")
[2018-11-16 10:27:18,506] {{base_task_runner.py:98}} INFO - Subtask: airflow.exceptions.AirflowException: Bash command failed
[2018-11-16 10:27:18,515] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,515] {{gcp_api_base_hook.py:82}} INFO - Getting connection using a JSON key file.
[2018-11-16 10:27:18,535] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,535] {{discovery.py:852}} INFO - URL being requested: GET https://www.googleapis.com/storage/v1/b/XXXX/o/mwt1%2Fmwt1_task1%2F2018-10-02T15%3A30%3A00%2F1.log?alt=media
[2018-11-16 10:27:18,535] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,535] {{client.py:595}} INFO - Attempting refresh to obtain initial access_token
[2018-11-16 10:27:18,537] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,537] {{client.py:893}} INFO - Refreshing access_token
[2018-11-16 10:27:18,911] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,911] {{gcp_api_base_hook.py:82}} INFO - Getting connection using a JSON key file.
[2018-11-16 10:27:18,922] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,922] {{util.py:134}} WARNING - __init__() takes at most 2 positional arguments (3 given)
[2018-11-16 10:27:18,928] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,928] {{discovery.py:852}} INFO - URL being requested: POST https://www.googleapis.com/upload/storage/v1/b/XXXX/o?name=mwt1%2Fmwt1_task1%2F2018-10-02T15%3A30%3A00%2F1.log&alt=json&uploadType=media
[2018-11-16 10:27:18,928] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,928] {{client.py:595}} INFO - Attempting refresh to obtain initial access_token
[2018-11-16 10:27:18,930] {{base_task_runner.py:98}} INFO - Subtask: [2018-11-16 10:27:18,930] {{client.py:893}} INFO - Refreshing access_token

This is a known issue that affects both GCS and S3 remote logging, see AIRFLOW-1916. It is fixed in Airflow 1.10 so you can either upgrade or pull run a fork with the fix.

Related

Create /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp, but there is no such file in hdfs

I'm using flume to get data from Kafka to HDFS. (Kafka Source and HDFS Sink). These are the versions I'm using.
hadoop-3.2.2
flume-1.9.0
kafka_2.11-0.10.1.0
This is my kafka-fluem-hdfs.conf:
a1.sources=r1 r2
a1.channels=c1 c2
a1.sinks=k1 k2
## source1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers=h01:9092,h02:9092,h03:9092
a1.sources.r1.kafka.topics=topic_start
## source2
a1.sources.r2.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r2.batchSize = 5000
a1.sources.r2.batchDurationMillis = 2000
a1.sources.r2.kafka.bootstrap.servers=h01:9092,h02:9092,h03:9092
a1.sources.r2.kafka.topics=topic_event
## channel1
a1.channels.c1.type = file
a1.channels.c1.checkpointDir=/usr/local/flume/checkpoint/behavior1
a1.channels.c1.dataDirs = /usr/local/flume/data/behavior1/
a1.channels.c1.keep-alive = 6
## channel2
a1.channels.c2.type = file
a1.channels.c2.checkpointDir=/usr/local/flume/checkpoint/behavior2
a1.channels.c2.dataDirs = /usr/local/flume/data/behavior2/
a1.channels.c2.keep-alive = 6
## sink1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /origin_data/gmall/log/topic_start/%Y-%m-%d
a1.sinks.k1.hdfs.filePrefix = logstart-
##sink2
a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path = /origin_data/gmall/log/topic_event/%Y-%m-%d
a1.sinks.k2.hdfs.filePrefix = logevent-
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k2.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.rollSize = 134217728
a1.sinks.k2.hdfs.rollCount = 0
a1.sinks.k1.hdfs.fileType = CompressedStream
a1.sinks.k2.hdfs.fileType = CompressedStream
a1.sinks.k1.hdfs.codeC = gzip
a1.sinks.k2.hdfs.codeC = gzip
#a1.sinks.k1.hdfs.codeC=com.hadoop.compression.lzo.LzopCodec
#a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.callTimeout=360000
#a1.sinks.k1.hdfs.maxIoWorkers=32
#a1.sinks.k1.hdfs.fileSuffix=.lzo
#a1.sinks.k2.hdfs.codeC=com.hadoop.compression.lzo.LzopCodec
#a1.sinks.k2.hdfs.writeFormat=Text
a1.sinks.k2.hdfs.callTimeout=360000
#a1.sinks.k2.hdfs.maxIoWorkers=32
#a1.sinks.k2.hdfs.fileSuffix=.lzo
a1.sources.r1.channels = c1
a1.sinks.k1.channel= c1
a1.sources.r2.channels = c2
a1.sinks.k2.channel= c2
The part of the log file:
2021-08-19 15:37:39,308 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp
2021-08-19 15:37:40,476 INFO [hdfs-k1-call-runner-0] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)) - Successfully loaded & initialized native-zlib library
2021-08-19 15:37:40,509 INFO [hdfs-k1-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,516 INFO [hdfs-k1-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:37:40,525 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:37:40,522 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp
2021-08-19 15:37:40,858 INFO [hdfs-k2-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,889 INFO [hdfs-k2-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas
my problems:
Create /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp, but there is no such file in hdfs.
More logs after I start flume:
....
....
....
2021-08-19 15:30:01,748 INFO [lifecycleSupervisor-1-0] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior1/checkpoint, elements to sync = 0
2021-08-19 15:30:01,754 INFO [lifecycleSupervisor-1-1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387001047, queueSize: 0, queueHead: 5765
2021-08-19 15:30:01,758 INFO [lifecycleSupervisor-1-0] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387001048, queueSize: 0, queueHead: 5778
2021-08-19 15:30:01,783 INFO [lifecycleSupervisor-1-0] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior1/log-26 position: 0 logWriteOrderID: 1629387001048
2021-08-19 15:30:01,783 INFO [lifecycleSupervisor-1-0] file.FileChannel (FileChannel.java:start(289)) - Queue Size after replay: 0 [channel=c1]
2021-08-19 15:30:01,784 INFO [lifecycleSupervisor-1-1] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior2/log-26 position: 0 logWriteOrderID: 1629387001047
2021-08-19 15:30:01,787 INFO [lifecycleSupervisor-1-1] file.FileChannel (FileChannel.java:start(289)) - Queue Size after replay: 0 [channel=c2]
2021-08-19 15:30:01,789 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(196)) - Starting Sink k1
2021-08-19 15:30:01,795 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
2021-08-19 15:30:01,795 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SINK, name: k1 started
2021-08-19 15:30:01,797 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(196)) - Starting Sink k2
2021-08-19 15:30:01,798 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(207)) - Starting Source r2
2021-08-19 15:30:01,799 INFO [lifecycleSupervisor-1-5] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SINK, name: k2: Successfully registered new MBean.
2021-08-19 15:30:01,803 INFO [lifecycleSupervisor-1-5] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SINK, name: k2 started
2021-08-19 15:30:01,799 INFO [lifecycleSupervisor-1-6] kafka.KafkaSource (KafkaSource.java:doStart(524)) - Starting org.apache.flume.source.kafka.KafkaSource{name:r2,state:IDLE}...
2021-08-19 15:30:01,815 INFO [conf-file-poller-0] node.Application (Application.java:startAllComponents(207)) - Starting Source r1
2021-08-19 15:30:01,818 INFO [lifecycleSupervisor-1-0] kafka.KafkaSource (KafkaSource.java:doStart(524)) - Starting org.apache.flume.source.kafka.KafkaSource{name:r1,state:IDLE}...
2021-08-19 15:30:01,918 INFO [lifecycleSupervisor-1-6] consumer.ConsumerConfig (AbstractConfig.java:logAll(279)) - ConsumerConfig values:
......
.......
.......
2021-08-19 15:30:01,926 INFO [lifecycleSupervisor-1-0] consumer.ConsumerConfig (AbstractConfig.java:logAll(279)) - ConsumerConfig values:
.....
......
......
2021-08-19 15:30:02,210 INFO [lifecycleSupervisor-1-0] utils.AppInfoParser (AppInfoParser.java:<init>(109)) - Kafka version : 2.0.1
2021-08-19 15:30:02,210 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r2: Successfully registered new MBean.
2021-08-19 15:30:02,211 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r2 started
2021-08-19 15:30:02,210 INFO [lifecycleSupervisor-1-0] utils.AppInfoParser (AppInfoParser.java:<init>(110)) - Kafka commitId : fa14705e51bd2ce5
2021-08-19 15:30:02,213 INFO [lifecycleSupervisor-1-0] kafka.KafkaSource (KafkaSource.java:doStart(547)) - Kafka source r1 started.
2021-08-19 15:30:02,214 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
2021-08-19 15:30:02,214 INFO [lifecycleSupervisor-1-0] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r1 started
2021-08-19 15:30:02,726 INFO [PollableSourceRunner-KafkaSource-r1] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:30:02,730 INFO [PollableSourceRunner-KafkaSource-r2] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:30:02,740 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-1, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:30:02,747 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-2, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:30:02,748 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-1, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:30:02,770 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-1, groupId=flume] (Re-)joining group
2021-08-19 15:30:02,776 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-2, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:30:02,776 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:30:02,845 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:30:02,935 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-1, groupId=flume] Successfully joined group with generation 66
2021-08-19 15:30:02,936 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-1, groupId=flume] Setting newly assigned partitions [topic_event-0]
2021-08-19 15:30:02,936 INFO [PollableSourceRunner-KafkaSource-r2] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_event - partition 0 assigned.
2021-08-19 15:30:02,950 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-2, groupId=flume] Successfully joined group with generation 66
2021-08-19 15:30:02,950 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-2, groupId=flume] Setting newly assigned partitions [topic_start-0]
2021-08-19 15:30:02,950 INFO [PollableSourceRunner-KafkaSource-r1] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_start - partition 0 assigned.
2021-08-19 15:30:04,912 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:30:04,912 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:30:04,984 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz.tmp
2021-08-19 15:30:06,577 INFO [hdfs-k2-call-runner-0] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)) - Successfully loaded & initialized native-zlib library
2021-08-19 15:30:06,606 INFO [hdfs-k2-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:30:06,648 INFO [hdfs-k2-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:30:06,665 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz.tmp
2021-08-19 15:30:06,675 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:30:06,916 INFO [hdfs-k1-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:30:06,927 INFO [hdfs-k1-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:30:06,931 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:30:16,676 INFO [hdfs-k2-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:30:16,676 INFO [hdfs-k2-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz.tmp
2021-08-19 15:30:16,682 INFO [hdfs-k2-call-runner-2] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz.tmp to /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387004913.gz
2021-08-19 15:30:16,932 INFO [hdfs-k1-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:30:16,932 INFO [hdfs-k1-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz.tmp
2021-08-19 15:30:16,934 INFO [hdfs-k1-call-runner-2] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz.tmp to /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387004913.gz
2021-08-19 15:30:30,932 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior2/checkpoint, elements to sync = 970
2021-08-19 15:30:30,936 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior1/checkpoint, elements to sync = 967
2021-08-19 15:30:30,951 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387004945, queueSize: 0, queueHead: 6733
2021-08-19 15:30:30,953 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387004946, queueSize: 0, queueHead: 6743
2021-08-19 15:30:30,963 INFO [Log-BackgroundWorker-c2] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior2/log-26 position: 1147366 logWriteOrderID: 1629387004945
2021-08-19 15:30:30,964 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-20
2021-08-19 15:30:30,967 INFO [Log-BackgroundWorker-c1] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior1/log-26 position: 487027 logWriteOrderID: 1629387004946
.....
.....
.....
2021-08-19 15:36:19,570 INFO [lifecycleSupervisor-1-8] utils.AppInfoParser (AppInfoParser.java:<init>(109)) - Kafka version : 2.0.1
2021-08-19 15:36:19,570 INFO [lifecycleSupervisor-1-8] utils.AppInfoParser (AppInfoParser.java:<init>(110)) - Kafka commitId : fa14705e51bd2ce5
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-6] utils.AppInfoParser (AppInfoParser.java:<init>(109)) - Kafka version : 2.0.1
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-6] utils.AppInfoParser (AppInfoParser.java:<init>(110)) - Kafka commitId : fa14705e51bd2ce5
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-8] kafka.KafkaSource (KafkaSource.java:doStart(547)) - Kafka source r1 started.
2021-08-19 15:36:19,572 INFO [lifecycleSupervisor-1-6] kafka.KafkaSource (KafkaSource.java:doStart(547)) - Kafka source r2 started.
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-8] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:register(119)) - Monitored counter group for type: SOURCE, name: r2: Successfully registered new MBean.
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-8] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r1 started
2021-08-19 15:36:19,573 INFO [lifecycleSupervisor-1-6] instrumentation.MonitoredCounterGroup (MonitoredCounterGroup.java:start(95)) - Component type: SOURCE, name: r2 started
2021-08-19 15:36:20,012 INFO [PollableSourceRunner-KafkaSource-r2] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:36:20,015 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-1, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:36:20,025 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-1, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:36:20,027 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-1, groupId=flume] (Re-)joining group
2021-08-19 15:36:20,030 INFO [PollableSourceRunner-KafkaSource-r1] clients.Metadata (Metadata.java:update(285)) - Cluster ID: erHI3p-1SzKgC1ywVUE_Dw
2021-08-19 15:36:20,034 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(677)) - [Consumer clientId=consumer-2, groupId=flume] Discovered group coordinator h01:9092 (id: 2147483647 rack: null)
2021-08-19 15:36:20,039 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinPrepare(472)) - [Consumer clientId=consumer-2, groupId=flume] Revoking previously assigned partitions []
2021-08-19 15:36:20,039 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:36:20,068 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:sendJoinGroupRequest(509)) - [Consumer clientId=consumer-2, groupId=flume] (Re-)joining group
2021-08-19 15:36:20,152 INFO [PollableSourceRunner-KafkaSource-r2] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-1, groupId=flume] Successfully joined group with generation 69
2021-08-19 15:36:20,153 INFO [PollableSourceRunner-KafkaSource-r2] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-1, groupId=flume] Setting newly assigned partitions [topic_event-0]
2021-08-19 15:36:20,154 INFO [PollableSourceRunner-KafkaSource-r2] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_event - partition 0 assigned.
2021-08-19 15:36:20,159 INFO [PollableSourceRunner-KafkaSource-r1] internals.AbstractCoordinator (AbstractCoordinator.java:onSuccess(473)) - [Consumer clientId=consumer-2, groupId=flume] Successfully joined group with generation 69
2021-08-19 15:36:20,159 INFO [PollableSourceRunner-KafkaSource-r1] internals.ConsumerCoordinator (ConsumerCoordinator.java:onJoinComplete(280)) - [Consumer clientId=consumer-2, groupId=flume] Setting newly assigned partitions [topic_start-0]
2021-08-19 15:36:20,159 INFO [PollableSourceRunner-KafkaSource-r1] kafka.SourceRebalanceListener (KafkaSource.java:onPartitionsAssigned(648)) - topic topic_start - partition 0 assigned.
2021-08-19 15:37:39,286 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:37:39,286 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.HDFSCompressedDataStream (HDFSCompressedDataStream.java:configure(64)) - Serializer = TEXT, UseRawLocalFileSystem = false
2021-08-19 15:37:39,308 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp
2021-08-19 15:37:40,476 INFO [hdfs-k1-call-runner-0] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)) - Successfully loaded & initialized native-zlib library
2021-08-19 15:37:40,509 INFO [hdfs-k1-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,516 INFO [hdfs-k1-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:37:40,525 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:37:40,522 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:open(246)) - Creating /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp
2021-08-19 15:37:40,858 INFO [hdfs-k2-call-runner-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.gz]
2021-08-19 15:37:40,889 INFO [hdfs-k2-call-runner-0] hdfs.AbstractHDFSWriter (AbstractHDFSWriter.java:reflectGetNumCurrentReplicas(190)) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
2021-08-19 15:37:40,889 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] hdfs.BucketWriter (BucketWriter.java:getRefIsClosed(197)) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
2021-08-19 15:37:48,532 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior2/checkpoint, elements to sync = 949
2021-08-19 15:37:48,533 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:beginCheckpoint(230)) - Start checkpoint for /usr/local/flume/checkpoint/behavior1/checkpoint, elements to sync = 1002
2021-08-19 15:37:48,562 INFO [Log-BackgroundWorker-c1] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387382580, queueSize: 0, queueHead: 7743
2021-08-19 15:37:48,562 INFO [Log-BackgroundWorker-c2] file.EventQueueBackingStoreFile (EventQueueBackingStoreFile.java:checkpoint(255)) - Updating checkpoint metadata: logWriteOrderID: 1629387382581, queueSize: 0, queueHead: 7680
2021-08-19 15:37:48,570 INFO [Log-BackgroundWorker-c1] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior1/log-27 position: 504908 logWriteOrderID: 1629387382580
2021-08-19 15:37:48,571 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-21
2021-08-19 15:37:48,578 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-22
2021-08-19 15:37:48,578 INFO [Log-BackgroundWorker-c2] file.Log (Log.java:writeCheckpoint(1065)) - Updated checkpoint for file: /usr/local/flume/data/behavior2/log-27 position: 1144058 logWriteOrderID: 1629387382581
2021-08-19 15:37:48,581 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-20
2021-08-19 15:37:48,585 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-23
2021-08-19 15:37:48,587 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-21
2021-08-19 15:37:48,591 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-24
2021-08-19 15:37:48,593 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-22
2021-08-19 15:37:48,597 INFO [Log-BackgroundWorker-c1] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior1/log-25
2021-08-19 15:37:48,600 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-23
2021-08-19 15:37:48,606 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-24
2021-08-19 15:37:48,612 INFO [Log-BackgroundWorker-c2] file.LogFile (LogFile.java:close(520)) - Closing RandomReader /usr/local/flume/data/behavior2/log-25
2021-08-19 15:37:50,526 INFO [hdfs-k1-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:37:50,526 INFO [hdfs-k1-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp
2021-08-19 15:37:50,694 INFO [hdfs-k1-call-runner-6] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp to /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz
2021-08-19 15:37:50,890 INFO [hdfs-k2-roll-timer-0] hdfs.HDFSEventSink (HDFSEventSink.java:run(393)) - Writer callback called.
2021-08-19 15:37:50,890 INFO [hdfs-k2-roll-timer-0] hdfs.BucketWriter (BucketWriter.java:doClose(438)) - Closing /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp
2021-08-19 15:37:50,893 INFO [hdfs-k2-call-runner-3] hdfs.BucketWriter (BucketWriter.java:call(681)) - Renaming /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz.tmp to /origin_data/gmall/log/topic_event/2021-08-19/logevent-.1629387459287.gz
.......
.......
.......

rxjava2-jdbc : Iterating over a toIterable() is blocking, which is not supported in thread reactor-http-nio-2"

I'm making a reactive/flux/mono rest service. Backend is Oracle, and using rxjava2-jdbc.
How to get past this blocking error?
I'm learning rx by example, so would be great to know the conceptual details that prevents list manipulation which feels routine-use.
Repository returns a Flux from rx/database:
Handler tries to add that list/flux into another Protobuf object SearchResponse, but fails.
Short stack trance:
Full stack trace:
2020-02-23T10:43:37,967 INFO [main] o.s.d.r.c.RepositoryConfigurationDelegate: Bootstrapping Spring Data JDBC repositories in DEFAULT mode.
2020-02-23T10:43:38,046 INFO [main] o.s.d.r.c.RepositoryConfigurationDelegate: Finished Spring Data repository scanning in 69ms. Found 0 JDBC repository interfaces.
2020-02-23T10:43:38,988 INFO [main] c.z.h.HikariDataSource: HikariPool-1 - Starting...
2020-02-23T10:43:39,334 INFO [main] c.z.h.HikariDataSource: HikariPool-1 - Start completed.
2020-02-23T10:43:40,196 INFO [main] o.s.b.w.e.n.NettyWebServer: Netty started on port(s): 8080
2020-02-23T10:43:40,199 INFO [main] o.s.b.StartupInfoLogger: Started App in 4.298 seconds (JVM running for 5.988)
2020-02-23T10:44:01,307 ERROR [reactor-http-nio-2] o.s.c.l.CompositeLog: [d806b05e] 500 Server Error for HTTP GET "/webflux/customers"
java.lang.IllegalStateException: Iterating over a toIterable() / toStream() is blocking, which is not supported in thread reactor-http-nio-2
at reactor.core.publisher.BlockingIterable$SubscriberIterator.hasNext(BlockingIterable.java:160)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ HTTP GET "/webflux/customers" [ExceptionHandlingWebHandler]
Stack trace:
at reactor.core.publisher.BlockingIterable$SubscriberIterator.hasNext(BlockingIterable.java:160)
at com.google.protobuf.AbstractMessageLite$Builder.addAllCheckingNulls(AbstractMessageLite.java:372)
at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:434)
at pn.api.protobuf.Proto$SearchResponse$Builder.addAllCustomers(Proto.java:3758)
at pn.api.controller.AppHandler.getAllCustomers(AppHandler.java:24)
at org.springframework.web.reactive.function.server.support.HandlerFunctionAdapter.handle(HandlerFunctionAdapter.java:61)
at org.springframework.web.reactive.DispatcherHandler.invokeHandler(DispatcherHandler.java:161)
at org.springframework.web.reactive.DispatcherHandler.lambda$handle$1(DispatcherHandler.java:146)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:118)
at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:67)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:76)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.innerNext(FluxConcatMap.java:274)
at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onNext(FluxConcatMap.java:851)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:121)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onNext(FluxPeekFuseable.java:203)
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2199)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.request(FluxPeekFuseable.java:137)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.request(FluxMapFuseable.java:162)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2007)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onSubscribe(Operators.java:1881)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onSubscribe(FluxMapFuseable.java:90)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onSubscribe(FluxPeekFuseable.java:171)
at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:54)
at reactor.core.publisher.Mono.subscribe(Mono.java:4105)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:441)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onSubscribe(FluxConcatMap.java:211)
at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:139)
at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:63)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.Mono.subscribe(Mono.java:4105)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:172)
at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55)
at reactor.netty.http.server.HttpServerHandle.onStateChange(HttpServerHandle.java:64)
at reactor.netty.tcp.TcpServerBind$ChildObserver.onStateChange(TcpServerBind.java:228)
at reactor.netty.http.server.HttpServerOperations.onInboundNext(HttpServerOperations.java:465)
at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:90)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
at reactor.netty.http.server.HttpTrafficHandler.channelRead(HttpTrafficHandler.java:167)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:830)
Repository.java
public Flux<Proto.Customer> allCustomers() {//rxjava2 returns Flowable<> ... Flux<>
Flowable<Proto.Customer> customerFlowable =
db.select(queryAllCustomers).get(new CustomerResultSetMapper());
return Flux.from(customerFlowable);
}
AppHandler.java
public Mono<ServerResponse> getAllCustomers(ServerRequest request) {
Flux<Proto.Customer> customers = repository.allCustomers();
Proto.SearchResponse out = Proto.SearchResponse.newBuilder()
.addAllCustomers(customers.toIterable()).build();
return ServerResponse.ok()
.contentType(MediaType.APPLICATION_JSON)
.body(out, Proto.SearchResponse.class);
}

how to fix "ERROR MongoRDD: WARNING: Partitioning failed. Partitioning using the 'DefaultMongoPartitioner$' failed." in pyspark

When I run the code locally it runs fine but when I run in the server the same code I get the above error.
When I run locally I read the data from local mongodb then I have no error. But when I run in server I read data from mongodb replica server
I have tried changing the
".config("spark.mongodb.input.partitionerOptions", "MongoPaginateByCountPartitioner")"
to
MongoDefaultPartitioner,MongoSplitVectorPartitioner
def save_n_rename(df):
print('------------------------------------- WRITING INITIATED -------------------------------------------')
df.write.format('com.mongodb.spark.sql.DefaultSource').mode('overwrite')\
.option('uri', '{}/{}.Revenue_Analytics'.format(mongo_final_url, mongo_final_db)).save()
print('------------------------------------- WRITING COMPLETED -------------------------------------------')
def main():
spark = SparkSession.builder \
.master(props.get(env, 'executionMode')) \
.appName("Revenue_Analytics") \
.config("spark.mongodb.input.partitionerOptions", "MongoPaginateByCountPartitioner") \
.getOrCreate()
start = time()
df = processing(spark)
mins_elapsed, secs_elapsed = divmod(time() - start, 60)
print("----------- Completed processing in {}m {:.2f}s -----------".format(mins_elapsed, secs_elapsed))
save_n_rename(df)
if __name__ == '__main__':
main()
EDIT 1:
MongoDB Version: 4.2.0
Pyspark version: 2.4.4
traceback:
19/10/24 12:57:45 INFO CodeGenerator: Code generated in 7.006073 ms
19/10/24 12:57:45 INFO CodeGenerator: Code generated in 4.714324 ms
19/10/24 12:57:45 INFO cluster: Cluster created with settings {hosts=[172.16.10.252:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
19/10/24 12:57:45 INFO cluster: Cluster description not yet available. Waiting for 30000 ms before timing out
19/10/24 12:57:45 INFO connection: Opened connection [connectionId{localValue:45, serverValue:172200}] to 172.16.10.252:27017
19/10/24 12:57:45 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=172.16.10.252:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 0, 0]}, minWireVersion=0, maxWireVersion=7, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=419102, setName='rs0', canonicalAddress=mongo-repl-3:27017, hosts=[172.16.10.250:27017, mongo-repl-2:27017, mongo-repl-3:27017], passives=[], arbiters=[], primary='172.16.10.250:27017', tagSet=TagSet{[]}, electionId=null, setVersion=3, lastWriteDate=Thu Oct 24 12:57:45 IST 2019, lastUpdateTimeNanos=2312527044492704}
19/10/24 12:57:45 INFO MongoClientCache: Creating MongoClient: [172.16.10.252:27017]
19/10/24 12:57:45 INFO connection: Opened connection [connectionId{localValue:46, serverValue:172201}] to 172.16.10.252:27017
19/10/24 12:57:45 INFO CodeGenerator: Code generated in 6.280343 ms
19/10/24 12:57:45 INFO CodeGenerator: Code generated in 3.269567 ms
19/10/24 12:57:45 INFO cluster: Cluster created with settings {hosts=[172.16.10.252:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
19/10/24 12:57:45 INFO cluster: Cluster description not yet available. Waiting for 30000 ms before timing out
19/10/24 12:57:45 INFO connection: Opened connection [connectionId{localValue:47, serverValue:172202}] to 172.16.10.252:27017
19/10/24 12:57:45 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=172.16.10.252:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 0, 0]}, minWireVersion=0, maxWireVersion=7, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=570933, setName='rs0', canonicalAddress=mongo-repl-3:27017, hosts=[172.16.10.250:27017, mongo-repl-2:27017, mongo-repl-3:27017], passives=[], arbiters=[], primary='172.16.10.250:27017', tagSet=TagSet{[]}, electionId=null, setVersion=3, lastWriteDate=Thu Oct 24 12:57:45 IST 2019, lastUpdateTimeNanos=2312527212534350}
19/10/24 12:57:45 INFO MongoClientCache: Creating MongoClient: [172.16.10.252:27017]
19/10/24 12:57:45 INFO connection: Opened connection [connectionId{localValue:48, serverValue:172203}] to 172.16.10.252:27017
19/10/24 12:57:45 INFO CodeGenerator: Code generated in 6.001824 ms
19/10/24 12:57:45 INFO CodeGenerator: Code generated in 3.610373 ms
19/10/24 12:57:45 INFO cluster: Cluster created with settings {hosts=[172.16.10.252:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
19/10/24 12:57:45 INFO cluster: Cluster description not yet available. Waiting for 30000 ms before timing out
19/10/24 12:57:45 INFO connection: Opened connection [connectionId{localValue:49, serverValue:172204}] to 172.16.10.252:27017
19/10/24 12:57:45 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=172.16.10.252:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 0, 0]}, minWireVersion=0, maxWireVersion=7, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=502689, setName='rs0', canonicalAddress=mongo-repl-3:27017, hosts=[172.16.10.250:27017, mongo-repl-2:27017, mongo-repl-3:27017], passives=[], arbiters=[], primary='172.16.10.250:27017', tagSet=TagSet{[]}, electionId=null, setVersion=3, lastWriteDate=Thu Oct 24 12:57:45 IST 2019, lastUpdateTimeNanos=2312527352871977}
19/10/24 12:57:45 INFO MongoClientCache: Creating MongoClient: [172.16.10.252:27017]
19/10/24 12:57:45 INFO connection: Opened connection [connectionId{localValue:50, serverValue:172205}] to 172.16.10.252:27017
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.552305 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 3.230598 ms
19/10/24 12:57:46 INFO cluster: Cluster created with settings {hosts=[172.16.10.252:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
19/10/24 12:57:46 INFO cluster: Cluster description not yet available. Waiting for 30000 ms before timing out
19/10/24 12:57:46 INFO connection: Opened connection [connectionId{localValue:51, serverValue:172206}] to 172.16.10.252:27017
19/10/24 12:57:46 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=172.16.10.252:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 0, 0]}, minWireVersion=0, maxWireVersion=7, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=535708, setName='rs0', canonicalAddress=mongo-repl-3:27017, hosts=[172.16.10.250:27017, mongo-repl-2:27017, mongo-repl-3:27017], passives=[], arbiters=[], primary='172.16.10.250:27017', tagSet=TagSet{[]}, electionId=null, setVersion=3, lastWriteDate=Thu Oct 24 12:57:46 IST 2019, lastUpdateTimeNanos=2312527492689014}
19/10/24 12:57:46 INFO MongoClientCache: Creating MongoClient: [172.16.10.252:27017]
19/10/24 12:57:46 INFO connection: Opened connection [connectionId{localValue:52, serverValue:172207}] to 172.16.10.252:27017
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 14.755534 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.132629 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.480881 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 4.944708 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.26496 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.270467 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.068084 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 4.947876 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 4.996435 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.080908 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 4.843392 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 4.93398 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 6.395543 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 5.189256 ms
19/10/24 12:57:46 INFO CodeGenerator: Code generated in 6.958948 ms
19/10/24 12:57:46 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:46 INFO connection: Closed connection [connectionId{localValue:32, serverValue:172187}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:46 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:46 INFO connection: Closed connection [connectionId{localValue:30, serverValue:172185}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:49 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:49 INFO connection: Closed connection [connectionId{localValue:36, serverValue:172191}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:49 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:49 INFO connection: Closed connection [connectionId{localValue:38, serverValue:172193}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:49 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:49 INFO connection: Closed connection [connectionId{localValue:40, serverValue:172195}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:50 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:50 INFO connection: Closed connection [connectionId{localValue:42, serverValue:172197}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:50 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:50 INFO connection: Closed connection [connectionId{localValue:44, serverValue:172199}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:50 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:50 INFO connection: Closed connection [connectionId{localValue:46, serverValue:172201}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:50 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:50 INFO connection: Closed connection [connectionId{localValue:48, serverValue:172203}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:51 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:51 INFO connection: Closed connection [connectionId{localValue:50, serverValue:172205}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:57:51 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:57:51 INFO connection: Closed connection [connectionId{localValue:52, serverValue:172207}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:58:03 ERROR MongoRDD:
-----------------------------
WARNING: Partitioning failed.
-----------------------------
Partitioning using the 'DefaultMongoPartitioner$' failed.
Please check the stacktrace to determine the cause of the failure or check the Partitioner API documentation.
Note: Not all partitioners are suitable for all toplogies and not all partitioners support views.%n
-----------------------------
19/10/24 12:58:04 INFO SparkContext: Invoking stop() from shutdown hook
19/10/24 12:58:04 INFO MongoClientCache: Closing MongoClient: [172.16.10.252:27017]
19/10/24 12:58:04 INFO connection: Closed connection [connectionId{localValue:34, serverValue:172189}] to 172.16.10.252:27017 because the pool has been closed.
19/10/24 12:58:04 INFO SparkUI: Stopped Spark web UI at http://172.16.10.242:4040
19/10/24 12:58:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/10/24 12:58:04 INFO MemoryStore: MemoryStore cleared
19/10/24 12:58:04 INFO BlockManager: BlockManager stopped
19/10/24 12:58:04 INFO BlockManagerMaster: BlockManagerMaster stopped
19/10/24 12:58:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/10/24 12:58:04 INFO SparkContext: Successfully stopped SparkContext
19/10/24 12:58:04 INFO ShutdownHookManager: Shutdown hook called
19/10/24 12:58:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-04e7bf58-133a-4c10-b5c4-20ac740ab880
19/10/24 12:58:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-e36f3499-1c23-4f25-b5ce-3a6a9685f9bb
19/10/24 12:58:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-e36f3499-1c23-4f25-b5ce-3a6a9685f9bb/pyspark-28bc9fe4-4bd8-44dd-b541-a25def4e3930
------------------------------------- WRITING INITIATED -------------------------------------------
Traceback (most recent call last):
File "/home/svr_data_analytic/hmis-analytics-data-processing/src/main/python/sales/revenue.py", line 402, in <module>
main()
File "/home/svr_data_analytic/hmis-analytics-data-processing/src/main/python/sales/revenue.py", line 398, in main
save_n_rename(df)
File "/home/svr_data_analytic/hmis-analytics-data-processing/src/main/python/sales/revenue.py", line 383, in save_n_rename
.option('uri', '{}/{}.Revenue_Analytics'.format(mongo_final_url, mongo_final_db)).save()
File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 736, in save
File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o740.save.
: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange hashpartitioning(itemtype_id#4652, 200)
+- *(70) Project [quantity#3658, amount#3578, discount#3591, item_net_amount#3761, billitems_id#3816, bill_doctor_id#3879, item_doctor_id#3902, bill_no#5673, name#5803, bill_date#5671, type#5851, total_amount#6053, bill_discount#6054, bills_id#6120, Patient_Type#6307, bill_type#4350, admit_doc_id#1096, item_name#4400, itemtype_id#4652, group_name#5625, category_name#5511, classification_name#5397]
+- SortMergeJoin [item_classification_id#4961], [item_cls_id#5404], LeftOuter
:- *(67) Sort [item_classification_id#4961 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(item_classification_id#4961, 200)
: +- *(66) Project [quantity#3658, amount#3578, discount#3591, item_net_amount#3761, billitems_id#3816, bill_doctor_id#3879, item_doctor_id#3902, bill_no#5673, name#5803, bill_date#5671, type#5851, total_amount#6053, bill_discount#6054, bills_id#6120, Patient_Type#6307, bill_type#4350, admit_doc_id#1096, item_name#4400, itemtype_id#4652, item_classification_id#4961, group_name#5625, category_name#5511]
: +- SortMergeJoin [item_category_id#4857], [item_cat_id#5510], LeftOuter
: :- *(63) Sort [item_category_id#4857 ASC NULLS FIRST], false, 0
: : +- Exchange hashpartitioning(item_category_id#4857, 200)
: : +- *(62) Project [quantity#3658, amount#3578, discount#3591, item_net_amount#3761, billitems_id#3816, bill_doctor_id#3879, item_doctor_id#3902, bill_no#5673, name#5803, bill_date#5671, type#5851, total_amount#6053, bill_discount#6054, bills_id#6120, Patient_Type#6307, bill_type#4350, admit_doc_id#1096, item_name#4400, itemtype_id#4652, item_category_id#4857, item_classification_id#4961, group_name#5625]
: : +- SortMergeJoin [item_group_id#4754], [item_grp_id#5624], LeftOuter
: : :- *(59) Sort [item_group_id#4754 ASC NULLS FIRST], false, 0
: : : +- Exchange hashpartitioning(item_group_id#4754, 200)
: : : +- *(58) Project [quantity#3658, amount#3578, discount#3591, item_net_amount#3761, billitems_id#3816, bill_doctor_id#3879, item_doctor_id#3902, bill_no#5673, name#5803, bill_date#5671, type#5851, total_amount#6053, bill_discount#6054, bills_id#6120, Patient_Type#6307, bill_type#4350, admit_doc_id#1096, item_name#4400, itemtype_id#4652, item_group_id#4754, item_category_id#4857, item_classification_id#4961]
: : : +- SortMergeJoin [billitems_item_id#3857], [item_id#4551], LeftOuter
: : : :- *(55) Sort [billitems_item_id#3857 ASC NULLS FIRST], false, 0
: : : : +- Exchange hashpartitioning(billitems_item_id#3857, 200)
: : : : +- *(54) Project [quantity#3658, amount#3578, discount#3591, item_net_amount#3761, billitems_id#3816, billitems_item_id#3857, bill_doctor_id#3879, item_doctor_id#3902, bill_no#5673, name#5803, bill_date#5671, type#5851, total_amount#6053, bill_discount#6054, bills_id#6120, Patient_Type#6307, bill_type#4350, admit_doc_id#1096]
: : : : +- SortMergeJoin [ip_app_id#6144], [ipapp_id#1094], LeftOuter
: : : : :- *(51) Sort [ip_app_id#6144 ASC NULLS FIRST], false, 0
: : : : : +- Exchange hashpartitioning(ip_app_id#6144, 200)
: : : : : +- *(50) Project [quantity#3658, amount#3578, discount#3591, item_net_amount#3761, billitems_id#3816, billitems_item_id#3857, bill_doctor_id#3879, item_doctor_id#3902, bill_no#5673, name#5803, bill_date#5671, type#5851, total_amount#6053, bill_discount#6054, bills_id#6120, ip_app_id#6144, Patient_Type#6307, bill_type#4350]
: : : : : +- *(50) SortMergeJoin [bill_id#3836], [bills_id#6120], Inner
: : : : : :- *(39) Sort [bill_id#3836 ASC NULLS FIRST], false, 0
: : : : : : +- Exchange hashpartitioning(bill_id#3836, 200)
: : : : : : +- *(38) Project [quantity#3658, amount#3578, discount#3591, total#3666 AS item_net_amount#3761, _id#3577.oid AS billitems_id#3816, bills#3586.$id.oid AS bill_id#3836, item#3620.$id.oid AS billitems_item_id#3857, bill_doctor#3582.$id.oid AS bill_doctor_id#3879, doctor#3594.$id.oid AS item_doctor_id#3902]
: : : : : : +- *(38) Filter ((((cast(from_unixtime(unix_timestamp(bill_date#3581, yyyy-MM-dd h:mm:ss, Some(Asia/Kolkata)), yyyy, Some(Asia/Kolkata)) as int) >= 2018) && isnotnull(bills#3586.$id.oid)) && isnotnull(is_previous_bill_item#3616)) && (is_previous_bill_item#3616 = false))
: : : : : : +- *(38) Scan MongoRelation(MongoRDD[25] at RDD at MongoRDD.scala:51,Some(StructType(StructField(_id,StructType(StructField(oid,StringType,true)),true), StructField(amount,DoubleType,true), StructField(billDoctor,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true)),true), StructField(billType,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(bill_date,TimestampType,true), StructField(bill_doctor,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(bill_doctor_name,StringType,true), StructField(bill_item_unique_id,StringType,true), StructField(bill_unique_id,StringType,true), StructField(bills,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(cgst_amount,DoubleType,true), StructField(cgst_per,DoubleType,true), StructField(created_at,TimestampType,true), StructField(description,StringType,true), StructField(discount,DoubleType,true), StructField(discount_amount,IntegerType,true), StructField(discount_per,DoubleType,true), StructField(doctor,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(doctor_fee,DoubleType,true), StructField(etl_billType,StringType,true), StructField(etl_billedOutlet,StringType,true), StructField(etl_data,BooleanType,true), StructField(etl_data_batch,StringType,true), StructField(etl_doctor,StringType,true), StructField(etl_item,StringType,true), StructField(etl_surgery,StringType,true), StructField(etl_taxMaster,StringType,true), StructField(igst_amount,DoubleType,true), StructField(igst_per,DoubleType,true), StructField(initial_amount,DoubleType,true), StructField(inventoryItemBatchDetail,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(inventoryLocationStock,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(inventoryStockLocation,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(ipAppointment,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(is_deleted,BooleanType,true), StructField(is_despatched_item,BooleanType,true), StructField(is_modified,BooleanType,true), StructField(is_modified_deleted,BooleanType,true), StructField(is_old_bill,BooleanType,true), StructField(is_previous_bill_item,BooleanType,true), StructField(is_sponsor_bill,BooleanType,true), StructField(is_stent_invoice_loaded,BooleanType,true), StructField(is_tax_reversed,BooleanType,true), StructField(item,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(item_category,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(item_group,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(item_movement_summary,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(legacy_billno,StringType,true), StructField(legacy_branchcode,StringType,true), StructField(legacy_concessionrate,StringType,true), StructField(legacy_dailycharge,StringType,true), StructField(legacy_dosage,StringType,true), StructField(legacy_emergency,StringType,true), StructField(legacy_itemcessamount,StringType,true), StructField(legacy_medicineusagereference,StringType,true), StructField(legacy_oldbillitemcost,StringType,true), StructField(legacy_oldvatamount,StringType,true), StructField(legacy_oldvatpercentage,StringType,true), StructField(legacy_prescriptionreference,StringType,true), StructField(legacy_productserialnumber,StringType,true), StructField(legacy_recordlocked,StringType,true), StructField(legacy_salestaxpercentage,StringType,true), StructField(legacy_sellingcgstamount,StringType,true), StructField(legacy_sellingdiscountamount,DoubleType,true), StructField(legacy_sellingdiscountpercentage,DoubleType,true), StructField(legacy_sellingsgstamount,StringType,true), StructField(legacy_slno,StringType,true), StructField(legacy_transfered,StringType,true), StructField(legacy_updategstvat,StringType,true), StructField(legacy_vatamount,StringType,true), StructField(legacy_vatinclusive,StringType,true), StructField(legacy_vatpercentage,StringType,true), StructField(less,BooleanType,true), StructField(local_storage_delete,BooleanType,true), StructField(master_tax,StructType(StructField($db,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($ref,StringType,true)),true), StructField(modified_at,TimestampType,true), StructField(mrp_price,DoubleType,true), StructField(organization,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(organization_code,StringType,true), StructField(package_order,IntegerType,true), StructField(previous_return_qty,StringType,true), StructField(quantity,IntegerType,true), StructField(rack,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true), StructField($db,StringType,true)),true), StructField(reversed_gst_amount,DoubleType,true), StructField(sess_amount,DoubleType,true), StructField(sgst_amount,DoubleType,true), StructField(sgst_per,DoubleType,true), StructField(surgery,StructType(StructField($ref,StringType,true), StructField($id,StructType(StructField(oid,StringType,true)),true)),true), StructField(taxMaster,NullType,true), StructField(total,DoubleType,true), StructField(total_sales_return_amount,DoubleType,true), StructField(unit_price,DoubleType,true)))) [bill_doctor#3582,is_previous_bill_item#3616,total#3666,item#3620,bills#3586,doctor#3594,_id#3577,quantity#3658,bill_date#3581,discount#3591,amount#3578] PushedFilters: [IsNotNull(is_previous_bill_item), EqualTo(is_previous_bill_item,false)], ReadSchema: struct<bill_doctor:struct<$ref:string,$id:struct<oid:string>,$db:string>,is_previous_bill_item:bo...
: : : : : +- *(49) Sort [bills_id#6120 ASC NULLS FIRST], false, 0
: : : : : +- Exchange hashpartitioning(bills_id#6120, 200)

What parameters should I pass for the schema-registry to run on non-master mode?

I want to run the schema-registry in non-master-mode in Kubernetes, I passed the environment variable master.eligibility=false, However, it's still electing the master.
Please point me where else I should change the configuration! There are no errors in the environment value being wrong.
cmd:
helm install helm-test-0.1.0.tgz --set env.name.SCHEMA_REGISTRY_KAFKASTORE_BOOTSERVERS="PLAINTEXT://xx.xx.xx.xx:9092\,PLAINTEXT://xx.xx.xx.xx:9092\,PLAINTEXT://xx.xx.xx.xx:9092" --set env.name.SCHEMA_REGISTRY_LISTENERS="http://0.0.0.0:8083" --set env.name.SCHEMA_REGISTRY_MASTER_ELIGIBILITY=false
Details:
replicaCount: 1
image:
repository: confluentinc/cp-schema-registry
tag: "5.0.0"
pullPolicy: IfNotPresent
env:
name:
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: "PLAINTEXT://xx.xxx.xx.xx:9092, PLAINTEXT://xx.xxx.xx.xx:9092, PLAINTEXT://xx.xxx.xx.xx:9092"
SCHEMA_REGISTRY_LISTENERS: "http://0.0.0.0:8883"
SCHEMA_REGISTRY_HOST_NAME: localhost
SCHEMA_REGISTRY_MASTER_ELIGIBILITY: false
Pod - schema-registry properties:
root#test-app-788455bb47-tjlhw:/# cat /etc/schema-registry/schema-registry.properties
master.eligibility=false
listeners=http://0.0.0.0:8883
host.name=xx.xx.xxx.xx
kafkastore.bootstrap.servers=PLAINTEXT://xx.xx.xx.xx:9092,PLAINTEXT://xx.xx.xx.xx:9092,PLAINTEXT://xx.xx.xx.xx:9092
echo "===> Launching ... "
+ echo '===> Launching ... '
exec /etc/confluent/docker/launch
+ exec /etc/confluent/docker/launch
===> Launching ...
===> Launching schema-registry ...
[2018-10-15 18:52:45,993] INFO SchemaRegistryConfig values:
resource.extension.class = []
metric.reporters = []
kafkastore.sasl.kerberos.kinit.cmd = /usr/bin/kinit
response.mediatype.default = application/vnd.schemaregistry.v1+json
kafkastore.ssl.trustmanager.algorithm = PKIX
inter.instance.protocol = http
authentication.realm =
ssl.keystore.type = JKS
kafkastore.topic = _schemas
metrics.jmx.prefix = kafka.schema.registry
kafkastore.ssl.enabled.protocols = TLSv1.2,TLSv1.1,TLSv1
kafkastore.topic.replication.factor = 3
ssl.truststore.password = [hidden]
kafkastore.timeout.ms = 500
host.name = xx.xxx.xx.xx
kafkastore.bootstrap.servers = [PLAINTEXT://xx.xxx.xx.xx:9092, PLAINTEXT://xx.xxx.xx.xx:9092, PLAINTEXT://xx.xxx.xx.xx:9092]
schema.registry.zk.namespace = schema_registry
kafkastore.sasl.kerberos.ticket.renew.window.factor = 0.8
kafkastore.sasl.kerberos.service.name =
schema.registry.resource.extension.class = []
ssl.endpoint.identification.algorithm =
compression.enable = false
kafkastore.ssl.truststore.type = JKS
avro.compatibility.level = backward
kafkastore.ssl.protocol = TLS
kafkastore.ssl.provider =
kafkastore.ssl.truststore.location =
response.mediatype.preferred = [application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json]
kafkastore.ssl.keystore.type = JKS
authentication.skip.paths = []
ssl.truststore.type = JKS
kafkastore.ssl.truststore.password = [hidden]
access.control.allow.origin =
ssl.truststore.location =
ssl.keystore.password = [hidden]
port = 8081
kafkastore.ssl.keystore.location =
metrics.tag.map = {}
master.eligibility = false
Logs of the schema-registry pod:
(org.apache.kafka.clients.consumer.ConsumerConfig)
[2018-10-15 18:52:48,571] INFO Kafka version : 2.0.0-cp1 (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-15 18:52:48,571] INFO Kafka commitId : 4b1dd33f255ddd2f (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-15 18:52:48,599] INFO Cluster ID: V-MGQtptQnuWK_K9-wot1Q (org.apache.kafka.clients.Metadata)
[2018-10-15 18:52:48,602] INFO Initialized last consumed offset to -1 (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-15 18:52:48,605] INFO [kafka-store-reader-thread-_schemas]: Starting (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-15 18:52:48,715] INFO [Consumer clientId=KafkaStore-reader-_schemas, groupId=schema-registry-10.100.4.189-8083] Resetting offset for partition _schemas-0 to offset 0. (org.apache.kafka.clients.consumer.internals.Fetcher)
[2018-10-15 18:52:48,721] INFO Cluster ID: V-MGQtptQnuWK_K9-wot1Q (org.apache.kafka.clients.Metadata)
[2018-10-15 18:52:48,775] INFO Wait to catch up until the offset of the last message at 228 (io.confluent.kafka.schemaregistry.storage.KafkaStore)
[2018-10-15 18:52:49,831] INFO Joining schema registry with Kafka-based coordination (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry)
[2018-10-15 18:52:49,852] INFO Kafka version : 2.0.0-cp1 (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-15 18:52:49,852] INFO Kafka commitId : 4b1dd33f255ddd2f (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-15 18:52:49,909] INFO Cluster ID: V-MGQtptQnuWK_K9-wot1Q (org.apache.kafka.clients.Metadata)
[2018-10-15 18:52:49,915] INFO [Schema registry clientId=sr-1, groupId=schema-registry] Discovered group coordinator ip-10-150-4-5.ec2.internal:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-10-15 18:52:49,919] INFO [Schema registry clientId=sr-1, groupId=schema-registry] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-10-15 18:52:52,975] INFO [Schema registry clientId=sr-1, groupId=schema-registry] Successfully joined group with generation 92 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-10-15 18:52:52,980] INFO Finished rebalance with master election result: Assignment{version=1, error=0, master='sr-1-abcd4cf2-8a02-4105-8361-9aa82107acd8', masterIdentity=version=1,host=ip-xx-xxx-xx-xx.ec2.internal,port=8083,scheme=http,masterEligibility=true} (io.confluent.kafka.schemaregistry.masterelector.kafka.KafkaGroupMasterElector)
[2018-10-15 18:52:53,088] INFO Adding listener: http://0.0.0.0:8083 (io.confluent.rest.Application)
[2018-10-15 18:52:53,347] INFO jetty-9.4.11.v20180605; built: 2018-06-05T18:24:03.829Z; git: d5fc0523cfa96bfebfbda19606cad384d772f04c; jvm 1.8.0_172-b01 (org.eclipse.jetty.server.Server)
[2018-10-15 18:52:53,428] INFO DefaultSessionIdManager workerName=node0 (org.eclipse.jetty.server.session)
[2018-10-15 18:52:53,429] INFO No SessionScavenger set, using defaults (org.eclipse.jetty.server.session)
[2018-10-15 18:52:53,432] INFO node0 Scavenging every 660000ms (org.eclipse.jetty.server.session)
Oct 15, 2018 6:52:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider io.confluent.kafka.schemaregistry.rest.resources.SubjectsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluent.kafka.schemaregistry.rest.resources.SubjectsResource will be ignored.
Oct 15, 2018 6:52:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider io.confluent.kafka.schemaregistry.rest.resources.ConfigResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluent.kafka.schemaregistry.rest.resources.ConfigResource will be ignored.
Oct 15, 2018 6:52:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider io.confluent.kafka.schemaregistry.rest.resources.SchemasResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluent.kafka.schemaregistry.rest.resources.SchemasResource will be ignored.
Oct 15, 2018 6:52:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider io.confluent.kafka.schemaregistry.rest.resources.SubjectVersionsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluent.kafka.schemaregistry.rest.resources.SubjectVersionsResource will be ignored.
Oct 15, 2018 6:52:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider io.confluent.kafka.schemaregistry.rest.resources.CompatibilityResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluent.kafka.schemaregistry.rest.resources.CompatibilityResource will be ignored.
[2018-10-15 18:52:54,364] INFO HV000001: Hibernate Validator 5.1.3.Final (org.hibernate.validator.internal.util.Version)
[2018-10-15 18:52:54,587] INFO Started o.e.j.s.ServletContextHandler#764faa6{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
[2018-10-15 18:52:54,619] INFO Started o.e.j.s.ServletContextHandler#14a50707{/ws,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
[2018-10-15 18:52:54,642] INFO Started NetworkTrafficServerConnector#62656be4{HTTP/1.1,[http/1.1]}{0.0.0.0:8083} (org.eclipse.jetty.server.AbstractConnector)
[2018-10-15 18:52:54,644] INFO Started #9700ms (org.eclipse.jetty.server.Server)
[2018-10-15 18:52:54,644] INFO Server started, listening for requests... (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain)
I checked and your configs look good. I believe, it is, in fact, starting as a follower and the logs are basically displaying who the master is in this case:
Assignment{version=1, error=0, master='sr-1-abcd4cf2-8a02-4105-8361-9aa82107acd8', masterIdentity=version=1,host=ip-xx-xxx-xx-xx.ec2.internal,port=8083,scheme=http,masterEligibility=true}

Kafka consumer java code not working

I have Just started with Kafka, I am able to produce and consume data through command prompt and also produce data through Java code even from remote server.
But I am trying this simple consumer Java code it is not working.
public class Simpleconsumer {
private final ConsumerConnector consumer;
private final String topic;
public Simpleconsumer(String topic) {
Properties props = new Properties();
props.put("zookeeper.connect", "127.0.0.1:2181");
props.put("group.id", "topic1");
props.put("auto.offset.reset", "smallest");
consumer = Consumer.createJavaConsumerConnector(new ConsumerConfig(props));
this.topic = topic;
}
public void testConsumer() {
try{
Map<String, Integer> topicCount = new HashMap<String, Integer>();
topicCount.put(topic, 1);
Map<String, List<KafkaStream<byte[], byte[]>>> consumerStreams = consumer.createMessageStreams(topicCount);
List<KafkaStream<byte[], byte[]>> streams = consumerStreams.get(topic);
System.out.println("start.......");
for (final KafkaStream stream : streams) {
ConsumerIterator<byte[], byte[]> it = stream.iterator();
System.out.println("iterate.......");
while (it.hasNext()) {
System.out.println("Message from Single Topic: " + new String(it.next().message()));
}
}
System.out.println("end.......");
if (consumer != null) {
consumer.shutdown();
}
}
catch(Exception e)
{
System.out.println(e);
}
}
public static void main(String[] args) {
// String topic = args[0];
Simpleconsumer simpleHLConsumer = new Simpleconsumer("topic1");
simpleHLConsumer.testConsumer();
}
}
Output is :-
log4j:WARN No appenders could be found for logger (kafka.utils.VerifiableProperties).
log4j:WARN Please initialize the log4j system properly.
start.......
iterate.......
There is no error, the programme doesn't terminate or give any output!!
Log of zookeeper
2016-02-18 17:31:31,790 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#197] - Accepted socket connection from /127.0.0.1:33338
2016-02-18 17:31:31,793 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#868] - Client attempting to establish new session at /127.0.0.1:33338
2016-02-18 17:31:31,821 [myid:] - INFO [SyncThread:0:ZooKeeperServer#617] - Established session 0x152f4265b0b0009 with negotiated timeout 6000 for client /127.0.0.1:33338
2016-02-18 17:31:31,891 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor#645] - Got user-level KeeperException when processing sessionid:0x152f4265b0b0009 type:create cxid:0x1 zxid:0x718 txntype:-1 reqpath:n/a Error Path:/consumers Error:KeeperErrorCode = NodeExists for /consumers
2016-02-18 17:31:31,892 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor#645] - Got user-level KeeperException when processing sessionid:0x152f4265b0b0009 type:create cxid:0x2 zxid:0x719 txntype:-1 reqpath:n/a Error Path:/consumers/artinew Error:KeeperErrorCode = NodeExists for /consumers/artinew
2016-02-18 17:31:31,892 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor#645] - Got user-level KeeperException when processing sessionid:0x152f4265b0b0009 type:create cxid:0x3 zxid:0x71a txntype:-1 reqpath:n/a Error Path:/consumers/artinew/ids Error:KeeperErrorCode = NodeExists for /consumers/artinew/ids
2016-02-18 17:31:32,000 [myid:] - INFO [SessionTracker:ZooKeeperServer#347] - Expiring session 0x152f4265b0b0008, timeout of 6000ms exceeded
2016-02-18 17:31:32,000 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor#494] - Processed session termination for sessionid: 0x152f4265b0b0008
2016-02-18 17:31:32,002 [myid:] - INFO [SyncThread:0:NIOServerCnxn#1007] - Closed socket connection for client /127.0.0.1:33337 which had sessionid 0x152f4265b0b0
I am getting this in Kafka console in infinite loop. please explain
[2016-02-17 20:50:08,594] INFO Closing socket connection to /xx.xx.xx.xx. (kafka.network.Processor)
[2016-02-17 20:50:08,174] INFO Closing socket connection to /xx.xx.xx.xx. (kafka.network.Processor)
[2016-02-17 20:50:08,385] INFO Closing socket connection to /xx.xx.xx.xx. (kafka.network.Processor)
[2016-02-17 20:50:08,760] INFO Closing socket connection to /xx.xx.xx.xx. (kafka.network.Processor)
i created the topic in the following manner
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 5 --topic topic1
i am able to consume it in command prompt using
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic topic1
Not able to understand what is the issue.
Try localhost instead of 127.0.0.1 in code to make sure local resolution is working fine.