Why does my Celery worker die with signal 6 - SIGIOT?

Why does my Celery worker die with signal 6 - SIGIOT? - celery

I'm running a celery based application. Every now and then I see the following in the log:
[... ERROR/MainProcess] Task
[...] raised unexpected: WorkerLostError('Worker exited prematurely: signal 6
(SIGIOT).',) Traceback (most recent call last): File
"/usr/local/lib/python2.7/dist-packages/billiard/pool.py", line 1170,
in mark_as_worker_lost
human_status(exitcode)), WorkerLostError: Worker exited prematurely: signal 6 (SIGIOT).
Perhaps anyone can come up with an explanation for this?

Related

Yocto: Failure expanding variable KERNEL_LOCALVERSION

I am facing below error while building kernel from local workspace(created by devtool modify virtual/kernel). If I do not have workspace created then I don't see any error.
ERROR: ExpansionError during parsing /home/aws-fsp-build/rax-workspace/yocto/meta-ti/recipes-kernel/linux/linux-ti-staging-rt_5.10.bb
Traceback (most recent call last):
File "Var <KERNEL_LOCALVERSION>", line 1, in <module>
bb.data_smart.ExpansionError: Failure expanding variable KERNEL_LOCALVERSION, expression was -g${#d.getVar('SRCPV', True).split('+')[1]} which triggered exception IndexError: list index out of range
Can you help me on resolving this? I need to have workspace since I am working on kernel related changes. I am using dunfell branch of meta-ti.
Loading cache: 100% |#########################################################################################################| Time: 0:00:00
Loaded 4480 entries from dependency cache.
WARNING: /home/aws-fsp-build/rax-workspace/yocto/meta-ti/recipes-kernel/linux/linux-ti-staging-rt_5.10.bb: Exception during build_dependencies for do_configure
WARNING: /home/aws-fsp-build/rax-workspace/yocto/meta-ti/recipes-kernel/linux/linux-ti-staging-rt_5.10.bb: Error during finalise of /home/aws-fsp-build/rax-workspace/yocto/meta-ti/recipes-kernel/linux/linux-ti-staging-rt_5.10.bb
ERROR: ExpansionError during parsing /home/aws-fsp-build/rax-workspace/yocto/meta-ti/recipes-kernel/linux/linux-ti-staging-rt_5.10.bb
Traceback (most recent call last):
File "Var <KERNEL_LOCALVERSION>", line 1, in <module>
bb.data_smart.ExpansionError: Failure expanding variable KERNEL_LOCALVERSION, expression was -g${#d.getVar('SRCPV', True).split('+')[1]} which triggered exception IndexError: list index out of range
WARNING: /home/aws-fsp-build/rax-workspace/yocto/meta-ti/recipes-kernel/linux/linux-ti-staging-rt_5.4.bb: Cooker received SIGTERM, shutting down...
WARNING: /home/aws-fsp-build/rax-workspace/yocto/meta-carrier/recipes-kernel/linux/linux-ti-staging_4.19.bb: Cooker received SIGTERM, shutting down...
WARNING: /home/aws-fsp-build/rax-workspace/yocto/meta-carrier/recipes-kernel/mstp-mod/mstp-mod.bb: Cooker received SIGTERM, shutting down...
Summary: There were 5 WARNING messages shown.
Summary: There was 1 ERROR message shown, returning a non-zero exit code.

It seems like after moving to the workspace, the SRCPV variable changes formatting, which leads to parsing failure. Try to add something like this to the build/workspace/appends/linux-ti-staging-rt_5.4.bbappend file:
KERNEL_LOCALVERSION = "-g999"

snakemake fails due to jobscript not found

I'm running snakemake on fairly large workflows. Somewhat randomly I get errors like
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/prog/Python/3.7.9-foss-2018a/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/prog/Python/3.7.9-foss-2018a/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "<home>/.pip/CentOS/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 1069, in _wait_for_jobs
status = job_status(active_job)
File "<home>/.pip/CentOS/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 1051, in job_status
os.remove(active_job.jobscript)<current working dir>/.snakemake/tmp.0w7jh5bc/snakejob.<name of rule>.6868.sh'
It happens somewhat randomly and restarting the workflow usually resolves the problem. I think this might be due to filesystem latency, however the latency-wait flag seems to work only on output files. Is there a way to make snakemake wait for jobscripts as well?

Airflow: Celery task failure

I have airflow up and running but I have an issue where my task is failing in celery.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 52, in execute_command
subprocess.check_call(command, shell=True)
File "/usr/local/lib/python3.6/subprocess.py", line 291, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'airflow run airflow_tutorial_v01 print_hello 2017-06-01T15:00:00 --local -sd /usr/local/airflow/dags/hello_world.py' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 375, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 632, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 55, in execute_command
raise AirflowException('Celery command failed')
airflow.exceptions.AirflowException: Celery command failed
it is a very basic DAG (taken from the hello world tutorial: https://github.com/apache/incubator-airflow/blob/master/airflow/example_dags/tutorial.py).
Also I do not see any logs of my worker, I got this stack strace from the Flower web interface.
If I run manually on the worker node, the airflow run command mentionned in the stack trace it works.
How can I get more information to debug further?
The only log I get when starting `airflow work` is
root#ip-10-0-4-85:~# /usr/local/lib/python3.5/dist-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")
[2018-07-25 17:49:43,430] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/Grammar.txt
[2018-07-25 17:49:43,469] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/PatternGrammar.txt
[2018-07-25 17:49:43,594] {__init__.py:45} INFO - Using executor CeleryExecutor
Starting flask
[2018-07-25 17:49:43,665] {_internal.py:88} INFO - * Running on http://0.0.0.0:8793/ (Press CTRL+C to quit)
^C
The config I use is the default one with a postgresql and redis backend for celery.
I see the worked online in Flower.
Thanks.
edit: edited for more informations

Worker Lost Error after task accepted with exitcode 2

I am new to celery. I have added the backup script to celery using periodic_task. From the logs I see that "Task accepted: main" after immediately I see the below error in the logs.
[2017-09-21 06:01:00,257: ERROR/MainProcess] Process 'PoolWorker-5' pid:XXXX
exited with 'exitcode 2'
[2017-09-21 06:01:00,268: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: exitcode 2.',)
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1224, in mark_as_worker_lost
human_status(exitcode)),
WorkerLostError: Worker exited prematurely: exitcode 2.
Thanks in advance.

IOError error message in IPython notebook when using StarCluster

I am running a small StarCluster and using it to run an IPython Notebook. Every time I have an error in the code I'm writing in the notebook, I get the following error message added onto the end of the notebook's output:
Traceback (most recent call last):
File "/usr/lib/python2.7/logging/__init__.py", line 872, in emit
stream.write(fs % msg)
IOError: [Errno 32] Broken pipe
Logged from file ipkernel.py, line 427
Other than that, it seems to be running OK, but I don't know why that might be happening / how I can find out more about why it's doing that

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Why does my Celery worker die with signal 6 - SIGIOT? - celery

Related

Yocto: Failure expanding variable KERNEL_LOCALVERSION

snakemake fails due to jobscript not found

Airflow: Celery task failure

Worker Lost Error after task accepted with exitcode 2

IOError error message in IPython notebook when using StarCluster

Categories

Resources