How to remove all due tasks from celery scheduler DatabaseScheduler - celery

My project has a lot of pending tasks task.com-43 to get executed on every 5 seconds. I want to remove all my pending tasks.
→ celery -A Project beat --loglevel=debug --scheduler django_celery_beat.schedulers:DatabaseScheduler
celery beat v4.2.1 (windowlicker) is starting.
__ - ... __ - _
LocalTime -> 2018-12-30 08:44:30
Configuration ->
. broker -> redis://localhost:6379//
. loader -> celery.loaders.app.AppLoader
. scheduler -> django_celery_beat.schedulers.DatabaseScheduler
. logfile -> [stderr]#%DEBUG
. maxinterval -> 5.00 seconds (5s)
[2018-12-30 08:44:30,310: DEBUG/MainProcess] Setting default socket timeout to 30
[2018-12-30 08:44:30,311: INFO/MainProcess] beat: Starting...
[2018-12-30 08:44:30,312: DEBUG/MainProcess] DatabaseScheduler: initial read
[2018-12-30 08:44:30,312: INFO/MainProcess] Writing entries...
[2018-12-30 08:44:30,312: DEBUG/MainProcess] DatabaseScheduler: Fetching database schedule
[2018-12-30 08:44:30,348: DEBUG/MainProcess] Current schedule:
[2018-12-30 08:44:30,418: INFO/MainProcess] Scheduler: Sending due task task5.com-43 (project_monitor_tasks)
[2018-12-30 08:44:30,438: DEBUG/MainProcess] beat: Synchronizing schedule...
[2018-12-30 08:44:30,438: INFO/MainProcess] Writing entries...
[2018-12-30 08:44:30,455: DEBUG/MainProcess] project_monitor_tasks sent. id->d440432f-111d-4c96-ab4f-00923f4cf7e1
[2018-12-30 08:44:30,464: DEBUG/MainProcess] beat: Waking up in 4.93 seconds.
[2018-12-30 08:44:35,413: INFO/MainProcess] Scheduler: Sending due task task.com-43 (project_monitor_tasks)
[2018-12-30 08:44:35,414: DEBUG/MainProcess] project_monitor_tasks sent. id->ff0438ce-9fb9-4ab0-aa8a-8a7636c67d90
[2018-12-30 08:44:35,424: DEBUG/MainProcess] beat: Waking up in 4.98 seconds.
[2018-12-30 08:44:40,419: INFO/MainProcess] Scheduler: Sending due task task.com-43 (project_monitor_tasks)
[2018-12-30 08:44:40,420: DEBUG/MainProcess] project_monitor_tasks sent. id->d0022780-7d5f-4e7b-965e-9fda0d607cbe
[2018-12-30 08:44:40,431: DEBUG/MainProcess] beat: Waking up in 4.98 seconds.
[2018-12-30 08:44:45,425: INFO/MainProcess] Scheduler: Sending due task task.com-43 (project_monitor_tasks)
[2018-12-30 08:44:45,427: DEBUG/MainProcess] project_monitor_tasks sent. id->9b3eb775-60d5-4daa-a019-e0dfae932380
[2018-12-30 08:44:45,439: DEBUG/MainProcess] beat: Waking up in 4.98 seconds.
....
....
I'm using Redis for the backend database for Project tasks, I Tried Purging The Celery & Flushing the redis but still, it is executing all pending tasks.
ps auxww | grep 'celery worker' | awk '{print $2}' | xargs kill -9 ## Stopping all workers first
celery -A project purge
redis-cli FLUSHALL
service redis-server restart

One way you could remove all the tasks is by deleting the tasks from Periodic Tasks Models but first stop all your workers & purge all project tasks.
The answer to the question is here:
https://stackoverflow.com/a/33047721/10372434

Related

Celery. Running single celery beat + multiple celery workers scale

Having single celery beat running by:
celery -A app:celery beat --loglevel=DEBUG
and three workers running by:
celery -A app:celery worker -E --loglevel=ERROR -n n1
celery -A app:celery worker -E --loglevel=ERROR -n n2
celery -A app:celery worker -E --loglevel=ERROR -n n3
Same Redis DB used as messages broker for all workers and beat.
All workers started on same machine for development purposes while they will be deployed using different Kubernetes pods on production. Main idea of usage multiple workers to distribute 50-150 tasks between different Kube pods each running on 4-8 core machine. We expect that none of pod will take more tasks than he have cores until there are any worker exists that has less tasks than available cores so max amount of tasks to be executed concurrently.
So I having troubles to test it locally.
Here is local beat triggers three tasks:
[2021-08-23 21:35:32,700: DEBUG/MainProcess] Current schedule:
<ScheduleEntry: task-5872-accrual Task5872Accrual() <crontab: 36 21 * * * (m/h/d/dM/MY)>
<ScheduleEntry: task-5872-accrual2 Task5872Accrual2() <crontab: 37 21 * * * (m/h/d/dM/MY)>
<ScheduleEntry: task-5872-accrual3 Task5872Accrual3() <crontab: 38 21 * * * (m/h/d/dM/MY)>
[2021-08-23 21:35:32,700: DEBUG/MainProcess] beat: Ticking with max interval->5.00 minutes
[2021-08-23 21:35:32,701: DEBUG/MainProcess] beat: Waking up in 27.29 seconds.
[2021-08-23 21:36:00,017: DEBUG/MainProcess] beat: Synchronizing schedule...
[2021-08-23 21:36:00,026: INFO/MainProcess] Scheduler: Sending due task task-5872-accrual (Task5872Accrual)
[2021-08-23 21:36:00,035: DEBUG/MainProcess] Task5872Accrual sent. id->96e671f8-bd07-4c36-a595-b963659bee5c
[2021-08-23 21:36:00,035: DEBUG/MainProcess] beat: Waking up in 59.95 seconds.
[2021-08-23 21:37:00,041: INFO/MainProcess] Scheduler: Sending due task task-5872-accrual2 (Task5872Accrual2)
[2021-08-23 21:37:00,043: DEBUG/MainProcess] Task5872Accrual2 sent. id->532eac4d-1d10-4117-9d7e-16b3f1ae7aee
[2021-08-23 21:37:00,043: DEBUG/MainProcess] beat: Waking up in 59.95 seconds.
[2021-08-23 21:38:00,027: INFO/MainProcess] Scheduler: Sending due task task-5872-accrual3 (Task5872Accrual3)
[2021-08-23 21:38:00,029: DEBUG/MainProcess] Task5872Accrual3 sent. id->68729b64-807d-4e13-8147-0b372ce536af
[2021-08-23 21:38:00,029: DEBUG/MainProcess] beat: Waking up in 5.00 minutes.
I expect that each worker will take single task to optimize load between workers but unfortunately here how they are distributed:
So i am not sure what does different workers synchronized between each other to distribute load between them smoothly? If not can I achieve that somehow? Tried to search in Google but there are mostly about concurrency between tasks in single worker but what to do if I need to run more tasks concurrently than single machine in Kube claster is have?
You should do two things in order to achieve what you want:
Run workers with the -O fair option. Example: celery -A app:celery worker -E --loglevel=ERROR -n n1 -O fair
Make workers prefetch as little as possible with worker_prefetch_multiplier=1 in your config.

Slow running Airflow 1.10.2 ETL when using ExternalTaskSensor for DAG task dependency?

I have two DAGs that I need to run with Airflow 1.10.2 + the CeleryExecutor. The first DAG (DAG1) is a long-running data load from s3 into Redshift (3+ hours). My second DAG (DAG2) performs computations on data loaded by DAG1. I want to include an ExternalTaskSensor in DAG2 so that the computations are reliably performed after the data loads. Theoretically so simple!
I can successfully get DAG2 to wait for DAG1 to complete by ensuring both DAGs are scheduled to start at the same time (schedule="0 8 * * *" for both DAGs) and DAG2 is dependent on the final task in DAG1. But I'm seeing a massive delay in our ETL on DAG1 when I introduce the sensor. I at first though it was because my original implementation used mode="poke" which I understand locks a worker. However, even when I changed this to mode="reschedule" as I read in the docs https://airflow.readthedocs.io/en/stable/_modules/airflow/sensors/base_sensor_operator.html I still see a massive ETL delay.
I'm using the ExternalTaskSensor code below in DAG2:
wait_for_data_load = ExternalTaskSensor(
dag=dag,
task_id="wait_for_data_load",
external_dag_id="dag1",
external_task_id="dag1_final_task_id",
mode="reschedule",
poke_interval=1800, # check every 30 min
timeout=43200, # timeout after 12 hours (catch delayed data load runs)
soft_fail=False # if the task fails, we assume a failure
)
If the code were working properly, I'd expect the sensor to perform a quick check whether DAG1 had finished and, if not, reschedule for 30 min time as defined by the poke_interval, causing no delay to DAG1 ETL. If DAG1 fails to complete after 12 hours, then DAG2 would stop poking and fail.
Instead, I'm getting frequent errors for each of the tasks in DAG1 saying (for example) Executor reports task instance <TaskInstance: dag1.data_table_temp_redshift_load 2019-05-20 08:00:00+00:00 [queued]> finished (failed) although the task says its queued. Was the task killed externally? even though the tasks are completing successfully (with some delay). Just before this error is sent, I see a line in our Sentry logs saying Executor reports dag1.data_table_temp_redshift_load execution_date=2019-05-20 08:00:00+00:00 as failed for try_number 1 though (again) I can see the task succeeded.
The logs on DAG2 are also looking a bit strange. I'm seeing repeated attempts logged at the same time intervals like the excerpt below:
--------------------------------------------------------------------------------
Starting attempt 1 of 4
--------------------------------------------------------------------------------
[2019-05-21 08:01:48,417] {{models.py:1593}} INFO - Executing <Task(ExternalTaskSensor): wait_for_data_load> on 2019-05-20T08:00:00+00:00
[2019-05-21 08:01:48,419] {{base_task_runner.py:118}} INFO - Running: ['bash', '-c', 'airflow run dag2 wait_for_data_load 2019-05-20T08:00:00+00:00 --job_id 572075 --raw -sd DAGS_FOLDER/dag2.py --cfg_path /tmp/tmp4g2_27c7']
[2019-05-21 08:02:02,543] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load [2019-05-21 08:02:02,542] {{settings.py:174}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, pid=28219
[2019-05-21 08:02:12,000] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load [2019-05-21 08:02:11,996] {{__init__.py:51}} INFO - Using executor CeleryExecutor
[2019-05-21 08:02:15,840] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load [2019-05-21 08:02:15,827] {{models.py:273}} INFO - Filling up the DagBag from /usr/local/airflow/dags/dag2.py
[2019-05-21 08:02:16,746] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load [2019-05-21 08:02:16,745] {{dag2.py:40}} INFO - Waiting for the dag1_final_task_id operator to complete in the dag1 DAG
[2019-05-21 08:02:17,199] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load [2019-05-21 08:02:17,198] {{cli.py:520}} INFO - Running <TaskInstance: dag1. wait_for_data_load 2019-05-20T08:00:00+00:00 [running]> on host 11d93b0b0c2d
[2019-05-21 08:02:17,708] {{external_task_sensor.py:91}} INFO - Poking for dag1. dag1_final_task_id on 2019-05-20T08:00:00+00:00 ...
[2019-05-21 08:02:17,890] {{models.py:1784}} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
[2019-05-21 08:02:17,892] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load /usr/local/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.25.2) or chardet (3.0.4) doesn't match a supported version!
[2019-05-21 08:02:17,893] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load RequestsDependencyWarning)
[2019-05-21 08:02:17,893] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load /usr/local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
[2019-05-21 08:02:17,894] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load """)
[2019-05-21 08:02:22,597] {{logging_mixin.py:95}} INFO - [2019-05-21 08:02:22,589] {{jobs.py:2527}} INFO - Task exited with return code 0
[2019-05-21 08:01:48,125] {{models.py:1359}} INFO - Dependencies all met for <TaskInstance: dag2. wait_for_data_load 2019-05-20T08:00:00+00:00 [queued]>
[2019-05-21 08:01:48,311] {{models.py:1359}} INFO - Dependencies all met for <TaskInstance: dag2. wait_for_data_load 2019-05-20T08:00:00+00:00 [queued]>
[2019-05-21 08:01:48,311] {{models.py:1571}} INFO -
--------------------------------------------------------------------------------
Starting attempt 1 of 4
--------------------------------------------------------------------------------
[2019-05-21 08:01:48,417] {{models.py:1593}} INFO - Executing <Task(ExternalTaskSensor): wait_for_data_load> on 2019-05-20T08:00:00+00:00
[2019-05-21 08:01:48,419] {{base_task_runner.py:118}} INFO - Running: ['bash', '-c', 'airflow run dag2 wait_for_data_load 2019-05-20T08:00:00+00:00 --job_id 572075 --raw -sd DAGS_FOLDER/dag2.py --cfg_path /tmp/tmp4g2_27c7']
[2019-05-21 08:02:02,543] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load [2019-05-21 08:02:02,542] {{settings.py:174}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, pid=28219
[2019-05-21 08:02:12,000] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load [2019-05-21 08:02:11,996] {{__init__.py:51}} INFO - Using executor CeleryExecutor
[2019-05-21 08:02:15,840] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load [2019-05-21 08:02:15,827] {{models.py:273}} INFO - Filling up the DagBag from /usr/local/airflow/dags/dag2.py
[2019-05-21 08:02:16,746] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load [2019-05-21 08:02:16,745] {{dag2.py:40}} INFO - Waiting for the dag1_final_task_id operator to complete in the dag1 DAG
[2019-05-21 08:02:17,199] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load [2019-05-21 08:02:17,198] {{cli.py:520}} INFO - Running <TaskInstance: dag2.wait_for_data_load 2019-05-20T08:00:00+00:00 [running]> on host 11d93b0b0c2d
[2019-05-21 08:02:17,708] {{external_task_sensor.py:91}} INFO - Poking for dag1.dag1_final_task_id on 2019-05-20T08:00:00+00:00 ...
[2019-05-21 08:02:17,890] {{models.py:1784}} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
[2019-05-21 08:02:17,892] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load /usr/local/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.25.2) or chardet (3.0.4) doesn't match a supported version!
[2019-05-21 08:02:17,893] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load RequestsDependencyWarning)
[2019-05-21 08:02:17,893] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load /usr/local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
[2019-05-21 08:02:17,894] {{base_task_runner.py:101}} INFO - Job 572075: Subtask wait_for_data_load """)
[2019-05-21 08:02:22,597] {{logging_mixin.py:95}} INFO - [2019-05-21 08:02:22,589] {{jobs.py:2527}} INFO - Task exited with return code 0
[2019-05-21 08:33:31,875] {{models.py:1359}} INFO - Dependencies all met for <TaskInstance: dag2.wait_for_data_load 2019-05-20T08:00:00+00:00 [queued]>
[2019-05-21 08:33:31,903] {{models.py:1359}} INFO - Dependencies all met for <TaskInstance: dag2.wait_for_data_load 2019-05-20T08:00:00+00:00 [queued]>
[2019-05-21 08:33:31,903] {{models.py:1571}} INFO -
--------------------------------------------------------------------------------
Starting attempt 1 of 4
--------------------------------------------------------------------------------
Though all logs say Starting attempt 1 of 4, I do see attempts records about every 30 min, but I see multiple logs for each time interval (10+ of the same logs printed for each 30 min interval).
From searching around I see other people are using sensors in production flows https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff, which makes me think there's a way around this or I'm implementing something wrong. But I'm also seeing open issues in the airflow project related to this issue, so perhaps there's a deeper issue in the project? I also found a related, but unanswered post here Apache Airflow 1.10.3: Executor reports task instance ??? finished (failed) although the task says its queued. Was the task killed externally?
Also, we are using the following config settings:
# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 32
# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 16
# Are DAGs paused by default at creation
dags_are_paused_at_creation = True
# When not using pools, tasks are run in the "default pool",
# whose size is guided by this config element
non_pooled_task_slot_count = 128
# The maximum number of active DAG runs per DAG
max_active_runs_per_dag = 16
These symptoms were actually caused by a call to Variable.set() in the body of DAG1 that DAG2 then used to retrieve DAG1s dynamically generated dag_id. The Variable.set() all was causing an error (discovered in the worker logs). As described here, the scheduler polls the DAG definitions with every heartbeat to update keep DAGs up-to-date. That meant an error with every heartbeat, which caused a large ETL delay.

Question: Usage of django celery.backend_cleanup

There is not much documentation available for the actual usage of django celery.backend_cleanup
Let's assume i have following 4 tasks scheduled with different interval
Checking DatabaseScheduler Logs I had found that only Task1 is executing on interval.
[2018-12-28 11:21:08,241: INFO/MainProcess] Writing entries...
[2018-12-28 11:24:08,778: INFO/MainProcess] Writing entries...
[2018-12-28 11:27:09,315: INFO/MainProcess] Writing entries...
[2018-12-28 11:28:32,948: INFO/MainProcess] Scheduler: Scheduler: Sending due TASK1(project_monitor_tasks)
[2018-12-28 11:30:13,215: INFO/MainProcess] Writing entries...
[2018-12-28 11:33:13,772: INFO/MainProcess] Writing entries...
[2018-12-28 11:36:14,316: INFO/MainProcess] Writing entries...
[2018-12-28 11:39:14,868: INFO/MainProcess] Writing entries...
[2018-12-28 11:42:15,397: INFO/MainProcess] Writing entries...
[2018-12-28 11:43:55,700: INFO/MainProcess] DatabaseScheduler: Schedule changed.
[2018-12-28 11:43:55,700: INFO/MainProcess] Writing entries...
[2018-12-28 11:45:15,997: INFO/MainProcess] Writing entries...
.....
....
[2018-12-28 17:16:28,613: INFO/MainProcess] Writing entries...
[2018-12-28 17:19:29,138: INFO/MainProcess] Writing entries...
[2018-12-28 17:22:29,625: INFO/MainProcess] Writing entries...
[2018-12-28 17:25:30,140: INFO/MainProcess] Writing entries...
[2018-12-28 17:28:30,657: INFO/MainProcess] Writing entries...
[2018-12-28 17:28:32,943: INFO/MainProcess] Scheduler: Sending due TASK1(project_monitor_tasks)
[2018-12-28 17:31:33,441: INFO/MainProcess] Writing entries...
[2018-12-28 17:34:34,009: INFO/MainProcess] Writing entries...
[2018-12-28 17:37:34,578: INFO/MainProcess] Writing entries...
[2018-12-28 17:40:35,130: INFO/MainProcess] Writing entries...
[2018-12-28 17:43:35,657: INFO/MainProcess] Writing entries...
[2018-12-28 17:43:50,716: INFO/MainProcess] DatabaseScheduler: Schedule changed.
[2018-12-28 17:43:50,716: INFO/MainProcess] Writing entries...
[2018-12-28 17:46:36,266: INFO/MainProcess] Writing entries...
[2018-12-28 17:49:36,809: INFO/MainProcess] Writing entries...
[2018-12-28 17:52:37,352: INFO/MainProcess] Writing entries...
Q1) why other TASKS which are at different intervals such as 24,8,10 Hours are not executing? , I'm Assuming this is because Crontab of celery.backend_cleanup is set to every 4 Hours which is cleaning up queue tasks. Should i keep the large interval time for celery.backend_cleanup task ?
Q2) Why we should keep celery.backend_cleanup task? Does it loads new tasks on every cleanup?
Q1: We can't answer more without seeing the actual schedules, knowing your configuration of celery, or the logs past twenty-four hours. The backend_cleanup job has no effect on the broker, it's purpose is to clean up task results that have expired by deleting expired results from an RDBMS celery result backend, so it would have no effect on whether a task executes properly.
Q2: See above. You should use this task if you are using an RDBMS / database backend if you want your expired celery results to be deleted from your celery results backend database.

celery beat instantly stopping with resource error

The logs of celery beat is like this where the last line just stopped and not continuing and recovering anymore.
[2018-08-20 11:20:59,002: INFO/MainProcess] Scheduler: Sending due task check result delays every 10sec (notify_delay)
[2018-08-20 11:21:00,000: INFO/MainProcess] Scheduler: Sending due task load abnormal schedules (load_abnormal_schedules)
[2018-08-20 11:21:00,004: INFO/MainProcess] Scheduler: Sending due task check close schedule every 5sec (close_schedule)
[2018-08-20 11:21:05,000: INFO/MainProcess] Scheduler: Sending due task check close schedule every 5sec (close_schedule)
[2018-08-20 11:21:10,000: INFO/MainProcess] Scheduler: Sending due task check close schedule every 5sec (close_schedule)
[2018-08-20 11:21:14,002: INFO/MainProcess] Scheduler: Sending due task check result delays every 10sec (notify_delay)
[2018-08-20 11:21:15,000: INFO/MainProcess] Scheduler: Sending due task load abnormal schedules (load_abnormal_schedules)
[2018-08-20 11:21:15,003: INFO/MainProcess] Scheduler: Sending due task check close schedule every 5sec (close_schedule)
[2018-08-20 11:21:20,000: INFO/MainProcess] Scheduler: Sending due task check close schedule every 5sec (close_schedule)
[2018-08-20 11:21:25,000: INFO/MainProcess] Scheduler: Sending due task check close schedule every 5sec (close_schedule)
[2018-08-20 11:21:29,003: INFO/MainProcess] Scheduler: Sending due task check result delays every 10sec (notify_delay)
It is run inside a docker container. When I checked via top it shows a high CPU percentage
120549 root 20 0 356016 150144 16388 S 23.4 1.0 3:36.33 celery
Then when I ssh inside container and try to the celery beat command. The error below initially returned
root#4a298cc9c6e2:/usr/src/app# celery -A ghost beat -l info --pidfile=
celery beat v4.2.0 (windowlicker) is starting.
__ - ... __ - _
LocalTime -> 2018-08-20 11:32:51
Configuration ->
. broker -> amqp://ghost:**#ghost-rabbitmq:5672/ghost
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]#%INFO
. maxinterval -> 5.00 minutes (300s)
[2018-08-20 11:32:51,526: INFO/MainProcess] beat: Starting...
[2018-08-20 11:32:51,535: ERROR/MainProcess] Removing corrupted schedule file 'celerybeat-schedule': error(11, 'Resource temporarily unavailable')
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/kombu/utils/objects.py", line 42, in __get__
return obj.__dict__[self.__name__]
KeyError: 'scheduler'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/celery/beat.py", line 476, in setup_schedule
self._store = self._open_schedule()
File "/usr/local/lib/python3.6/site-packages/celery/beat.py", line 466, in _open_schedule
return self.persistence.open(self.schedule_filename, writeback=True)
File "/usr/local/lib/python3.6/shelve.py", line 243, in open
return DbfilenameShelf(filename, flag, protocol, writeback)
File "/usr/local/lib/python3.6/shelve.py", line 227, in __init__
Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
File "/usr/local/lib/python3.6/dbm/__init__.py", line 94, in open
return mod.open(file, flag, mode)
_gdbm.error: [Errno 11] Resource temporarily unavailable
Take note that I'm only using pure celery and not django-celery-beat
My dear friend, when you every up your docker container, celery wants create celerybeat.pid file and If exist django raises error about this. So you should add these command for delete current celerybeat.pid file on your Dockerfile like this:
COPY entrypoint.sh /code/entrypoint.sh
ENTRYPOINT ["/code/entrypoint.sh"]
RUN chmod +x /entrypoint.sh
And you should create a entrypoint.sh file like below:
#!/bin/sh
rm -rf /code/badpanty/*.pid
exec "$#"
I hope it's helpful.

Airflow scheduler keep on Failing jobs without heartbeat

I'm new to airflow and i tried to manually trigger a job through UI. When I did that, the scheduler keep on logging that it is Failing jobs without heartbeat as follows:
[2018-05-28 12:13:48,248] {jobs.py:1662} INFO - Heartbeating the executor
[2018-05-28 12:13:48,250] {jobs.py:1672} INFO - Heartbeating the scheduler
[2018-05-28 12:13:48,259] {jobs.py:368} INFO - Started process (PID=58141) to work on /Users/gkumar6/airflow/dags/tutorial.py
[2018-05-28 12:13:48,264] {jobs.py:1742} INFO - Processing file /Users/gkumar6/airflow/dags/tutorial.py for tasks to queue
[2018-05-28 12:13:48,265] {models.py:189} INFO - Filling up the DagBag from /Users/gkumar6/airflow/dags/tutorial.py
[2018-05-28 12:13:48,275] {jobs.py:1754} INFO - DAG(s) ['tutorial'] retrieved from /Users/gkumar6/airflow/dags/tutorial.py
[2018-05-28 12:13:48,298] {models.py:341} INFO - Finding 'running' jobs without a recent heartbeat
[2018-05-28 12:13:48,299] {models.py:345} INFO - Failing jobs without heartbeat after 2018-05-28 06:38:48.299278
[2018-05-28 12:13:48,304] {jobs.py:375} INFO - Processing /Users/gkumar6/airflow/dags/tutorial.py took 0.045 seconds
[2018-05-28 12:13:49,266] {jobs.py:1627} INFO - Heartbeating the process manager
[2018-05-28 12:13:49,267] {dag_processing.py:468} INFO - Processor for /Users/gkumar6/airflow/dags/tutorial.py finished
[2018-05-28 12:13:49,271] {dag_processing.py:537} INFO - Started a process (PID: 58149) to generate tasks for /Users/gkumar6/airflow/dags/tutorial.py
[2018-05-28 12:13:49,272] {jobs.py:1662} INFO - Heartbeating the executor
[2018-05-28 12:13:49,283] {jobs.py:368} INFO - Started process (PID=58149) to work on /Users/gkumar6/airflow/dags/tutorial.py
[2018-05-28 12:13:49,288] {jobs.py:1742} INFO - Processing file /Users/gkumar6/airflow/dags/tutorial.py for tasks to queue
[2018-05-28 12:13:49,289] {models.py:189} INFO - Filling up the DagBag from /Users/gkumar6/airflow/dags/tutorial.py
[2018-05-28 12:13:49,300] {jobs.py:1754} INFO - DAG(s) ['tutorial'] retrieved from /Users/gkumar6/airflow/dags/tutorial.py
[2018-05-28 12:13:49,326] {models.py:341} INFO - Finding 'running' jobs without a recent heartbeat
[2018-05-28 12:13:49,327] {models.py:345} INFO - Failing jobs without heartbeat after 2018-05-28 06:38:49.327218
[2018-05-28 12:13:49,332] {jobs.py:375} INFO - Processing /Users/gkumar6/airflow/dags/tutorial.py took 0.049 seconds
[2018-05-28 12:13:50,279] {jobs.py:1627} INFO - Heartbeating the process manager
[2018-05-28 12:13:50,280] {dag_processing.py:468} INFO - Processor for /Users/gkumar6/airflow/dags/tutorial.py finished
[2018-05-28 12:13:50,283] {dag_processing.py:537} INFO - Started a process (PID: 58150) to generate tasks for /Users/gkumar6/airflow/dags/tutorial.py
[2018-05-28 12:13:50,285] {jobs.py:1662} INFO - Heartbeating the executor
[2018-05-28 12:13:50,296] {jobs.py:368} INFO - Started process (PID=58150) to work on /Users/gkumar6/airflow/dags/tutorial.py
[2018-05-28 12:13:50,301] {jobs.py:1742} INFO - Processing file /Users/gkumar6/airflow/dags/tutorial.py for tasks to queue
[2018-05-28 12:13:50,302] {models.py:189} INFO - Filling up the DagBag from /Users/gkumar6/airflow/dags/tutorial.py
[2018-05-28 12:13:50,312] {jobs.py:1754} INFO - DAG(s) ['tutorial'] retrieved from /Users/gkumar6/airflow/dags/tutorial.py
[2018-05-28 12:13:50,338] {models.py:341} INFO - Finding 'running' jobs without a recent heartbeat
[2018-05-28 12:13:50,339] {models.py:345} INFO - Failing jobs without heartbeat after 2018-05-28 06:38:50.339147
[2018-05-28 12:13:50,344] {jobs.py:375} INFO - Processing /Users/gkumar6/airflow/dags/tutorial.py took 0.048 seconds
And the status of job on UI is stuck at running. Is there something i need to configure to solve this issue?
It seems that it's not a "Failing jobs" problem but a logging problem. Here's what I found when I tried to fix this problem.
Is this message indicates that there's something wrong that I should
be concerned?
No.
"Finding 'running' jobs" and "Failing jobs..." are INFO level logs
generated from find_zombies function of heartbeat utility. So there will be logs generated every
heartbeat interval even if you don't have any failing jobs
running.
How do I turn it off?
The logging_level option in airflow.cfg does not control the scheduler logging.
There's one hard-code in
airflow/settings.py:
LOGGING_LEVEL = logging.INFO
You could change this to:
LOGGING_LEVEL = logging.WARN
Then restart the scheduler and the problem will be gone.
I think in point 2 if you just change the logging_level = INFO to WARN in airflow.cfg, you won't get INFO log. you don't need to modify settings.py file.