Celery active tasks persistence - celery

If celery crashes during executing some task, this task is lost after restart celery. Tasks that in the queue in the moment of the crash, will be restored fine in RabbitMQ. But how can I make active tasks persistent?

Celery is configured by default with task_acks_late=False. [1] This means that the task is acked as soon as the worker receives it from the queue. And if the task fails, the queue has no way of knowing it.
Set task_acks_late to True and the task will be acked after it has been processed. When the task fails, it is requeued. [2] But be careful, your tasks must be idempotent. [3]
[1] http://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-task_acks_late
[2] http://docs.celeryproject.org/en/latest/faq.html#faq-acks-late-vs-retry
[3] https://en.wikipedia.org/wiki/Idempotence

Related

How to safely restart Airflow and kill a long-running task?

I have Airflow is running in Kubernetes using the CeleryExecutor. Airflow submits and monitors Spark jobs using the DatabricksOperator.
My streaming Spark jobs have a very long runtime (they run forever unless they fail or are cancelled). When pods for Airflow worker are killed while a streaming job is running, the following happens:
Associated task becomes a zombie (running state, but no process with heartbeat)
Task is marked as failed when Airflow reaps zombies
Spark streaming job continues to run
How can I force the worker to kill my Spark job before it shuts down?
I've tried killing the Celery worker with a TERM signal, but apparently that causes Celery to stop accepting new tasks and wait for current tasks to finish (docs).
You need to be more clear about the issue. If you are saying that the spark cluster finishes the jobs as expected and not calling the on_kill function, it's expected behavior. As per the docs on kill function is for cleaning up after task get killed.
def on_kill(self) -> None:
"""
Override this method to cleanup subprocesses when a task instance
gets killed. Any use of the threading, subprocess or multiprocessing
module within an operator needs to be cleaned up or it will leave
ghost processes behind.
"""
In your case when you manually kill the job it is doing what it has to do.
Now if you want to have a clean_up even after successful completion of the job, override post_execute function.
As per the docs. The post execute is
def post_execute(self, context: Any, result: Any = None):
"""
This hook is triggered right after self.execute() is called.
It is passed the execution context and any results returned by the
operator.
"""

Celery Beat runs duplicate tasks

I have one celery beat task, that is running other scraping tasks.
When those tasks are not processed, queue is starting to grow.
I know celery use backend db, but there are only: id, task_id, status, result, date_done, traceback.
My ideas is to switch from celery beat to rescheduling tasks by them self, but some tasks are unconnected or can get lost, so celery beat is useful in these cases.
Second idea is to add my logs, like my table, where I can save task-id and task context, by which I will be able to find out if task already exists.
May be you have better approach? Thanks
celery tasks can be delayed with expires argument:
http://docs.celeryproject.org/en/latest/userguide/calling.html#expiration

What if i schedule tasks for celery to perform every minute and it is not able to complete it in time?

If I schedule the task for every minute and if it is not able to be getting completed in the time(one minute). Would the task wait in queue and it will go on like this? if this happens then after few hours it will be overloaded. Is there any solution for this kind of problems?
I am using beat and worker combination for this. It is working fine for less records to perform tasks. but for large database, I think this could cause problem.
Task is assign to queue (RabbitMQ for example).
Workers are queue consumers, more workers (or worker with high concurrency) - more tasks could be handled in parallel.
Your periodic task produce messages of the same type (I guess) and your celery router route them to the same queue.
Just set your workers to consume messages from that queue and that's all.
celery worker -A celeryapp:app -l info -Q default -c 4 -n default_worker#%h -Ofair
In the example above I used -c 4 for concurrency of four (eqv. to 4 consumers/workers). You can also start move workers and let them consume from the same queue with -Q <queue_name> (in my example it's default queue).
EDIT:
When using celery (the worker code) you are initiate Celery object. In Celery constructor you are setting your broker and backend (celery used them as part of the system)
for more info: http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html#application

Celery beat fails silently

I'm having issues with a celery beat worker not sending out tasks to celery. Celery runs on three servers with a RabbitMQ cluster behind HAProxy as a backend.
Celery beat is used to schedule a task every day at 9AM. When I start the worker, usually the first task succeeds, but after that it seems like the following tasks are never sent to rabbitmq. In the celery beat log file (celery beat is run with the -l debug option), I see messages such as: Scheduler: Sending due task my-task (tasks.myTask), but no sign of the task being received by any celery worker.
I also tried logging messages in rabbitmq via the rabbitmq_tracing plugin, which only confirmed that the task never reached rabbitmq.
Any idea what could be happening? Thanks!

Quartz scheduler shutdown(true) wait for all threads started from running Jobs to stop?

If I have a job and from that job I create some threads, what happens when I call scheduler.shutdown(true)?
Will the scheduler wait for all of my threads to finish or not?
Quartz 1.8.1 API docs:
Halts the Scheduler's firing of Triggers, and cleans up all resources associated with the Scheduler.
Parameters:
waitForJobsToComplete - if true the scheduler will not allow this method to return until all currently executing jobs have completed.
Quarts neither know nor cares about any threads spawned by your job, it will simply wait for the job to complete. If your job spawns new threads then exits, then as far as Quartz is concerned, it's finished.
If your job needs to wait for its spawned threads to complete, then you need to use something like an ExecutorService (see javadoc for java.util.concurrent), which will allow the job thread to wait for its spawned threads to complete. If you're using raw java threads, then use Thread.join().