What does celery daemonization mean? - celery

Can I know what does celery daemonization mean? Also, I would like to start running both celery worker and celery beat using single command. Anyway to do it (one way I can think of is using supervisor module for worker and beat and then writing the starting scripts for those in a separate .sh file and running that script..any other way?)
In other terms, I can start the worker and beat process as background process manually even..right? So, daemonization in celery just runs the processes as background processes or is there anything else?

Related

Is it mandatory to run celery as daemon in production

I have seen celery documentation that its advisable to run celery as daemon process. In my case each celery worker is a docker container whose sole purpose is to execute celery tasks. In that scnario also, is it recommended to execute as daemon process?
No, if Celery worker runs inside a container there is no need to run it as daemon.

Django celery delete specific tasks

we know we have some badly formed tasks in the celery queue that are crashing our workers, is there an easy way to manually delete them?
We don't want to flush the whole thing as there might be some important emails to be sent...
Use celery flower to manage and monitor the celery task. Refer to https://flower.readthedocs.io/en/latest/

Why does Celery discourage worker & beat together?

From the celery help function:
> celery worker -h
...
Embedded Beat Options:
-B, --beat Also run the celery beat periodic task scheduler. Please note that there must only be
one instance of this service. .. note:: -B is meant to be used for development
purposes. For production environment, you need to start celery beat separately.
This also appears in the docs.
You can also embed beat inside the worker by enabling the workers -B
option, this is convenient if you’ll never run more than one worker
node, but it’s not commonly used and for that reason isn’t recommended
for production use:
celery -A proj worker -B
But it's not actually explained why it's "bad" to use this in production. Would love some insight.
The --beat option will start a beat scheduler along with the worker.
But you only need one beat scheduler。
In the production environment, you usually have more than one worker running. Using --beat option will be a disaster.
For example: you have a event scheduled at 12:am each day.
If you started two beat process, the event will run twice at 12:am each day.
If you’ll never run more than one worker node, --beat option if just fine.

Airflow: what do `airflow webserver`, `airflow scheduler` and `airflow worker` exactly do?

I've been working with Airflow for a while now, which was set up by a colleague. Lately I run into several errors, which require me to more in dept know how to fix certain things within Airflow.
I do understand what the 3 processes are, I just don't understand the underlying things that happen when I run them. What exactly happens when I run one of the commands? Can I somewhere see afterwards that they are running? And if I run one of these commands, does this overwrite older webservers/schedulers/workers or add a new one?
Moreover, if I for example run airflow webserver, the screen shows some of the things that are happening. Can I simply get out of this by pressing CTRL + C? Because when I do this, it says things like Worker exiting and Shutting down: Master. Does this mean I'm shutting everything down? How else should I get out of the webserver screen then?
Each process does what they are built to do while they are running (webserver provides a UI, scheduler determines when things need to be run, and workers actually run the tasks).
I think your confusion is that you may be seeing them as commands that tell some sort of "Airflow service" to do something, but they are each standalone commands that start the processes to do stuff. ie. Starting from nothing, you run airflow scheduler: now you have a scheduler running. Run airflow webserver: now you have a webserver running. When you run airflow webserver, it is starting a python flask app. While that process is running, the webserver is running, if you kill command, is goes down.
All three have to be running for airflow as a whole to work (assuming you are using an executor that needs workers). You should only ever had one scheduler running, but if you were to run two processes of airflow webserver (ignoring port conflicts, you would then have two separate http servers running using the same metadata database. Workers are a little different in that you may want multiple worker processes running so you can execute more tasks concurrently. So if you create multiple airflow worker processes, you'll end up with multiple processes taking jobs from the queue, executing them, and updating the task instance with the status of the task.
When you run any of these commands you'll see the stdout and stderr output in console. If you are running them as a daemon or background process, you can check what processes are running on the server.
If you ctrl+c you are sending a signal to kill the process. Ideally for a production airflow cluster, you should have some supervisor monitoring the processes and ensuring that they are always running. Locally you can either run the commands in the foreground of separate shells, minimize them and just keep them running when you need them. Or run them in as a background daemon with the -D argument. ie airflow webserver -D.

Should Restart celery beat after restart celery workers?

i have two systemd service
one handles my celery workers(10 queue for different tasks) and one handles celery beat
after deploying new code i restart celery worker service to get new tasks and update celery jobs
Should i restart celery beat with celery worker service too?
or it gets new tasks automatically ?
It depends on what type of scheduler you're using.
If it's default PersistentScheduler then yes, you need to restart beat daemon to allow it to pick up new configuration from the beat_schedule setting.
But if you're using something like django-celery-beat which allows managing periodic tasks at runtime then you don't have to restart celery beat.