Run celery periodic tasks on one machine only - celery

I am working on a django project where I am using celery. I have three two big modules in the project named app1 and app2. I have created two celery apps for that which are running on two separate machines. In the app1 and app2 there are different tasks which i want to run difference machines and it is working fine. But my problem is that i have some periodic_tasks. I have defined a queue named periodic_tasks for them. I want to run these periodic tasks on a separate third machine. Or on the third machine I want to run only the periodic tasks, and these periodic tasks shouldn't executed from the other two machines. Is it possible using celery.

On your third machine, make sure to start up celery with the -Q or --queues option with periodic_tasks. On app1 and app2, start up celery without the periodic_tasks queue. You can read more about queue handling here: http://docs.celeryproject.org/en/latest/reference/celery.bin.worker.html#cmdoption-celery-worker-Q

Related

How to make celery worker find the tasks to include?

I have a fastapi app and I use celery for some async tasks. I also use docker, so fastapi runs in one container and celery on another. Now I am moving to break the workers into different queues and they will run in different containers. Right now I am using almost the same image for fastapi and celery, but for this new worker I will end up with a image way bigger than it should be since I have code and packages that the worker don't need. To get around that I now have 2 different dockerfiles, one for each worker, but in both of them I have the exact same file for setting up the celery app.
This is the celery I was setting up:
celery_app = Celery(
broker=config.celery_settings.broker_url,
backend=config.celery_settings.result_backend,
include=[
"src.iam.service_layer.tasks",
"src.receipt_tracking.service_layer.tasks",
"src.cfe_scraper.tasks",
],
)
And the idea is to spin off src.cfe_scraper.tasks and in that docker image I will not have src.iam.service_layer.tasks neither src.receipt_tracking.service_layer.tasks. But when I try to build the image I get an error saying that those paths don't exists, which is correct for that case, but if I simple delete the include argument the worker won't have any task registered. Is there an easy way to solve that without having to have two modules to setup different celery apps?

Airflow: what do `airflow webserver`, `airflow scheduler` and `airflow worker` exactly do?

I've been working with Airflow for a while now, which was set up by a colleague. Lately I run into several errors, which require me to more in dept know how to fix certain things within Airflow.
I do understand what the 3 processes are, I just don't understand the underlying things that happen when I run them. What exactly happens when I run one of the commands? Can I somewhere see afterwards that they are running? And if I run one of these commands, does this overwrite older webservers/schedulers/workers or add a new one?
Moreover, if I for example run airflow webserver, the screen shows some of the things that are happening. Can I simply get out of this by pressing CTRL + C? Because when I do this, it says things like Worker exiting and Shutting down: Master. Does this mean I'm shutting everything down? How else should I get out of the webserver screen then?
Each process does what they are built to do while they are running (webserver provides a UI, scheduler determines when things need to be run, and workers actually run the tasks).
I think your confusion is that you may be seeing them as commands that tell some sort of "Airflow service" to do something, but they are each standalone commands that start the processes to do stuff. ie. Starting from nothing, you run airflow scheduler: now you have a scheduler running. Run airflow webserver: now you have a webserver running. When you run airflow webserver, it is starting a python flask app. While that process is running, the webserver is running, if you kill command, is goes down.
All three have to be running for airflow as a whole to work (assuming you are using an executor that needs workers). You should only ever had one scheduler running, but if you were to run two processes of airflow webserver (ignoring port conflicts, you would then have two separate http servers running using the same metadata database. Workers are a little different in that you may want multiple worker processes running so you can execute more tasks concurrently. So if you create multiple airflow worker processes, you'll end up with multiple processes taking jobs from the queue, executing them, and updating the task instance with the status of the task.
When you run any of these commands you'll see the stdout and stderr output in console. If you are running them as a daemon or background process, you can check what processes are running on the server.
If you ctrl+c you are sending a signal to kill the process. Ideally for a production airflow cluster, you should have some supervisor monitoring the processes and ensuring that they are always running. Locally you can either run the commands in the foreground of separate shells, minimize them and just keep them running when you need them. Or run them in as a background daemon with the -D argument. ie airflow webserver -D.

Gracefully update running celery pod in Kubernetes

I have a Kubernetes cluster running Django, Celery, RabbitMq and Celery Beat. I have several periodic tasks spaced out throughout the day (so as to keep server load down). There are only a few hours when no tasks are running, and I want to limit my rolling-updates to those times, without having to track it manually. So I'm looking for a solution that will allow me to fire off a script or task of some sort that will monitor the Celery server, and trigger a rolling update once there's a window in which no tasks are actively running. There are two possible ways I thought of doing this, but I'm not sure which is best, nor how to implement either one.
Run a script (bash or otherwise) that checks up on the Celery server every few minutes, and initiates the rolling-update if the server is inactive
Increment the celery app name before each update (in the Beat run command, the Celery run command, and in the celery.py config file), create a new Celery pod, rolling-update the Beat pod, and then delete the old Celery 12 hours later (a reasonable time span for all running tasks to finish)
Any thoughts would be greatly appreciated.

Can I tie Celery workers to a particular instance given a shared database?

I have a number of machines each with a Django instance, sharing a single Postgres database.
I want to run Celery, preferably using the Django broker and the Postgres database for simplicity. I do not have a high volume of tasks to run, so there is no need to use a different broker for that reason.
I want to run celery tasks which operate on local file storage. This means that I want the celery worker only to run tasks which are on the same machine that triggered the event.
Is this possible with the current setup? If not, how do to it? A local Redis instance for each machine?
I worked out how to make this work. No need for fancy routing or brokers.
I run each celeryd instance with a special queue named after the host. This can be done automatically, like:
./manage.py celeryd -Q celery,`hostname`
I then set up a hostname in the settings.py that stores the hostname:
import socket
CELERY_HOSTNAME = socket.gethostname()
In each Django instance this will have a different value.
I can then specify this queue when I asynchronously call my task:
my_task.apply_async(args=[one, two], queue=settings.CELERY_HOSTNAME)

Using celeryd as a daemon with multiple django apps?

I'm just starting using django-celery and I'd like to set celeryd running as a daemon. The instructions, however, appear to suggest that it can be configured for only one site/project at a time. Can the celeryd handle more than one project, or can it handle only one? And, if this is the case, is there a clean way to set up celeryd to be automatically started for each configuration, which requiring me to create a separate init script for each one?
Like all interesting questions, the answer is it depends. :)
It is definitely possible to come up with a scenario in which celeryd can be used by two independent sites. If multiple sites are submitting tasks to the same exchange, and the tasks do not require access to any specific database -- say, they operate on email addresses, or credit card numbers, or something other than a database record -- then one celeryd may be sufficient. Just make sure that the task code is in a shared module that is loaded by all sites and the celery server.
Usually, though, you'll find that celery needs access to the database -- either it loads objects based on the ID that was passed as a task parameter, or it has to write some changes to the database, or, most often, both. And multiple sites / projects usually don't share a database, even if they share the same apps, so you'll need to keep the task queues separate .
In that case, what will usually happen is that you set up a single message broker (RabbitMQ, for example) with multiple exchanges. Each exchange receives messages from a single site. Then you run one or more celeryd processes somewhere for each exchange (in the celery config settings, you have to specify the exchange. I don't believe celeryd can listen to multiple exchanges). Each celeryd server knows its exchange, the apps it should load, and the database that it should connect to.
To manage these, I would suggest looking into cyme -- It's by #asksol, and manages multiple celeryd instances, on multiple servers if necessary. I haven't tried, but it looks like it should handle different configurations for different instances.
Did not try but using Celery 3.1.x which does not need django-celery, according to the documentation you can instantiate a Celery app like this:
app1 = Celery('app1')
app1.config_from_object('django.conf:settings')
app1.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
But you can use celery multi for launching several workers with single configuration each, you can see examples here. So you can launch several workers with different --app appX parameters so it will use different taks and settings:
# 3 workers: Two with 3 processes, and one with 10 processes.
$ celery multi start 3 -c 3 -c:1 10
celery worker -n celery1#myhost -c 10 --config celery1.py --app app1
celery worker -n celery2#myhost -c 3 --config celery2.py --app app2
celery worker -n celery3#myhost -c 3 --config celery3.py --app app3