Why does Celery discourage worker & beat together? - celery

From the celery help function:
> celery worker -h
...
Embedded Beat Options:
-B, --beat Also run the celery beat periodic task scheduler. Please note that there must only be
one instance of this service. .. note:: -B is meant to be used for development
purposes. For production environment, you need to start celery beat separately.
This also appears in the docs.
You can also embed beat inside the worker by enabling the workers -B
option, this is convenient if you’ll never run more than one worker
node, but it’s not commonly used and for that reason isn’t recommended
for production use:
celery -A proj worker -B
But it's not actually explained why it's "bad" to use this in production. Would love some insight.

The --beat option will start a beat scheduler along with the worker.
But you only need one beat scheduler。
In the production environment, you usually have more than one worker running. Using --beat option will be a disaster.
For example: you have a event scheduled at 12:am each day.
If you started two beat process, the event will run twice at 12:am each day.
If you’ll never run more than one worker node, --beat option if just fine.

Related

how to get process ID of celery concurency in task

From some answer in SO I found that best practice to higher processing rate in celery is using it's concurency feature.
AFAIK celery use multiprocessing for concurency.
One of my task is for running a docker image with -p option.
So if there multiple process run this task,they will conflicting on choosing the host port.
So how to now that the current process is the 1st/2nd/3rd/4th process?
What I want to achieve is that (i.e):
1st process will use 127.0.0.1:8001
2nd process will use 127.0.0.1:8002
etc etc
Sincerely
-bino-

Airflow Celery Worker celery-hostname

On airflow 2.1.3, looking at the documentation for the CLI, airflow celery -h
it shows:
-H CELERY_HOSTNAME, --celery-hostname CELERY_HOSTNAME
Set the hostname of celery worker if you have multiple workers on a single machine
I am familiar with Celery and I know you can run multiple workers on the same machine. But with Airflow (and the celery executor) I don't understand how to do so.
if you do, on the same machine:
> airflow celery worker -H 'foo'
> airflow celery worker -H 'bar'
The second command will fail, complaining about the pid, so then:
> airflow celery worker -H 'bar' --pid some-other-pid-file
This will run another worker and it will sync with the first worker BUT you will get a port binding error since airflow binds the worker process to http://0.0.0.0:8793/ no matter what (unless I missed a parameter?).
It seems you are not suppose to run multiple workers per machines... Then my question is, what is the '-H' (--celery-hostname) option for? How would I use it?
The port that celery listens to is also configurable - it is used to serve logs to the webserver:
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#worker-log-server-port
You can run multiple celery workers with different settings for that port or run them with --skip-serve-logs to not open the webserver in the first place (for example if you send logs to sentry/s3/gcs etc.).
BTW. This sounds strange by the way to run several celery workers because you can utilise the machine CPUS by using parallelism. This seems much more practical an easier to manage.

How to prevent celery.backend_cleanup from executing in default queue

I am using python + flask + SQS and I'm also using celery beat to execute some scheduled tasks.
Recently I went from having one single default "celery" queue to execute all my tasks to having dedicated queues/workers for each task. This includes tasks scheduled by celery beat which now all go to a queue named "scheduler".
Before dropping the "celery" queue, I monitored it to see if any tasks would wind up in that queue. To my surprise, they did.
Since I had no worker consuming from that queue, I could easily inspect the messages which piled up using the AWS console. What is saw was that all tasks were celery.backend_cleanup!!!
I cannot find out from the celery docs how do I prevent this celery.backend_cleanup from getting tossed into this default "celery" queue which I want to get rid of! And the docs on beat do not show an option to pass a queue name. So how do I do this?
This is how I am starting celery beat:
/venv/bin/celery -A backend.app.celery beat -l info --pidfile=
And this is how I am starting the worker
/venv/bin/celery -A backend.app.celery worker -l info -c 2 -Ofair -Q scheduler
Keep in mind, I don't want to stop backend_cleanup from executing, I just want it to go in whatever queue I specify.
Thanks ahead for the assistance!
You can override this in the beat task setup. You could also change the scheduled time to run here if you wanted to.
app.conf.beat_schedule = {
'backend_cleanup': {
'task': 'celery.backend_cleanup',
'options': {'queue': <name>,
'exchange': <name>,
'routing_key': <name>}
}
}

What if i schedule tasks for celery to perform every minute and it is not able to complete it in time?

If I schedule the task for every minute and if it is not able to be getting completed in the time(one minute). Would the task wait in queue and it will go on like this? if this happens then after few hours it will be overloaded. Is there any solution for this kind of problems?
I am using beat and worker combination for this. It is working fine for less records to perform tasks. but for large database, I think this could cause problem.
Task is assign to queue (RabbitMQ for example).
Workers are queue consumers, more workers (or worker with high concurrency) - more tasks could be handled in parallel.
Your periodic task produce messages of the same type (I guess) and your celery router route them to the same queue.
Just set your workers to consume messages from that queue and that's all.
celery worker -A celeryapp:app -l info -Q default -c 4 -n default_worker#%h -Ofair
In the example above I used -c 4 for concurrency of four (eqv. to 4 consumers/workers). You can also start move workers and let them consume from the same queue with -Q <queue_name> (in my example it's default queue).
EDIT:
When using celery (the worker code) you are initiate Celery object. In Celery constructor you are setting your broker and backend (celery used them as part of the system)
for more info: http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html#application

Using Celery queues with multiple apps

How do you use a Celery queue with the same name for multiple apps?
I have an application with N client databases, which all require Celery task processing on a specific queue M.
For each client database, I have a separate celery worker that I launch like:
celery worker -A client1 -n client1#%h -P solo -Q long
celery worker -A client2 -n client2#%h -P solo -Q long
celery worker -A client3 -n client3#%h -P solo -Q long
When I ran all the workers at once, and tried to kick off a task to client1, I found it never seemed to execute. Then I killed all workers except for the first, and now the first worker receives and executes the task. It turned out that even though each worker's app used a different BROKER_URL, using the same queue caused them to steal each others tasks.
This surprised me, because if I don't specify -Q, meaning Celery pulls from the "default" queue, this doesn't happen.
How do I prevent this with my custom queue? Is the only solution to include a client ID in the queue name? Or is there a more "proper" solution?
For multiple applications I use different Redis databases like
redis://localhost:6379/0
redis://localhost:6379/1
etc.