From some answer in SO I found that best practice to higher processing rate in celery is using it's concurency feature.
AFAIK celery use multiprocessing for concurency.
One of my task is for running a docker image with -p option.
So if there multiple process run this task,they will conflicting on choosing the host port.
So how to now that the current process is the 1st/2nd/3rd/4th process?
What I want to achieve is that (i.e):
1st process will use 127.0.0.1:8001
2nd process will use 127.0.0.1:8002
etc etc
Sincerely
-bino-
Related
What is the algorithm used to distribute the task load between workers in celery?
I checked the documentation, could not find the info.
This will depend on the broker that is used. For example, for redis, each of worker process uses kombu's redis transport which in turn calls brpop to get the next task available. redis implements brpop using the longest-waiting client algorithm to allocate a certain task to a certain client (waiting celery worker process).
we know we have some badly formed tasks in the celery queue that are crashing our workers, is there an easy way to manually delete them?
We don't want to flush the whole thing as there might be some important emails to be sent...
Use celery flower to manage and monitor the celery task. Refer to https://flower.readthedocs.io/en/latest/
From the celery help function:
> celery worker -h
...
Embedded Beat Options:
-B, --beat Also run the celery beat periodic task scheduler. Please note that there must only be
one instance of this service. .. note:: -B is meant to be used for development
purposes. For production environment, you need to start celery beat separately.
This also appears in the docs.
You can also embed beat inside the worker by enabling the workers -B
option, this is convenient if you’ll never run more than one worker
node, but it’s not commonly used and for that reason isn’t recommended
for production use:
celery -A proj worker -B
But it's not actually explained why it's "bad" to use this in production. Would love some insight.
The --beat option will start a beat scheduler along with the worker.
But you only need one beat scheduler。
In the production environment, you usually have more than one worker running. Using --beat option will be a disaster.
For example: you have a event scheduled at 12:am each day.
If you started two beat process, the event will run twice at 12:am each day.
If you’ll never run more than one worker node, --beat option if just fine.
I know that it's currently possible to use Django Celery to schedule tasks using Django's built-in ORM, but is there a way to use MongoDB in this regard?
(I'm not asking about brokers or result backends, as I know that Celery supports those, I'm specifically asking about scheduling.)
I think what you're looking for is celerybeat-mongo
Yes it posible. I just use worker -B arguments for my workers.
I do not sure that celery scheduler put this tasks in queue (mongodb in this case), because it already runing on backend. But you always can trigger (delay) task inside schedule task.
I'm just starting using django-celery and I'd like to set celeryd running as a daemon. The instructions, however, appear to suggest that it can be configured for only one site/project at a time. Can the celeryd handle more than one project, or can it handle only one? And, if this is the case, is there a clean way to set up celeryd to be automatically started for each configuration, which requiring me to create a separate init script for each one?
Like all interesting questions, the answer is it depends. :)
It is definitely possible to come up with a scenario in which celeryd can be used by two independent sites. If multiple sites are submitting tasks to the same exchange, and the tasks do not require access to any specific database -- say, they operate on email addresses, or credit card numbers, or something other than a database record -- then one celeryd may be sufficient. Just make sure that the task code is in a shared module that is loaded by all sites and the celery server.
Usually, though, you'll find that celery needs access to the database -- either it loads objects based on the ID that was passed as a task parameter, or it has to write some changes to the database, or, most often, both. And multiple sites / projects usually don't share a database, even if they share the same apps, so you'll need to keep the task queues separate .
In that case, what will usually happen is that you set up a single message broker (RabbitMQ, for example) with multiple exchanges. Each exchange receives messages from a single site. Then you run one or more celeryd processes somewhere for each exchange (in the celery config settings, you have to specify the exchange. I don't believe celeryd can listen to multiple exchanges). Each celeryd server knows its exchange, the apps it should load, and the database that it should connect to.
To manage these, I would suggest looking into cyme -- It's by #asksol, and manages multiple celeryd instances, on multiple servers if necessary. I haven't tried, but it looks like it should handle different configurations for different instances.
Did not try but using Celery 3.1.x which does not need django-celery, according to the documentation you can instantiate a Celery app like this:
app1 = Celery('app1')
app1.config_from_object('django.conf:settings')
app1.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
But you can use celery multi for launching several workers with single configuration each, you can see examples here. So you can launch several workers with different --app appX parameters so it will use different taks and settings:
# 3 workers: Two with 3 processes, and one with 10 processes.
$ celery multi start 3 -c 3 -c:1 10
celery worker -n celery1#myhost -c 10 --config celery1.py --app app1
celery worker -n celery2#myhost -c 3 --config celery2.py --app app2
celery worker -n celery3#myhost -c 3 --config celery3.py --app app3