too many celery processes running - celery

I have a tornado application with a celery application initiated like this.
celery worker --app=pluto -l INFO -Q pluto -c 4 -Ofair --logfile=pluto-celery.log
The application is eating lot of memory, when I do an htops i see a lot of celery application (around 30), most of them sleeping but consuming 0.1% memory. Why do I have so many processes, is there anything wrong with the command I am using, I tried to look, but could not find about -c 4 part of that command. How can I find more info/debug this memory leak issue ?

Related

What does celery daemonization mean?

Can I know what does celery daemonization mean? Also, I would like to start running both celery worker and celery beat using single command. Anyway to do it (one way I can think of is using supervisor module for worker and beat and then writing the starting scripts for those in a separate .sh file and running that script..any other way?)
In other terms, I can start the worker and beat process as background process manually even..right? So, daemonization in celery just runs the processes as background processes or is there anything else?

Why does Celery discourage worker & beat together?

From the celery help function:
> celery worker -h
...
Embedded Beat Options:
-B, --beat Also run the celery beat periodic task scheduler. Please note that there must only be
one instance of this service. .. note:: -B is meant to be used for development
purposes. For production environment, you need to start celery beat separately.
This also appears in the docs.
You can also embed beat inside the worker by enabling the workers -B
option, this is convenient if you’ll never run more than one worker
node, but it’s not commonly used and for that reason isn’t recommended
for production use:
celery -A proj worker -B
But it's not actually explained why it's "bad" to use this in production. Would love some insight.
The --beat option will start a beat scheduler along with the worker.
But you only need one beat scheduler。
In the production environment, you usually have more than one worker running. Using --beat option will be a disaster.
For example: you have a event scheduled at 12:am each day.
If you started two beat process, the event will run twice at 12:am each day.
If you’ll never run more than one worker node, --beat option if just fine.

What if i schedule tasks for celery to perform every minute and it is not able to complete it in time?

If I schedule the task for every minute and if it is not able to be getting completed in the time(one minute). Would the task wait in queue and it will go on like this? if this happens then after few hours it will be overloaded. Is there any solution for this kind of problems?
I am using beat and worker combination for this. It is working fine for less records to perform tasks. but for large database, I think this could cause problem.
Task is assign to queue (RabbitMQ for example).
Workers are queue consumers, more workers (or worker with high concurrency) - more tasks could be handled in parallel.
Your periodic task produce messages of the same type (I guess) and your celery router route them to the same queue.
Just set your workers to consume messages from that queue and that's all.
celery worker -A celeryapp:app -l info -Q default -c 4 -n default_worker#%h -Ofair
In the example above I used -c 4 for concurrency of four (eqv. to 4 consumers/workers). You can also start move workers and let them consume from the same queue with -Q <queue_name> (in my example it's default queue).
EDIT:
When using celery (the worker code) you are initiate Celery object. In Celery constructor you are setting your broker and backend (celery used them as part of the system)
for more info: http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html#application

Using Celery queues with multiple apps

How do you use a Celery queue with the same name for multiple apps?
I have an application with N client databases, which all require Celery task processing on a specific queue M.
For each client database, I have a separate celery worker that I launch like:
celery worker -A client1 -n client1#%h -P solo -Q long
celery worker -A client2 -n client2#%h -P solo -Q long
celery worker -A client3 -n client3#%h -P solo -Q long
When I ran all the workers at once, and tried to kick off a task to client1, I found it never seemed to execute. Then I killed all workers except for the first, and now the first worker receives and executes the task. It turned out that even though each worker's app used a different BROKER_URL, using the same queue caused them to steal each others tasks.
This surprised me, because if I don't specify -Q, meaning Celery pulls from the "default" queue, this doesn't happen.
How do I prevent this with my custom queue? Is the only solution to include a client ID in the queue name? Or is there a more "proper" solution?
For multiple applications I use different Redis databases like
redis://localhost:6379/0
redis://localhost:6379/1
etc.

In celery, what would be the purpose of having multiple workers process the same queue?

In the documentation for celeryd-multi, we find this example:
# Advanced example starting 10 workers in the background:
# * Three of the workers processes the images and video queue
# * Two of the workers processes the data queue with loglevel DEBUG
# * the rest processes the default' queue.
$ celeryd-multi start 10 -l INFO -Q:1-3 images,video -Q:4,5 data
-Q default -L:4,5 DEBUG
( From here: http://docs.celeryproject.org/en/latest/reference/celery.bin.celeryd_multi.html#examples )
What would be a practical example of why it would be good to have more than one worker on a single host process the same queue, as in the above example? Isn't that what setting the concurrency is for?
More specifically, would there be any practical difference between the following two lines (A and B)?:
A:
$ celeryd-multi start 10 -c 2 -Q data
B:
$ celeryd-multi start 1 -c 20 -Q data
I am concerned that I am missing some valuable bit of knowledge about task queues by my not understanding this practical difference, and I would greatly appreciate if somebody could enlighten me.
Thanks!
What would be a practical example of why it would be good to have more
than one worker on a single host process the same queue, as in the
above example?
Answer:
So, you may want to run multiple worker instances on the same machine
node if:
You're using the multiprocessing pool and want to consume messages in parallel. Some report better performance using multiple worker
instances instead of running a single instance with many pool
workers.
You're using the eventlet/gevent (and due to the infamous GIL, also the 'threads') pool), and you want to execute tasks on multiple CPU
cores.
Reference: http://www.quora.com/Celery-distributed-task-queue/What-is-the-difference-between-workers-and-processes