How to fix issue with celery workers writing payloads to the same file but output is often malformed - celery

A legacy django & celery based service has been writing payloads in a newline to a file using celery workers. But I noticed a lot of malformed payloads ... A payload would start in the middle of another.
To do RCA, I tried logging all payloads to the celery task logger. Even in the logger, a log would start in the middle of another.
Could all this be because multiple celery workers are writing to the same file? How can I avoid this ... I'm okay with each worker writing to a different file.

Related

Load balancer and celery result backends

I have a task that takes approximately 3 minutes to run. It pulls data from a remote server and makes cpu-intensive analysis on it. This task will be invoked by an api call. Upon the api call, i am planning to give client a unique task id and assign the task to a celery worker. Then the client will poll the server with the given task id to see if the task is completed by celery worker and its result it saved to a result backend. I think of using nginx, gunicorn, flask and dockerize them for a easy deploy in case i need to distribute this architecture across multiple machines.
The problem is that the client may poll different servers due to load balancer and if not handled well, the polled server’s celery’s result backend might not have the task’s result but other server’s celery result backend has it.
Is it possible to use a single result backend over multiple celery instances and make different celery instances wuery the same result backend? What might be other possible ways to solve this other than using cloud storage like S3?
Would I have this problem only if I have multiple machines or would it happen even if I have multiple gunicorn instances in a single machine where nginx acts as a load balancer on them?
Not that it is possible to use a single result backend by all Celery workers, but that is the only setting that makes sense! Same goes for the broker in most cases, unless you have a complicated Celery infrastructure with exchanges, and complicated routes...

How to debug celery delays and errors?

I am continuing Django project of someone who is using Celery along with Mandrill. There are daily reports which are sent to customers and due to some reason not a single mail is sent for three days, gets accumulated and sent together after three days. Since I am new to Celery, I want to know how to debug celery delays and errors, what are popular commands and execution path to follow?
Short tips:
Set debug=True in celery config, it will take you register and execution time for every task.
Install flower, popular tool for monitoring tasks
Use sentry for handy error tracking and aggregation
Happy debugging ;)

Celery Workers as mirror frontends for Webservices

I am looking for a way to distribute jobs over SOAP-based Web-Services that can be randomly switched on and off on the Cloud, and can exist in one or several instances.
I went through the tutorials of Celery, and it seems a very interesting tool to distribute tasks.
However in my case, I don't have access to the hosts of the SOAP webservices , so I can't add any extra services on them. And I can't turn them into "worker nodes" for Celery.
I thought I could maybe create "mirrors" worker-nodes (one per SOAP web-services) on the machine that will be the like an intermediary between the Celery client and the SOAP-services.
My knowledge in Celery being limited, I wonder if this can be a good solution, and what would be the limits.
I have read in the documentation that it is possible to tune the number of processes executed on a machine with:
CELERYD_CONCURRENCY
The default value being CELERYD_CONCURRENCY = number of CPUs
It seems to me that I can use this option on the "Mirrors Workers" that would stand all on the same machine, each "mirror worker" have a CELERYD_CONCURRENCY value corresponding to how many execute I would allow on each SOAP service.
Does it seems acheivable with Celery, or is it very "hacky" ?

Using celeryd as a daemon with multiple django apps?

I'm just starting using django-celery and I'd like to set celeryd running as a daemon. The instructions, however, appear to suggest that it can be configured for only one site/project at a time. Can the celeryd handle more than one project, or can it handle only one? And, if this is the case, is there a clean way to set up celeryd to be automatically started for each configuration, which requiring me to create a separate init script for each one?
Like all interesting questions, the answer is it depends. :)
It is definitely possible to come up with a scenario in which celeryd can be used by two independent sites. If multiple sites are submitting tasks to the same exchange, and the tasks do not require access to any specific database -- say, they operate on email addresses, or credit card numbers, or something other than a database record -- then one celeryd may be sufficient. Just make sure that the task code is in a shared module that is loaded by all sites and the celery server.
Usually, though, you'll find that celery needs access to the database -- either it loads objects based on the ID that was passed as a task parameter, or it has to write some changes to the database, or, most often, both. And multiple sites / projects usually don't share a database, even if they share the same apps, so you'll need to keep the task queues separate .
In that case, what will usually happen is that you set up a single message broker (RabbitMQ, for example) with multiple exchanges. Each exchange receives messages from a single site. Then you run one or more celeryd processes somewhere for each exchange (in the celery config settings, you have to specify the exchange. I don't believe celeryd can listen to multiple exchanges). Each celeryd server knows its exchange, the apps it should load, and the database that it should connect to.
To manage these, I would suggest looking into cyme -- It's by #asksol, and manages multiple celeryd instances, on multiple servers if necessary. I haven't tried, but it looks like it should handle different configurations for different instances.
Did not try but using Celery 3.1.x which does not need django-celery, according to the documentation you can instantiate a Celery app like this:
app1 = Celery('app1')
app1.config_from_object('django.conf:settings')
app1.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
But you can use celery multi for launching several workers with single configuration each, you can see examples here. So you can launch several workers with different --app appX parameters so it will use different taks and settings:
# 3 workers: Two with 3 processes, and one with 10 processes.
$ celery multi start 3 -c 3 -c:1 10
celery worker -n celery1#myhost -c 10 --config celery1.py --app app1
celery worker -n celery2#myhost -c 3 --config celery2.py --app app2
celery worker -n celery3#myhost -c 3 --config celery3.py --app app3

Queuing systems - what is a good way to start up multiple workers?

How have you set-up one or more worker scripts for queue-oriented systems?
How do you arrange to startup - and restart if necessary - worker scripts as required? (I'm thinking about such tools as init.d/, Ruby-based 'god', DJB's Daemontools, etc, etc)
I'm developing an asynchronous queue/worker system, in this case using PHP & BeanstalkdD (though the actual language and daemon isn't important). The tasks themselves are not too hard - encoding an array with the commands and parameters into JSON for transport through the Beanstalkd daemon, picking them up in a worker script to action them as required.
There are a number of other similar queue/worker setups out there, such as Starling, Gearman, Amazon's SQS and other more 'enterprise' oriented systems like IBM's MQ and RabbitMQ. If you run something like Gearman, or SQS - how do you start and control the worker pool? The questions is on the initial worker startup, and then being able to add additional extra workers, shutting them down at will (though I can send a message through the queue to shut them down - as long as some 'watcher' won't automatically restart them). This is not a PHP problem, it's about straight Unix processes of setting up one or more processes to run on startup, or adding more workers to the pool.
A bash script to loop a script is already in place - this calls the PHP script which then collects and runs tasks from the queue, occasionally exiting to be able to clean itself up (it can also pause a few seconds on failure, or via a planned event). This works fine, and building the worker processes on top of that won't be very hard at all.
Getting a good worker controller system is about flexibility, starting one or two automatically on a machine start, and being able to add a couple more from the command line when the queue is busy, shutting down the extras when no longer required.
I've been helping a friend who's working on a project that involves a Gearman-based queue that will dispatch various asynchronous jobs to various PHP and C daemons on a pool of several servers.
The workers have been designed to behave just like classic unix/linux daemons, thanks to simple shell scripts in /etc/init.d/, and commands like :
invoke-rc.d myWorker start|stop|restart|reload
This mechanism is simple and efficient. And as it relies on standard linux features, even people with a limited knowledge of your app can launch a daemon or stop one, if they know how it's called system-wise (aka "myWorker" in the above example).
Another advantage of this mechanism is it makes your workers pool management easy as well. You could have 10 daemons on your machine (myWorker1, myWorker2, ...) and have a "worker manager" start or stop them depending on the queue length. And as these commands can be run through ssh, you can easily manage several servers.
This solution may sound cheap, but if you build it with well-coded daemons and reliable management scripts, I don't see why it would be less efficient than big-bucks solutions, for any average (as in "non critical") project.
Real message queuing middleware like WebSphere MQ or MSMQ offer "triggers" where a service that is part of the MQM will start a worker when new messages are placed into a queue.
AFAIK, no "web service" queuing system can do that, by the nature of the beast. However I have only looked hard at SQS. There you have to poll the queue, and in Amazon's case overly eager polling is going to cost you some real $$.
I've recently been working on such a tool. It's not entirely finished (thought it should take more than a few more days before I hit something I could call 1.0) and clearly not ready for production yet, but the important part are already coded. Anybody can have a look at the code here: https://gitorious.org/workers_pool.
Supervisor is a good monitor tool. It includes a web UI where you can monitor and manage workers.
Here is a simple config file for a worker.
[program:demo]
command=php worker.php ; php command to run worker file
numprocs=2 ; number of processes
process_name=%(program_name)s_%(process_num)03d ; unique name for each process if numprocs > 1
directory=/var/www/demo/ ; directory containing worker file
stdout_logfile=/var/www/demo/worker.log ; log file location
autostart=true ; auto start program when supervisor starts
autorestart=true ; auto restart program if it exits