Will celery handle tasks gracefully if supervisorctl stop/start/restart is used? - celery

In a case recently, I had to restart some inexplicably idle workers run by supervisord. We are thinking about adding a periodic restart, say, once or twice a day.
This could easily be done using supervisorctl, but is there any chance tasks will be lost while the restart occurs?

Related

task executed slowly after celery serving for long time but still work quickly after celery restart

This kind of issues occurs several times
The log shows that when celery serves for a long time, there was a stuck about 50s. I dont know what happened in that "break"
But it works properly after restarting celery and indeed this task only consumes less than 10s
I suppose there was something overloaded in the backend of celery then be released after restart.
But I have no idea what's the real problem and hardly come up with such issue

Will celeryd restart drop currently running tasks?

If I have two worker processes doing long-time operations. If I use /etc/init.d/celeryd restart as in the official document to restart them when they are in the middle of processing tasks, what happens then? Will they wait till they finish the tasks before shutting down? If new tasks keep coming right now, are they lining up in the queue till any worker finishes restarting? Or will celery start a new worker before old ones are shut down and routing new tasks to it so that there won't be any time no workers are available?

systemd `systemctl stop` aggressively kills subprocesses

I've a daemon-like process that starts two subprocesses (and one of the subprocesses starts ~10 others). When I systemctl stop my process the child subprocesses appear to be 'aggressively' killed by systemctl - which doesn't give my process a chance to clean up.
How do I get systemctl stop to quit the aggressive kill and thus to allow my process to orchestrate an orderly clean up?
I tried timeoutSec=30 to no avail.
KillMode= defaults to control-group. That means every process of your service is killed with SIGTERM.
You have two options:
Handle SIGTERM in each of your processes and shutdown within TimeoutStopSec (which defaults to 90 seconds)
If you really want to delegate the shutdown from your main process, set KillMode=mixed. SIGTERM will be sent to the main process only. Then again shutdown within TimeoutStopSec. If you do not shutdown within TimeoutStopSec, systemd will send SIGKILL to all your processes.
Note: I suggest to use KillMode=mixed in option 2 instead of KillMode=process, as the latter would send the final SIGKILL only to your main process, which means your sub-processes would not be killed if they've locked up.
A late (possible) answer, but as I googled for weeks with a similar issue, finding nothing, I figured I add my solution.
My error was that I ran the systemd unit as root and switched (using sudo) to "the correct" user in the startscript (inherited from SysVinit script).
That starts the processes in the user.slice which is killed mercilessly on shutdown. When I changed the unit file to run as the correct user (USER=myuser) and removed sudo from the start script, the processes start in the system.slice and get properly handled on shutdown.

Gracefully update running celery pod in Kubernetes

I have a Kubernetes cluster running Django, Celery, RabbitMq and Celery Beat. I have several periodic tasks spaced out throughout the day (so as to keep server load down). There are only a few hours when no tasks are running, and I want to limit my rolling-updates to those times, without having to track it manually. So I'm looking for a solution that will allow me to fire off a script or task of some sort that will monitor the Celery server, and trigger a rolling update once there's a window in which no tasks are actively running. There are two possible ways I thought of doing this, but I'm not sure which is best, nor how to implement either one.
Run a script (bash or otherwise) that checks up on the Celery server every few minutes, and initiates the rolling-update if the server is inactive
Increment the celery app name before each update (in the Beat run command, the Celery run command, and in the celery.py config file), create a new Celery pod, rolling-update the Beat pod, and then delete the old Celery 12 hours later (a reasonable time span for all running tasks to finish)
Any thoughts would be greatly appreciated.

Setting up deployments with Capistrano, Sidekiq and Monit

My application uses Sidekiq to handle long (several minutes) running background tasks. Deployments are done with Capistrano 2 and all processes are monitored with Monit.
I have used capistrano-sidekiq to manage the sidekiq process during deployments but it has not worked perfectly. Some times during the deployment a new sidekiq process is started but the old one is not killed. I believe this happens because capistrano-sidekiq is not operating through Monit during the deployment.
Second problem is that because my background tasks can take several minutes to complete my deployment should allow two sidekiq processes to co-exisit. The old sidekiq process should be allowed to complete the tasks it is processing and a new sidekiq process should start taking new tasks into processing.
I have been thinking about something like this into my deploy script
When deployment starts:
I tell Monit to unmonitor the sidekiq process
I stop the current sidekiq process and give it 10 minutes to finish its tasks
After the code has been updated:
I start a new sidekiq process and tell Monit to start monitoring it.
I may need to move the sidekiq process pid file into the release directory if the pid file is not removed until the stopped sidekiq process has eventually been killed.
How does this sound? Any caveats spotted?
EDIT:
Found a good thread about this same issue.
http://librelist.com/browser//sidekiq/2014/6/5/rollback-signal-after-usr1/#f6898deccb46801950f40ad22e75471d
Seems reasonable to me. The only possible issue is losing track of the old Sidekiq's PID but you should be able to use ps and grep for "stopping" to find old Sidekiqs.