howto: elastic beanstalk + deploy docker + graceful shutdown - deployment

Hi great people of stackoverflow,
Were hosting a docker container on EB with an nodejs based code running on it.
When redeploying our docker container we'd like the old one to do a graceful shutdown.
I've found help & guides on how our code could receive a sigterm signal produced by 'docker stop' command.
However further investigation into the EB machine running docker at:
/opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh
shows that when "flipping" from current to the new staged container, the old one is killed with 'docker kill'
Is there any way to change this behaviour to docker stop?
Or in general a recommended approach to handling graceful shutdown of the old container?
Thanks!

Self answering as I've found a solution that works for us:
tl;dr: use .ebextensions scripts to run your script before 01flip, your script will make sure a graceful shutdown of whatevers inside the docker takes place
first,
your app (or whatever your'e running in docker) has to be able to catch a signal, SIGINT for example, and shutdown gracefully upon it.
this is totally unrelated to Docker, you can test it running wherever (locally for example)
There is a lot of info about getting this kind of behaviour done for different kind of apps on the net (be it ruby, node.js etc...)
Second,
your EB/Docker based project can have a .ebextensions folder that holds all kinda of scripts to execute while deploying.
we put 2 custom scripts into it, gracefulshutdown_01.config and gracefulshutdown_02.config file that looks something like this:
# gracefulshutdown_01.config
commands:
backup-original-flip-hook:
command: cp -f /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh /opt/elasticbeanstalk/hooks/appdeploy/01flip.sh.bak
test: '[ ! -f /opt/elasticbeanstalk/hooks/appdeploy/01flip.sh.bak ]'
cleanup-custom-hooks:
command: rm -f 05gracefulshutdown.sh
cwd: /opt/elasticbeanstalk/hooks/appdeploy/enact
ignoreErrors: true
and:
# gracefulshutdown_02.config
commands:
reorder-original-flip-hook:
command: mv /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh /opt/elasticbeanstalk/hooks/appdeploy/enact/10flip.sh
test: '[ -f /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh ]'
files:
"/opt/elasticbeanstalk/hooks/appdeploy/enact/05gracefulshutdown.sh":
mode: "000755"
owner: root
group: root
content: |
#!/bin/sh
# find currently running docker
EB_CONFIG_DOCKER_CURRENT_APP_FILE=$(/opt/elasticbeanstalk/bin/get-config container -k app_deploy_file)
EB_CONFIG_DOCKER_CURRENT_APP=""
if [ -f $EB_CONFIG_DOCKER_CURRENT_APP_FILE ]; then
EB_CONFIG_DOCKER_CURRENT_APP=`cat $EB_CONFIG_DOCKER_CURRENT_APP_FILE | cut -c 1-12`
echo "Graceful shutdown on app container: $EB_CONFIG_DOCKER_CURRENT_APP"
else
echo "NO CURRENT APP TO GRACEFUL SHUTDOWN FOUND"
exit 0
fi
# give graceful kill command to all running .js files (not stats!!)
docker exec $EB_CONFIG_DOCKER_CURRENT_APP sh -c "ps x -o pid,command | grep -E 'workers' | grep -v -E 'forever|grep' " | awk '{print $1}' | xargs docker exec $EB_CONFIG_DOCKER_CURRENT_APP kill -s SIGINT
echo "sent kill signals"
# wait (max 5 mins) until processes are done and terminate themselves
TRIES=100
until [ $TRIES -eq 0 ]; do
PIDS=`docker exec $EB_CONFIG_DOCKER_CURRENT_APP sh -c "ps x -o pid,command | grep -E 'workers' | grep -v -E 'forever|grep' " | awk '{print $1}' | cat`
echo TRIES $TRIES PIDS $PIDS
if [ -z "$PIDS" ]; then
echo "finished graceful shutdown of docker $EB_CONFIG_DOCKER_CURRENT_APP"
exit 0
else
let TRIES-=1
sleep 3
fi
done
echo "failed to graceful shutdown, please investigate manually"
exit 1
gracefulshutdown_01.config is a small util that backups the original flip01 and deletes (if exists) our custom script.
gracefulshutdown_02.config is where the magic happens.
it creates a 05gracefulshutdown enact script and makes sure flip will happen afterwards by renaming it to 10flip.
05gracefulshutdown, the custom script, does this basically:
find current running docker
find all processes that need to be sent a SIGINT (for us its processes with 'workers' in its name
send a sigint to the above processes
loop:
check if processes from before were killed
continue looping for an amount of tries
if tries are over, exit with status "1" and dont continue to 10flip, manual interference is needed.
this assumes you only have 1 docker running on the machine, and that you are able to manually hop on to check whats wrong in the case it fails (for us never happened yet).
I imagine it can also be improved in many ways, so have fun.

Related

MongoDB Docker container: ERROR: Cannot write pid file to /tmp/tmp.aLmNg7ilAm: No space left on device

I started a MongoDB container like so:
docker run -d -p 27017:27017 --net=cdt-net --name cdt-mongo mongo
I saw that my MongoDB container exited:
0e35cf68a29c mongo "docker-entrypoint.s…" Less than a second ago Exited (1) 3 seconds ago cdt-mongo
I checked my Docker logs, I see:
$ docker logs 0e35cf68a29c
about to fork child process, waiting until server is ready for connections.
forked process: 21
2018-01-12T23:42:03.413+0000 I CONTROL [main] ***** SERVER RESTARTED *****
2018-01-12T23:42:03.417+0000 I CONTROL [main] ERROR: Cannot write pid file to /tmp/tmp.aLmNg7ilAm: No space left on device
ERROR: child process failed, exited with error number 1
Does anyone know what this error is about? Not enough space in the container?
I had to delete old Docker images to free up space, here are the commands I used:
# remove all unused / orphaned images
echo -e "Removing unused images..."
docker rmi -f $(docker images --no-trunc | grep "<none>" | awk "{print \$3}") 2>&1 | cat;
echo -e "Done removing unused images"
# clean up stuff -> using these instructions https://lebkowski.name/docker-volumes/
echo -e "Cleaning up old containers..."
docker ps --filter status=dead --filter status=exited -aq | xargs docker rm -v 2>&1 | cat;
echo -e "Cleaning up old volumes..."
docker volume ls -qf dangling=true | xargs docker volume rm 2>&1 | cat;
We've experienced this problem recently while using docker-compose with mongo and a bunch of other services. There are two fixes which have worked for us.
Clear down unused stuff
# close down all services
docker-compose down
# clear unused docker images
docker system prune
# press y
Increase the image memory available to docker - this will depend on your installation of docker. On Mac, for example, it defaults to 64Gb and we doubled it to 128Gb via the UI.
We've had this problem in both Windows and Mac and the above fixed it.

How can I have a custom restart script for runit?

I'm using runit to manage an HAProxy and want to do a safe restart to reload a configuration file (specifically: haproxy -f /etc/haproxy/haproxy.cfg -sf $OLD_PROCESS_ID). I figure that I could run sv restart haproxy and tried to add a custom script named /etc/service/haproxy/restart, but it never seems to execute. How do I have a special restart script? Is my approach even good here? How do I reload my config with minimal impact using runit?
HAProxy runit service script
/etc/service/haproxy/run
#!/bin/sh
#
# runit haproxy
#
# forward stderr to stdout for use with runit svlogd
exec 2>&1
PID_PATH=/var/run/haproxy/haproxy.pid
BIN_PATH=/opt/haproxy/sbin/haproxy
CFG_PATH=/opt/haproxy/etc/haproxy.cfg
exec /bin/bash <<EOF
$BIN_PATH -f $CFG_PATH -D -p $PID_PATH
trap "echo SIGHUP caught; $BIN_PATH -f $CFG_PATH -D -p $PID_PATH -sf \\\$(cat $PID_PATH)" SIGHUP
trap "echo SIGTERM caught; kill -TERM \\\$(cat $PID_PATH) && exit 0" SIGTERM SIGINT
while true; do # Iterate to keep job running.
sleep 1 # Wake up to handle signals
done
EOF
Graceful reload that keeps things up and running.
sv reload haproxy
Full stop and start.
sv restart haproxy
This solution was inspired by https://gist.github.com/gfrey/8472007

How do you stop a perl Dancer/Starman/Plack server?

I started a Dancer/Starman server using:
sudo plackup -s Starman -p 5001 -E deployment --workers=10 -a mywebapp/bin/app.pl
but I'm unsure how I can stop the server. Can someone provide me with a quick way of stopping it and all the workers it has spawned?
Use the
--pid /path/to/the/pid.file
and you can kill the process based on his PID
So, using the above options, you can use
kill $(cat /path/to/the/pid.file)
the pid.file simply stores the master's PID - don't need analyze the ps output...
pkill -f starman
Kill processes based on name.
On Windows you can do "CTRL + C" like making a copy but Cancel in this case. Tested working.

Why is sleep needed after fabric call to pg_ctl restart

I'm using Fabric to initialize a postgres server. I have to add a "sleep 1" at the end of the command or the postgres server processes die without explanation or an entry in the log:
sudo('%(pgbin)s/pg_ctl -D %(pgdata)s -l /tmp/pg.log restart && sleep 1' % env, user='postgres')
That is, I see this output on the terminal:
[dbserv] Executing task 'setup_postgres'
[dbserv] run: /bin/bash -l -c "sudo -u postgres /usr/lib/postgresql/9.1/bin/pg_ctl -D /data/pg -l /tmp/pg.log restart && sleep 1"
[dbserv] out: waiting for server to shut down.... done
[dbserv] out: server stopped
[dbserv] out: server starting
Without the && sleep 1, there's nothing in /tmp/pg.log (though the file is created), and no postgres processes are running. With the sleep, everything works fine.
(And if I execute the same command directly on target machine's command line, it works fine without the sleep.)
Since it's working, it doesn't really matter, but I'm asking anyway: Does someone know what the sleep is allowing to happen and why?
You might try also using the pty option set it to false and see if it's related to how fabric handles pseudo-ttys.

Boto - how do I wait for a background process (e.g. mdadm) to finish before running a new command?

I'm scripting up my amazon deployment, and I haven't managed to automate a step in it.
The step is between setting up RAID (via mdadm) and then installing my db (mongo) on the new mounted directory. This is because I have to wait for mdadm to finish in the background before installing mongo. I know when mdadm is finished by running the following command:
sudo mdadm --detail /dev/md0
When mdadm is still in progress this command will produce a progress indicator e.g.:
Rebuild Status : 2% complete
When mdadm is finished this status will be gone.
Does anyone have a clean solution for being able to tell when mdadm is finished, so that the script can run entirely on its own, and then continue on to install mongo once mdadm is done?
At the moment I'm contemplating placing a script of sorts on the box using boto, running the script from boto, and having the script exit once it parses and reads that mdadm is finished...
Thanks a lot for your help!
I am using:
mdadm --wait /dev/md0
Note that the above command will return a non-zero exit status if it did not have to wait...you might need to take that into account in the script.
Ok... so as I said I'll post up my solution (I'm completely new to writing bash scripts, so if you have any advice for improvement I'm all ears!!!)
#!/bin/bash
false=1
true=0
function drives_are_ready {
RAID_INFO=`sudo mdadm --detail /dev/md0`
rebuild_status_line_count=`echo "$RAID_INFO" | grep "Rebuild Status" | wc -l`
echo `echo "$RAID_INFO" | grep "Rebuild Status"`
if (( rebuild_status_line_count == 0 )); then
return $true
else
return $false
fi
}
drives_are_ready
raid_is_finished=$?
while (( raid_is_finished == $false )); do
echo "RAID isn't finished yet... will check again in 10s"
sleep 10s
drives_are_ready
raid_is_finished=$?
done
echo "RAID is done."
I scp the file to my instance, and then chmod and run the script via boto.
You don't necessarily need to wait for the superblock resynchronization before using the disks, but in my (and I'm sure yours as well) experience, it is a very good idea with ec2 instances.
You could simply check for it in a bash while loop
#!/bin/bash
... stuff in your script that doesn't require raid ...
# Pause until mdadm --detail returns nothing
while [[ `sudo mdadm --detail /dev/md0 | grep 'Rebuild Status'` != '' ]] do
sleep 30
done
echo "Raid superblock resynchronization complete"
... stuff in your script that requires raid synchronization to be complete...