Does Pgbouncer answer to kill command? - postgresql

I have configured Pgbouncer as a sidecar pattern in one of my pods in Azure Kubernetes based on Azure Oss Db Tools Pgbouncer Sidecar documentation. It has the following container lifecycle hook:
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "killall -INT pgbouncer && sleep 120"]
I believe the intended purpose of this command is to wait 120 seconds until any running query finishes.
To understand what it does, I opened two interactive shells inside the Pgbouncer container. In the first shell, I executed the killall command and in the second shell, I executed ps command multiple times.
First shell:
/ $ ps
PID USER TIME COMMAND
1 postgres 0:00 /usr/bin/pgbouncer /etc/pgbouncer/pgbouncer.ini
6 postgres 0:00 /bin/sh
30 postgres 0:00 ps
/ $
/ $
/ $ killall -INT pgbouncer && sleep 120
Second shell:
/ $ ps
PID USER TIME COMMAND
1 postgres 0:00 /usr/bin/pgbouncer /etc/pgbouncer/pgbouncer.ini
6 postgres 0:00 /bin/sh
33 postgres 0:00 /bin/sh
40 postgres 0:00 sleep 120
41 postgres 0:00 ps
/ $
/ $
/ $ ps
PID USER TIME COMMAND
1 postgres 0:00 /usr/bin/pgbouncer /etc/pgbouncer/pgbouncer.ini
6 postgres 0:00 /bin/sh
33 postgres 0:00 /bin/sh
42 postgres 0:00 ps
After 120 seconds, Pgbouncer main process is still running (See the output from the second shell). I thought this should have terminated both my terminal sessions since it was supposed to kill Pgbouncer process (PID = 1) and stop the container.
If I try to kill using the below command:
/ $ kill 1
/ $ command terminated with exit code 137
I see that both my terminal sessions are immediately terminated and the container is stopped.
I want to understand whether we really need this lifecycle hook since it is not properly working? Or did I make any mistake trying to understand what it does?
Thanks for the help! 🙏

There is a difference here. The killall -INT sends INT signal while kill sends TERM signal if no signal is specified. You can try again with kill -INT 1 to see whether it gets the same behavior. I think the pgbouncer process is also catching INT.
Here is reference to the site:
int cf_shutdown; /* 1 - wait for queries to finish, 2 - shutdown immediately */
...
static void handle_sigterm(evutil_socket_t sock, short flags, void *arg)
{
log_info("got SIGTERM, fast exit");
/* pidfile cleanup happens via atexit() */
exit(1);
}
static void handle_sigint(evutil_socket_t sock, short flags, void *arg)
{
log_info("got SIGINT, shutting down");
sd_notify(0, "STOPPING=1");
if (cf_reboot)
die("takeover was in progress, going down immediately");
if (cf_pause_mode == P_SUSPEND)
die("suspend was in progress, going down immediately");
cf_pause_mode = P_PAUSE;
cf_shutdown = 1;
}
Pgbouncer stops processing any more queries when the INT signal is given while the TERM signal will terminate the process immediately.

Related

Hiding Docker Exec output of a Postgres query

I'm using a Powershell script to start a Docker container with Postgres:
docker run -p ${host_port}:${remote_port} --name $container_name -d $database_name
# 0b. Wait for the container and the postgres database to be ready
Do
{
echo "Waiting for database system to start up..."
$timeout++
sleep 1
} until ((docker exec $container_name psql --username=$database_user_name --dbname=$database_name --command="SELECT 1;") -Or ($timeout -eq $timeout_limit))
if ($timeout -eq $timeout_limit)
{
Throw "Database system failed to start up."
exit
}
else {
# Do stuff
}
An issue I ran into was that the database was not always ready by the time I imported my schema. I added the Do-Until loop to continue "pinging" the database until it gets a response, or until 10 seconds have passed.
This works okay. And this is the console output:
Waiting for database system to start up...
psql: FATAL: the database system is starting up
Waiting for database system to start up...
Is there any way to prevent that second line from appearing?
psql: FATAL: the database system is starting up
I've tried to redirect the "pinging" to /dev/null but this fails when executing it like so,
docker exec $container_name psql --username=$database_user_name --dbname=$database_name --command="SELECT 1;" > /dev/null
I guess you have to redirect Stderr too. How does 2>&1 instead of just > work?

howto: elastic beanstalk + deploy docker + graceful shutdown

Hi great people of stackoverflow,
Were hosting a docker container on EB with an nodejs based code running on it.
When redeploying our docker container we'd like the old one to do a graceful shutdown.
I've found help & guides on how our code could receive a sigterm signal produced by 'docker stop' command.
However further investigation into the EB machine running docker at:
/opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh
shows that when "flipping" from current to the new staged container, the old one is killed with 'docker kill'
Is there any way to change this behaviour to docker stop?
Or in general a recommended approach to handling graceful shutdown of the old container?
Thanks!
Self answering as I've found a solution that works for us:
tl;dr: use .ebextensions scripts to run your script before 01flip, your script will make sure a graceful shutdown of whatevers inside the docker takes place
first,
your app (or whatever your'e running in docker) has to be able to catch a signal, SIGINT for example, and shutdown gracefully upon it.
this is totally unrelated to Docker, you can test it running wherever (locally for example)
There is a lot of info about getting this kind of behaviour done for different kind of apps on the net (be it ruby, node.js etc...)
Second,
your EB/Docker based project can have a .ebextensions folder that holds all kinda of scripts to execute while deploying.
we put 2 custom scripts into it, gracefulshutdown_01.config and gracefulshutdown_02.config file that looks something like this:
# gracefulshutdown_01.config
commands:
backup-original-flip-hook:
command: cp -f /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh /opt/elasticbeanstalk/hooks/appdeploy/01flip.sh.bak
test: '[ ! -f /opt/elasticbeanstalk/hooks/appdeploy/01flip.sh.bak ]'
cleanup-custom-hooks:
command: rm -f 05gracefulshutdown.sh
cwd: /opt/elasticbeanstalk/hooks/appdeploy/enact
ignoreErrors: true
and:
# gracefulshutdown_02.config
commands:
reorder-original-flip-hook:
command: mv /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh /opt/elasticbeanstalk/hooks/appdeploy/enact/10flip.sh
test: '[ -f /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh ]'
files:
"/opt/elasticbeanstalk/hooks/appdeploy/enact/05gracefulshutdown.sh":
mode: "000755"
owner: root
group: root
content: |
#!/bin/sh
# find currently running docker
EB_CONFIG_DOCKER_CURRENT_APP_FILE=$(/opt/elasticbeanstalk/bin/get-config container -k app_deploy_file)
EB_CONFIG_DOCKER_CURRENT_APP=""
if [ -f $EB_CONFIG_DOCKER_CURRENT_APP_FILE ]; then
EB_CONFIG_DOCKER_CURRENT_APP=`cat $EB_CONFIG_DOCKER_CURRENT_APP_FILE | cut -c 1-12`
echo "Graceful shutdown on app container: $EB_CONFIG_DOCKER_CURRENT_APP"
else
echo "NO CURRENT APP TO GRACEFUL SHUTDOWN FOUND"
exit 0
fi
# give graceful kill command to all running .js files (not stats!!)
docker exec $EB_CONFIG_DOCKER_CURRENT_APP sh -c "ps x -o pid,command | grep -E 'workers' | grep -v -E 'forever|grep' " | awk '{print $1}' | xargs docker exec $EB_CONFIG_DOCKER_CURRENT_APP kill -s SIGINT
echo "sent kill signals"
# wait (max 5 mins) until processes are done and terminate themselves
TRIES=100
until [ $TRIES -eq 0 ]; do
PIDS=`docker exec $EB_CONFIG_DOCKER_CURRENT_APP sh -c "ps x -o pid,command | grep -E 'workers' | grep -v -E 'forever|grep' " | awk '{print $1}' | cat`
echo TRIES $TRIES PIDS $PIDS
if [ -z "$PIDS" ]; then
echo "finished graceful shutdown of docker $EB_CONFIG_DOCKER_CURRENT_APP"
exit 0
else
let TRIES-=1
sleep 3
fi
done
echo "failed to graceful shutdown, please investigate manually"
exit 1
gracefulshutdown_01.config is a small util that backups the original flip01 and deletes (if exists) our custom script.
gracefulshutdown_02.config is where the magic happens.
it creates a 05gracefulshutdown enact script and makes sure flip will happen afterwards by renaming it to 10flip.
05gracefulshutdown, the custom script, does this basically:
find current running docker
find all processes that need to be sent a SIGINT (for us its processes with 'workers' in its name
send a sigint to the above processes
loop:
check if processes from before were killed
continue looping for an amount of tries
if tries are over, exit with status "1" and dont continue to 10flip, manual interference is needed.
this assumes you only have 1 docker running on the machine, and that you are able to manually hop on to check whats wrong in the case it fails (for us never happened yet).
I imagine it can also be improved in many ways, so have fun.

Celery doesn't restart subprocesses

I have an issue with celery deployment - when I restart it old subprocesses don't stop and continue to process some of jobs. I use supervisord to run celery. Here is my config:
$ cat /etc/supervisor/conf.d/celery.conf
[program:celery]
; Full path to use virtualenv, honcho to load .env
command=/home/ubuntu/venv/bin/honcho run celery -A stargeo worker -l info --no-color
directory=/home/ubuntu/app
environment=PATH="/home/ubuntu/venv/bin:%(ENV_PATH)s"
user=ubuntu
numprocs=1
stdout_logfile=/home/ubuntu/logs/celery.log
stderr_logfile=/home/ubuntu/logs/celery.err
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
Here is how celery processes look:
$ ps axwu | grep celery
ubuntu 983 0.0 0.1 47692 10064 ? S 11:47 0:00 /home/ubuntu/venv/bin/python /home/ubuntu/venv/bin/honcho run celery -A stargeo worker -l info --no-color
ubuntu 984 0.0 0.0 4440 652 ? S 11:47 0:00 /bin/sh -c celery -A stargeo worker -l info --no-color
ubuntu 985 0.0 0.5 168720 41356 ? S 11:47 0:01 /home/ubuntu/venv/bin/python /home/ubuntu/venv/bin/celery -A stargeo worker -l info --no-color
ubuntu 990 0.0 0.4 167936 36648 ? S 11:47 0:00 /home/ubuntu/venv/bin/python /home/ubuntu/venv/bin/celery -A stargeo worker -l info --no-color
ubuntu 991 0.0 0.4 167936 36648 ? S 11:47 0:00 /home/ubuntu/venv/bin/python /home/ubuntu/venv/bin/celery -A stargeo worker -l info --no-color
When I run sudo supervisorctl restart celery it only stops first process python ... honcho one and all the other ones continue. And if I try to kill them they continue (kill -9 works).
This appeared to be a bug with honcho. I ended up with workaround of starting this script from supervisor:
#!/bin/bash
source /home/ubuntu/venv/bin/activate
exec env $(cat .env | grep -v ^# | xargs) \
celery -A stargeo worker -l info --no-color

Why is sleep needed after fabric call to pg_ctl restart

I'm using Fabric to initialize a postgres server. I have to add a "sleep 1" at the end of the command or the postgres server processes die without explanation or an entry in the log:
sudo('%(pgbin)s/pg_ctl -D %(pgdata)s -l /tmp/pg.log restart && sleep 1' % env, user='postgres')
That is, I see this output on the terminal:
[dbserv] Executing task 'setup_postgres'
[dbserv] run: /bin/bash -l -c "sudo -u postgres /usr/lib/postgresql/9.1/bin/pg_ctl -D /data/pg -l /tmp/pg.log restart && sleep 1"
[dbserv] out: waiting for server to shut down.... done
[dbserv] out: server stopped
[dbserv] out: server starting
Without the && sleep 1, there's nothing in /tmp/pg.log (though the file is created), and no postgres processes are running. With the sleep, everything works fine.
(And if I execute the same command directly on target machine's command line, it works fine without the sleep.)
Since it's working, it doesn't really matter, but I'm asking anyway: Does someone know what the sleep is allowing to happen and why?
You might try also using the pty option set it to false and see if it's related to how fabric handles pseudo-ttys.

memcached restart starts a new memcached and doesn't kill the old one

I'm running my rails app in production mode and in staging mode on the same server, in different folders. They both use memcache-client which requires memcached to be running.
As yet i haven't set up a deploy script and so just do a deploy manually by sshing onto the server, going to the appropriate directory, updating the code, restarting memcached and then restarting unicorn (the processes which actually run the rails app). I restart memcached thus:
sudo /etc/init.d/memcached restart &
This starts a new memcached, but it doesn't kill the old one: check it out:
ip-<an-ip>:test.millionaire[subjects]$ ps afx | grep memcache
11176 pts/2 S+ 0:00 | \_ grep --color=auto memcache
10939 pts/3 R 8:13 \_ sudo /etc/init.d/memcached restart
7453 ? Sl 0:00 /usr/bin/memcached -m 64 -p 11211 -u nobody -l 127.0.0.1
ip-<an-ip>:test.millionaire[subjects]$ sudo /etc/init.d/memcached restart &
[1] 11187
ip-<an-ip>:test.millionaire[subjects]$ ps afx | grep memcache
11187 pts/2 T 0:00 | \_ sudo /etc/init.d/memcached restart
11199 pts/2 S+ 0:00 | \_ grep --color=auto memcache
10939 pts/3 R 8:36 \_ sudo /etc/init.d/memcached restart
7453 ? Sl 0:00 /usr/bin/memcached -m 64 -p 11211 -u nobody -l 127.0.0.1
[1]+ Stopped sudo /etc/init.d/memcached restart
ip-<an-ip>:test.millionaire[subjects]$ sudo /etc/init.d/memcached restart &
[2] 11208
ip-<an-ip>:test.millionaire[subjects]$ ps afx | grep memcache
11187 pts/2 T 0:00 | \_ sudo /etc/init.d/memcached restart
11208 pts/2 R 0:01 | \_ sudo /etc/init.d/memcached restart
11218 pts/2 S+ 0:00 | \_ grep --color=auto memcache
10939 pts/3 R 8:42 \_ sudo /etc/init.d/memcached restart
7453 ? Sl 0:00 /usr/bin/memcached -m 64 -p 11211 -u nobody -l 127.0.0.1
What might be causing it is there's another memcached running - see the bottom line. I'm mystified as to where this is from and my instinct is to kill it but i thought i'dd better check with someone who actually knows more about memcached than i do.
Grateful for any advice - max
EDIT - solution
I figured this out after a bit of detective work with a colleague. In the rails console i typed CACHE.stats which prints out a hash of values, including "pid", which i could see was set to the instance of memcached which wasn;t started with memcached restart, ie this process:
7453 ? Sl 0:00 /usr/bin/memcached -m 64 -p 11211 -u nobody -l 127.0.0.1
The memcached control script (ie that defines the start, stop and restart commands), is in /etc/init.d/memcached
A line in this says
# Edit /etc/default/memcached to change this.
ENABLE_MEMCACHED=no
So i looked in /etc/default/memcached, which was also set to ENABLE_MEMCACHED=no
So, this was basically preventing memcached from being stopped and started. I changed it to ENABLE_MEMCACHED=yes, then it would stop and start fine. Now when i stop and start memcached, it's the above process, the in-use memcached, that's stopped and started.
try using:
killall memcached