how do I gracefully kill stale server process postgres - postgresql

Occasionally in our lab, our postgres 8.3 database will get orphaned from the pid file, and we get this message when trying to shut down the database:
Error: pid file is invalid, please manually kill the stale server process postgres
When this happens, we immediately do a pg_dump so we can restore the database later. But, if we just kill -9 the orphan postgres process and then start it, the database starts only with the data from the last successful shutdown. But if you psql to it before killing it, the data is all available, thus why the pg_dump works.
Is there a way to gracefully shutdown the orphaned postgres process so we don't have to go through the pg_dump and restore? Or is there a way to have the database recover after killing the orphaned process?

According to the documentation you could either send SIGTERM or SIGQUIT. SIGTERM is preferred. Either way never use SIGKILL (as you know from personal experience).
Edit: on the other hand what you experience is not normal and could indicate a mis-configuration or a bug. Please, ask for assistance on the pgsql-admin mailing list.

Never use kill -9.
And I would strongly advice you to try to figure out exactly how this happens. Where exactly does the error message come from? It's not a PostgreSQL error message. Are you by any chance mixing different ways to start/stop the server (initscripts sometimes and pg_ctl sometimes, for example)? That could probably cause things to go out of sync.
But to answer the direct question - use a regular kill (no -9) on the process to shut it down. Make sure you kill all the postgres processes if there is more than one running.
The database will always do an automatic recovery whenever it's shut down. This shuold happen with kill -9 as well - any data that is committed should be up there. This almost sounds like you have two different data directories mounted on top of each other or something like that - this has been a known issue with NFS at least before.

I use a script like the following run by cron every minute.
#!/bin/bash
DB="YOUR_DB"
# Here's a snippet to watch how long each connection to the db has been open:
# watch -n 1 'ps -o pid,cmd,etime -C postgres | grep $DB'
# This program kills any postgres workers/connections to the specified database
# which have been running for 2 or 3 minutes. It actually kills workers which
# have an elapsed time including "02:" or "03:". That'll be anything running
# for at least 2 minutes and less than 4. It'll also cover anything that
# managed to stay around until an hour and 2 or 3 minutes, etc.
#
# Run this once a minute via cron and it should catch any connection open
# between 2 and 3 minutes. You can temporarily disable it if if you need to run
# a long connection once in a while.
#
# The check for "03:" is in case there's a little lag starting the cron job and
# the timing is really bad and it never sees a worker in the 1 minute window
# when it's got "02:".
old=$(ps -o pid,cmd,etime -C postgres | grep "$DB" | egrep '0[23]:')
if [ -n "$old" ]; then
echo "Killing:"
echo "$old"
echo "$old" | awk '{print $1}' | xargs -I {} kill {}
fi

Related

Postgresql needs to run command every time for startup

I'm on RHEL6 and have installed PostgreSQL. Now whenever I want to start development I need to run the following command to start PostgreSQL
/opt/PostgreSQL/9.5/bin/postgres -D /opt/PostgreSQL/9.5/data
Then it halts for that terminal and I need to start another session of postgresql into another terminal. Whats wrong in Installation? and How to rectify this problem?
Image of practical for better understanding
https://www.postgresql.org/docs/current/static/app-postgres.html
The utility command pg_ctl can be used to start and shut down the
postgres server safely and comfortably.
If at all possible, do not use SIGKILL to kill the main postgres
server. Doing so will prevent postgres from freeing the system
resources (e.g., shared memory and semaphores) that it holds before
terminating. This might cause problems for starting a fresh postgres
run.
use pg_ctl -D /opt/PostgreSQL/9.5/data start instead, otherwise one day your database will tell you about corrupted data

Is (sudo) service postgresql restart a clean shutdown

I know database indexes can become corrupted if the server crashes. If I do:
sudo service postgresql restart
can that cause the same kind of corruption as a server crash?
That depends on the system I belive. You should look into the script to check the actual command issued. Eg. here we see, that restart is equal to stop & start. then checking stop we see it does killproc postmaster and removes pid. From the man killproc sends SIGTERM if otherly not specified. By the documentation
SIGTERM
This is the Smart Shutdown mode. After receiving SIGTERM, the
server disallows new connections, but lets existing sessions end their
work normally. It shuts down only after all of the sessions terminate.
If the server is in online backup mode, it additionally waits until
online backup mode is no longer active. While backup mode is active,
new connections will still be allowed, but only to superusers (this
exception allows a superuser to connect to terminate online backup
mode). If the server is in recovery when a smart shutdown is
requested, recovery and streaming replication will be stopped only
after all regular sessions have terminated.
So in presented case, indexes should survive. But you definetely should watch your /etc/init.d/ script to be sure.

Postgres Restore taking ages (days)

I've been working on a backup / restore for a Postgres server for quite a while now. It's an Azure Windows Virtual Machine (Windows server 2012).
The database isn't that big (near 5Gb), but the restore takes (literally) days. I've tried (several) times with different settings to restore the database, but all of the times it took days to "finish" (it didn't finish - I killed the process because I didn't see anything happening, that's why I'm running the job verbose this time).
I've now been running the job (verbose one) for 5 days straight and still it isn't finished. It's inserting rows (or at least displaying the rows), but it's still running.
Currently I'm using this command:
pg_restore -Fc -v --jobs=2 --host=localhost [filename]
Jobs is set at 2 because it's a dual core server. Like I said: different settings still very very slow.
What is wrong - should I be "tuning" the database before the restore or what?
This is a test-server setup. When we're doing with the test the current data need to be restored (again) to the new production server: we can't afford to wait days on end before the production environment comes online.
It's not pushing errors into the logs or something - it just keeps running and running and running...
So what am I doing wrong?

Why is pgsql sometimes not listening for the first few seconds after start even though "service postgres status" returns OK?

I have a web app that uses postgresql 9.0 with some plperl functions that call custom libraries of mine. So, when I want to start fresh as if just released, my build process for my development area does basically this:
dumps data and roles from production
drops dev data and roles
restores production data and roles onto dev
restarts postgresql so that any cached versions of my custom libraries are flushed and newly-changed ones will be picked up
applies my dev delta
vacuums
Since switching my app's stack from win32 to CentOS, I now sometimes (i.e., it seems, only if and only if I haven't run this build process in "a while"--perhaps at least a day) get an error when my build script tries to apply the delta:
psql: could not connect to server: No such file or directory
Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
Specifically, what's failing to execute at the shell level is this:
psql --host=$host -U $superuser -p $port -d $db -f "$delta_filename.sql"
If, immediately after seeing this error, I try to connect to the dev database with psql, I can do so with no trouble. Also, if I just re-run the build script, it works fine the second time, every time I've encountered this. Acceptable workaround, but is the underlying cause something to be concerned about?
So far in my attempts to debug this, I inserted a step just after the server restart (which of course reports OK shutdown, OK startup) whereby I check the results of service postgresql-dev status in a loop, waiting 2 seconds between tries if it fails. On my latest build script run, said loop succeeds on the first try--status returns "is running"--but then applying the delta still fails with the above connection error. Again, second try succeeds, as does connecting via psql outside the script just after it fails.
My next debug attempt was to sleep for 5 seconds before the first status check and see what happens. So far this seems to solve the problem.
So why is pgsql not listening on the socket after it starts [OK] and also has status running ok, for up to 5 seconds, unless it has "recently" been restarted?
The status check only checks whether the process is running. It doesn't check whether you can connect. There can be any amount of time between starting the process and the process being ready to accept connections. It's usually a few seconds, but it could be longer. If you need to cope with this, you need to script it so that it checks whether it is possible to connect before proceeding. You could argue that the CentOS package should do this for you, but it doesn't.
Actually, I think in your case there is no reason to do a full restart. Unless you are loading libraries with shared_preload_libraries, it is sufficient to restart the connection to pick up new libraries.

how to terminate postgresql 8.3 sessions?

I am trying to terminate a session (a specific session or all sessions, doesnt matter) in postgresql 8.3 and am having trouble doing that. I know in newer versions (8.4 and later) there is a pg_terminate_backend command that will do the trick but this is not available in postgresql 8.3. If I use pg_stat_activity, I can see all the sessions that are active but have no way of terminating them.
The solution does not have to necessarily be sql commands but I would like it to be independent of the OS that is being used (i.e. no DOS/UNIX commands).
Stopping and starting the postgres service in windows services works perfectly but this is an OS specific approach. Using 'pg_ctl restart -D DATA_DIR' does not stop the service however. Actually using pg_ctl to try and restart the service at the time I am trying to do it causes some weird behavior. If there is a way I can somehow use pg_ctl to force shutdown the process like I assume windows does, then I can probably use that.
Anyways, I am looking for a way to terminate one or all sessions in postgresql 8.3 that is not platform specific. Any help would be great!
You can use pg_cancel_backend():
select pg_cancel_backend(55555);
You can use this with pg_stat_activity. For example:
select pg_cancel_backend(procpid)
from pg_stat_activity where current_query='<IDLE>';
If that doesn't work you can try this:
pg_ctl kill -TERM pid
That should be OS independent. I'm not sure if there's any real difference in behaviour.
Other than that you could try stopping and starting the server, but you indicated odd behaviour from that. (What kind?)
Finally, for an OS specific option, on linux you can of course try using the kill command. kill -15 (SIGTERM) is safe; that's basically what pg_terminate_backend uses: kill -15 <pid>. kill -9 is moderately unsafe and you should use it only as a last resort.
su - posgres
psql
SELECT pg_terminate_backend(pg_stat_activity.procpid) FROM pg_stat_activity WHERE procpid <> pg_backend_pid() AND datname = 'dbname' ;
drop database "database name";