Diagnosing application connection leaks in PostgreSQL - postgresql

I am investigating a connection leak in PostgreSQL and would like to gather some diagnostics.
In particular, I would like to associate some information with each connection I make and then be able to query it directly.
If I was designing this kind of thing in MS-SQL I would change my connection factory to execute an upsert against a diagnostics table after each connection is made, containing a ##spid, backtrace and other diag information.
Then to diagnose what is happening I could query sysprocesses joining to my diagnostics table on spid. This would give me clean application backtraces associated with each connection.
How can I achieve something similar on PostgreSQL?

PostgreSQL forks a new process to handle each connection. You can get the PID of this process easily:
SELECT pg_backend_pid();
This corresponds to the process ID visible in standard administration tools (top, ps, etc.). You can terminate connections using standard tools too (kill) or, with the appropriate permissions, by saying SELECT pg_terminate_backend(pid).
A list of current sessions is also accessible in the database:
SELECT * FROM pg_stat_activity;
One last note: PIDs are guaranteed to be unique as of the time the query is run, but will be re-used by the operating system eventually. You can ensure uniqueness by pairing the PID with the backend_start column of pg_stat_activity. For that matter, you might as well lump that in with your logging:
INSERT INTO log_table (pid, backend_start, message, backtrace)
SELECT procpid, backend_start, 'my message', 'my backtrace'
FROM pg_stat_activity
WHERE procpid=pg_backend_pid();

Related

How to cancel PostgreSQL query?

I am not very familiar with databases or SQL and wanted to make sure that I don't mess anything up. I did:
SELECT pid, state, usename, query FROM pg_stat_activity;
To check if I had any queries and there were several that had the state active. Do I just cancel them by doing:
select pg_cancel_backend(PID);
And this won't affect anything except the my queries, correct? I also wanted to figure out why those queries were still in the state active. I have a python file where I read in my sql file, but I stopped running the python file in the middle of reading my sql file. Is that possibly why it happened and why the states are still active?
Yes, this is what pg_cancel_backend(pid) is for. Why exactly the query is still running depends on a few things - could be waiting to grab a lock, or the query could just take a long time - but given the python processes that started the queries have exited, the connection is technically already closed, the PG backend process just hasn't noticed yet. It won't notice until the query completes and it tries to return the query status to the client, at which point it'll rollback the transaction when it sees the connection is no longer present.
The only effect pg_cancel_backend on the PIDs of those backends should have is to cause PG to notice the connection is closed immediately, rather than whenever the query completes.

Did this idle query cause my create unique index command to lock up?

I had an open connection from Matlab to my postgres server, and the last query was insert into table_name ..., which had state idle when I look at the processes on the database server using:
SELECT datname,pid,state,query FROM pg_stat_activity;
I tried to create a unique index on table_name and it was taking a very long time, with no discernable CPU usage for the pgadmin process. When I closed the connection from Matlab, the query dropped out of pg_stat_activity, and the unique index was immediately built.
Was the idle query the reason why it took so long to build the index?
No, a session in state “idle” does not hold any locks and cannot block anything. It's “idle in transaction” sessions that usually are the problem, because they tend to hold locks over a long time. Such sessions are almost always caused by an application bug.
To investigate problems like that, look at the view pg_locks. A hanging CREATE INDEX statement will usually hang acquiring the ACCESS EXCLUSIVE lock on the table to be indexed. You should see that by a value of FALSE in the granted column of pg_locks. Then figure out which backend (pid) has a lock on the table in question, an you have the culprit(s).

Dropping index concurrently PostgreSQL 9.1

DROP INDEX CONCURRENTLY first appeared in PSQL 9.2, but my server runs 9.1. Unfortunately that operation locks my app for an unpredictable amount of time, that's a very sad fact when doing it on production.
Is there a way to drop an index concurrently?
No, there's no simple workaround - otherwise it's rather less likely that DROP INDEX CONCURRENTLY would've been added in 9.2.
However, you can kill all sessions to force the drop to occur promptly.
What you want to avoid is the drop waiting on a partially acquired exclusive lock that prevents other transactions from proceeding, but doesn't let it proceed either, while it waits for other transactions to finish and release their share locks. The best way to ensure that happens is to kill all concurrent sessions.
So, in one session:
DROP INDEX my_index;
In another session, as a superuser, terminate all other sessions using the following untested query, which you'll need to adapt appropriately and test before use:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE procpid <> (
SELECT pid
FROM pg_stat_activity
WHERE query = 'DROP INDEX my_index;')
AND
procpid <> pg_backend_pid();
Your well-written and well-tested application will immediately reconnect and retry its queries without bothering the user or getting upset, because it knows that transient errors are something it has to cope with so it runs all its database access in retry loops. If it isn't well written then you'll discover that with a flood of error user-visible messages. If it's really not well written you'll have to restart it before it gets its head together, but it's rare to see apps that are quite that broken.
This is a heavy handed approach. You can be rather softer about it by joining against pg_locks and only terminating sessions that actually hold a lock on the relation you're interested in or the index you wish to modify. You get to enjoy writing that query, because my interest in working around limitations of older database versions is limited.

How to drop a Redshfit database with connected users

Is it possible to drop active connections to Redshift in order to drop a database?
In my development environment I find myself recreating the schema very frequently and if there happens to be some stray process connected to the database this fails. I know it's possible to do this with Postgresql using pg_terminate_backend, but this doesn't seem to work on Redshift.
Deleting rows from the STV_SESSIONS table isn't an option, either.
Any ideas?
http://docs.aws.amazon.com/redshift/latest/dg/PG_TERMINATE_BACKEND.html
Find the PIDs of current running queries
select pid from stv_recents where status = 'Running';
and terminate all the queries with
select pg_terminate_backend(pid);

All Queries in a given transaction

I am currently trying to debug "idle in transaction" scenarios in my application.I can find out the process id and transaction start time for a query with state 'idle in transaction' by looking at pg_state_activity.
select pid,query from pg_stat_activity where state='idle in transaction' OR state='idle'
Is there any way to identify list of all queries executed within a transaction corresponding to the query with 'idle in transaction'
Are you attempting to get a list of all previous statements run by the transaction that is now showing up as idle in transaction?
If so, there is no easy and safe way to do so at the SQL level. You should use CSV logging mode to analyze your query history and group queries up into transactions. Handily, you can do this with SQL, by COPYing the CSV into a PostgreSQL table for easier analysis.
Alternately, use ordinary text logs, and set a log_line_prefix that includes the transaction ID and process ID.
(I could've sworn I saw an extension for debugging that collected a query trace, but cannot find it now, and I'm not sure it's that useful as you must run a command on the problem connection to extract the data it's collected).