If calling PQfinish, is there any benefit of calling PQcancel? - postgresql

If calling PQFinish, which "closes the connection to the server", is there any benefit to calling PQCancel beforehand?
i.e. if the connection is closed, would the PostgreSQL server cancel any in-progress queries on this connection, just as it would with PQCancel?

I've given this a test with just a SELECT pg_sleep(120) query. Even after PQFinish is called, connecting in via psql and running
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc
still showed the query to be running.
So I think there is a benefit of running PQCancel - it would increase the chance of queries being cancelled and reducing resource use on the server.

Related

Postgresql - Unable to see exact Query or Transaction that blocked my actual Transaction

Note: I'm using DBeaver 21.1.3 as my PostgreSQL development tool.
For my testing, I have created a table:
CREATE TABLE test_sk1(n numeric(2));
then I have disabled Auto-Commit on my DBEaver to verify whether I can see the blocking query for my other transaction.
I have then executed an insert on my table:
INSERT INTO test_sk1(n) values(10);
Now this insert transaction is un-committed, which will block the table.
Then I have taken another new SQL window and tried alter command on the table.
ALTER TABLE test_sk1 ADD COLUMN v VARCHAR(2);
Now I see the alter transaction got blocked.
But when I verified in the locks, I see that this Alter transaction got blocked by "Show search_path;' transaction, where I'm expecting "INSERT..." transaction as blocking query.
I used below query to fetch the lock:
SELECT p1.pid,
p1.query as blocked_query,
p2.pid as blocked_by_pid,
p2.query AS blocking_query
FROM pg_stat_activity p1, pg_stat_activity p2
WHERE p2.pid IN (SELECT UNNEST(pg_blocking_pids(p1.pid)))
AND cardinality(pg_blocking_pids(p1.pid)) > 0;
Why is this happening on our databases?
Try using psql for such experiments. DBeaver and many other tools will execute many SQL statements that you didn't intend to run.
The query that you see in pg_stat_activity is just the latest query done by that process, not necessarily the one locking any resource.

Postgres statement_timeout does not work as expected

I am working on a use case where in my current api application I need to kill any query that has been running more than 30 sec (as my server has a timeout of 30 sec but the query keeps running on Postgres).
So after some finding i came across the statement_timeout configuration in postgres. and implemented it in my sqlAlchemy code like this:
#contextmanager
def db_session():
"""Executes the query."""
import os
from my_aws import secretsmanager
secret_name = f'<my_secrey_key>'
secret = secretsmanager.get_secret(secret_name)
conn = f'{secret["dbname"]}://{secret["username"]}:{secret["password"]}#' \
f'{secret["host"]}:{secret["port"]}/{secret["dbname"]}'
eng = create_engine(
conn,
connect_args={'options': '-c statement_timeout=30s'})
connection = eng.connect()
db_session = scoped_session(sessionmaker(autocommit=False, autoflush=True, bind=eng))
yield db_session
db_session.close()
connection.close()
So my expectation here was that any query whcih cannot complete within 30s should timeout and return an error.
So when testing this.
I place a lock in one of my tables to delay my queries by doing this:
BEGIN WORK;
LOCK TABLE <schema>.<table_name> IN ACCESS EXCLUSIVE mode;
then i trigger an API call which queries the locked table (from above). this api does not repond as expected becuase the query is unable to execute witin 30 sec.
however the query does not terminate and i can still see it running in the pg_stat_activity
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' and usename='api_user'
ORDER BY query_start desc;
So the above query gives the reponse:
pid |age |usename |query
----|---------------|--------|-------------------------------
3334|00:05:17.962059|api_user|SELECT count(*) AS count_1 ¶FRO
1752|00:05:22.577919|api_user|COMMIT
1754|00:05:22.627446|api_user|COMMIT
3270|00:05:22.791417|api_user|SELECT count(*) AS count_1 ¶FRO
1755|00:05:23.058261|api_user|COMMIT
1753|00:05:23.123582|api_user|COMMIT
1689|00:05:24.149163|api_user|SELECT count(*) AS count_1 ¶FRO
1759|00:05:24.579171|api_user|SELECT DISTINCT sum(public.dema
1760|00:05:24.631371|api_user|SELECT count(*) AS count_1 ¶FRO
As you can see that the query on the locked tables are still waiting from more than 5 min.
Is there something wrong with my understanding of statement_timeout here.
FYI: I can see that the timeout is set on the postgres as the result of this query:
show statement_timeout;
Result:
statement_timeout|
-----------------|
30s |
I recommend that you set the parameter in postgresql.conf (then it is valid for the whole PostgreSQL server) or with ALTER DATABASE (then it is valid only for new connections to that database).
If that does not do the trick, the setting must be overridden somewhere. To debug, run the following SQL statement using SQLAlchemy:
SELECT current_setting('statement_timeout');
However, when I look at your query, perhaps everything is working anyway: add the state column to the pg_stat_activity query and check if the state is indeed active. Perhaps the query has already been canceled, and the state is idle or idle in transaction (aborted) (note that query shows the last query on that connection, which need not be active any more).
I think The statement_timeout should be a value in milliseconds. If you are really passing in 30s, that might be the wrong parameter value. Try using 30000 for 30 seconds.
eng = create_engine(
conn,
connect_args={'options': '-c statement_timeout=30000'})

Is postgres caching my query?

I have a pretty simple snippet of Python code to run a Postgres query then send the results to a dashboard. I'm using psycopg2 to periodically run the same query. Let's not worry about the looping mechanism for now.
conn = psycopg2.connect(<connection info>)
while True:
# Run query and update dashboard
cur = conn.cursor()
cur.execute(q_tcc)
query_results = cur.fetchall()
update_dashboard(query_results)
time.sleep(5)
For reference, the actual query is :
q_tcc = """SELECT client_addr, application_name, count(*) cnt FROM pg_stat_activity
GROUP BY client_addr, application_name ORDER BY cnt DESC;"""
When I run this, I keep getting the same results even though they should be changing. If i move the psycopg2.connect() line into the loop with a conn.close(), everything works fine. According to the connection and cursor docs, however, I should be able to keep using the same cursor (and, therefore, connection) the whole time.
Does this mean Postgres is caching my query on a per-client-connection basis?
PostgreSQL doesn't have a query cache.
However, if you're using SERIALIZABLE isolation, you might be seeing the same snapshot of the data, since you appear to do all your queries within a single transaction.
You should really commit (or rollback) the transaction after each query in your loop. conn.rollback()

PostgreSQL - REINDEX still working even after two hours

I have started REINDEX on my PostgreSQL database. It can be visible in GUI that it processed a number of tables, and then stop responding. It looks like it is still working, even after two hours. The GUI is not responsive and its last row says: "NOTICE: table public.res_request_history" was reindexed."
Can I safely stop REINDEX? What can I do to actually make REINDEX work?
Thanks.
Yes, you can use pg_cancel_backend(pid). PID you can find executing 'select pg_stat_activity()'.
For example:
--Will display running queries and corresponding pid
SELECT query, pid FROM pg_stat_activity;
--You can then cancel one of them by calling this method with its pid
SELECT pg_cancel_backend(<pid>);

Transaction time out workaround for PostgreSQL

AFAIK, PostgreSQL 8.3 does not support transaction time out. I've read about supporting this feature in the future and there's some discussion about it. However, for specific reasons, I need a solution for this problem. So what I did is a script that runs periodically:
1) Based on locks and activity, query in order to retrieve processID of the transactions that is taking too long, and keeping the oldest (trxTimeOut.sql):
SELECT procpid
FROM
(
SELECT DISTINCT age(now(), query_start) AS age, procpid
FROM pg_stat_activity, pg_locks
WHERE pg_locks.pid = pg_stat_activity.procpid
) AS foo
WHERE age > '30 seconds'
ORDER BY age DESC
LIMIT 1
2) Based on this query, kill the corresponding process (trxTimeOut.sh):
psql -h localhost -U postgres -t -d test_database -f trxTimeOut.sql | xargs kill
Although I've tested it and seems to work, I'd like to know if it's an acceptable approach or should I consider a different one?
PostgreSQL provides idle_in_transaction_session_timeout since version 9.6, to automatically terminate transactions that are idle for too long.
It's also possible to set a limit on how long a command can take, through statement_timeout, independently on the duration of the transaction it's in, or why it's stuck (busy query or waiting for a lock).
To auto-abort transactions that are stuck specifically waiting for a lock, see lock_timeout.
These settings can be set at the SQL level with commands like SET shown below, or can be set as defaults to a database with ALTER DATABASE, or to a user with ALTER USER, or to the entire instance through postgresql.conf.
SET statement_timeout=10000; -- time out after 10 seconds