Transaction time out workaround for PostgreSQL - postgresql

AFAIK, PostgreSQL 8.3 does not support transaction time out. I've read about supporting this feature in the future and there's some discussion about it. However, for specific reasons, I need a solution for this problem. So what I did is a script that runs periodically:
1) Based on locks and activity, query in order to retrieve processID of the transactions that is taking too long, and keeping the oldest (trxTimeOut.sql):
SELECT procpid
FROM
(
SELECT DISTINCT age(now(), query_start) AS age, procpid
FROM pg_stat_activity, pg_locks
WHERE pg_locks.pid = pg_stat_activity.procpid
) AS foo
WHERE age > '30 seconds'
ORDER BY age DESC
LIMIT 1
2) Based on this query, kill the corresponding process (trxTimeOut.sh):
psql -h localhost -U postgres -t -d test_database -f trxTimeOut.sql | xargs kill
Although I've tested it and seems to work, I'd like to know if it's an acceptable approach or should I consider a different one?

PostgreSQL provides idle_in_transaction_session_timeout since version 9.6, to automatically terminate transactions that are idle for too long.
It's also possible to set a limit on how long a command can take, through statement_timeout, independently on the duration of the transaction it's in, or why it's stuck (busy query or waiting for a lock).
To auto-abort transactions that are stuck specifically waiting for a lock, see lock_timeout.
These settings can be set at the SQL level with commands like SET shown below, or can be set as defaults to a database with ALTER DATABASE, or to a user with ALTER USER, or to the entire instance through postgresql.conf.
SET statement_timeout=10000; -- time out after 10 seconds

Related

kill the long running queries automatically

I want to kill the queries which are running more then 2 hours in automatic way.
I tried creating trigger like below
create or replace function stop_query()
RETURNS trigger
language plpgsql
as $$
begin
with pid_tbl as
(
SELECT
pid
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '120 minutes';
)
select * from pid_tbl;
SELECT pg_cancel_backend(var_pid);
end;$$
CREATE TRIGGER stop_query
FOR EACH ROW EXECUTE FUNCTION stop_query();
please advice me how can i achieve this. is there any way I can achieve it without writing functions trigger
You don't need this trigger at all. As I mentioned in the comment, it should be enough for you to run one of these queries:
SET LOCAL statement_timeout='2 h';--applies only until the end of the current transaction within the current session
SET SESSION statement_timeout='2 h';--only in the current session/connection
ALTER ROLE your_user_name SET statement_timeout='2 h';--all new sessions of this user
ALTER DATABASE your_db_name SET statement_timeout='2 h';--all new sessions on this db
ALTER SYSTEM SET statement_timeout='2 h';--all new sessions on all dbs on this system
They all set the statement_timeout setting that's by default 0 (meaning "no limit") to '2 h' (which simply stand for "2 hours"). It's best to apply this only to the specific context where it's required, i.e. for a specific user that tends to run queries you don't want hanging for too long.
Documentation:
statement_timeout (integer)
Abort any statement that takes more than the specified amount of time. If log_min_error_statement is set to ERROR or lower, the statement that timed out will also be logged. If this value is specified without units, it is taken as milliseconds. A value of zero (the default) disables the timeout.
The timeout is measured from the time a command arrives at the server until it is completed by the server. If multiple SQL statements appear in a single simple-Query message, the timeout is applied to each statement separately. (PostgreSQL versions before 13 usually treated the timeout as applying to the whole query string.) In extended query protocol, the timeout starts running when any query-related message (Parse, Bind, Execute, Describe) arrives, and it is canceled by completion of an Execute or Sync message.
Setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions.
If you try to use unsupported units, you'll get a hint with your error:
ERROR: invalid value for parameter "statement_timeout": "2 hours"
HINT: Valid units for this parameter are "us", "ms", "s", "min", "h", and "d".
Which are microseconds, milliseconds, seconds, minutes, hours and days respectively.

Postgres statement_timeout does not work as expected

I am working on a use case where in my current api application I need to kill any query that has been running more than 30 sec (as my server has a timeout of 30 sec but the query keeps running on Postgres).
So after some finding i came across the statement_timeout configuration in postgres. and implemented it in my sqlAlchemy code like this:
#contextmanager
def db_session():
"""Executes the query."""
import os
from my_aws import secretsmanager
secret_name = f'<my_secrey_key>'
secret = secretsmanager.get_secret(secret_name)
conn = f'{secret["dbname"]}://{secret["username"]}:{secret["password"]}#' \
f'{secret["host"]}:{secret["port"]}/{secret["dbname"]}'
eng = create_engine(
conn,
connect_args={'options': '-c statement_timeout=30s'})
connection = eng.connect()
db_session = scoped_session(sessionmaker(autocommit=False, autoflush=True, bind=eng))
yield db_session
db_session.close()
connection.close()
So my expectation here was that any query whcih cannot complete within 30s should timeout and return an error.
So when testing this.
I place a lock in one of my tables to delay my queries by doing this:
BEGIN WORK;
LOCK TABLE <schema>.<table_name> IN ACCESS EXCLUSIVE mode;
then i trigger an API call which queries the locked table (from above). this api does not repond as expected becuase the query is unable to execute witin 30 sec.
however the query does not terminate and i can still see it running in the pg_stat_activity
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' and usename='api_user'
ORDER BY query_start desc;
So the above query gives the reponse:
pid |age |usename |query
----|---------------|--------|-------------------------------
3334|00:05:17.962059|api_user|SELECT count(*) AS count_1 ¶FRO
1752|00:05:22.577919|api_user|COMMIT
1754|00:05:22.627446|api_user|COMMIT
3270|00:05:22.791417|api_user|SELECT count(*) AS count_1 ¶FRO
1755|00:05:23.058261|api_user|COMMIT
1753|00:05:23.123582|api_user|COMMIT
1689|00:05:24.149163|api_user|SELECT count(*) AS count_1 ¶FRO
1759|00:05:24.579171|api_user|SELECT DISTINCT sum(public.dema
1760|00:05:24.631371|api_user|SELECT count(*) AS count_1 ¶FRO
As you can see that the query on the locked tables are still waiting from more than 5 min.
Is there something wrong with my understanding of statement_timeout here.
FYI: I can see that the timeout is set on the postgres as the result of this query:
show statement_timeout;
Result:
statement_timeout|
-----------------|
30s |
I recommend that you set the parameter in postgresql.conf (then it is valid for the whole PostgreSQL server) or with ALTER DATABASE (then it is valid only for new connections to that database).
If that does not do the trick, the setting must be overridden somewhere. To debug, run the following SQL statement using SQLAlchemy:
SELECT current_setting('statement_timeout');
However, when I look at your query, perhaps everything is working anyway: add the state column to the pg_stat_activity query and check if the state is indeed active. Perhaps the query has already been canceled, and the state is idle or idle in transaction (aborted) (note that query shows the last query on that connection, which need not be active any more).
I think The statement_timeout should be a value in milliseconds. If you are really passing in 30s, that might be the wrong parameter value. Try using 30000 for 30 seconds.
eng = create_engine(
conn,
connect_args={'options': '-c statement_timeout=30000'})

Logging slow queries on Google Cloud SQL PostgreSQL instances

The company I work for uses Google Cloud SQL to manage their SQL databases in production.
We're having performance issues and I thought it'd be a good idea (among other things) to see/monitor all queries above a specific threshold (e.g. 250ms).
By looking at the PostgreSQL documentation I think log_min_duration_statement seems like the flag I need.
log_min_duration_statement (integer)
Causes the duration of each completed statement to be logged if the statement ran for at least the specified number of milliseconds. Setting this to zero prints all statement durations.
But judging from the Cloud SQL documentation I see that is only possible to set a narrow set of database flags (as in for each DB instance) but as you can see from here log_min_duration_statement is not among those supported flags.
So here comes the question. How do I log/monitor my slow PostgreSQL queries with Google Cloud SQL? If not possible then what kind of tool/methodologies do you suggest I use to achieve a similar result?
April 3, 2019 UPDATE
It is now possible to log slow queries on Google Cloud SQL PostgreSQL instances, see https://cloud.google.com/sql/docs/release-notes#april_3_2019:
database_flags = [
{
name = "log_min_duration_statement"
value = "1000"
},
]
Once you enable log_min_duration_statement, you can view the logs using Stackdriver logging. Select Cloud SQL Database -> cloudsql.googleapis.com/postgres.log and you will see the log like this.
[103402]: [9-1] db=cloudsqladmin,user=cloudsqladmin LOG: duration: 11.211 ms statement: [YOUR SQL HERE]
References:
Full list of supported flags (CTRL+F for log_min_duration_statement): https://cloud.google.com/sql/docs/postgres/flags#postgres-l
Issue tracker: https://issuetracker.google.com/issues/74578509#comment54
PostgreSQL docs: https://www.postgresql.org/docs/9.6/runtime-config-logging.html#GUC-LOG-MIN-DURATION-STATEMENT
The possibility of monitoring slow PostgreSQL queries for Cloud SQL instances is currently not available. As you comment, the log_min_duration_statement flag is currently not supported by Cloud SQL.
Right now, work is being made on adding this feature to Cloud SQL, and you can keep track on the progress made through this link. You can click on the star icon on the top left corner to get email notifications whenever any significant progress has been achieved.
There is a way to log slow queries through the pg_stat_statements extension which is supported by Cloud SQL.
Since Cloud SQL doesn't grant superuser right to any of the users you need to use some workaround.
First, you need to enable the extension with
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
then you can check slow queries with a query like
SELECT pd.datname,
us.usename,
pss.userid,
pss.query AS SQLQuery,
pss.rows AS TotalRowCount,
(pss.total_time / 1000) AS TotalSecond,
((pss.total_time / 1000) / calls) as TotalAverageSecond
FROM pg_stat_statements AS pss
INNER JOIN pg_database AS pd
ON pss.dbid = pd.oid
INNER JOIN pg_user AS us
ON pss.userid = us.usesysid
ORDER BY TotalAverageSecond DESC
LIMIT 10;
As postgres user you can have a look on all slow queries, but since the user is not superuser you will see <insufficient privilege> on all other users' queries.
To get around this limitation you can install the extension on other databases too (normally only postgres user has rigths to install extensions) and you can check the query texts with the owner of the db.
Not ideal by any measure, but what we do is run something like this on a cron once a minute and log out the result:
SELECT EXTRACT(EPOCH FROM now() - query_start) AS seconds, query
FROM pg_stat_activity
WHERE state = 'active' AND now() - query_start > interval '1 seconds' AND query NOT LIKE '%pg_stat_activity%'
ORDER BY seconds DESC LIMIT 20
You'd need to fiddle with the query to get millisecond granularity, and even then it'll only catch queries that overlap with your cron frequency, but probably better than nothing?

Postgresql: How to repeat query as soon as finished?

Let's say I have a query like so:
SELECT * FROM a WHERE a.Category = 'liquid' ORDER BY a.MeasurementTime DESC;
and I want to see the results coming into the database 'live'.
How can I write a query for Postresql which will repeat as soon as the query finishes?
You can use the \watch n command in the terminal to re-execute the query every n seconds.
Example:
postgre=# SELECT * FROM TABLE WHERE CONDITION
postgre=# \watch 5
-- now the "SELECT * FROM TABLE WHERE CONDITION" is re-executed every 5 seconds
You can't see them 'live'. Queries complete before returning to calling environment.
You could wrap this in a cron job ( depending on your environment ) or similar scheduler and have them run every minute, or a function and add that to pgagent to be run every minute.
To have a dml statement constantly running is not really a good idea and i would not recommend it for performance and table management purposes.
however...
Within a function you can create a loop with a wait clause using pg_sleep and just no break clause, but really a job is the best way to go.
watch -n1 'psql -h {ip} {db} {user} -c "select * from condition;"'
Make sure that you set the password of the {user} inside an environment variable:
Linux> export PGPASSWORD="password"

PostgreSQL - REINDEX still working even after two hours

I have started REINDEX on my PostgreSQL database. It can be visible in GUI that it processed a number of tables, and then stop responding. It looks like it is still working, even after two hours. The GUI is not responsive and its last row says: "NOTICE: table public.res_request_history" was reindexed."
Can I safely stop REINDEX? What can I do to actually make REINDEX work?
Thanks.
Yes, you can use pg_cancel_backend(pid). PID you can find executing 'select pg_stat_activity()'.
For example:
--Will display running queries and corresponding pid
SELECT query, pid FROM pg_stat_activity;
--You can then cancel one of them by calling this method with its pid
SELECT pg_cancel_backend(<pid>);