kill the long running queries automatically - postgresql

I want to kill the queries which are running more then 2 hours in automatic way.
I tried creating trigger like below
create or replace function stop_query()
RETURNS trigger
language plpgsql
as $$
begin
with pid_tbl as
(
SELECT
pid
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '120 minutes';
)
select * from pid_tbl;
SELECT pg_cancel_backend(var_pid);
end;$$
CREATE TRIGGER stop_query
FOR EACH ROW EXECUTE FUNCTION stop_query();
please advice me how can i achieve this. is there any way I can achieve it without writing functions trigger

You don't need this trigger at all. As I mentioned in the comment, it should be enough for you to run one of these queries:
SET LOCAL statement_timeout='2 h';--applies only until the end of the current transaction within the current session
SET SESSION statement_timeout='2 h';--only in the current session/connection
ALTER ROLE your_user_name SET statement_timeout='2 h';--all new sessions of this user
ALTER DATABASE your_db_name SET statement_timeout='2 h';--all new sessions on this db
ALTER SYSTEM SET statement_timeout='2 h';--all new sessions on all dbs on this system
They all set the statement_timeout setting that's by default 0 (meaning "no limit") to '2 h' (which simply stand for "2 hours"). It's best to apply this only to the specific context where it's required, i.e. for a specific user that tends to run queries you don't want hanging for too long.
Documentation:
statement_timeout (integer)
Abort any statement that takes more than the specified amount of time. If log_min_error_statement is set to ERROR or lower, the statement that timed out will also be logged. If this value is specified without units, it is taken as milliseconds. A value of zero (the default) disables the timeout.
The timeout is measured from the time a command arrives at the server until it is completed by the server. If multiple SQL statements appear in a single simple-Query message, the timeout is applied to each statement separately. (PostgreSQL versions before 13 usually treated the timeout as applying to the whole query string.) In extended query protocol, the timeout starts running when any query-related message (Parse, Bind, Execute, Describe) arrives, and it is canceled by completion of an Execute or Sync message.
Setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions.
If you try to use unsupported units, you'll get a hint with your error:
ERROR: invalid value for parameter "statement_timeout": "2 hours"
HINT: Valid units for this parameter are "us", "ms", "s", "min", "h", and "d".
Which are microseconds, milliseconds, seconds, minutes, hours and days respectively.

Related

Caused by: com.amazon.redshift.util.RedshiftException: ERROR: Query (659514) cancelled on user's request

I am trying to save data in redshift using java code through multirow insert and getting below error.
Caused by: com.amazon.redshift.util.RedshiftException: ERROR: Query (659514) cancelled on user's request.
As per the official documentation of AWS it is mentioned
The statement_timeout value is the maximum amount of time that a query can run before Amazon Redshift terminates it. When a statement timeout is exceeded, then queries submitted during the session are aborted with the following error message:
ERROR: Query (150) cancelled on user's request
To verify whether a query was aborted because of a statement timeout, run following query:
select * from SVL_STATEMENTTEXT where text ilike '%set%statement_timeout%to%' and pid in (select pid from STL_QUERY where query = <queryid>);
I tried to run the above query with queryid but it doesn't give any result. Also statement timeout is 0 which turn off limitation of timeout.
what might be the problem?
Checking for statement timeouts is a good path to look down. The query you provided only looks for a statement_timeout set by the user with a SET command. This is not the only way this parameter can be set nor is it the only timeout. This parameter can be for all connections by a user through the ALTER USER command. If you think this is the parameter causing the issue you can "SET STATEMENT_TIMEOUT TO 0;" early in your session to remove this limit.
If this doesn't fix the issue then the problem may be elsewhere. WLM settings can timeout queries so check STL_WLM_RULE_ACTION system table to see if any were applied to your query.
Statement timeouts can also be set at the cluster level through the parameter group. So you may want to check the parameter group for a statement_timeout setting.

How to not execute INSERT in read-only transaction

Postgres server is in hot standbuy mode.
Asynchronou streaming binary replication is used.
Command like
INSERT INTO logfile (logdate) values (current_date)
Causes error
cannot execute INSERT in a read-only transaction.
Maybe it should be changed to
INSERT INTO logfile (logdate)
SELECT current_date
WHERE ???
What where condition should used ?
It should work starting at Postgres 9.0
If direct where clause is not possible, maybe some plpgsql function can used in where.
Maybe
show transaction_read_only
result should captured or some function can used.
Alternately application can determine if database is read-only in startup. Should show transaction_read_only result used for this.
Running INSERT on a standby server is not possible in pure (non-procedural) SQL because when the server is in standby mode, all the data-modification queries are rejected in planning phase, before it's executed.
It's possible with conditionals in PL/PgSQL.
DO $code$
BEGIN
IF NOT pg_is_in_recovery() THEN
INSERT INTO logfile (logdate) VALUES (current_date);
END IF;
END;
$code$;
However, it's probably not recommended - it's usually better to test pg_is_in_recovery() once (in application code) and then act accordingly.
I'm using pg_is_in_recovery() system function instead of transaction_read_only GUC because it's not exactly the same thing. But if you prefer that, please use:
SELECT current_setting('transaction_read_only')::bool
More info: DO command, conditionals in PL/PgSQL, system information functions.

INSERT OPENQUERY timeout

I'm trying to execute and insert query to a linked server in SQL Server.
For that I'm using INSERT INTO OPENQUERY statement.
The linked server is an Apache HIVE using Cloudera ODBC Provider.
The insert operation takes around 1 minute in my setup when performed from HIVE client.
However, SQL INSERT always times out after 30 seconds.
I set the Query Timeout parameter to 0 but it seems to be not affecting INSERT statement, however, it is working fine for SELECT statements taking longer time.
Is this a known limitation?
Is there a way to change the timeout for the insert statement when using OPENQUERY?
EDIT
I would like to clarify the setup I'm working with.
---------- ---------------------- ---------------
| MS SQL | => Linked Server => | Hive ODBC Provider | => | Hive Server |
---------- ---------------------- ---------------
In Hive, I have a table called calc_result where I would like to periodically store calculation results from the SQL server. For example, I try to insert using a query like this.
insert openquery(HIVE, 'select timestamp timestamp , tag tag, value value from calc_result')
values('2019-04-22 11:50:41', 'test',2.0)
The insert operation is captured correctly by HIVE server and a MapReduce job starts. However, the job will be killed after 30 seconds due to timeout.
The SQL server will show the below error message.
OLE DB provider "MSDASQL" for linked server "HIVE" returned message "[Cloudera][Hardy] (72) Query execution timeout expired.".
However, SELECT OPENQUERY works fine and would follow Query Timeout settings of the linked server (Which is set to 0 in this case).
Edit that is completely different use case from what I've imagined. In that case there should not be any difference in select/insert.
As you have configured your linked server timeout, there is a second place in the linked server properties you can check a Command Timeout setting in the provider string:
Other option that comes into my mind is instance wide timout. Default set for 600 seconds (10 minutes) which is way above your 30 seconds. However, you can still try it to see if there is any impact.
For infinite wait:
sp_configure 'show advanced options',1
go
reconfigure
go
sp_configure 'remote query timeout (s)',0
go
reconfigure
go
I would try using SELECT INTO temporary table and then materializing it using regular INSERT INTO:
SELECT c1, c2
INTO #temp_tab
FROM OPENQUERY(mylinkedserver, 'SELECT c1, c2 FROM remote_table');
INSERT INTO normal_table(col1, col2)
SELECT c1, c2
FROM #temp_tab;
EDIT:
You could try wrapping it with transaction and remove aliases:
BEGIN TRAN;
insert openquery(HIVE, 'select timestamp, tag, value from calc_result')
values('2019-04-22 11:50:41', 'test',2.0);
COMMIT;
If necessary set up DTC: How can I enable distributed transactions for a linked server?
While I didn't find a way to change OPENQUERYtimeout from 30 seconds, I found that using EXEC AT Linked Server to work fine for INSERT queries while adhering to timeout settings.
I accidentally stumbled upon the solution in this 2009 blog post. Databases might not be my strength, but I feel SQL Server documentation can be improved. A simple page that lists possible ways to interact with a Linked Server could've saved me lots of retries.

How can pgsql sequence be undefined when I just called nextval?

I've got an app built on top of PostgresQL, which makes use of a custom sequence. I think I understand sequences pretty well by now: they are non-transactional, currval is defined only within the current session, etc. But I don't understand this:
2015-10-13 10:37:16 SQLSelect: SELECT nextval('commit_id_seq')
2015-10-13 10:37:16 commit_id_seq: 57
2015-10-13 10:37:16 SQLExecute: UPDATE bid SET is_archived=false,company_id=1436,contact_id=15529,...(etc)...,sharing_policy='' WHERE id = 56229
2015-10-13 10:37:16 ERROR: ERROR: currval of sequence "commit_id_seq" is not yet defined in this session
CONTEXT: SQL statement "INSERT INTO history (table_name, record_id, sec_user_id, created, action, notes, status, before, after, commit_id)
SELECT TG_TABLE_NAME, rec.id, (SELECT id FROM sec_user WHERE name = CURRENT_USER), now(), SUBSTR(TG_OP,1,1), note, stat, oldH, newH, currval('commit_id_seq')"
PL/pgSQL function log_to_history() line 28 at SQL statement
[3]
We log every call to the database, and in the case of the SELECT nextval, I also log the result. The above are the exact calls, except that I trimmed the UPDATE statement (because the original is really long).
So, you can see that we just called nextval on the sequence, got a reasonable number back, and then we do an UPDATE that invokes a trigger function that attempts to use currval on that sequence... and it fails, claiming currval is not defined.
Note that this doesn't usually happen, but once it does start happening, it does so consistently (perhaps until the user disconnects from the DB).
How can this be? And what can I do about it?
Your UPDATE statement obviously calls a trigger. The most plausible cause of this error is that the trigger function is in a different schema from where the sequence is defined and the schema of the sequence is not in the search_path. That gives you two options to resolve this:
Make the schema of the sequence visible to the trigger function using SET search_path TO .... Note that this will make all objects in the schema of the sequence visible, which may be something of a security risk, depending on your database design.
Schema-qualify the sequence name in the trigger function: currval('my_schema.commit_id_seq').
Another plausible cause is connection pooling at your application end. Log the "session ID" (really just the starting time and pid of the current session) by adding %c to your log_line_prefix() parameter in postgresql.conf. In PostgreSQL every command runs in its own transaction unless a transaction is explicitly established. Connection pooling software also works at the transaction level (i.e. you start a transaction and then your connection will stay open until you close it, outside of a transaction there are no guarantees about session persistence). If that is the case you can wrap your entire set of commands in a BEGIN ... COMMIT block (you should probably use a specific call from your pooling software), or better yet, change your code to not depend on a previous nextval() call.

Transaction time out workaround for PostgreSQL

AFAIK, PostgreSQL 8.3 does not support transaction time out. I've read about supporting this feature in the future and there's some discussion about it. However, for specific reasons, I need a solution for this problem. So what I did is a script that runs periodically:
1) Based on locks and activity, query in order to retrieve processID of the transactions that is taking too long, and keeping the oldest (trxTimeOut.sql):
SELECT procpid
FROM
(
SELECT DISTINCT age(now(), query_start) AS age, procpid
FROM pg_stat_activity, pg_locks
WHERE pg_locks.pid = pg_stat_activity.procpid
) AS foo
WHERE age > '30 seconds'
ORDER BY age DESC
LIMIT 1
2) Based on this query, kill the corresponding process (trxTimeOut.sh):
psql -h localhost -U postgres -t -d test_database -f trxTimeOut.sql | xargs kill
Although I've tested it and seems to work, I'd like to know if it's an acceptable approach or should I consider a different one?
PostgreSQL provides idle_in_transaction_session_timeout since version 9.6, to automatically terminate transactions that are idle for too long.
It's also possible to set a limit on how long a command can take, through statement_timeout, independently on the duration of the transaction it's in, or why it's stuck (busy query or waiting for a lock).
To auto-abort transactions that are stuck specifically waiting for a lock, see lock_timeout.
These settings can be set at the SQL level with commands like SET shown below, or can be set as defaults to a database with ALTER DATABASE, or to a user with ALTER USER, or to the entire instance through postgresql.conf.
SET statement_timeout=10000; -- time out after 10 seconds