I'm trying to create indices through hikari by "create index concurrently" statement with postgresql 9.6
The create statement is blocked by another transaction which is working on another table and the transaction state is IIT(idle in transaction)
The code is creating the indices dynamically through hikari connection
pool
There is only one connection pool for all the actions, such as select/create index and so on
The 2 SQLs are running in the same thread with different db connection with async mode
When using "create index" instead of "create index concurrently", all the things are OK
postgresql shows the "create index concurrently on B"(active) is blocked by "select * from A"(idle in transaction)
I tried to reproduce the issue through command line, all the things worked well, just open 2 window, execute "begin; select * from A;" in first window, and tried to execute "create index concurrently on B;" in second window, the index created as expected, no block happened(I checked that, the first one is in "IIT" state)
To use fetch_size on cursor, the select statement will disable autocommit when get connection from pool and set the value back with pool global settings by hikari itself, the default pool setting is autocommit=true
2 statement are working on different tables, no relations between these 2 tables
When the "IIT" statement cancelled, the "create" statement continued to work as expected
wait_event_type | pid | state | query
Lock | 25707 | active | CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS "idx_tr-parameters__id_json" ON "tr-parameters" ((info->'_id') ASC)
| 25701 | idle in transaction | SELECT t.info FROM "configuration-profiles" t
05-29 21:22:53.458 [vert.x-worker-thread-11] DEBUG com.calix.sxa.VertxPGVertice - SELECT t.info FROM "organizations" t HikariProxyConnection#379242839 wrapping org.postgresql.jdbc.PgConnection#645bae4d
05-29 21:22:53.529 [vert.x-worker-thread-11] DEBUG com.zaxxer.hikari.pool.PoolBase - hikari-cp-threads - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection#645bae4d
05-29 21:22:53.533 [vert.x-worker-thread-11] DEBUG com.calix.sxa.VertxPGVertice - SELECT t.info FROM "configuration-profiles" t HikariProxyConnection#358392671 wrapping org.postgresql.jdbc.PgConnection#645bae4d
05-29 21:22:53.693 [vert.x-worker-thread-11] DEBUG com.calix.sxa.VertxPGVertice - SELECT t.info FROM "groups" t HikariProxyConnection#269112314 wrapping org.postgresql.jdbc.PgConnection#63822471
05-29 21:22:53.701 [vert.x-worker-thread-11] DEBUG com.zaxxer.hikari.pool.PoolBase - hikari-cp-threads - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection#63822471
05-29 21:22:53.701 [vert.x-worker-thread-11] DEBUG com.calix.sxa.VertxPGVertice - SELECT t.info FROM "configuration-profiles" t WHERE COALESCE((t.info->'configurations'->'parameterValues')::jsonb ?? 'OUI_FilterList', false) = true HikariProxyConnection#1431456353 wrapping org.postgresql.jdbc.PgConnection#63822471
05-29 21:22:53.704 [vert.x-worker-thread-11] DEBUG com.zaxxer.hikari.pool.PoolBase - hikari-cp-threads - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection#63822471
05-29 21:22:53.712 [vert.x-worker-thread-11] DEBUG com.calix.sxa.VertxPGVertice - CREATE INDEX CONCURRENTLY IF NOT EXISTS "idx_tr-parameters__id_json" ON "tr-parameters" ((info->>'timestamp') ASC) HikariProxyConnection#454316525 wrapping org.postgresql.jdbc.PgConnection#63822471
I don't know why the block happened since the tables are totally different and when testing manually with 2 command line windows, all the things worked well
How can I fix this issue? Any workaround on it?
Thanks for you kindly help
As mentioned by #jjanes, it's blocked by other transactions(transactions on the same table or any transaction with snapshot) during the 2 scans when building index concurrently
The official doc also mentioned, [1]: https://www.postgresql.org/docs/9.1/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
After the second scan, the index build must wait for any transactions
that have a snapshot (see Chapter 13) predating the second scan to terminate
In my case,
wait_event_type | pid | state | backend_xid | backend_xmin | query
----------------+-------+---------------------+-------------+--------------+--------------------------------------------------
| 5226 | idle in transaction | | 7973432 | select * from "configuration-profiles"
The backend_xmin is 7973432 of IIT and then the "create index concurrently" is blocked by the IIT with snapshot
BTW, when using command line with isolation level "read committed", the "create index concurrently" is not blocked, but with Java code, the "create" action is also blocked with the same isolation level,
wait_event_type | pid | state | backend_xid | backend_xmin | query
----------------+-------+---------------------+-------------+--------------+--------------------------------------------------
| 5226 | idle in transaction | | 7973432 | select * from "configuration-profiles"
| 5210 | idle in transaction | | | select * from "configuration-profiles";
| 5455 | idle in transaction | | 7973432 | declare cur cursor for select * from "configuration-profiles";
As showed above,
Using "select * from "configuration-profiles";" in command line, there is no backend_xmin since no cursor opened, all the records should be returned after executed the statement
Using "declare cur cursor for select * from "configuration-profiles";" in command line, backend_xmin has value since the cursor opened and waiting for query
Using "select * from "configuration-profiles"" through Java, backend_xmin also has value since the lib also uses cursor
PostgreSQL has no way of knowing that that other connection will not want to use the table the index is being built on at some point in the future of its snapshot. Just because it hasn't used the table yet doesn't mean it never will. The way to know that for sure is to wait for that transaction (or snapshot) to finish, which is what it does.
The 2 SQLs are running in the same thread with different db connection with async mode
Why is the other connection IIT? What is it waiting for? (It is waiting in your code, not in the database). Since it is async, it shouldn't be waiting on the CIC. Is it just waiting for you to issue a COMMIT? Since you turned off autocommit, it is your responsibility to issue COMMITs at the appropriate points. If you are using higher isolation levels, even SELECT only statements need to be committed.
I tried to reproduce the issue through command line, all the things worked well, just open 2 window, execute "begin; select * from A;" in first window, and tried to execute "create index concurrently on B;" in second window,
You can reproduce by changing the first one to begin isolation level repeatable read; select * from A;
Workarounds:
Don't leave things hanging around in an open transaction
Don't use higher isolation levels than you need to.
Don't use CIC.
Related
I'm trying to rename a database using:
ALTER DATABASE xxx RENAME TO yyy
I got an error message saying there is another open session. I killed it using:
SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'xxx' AND pid <> pg_backend_pid();
However, I then got an error message saying there are 2 prepared transactions pending. I made sure to kill all processes in the background with:
SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE pid <> pg_backend_pid();
But still, I am not able to rename the database.
I then saw there is a view by postgres called pg_prepared_xacts that shows me the prepared transactions (shortened the gid for a better overview), and indeed, there are two of them:
transaction|gid |prepared |owner|database|
-----------+--------------+-----------------------------+-----+--------+
5697779 |4871251_EAAAAC|2022-08-05 15:50:59.782 +0200|xxx |xxx |
5487701 |4871251_DAAAAA|2022-07-08 08:05:36.066 +0200|xxx |xxx |
According to the Postgres documentation on prepared transactions, I can execute a Rollback on the transaction id - which is what I did:
ROLLBACK PREPARED '5697779';
I made sure to execute the ROLLBACK with the same user, but it shows an error message saying that the transaction does not exist...
How can I get rid of it / kill it in order to be able to rename the database?
Thank you for reading and taking time to respond.
From here Prepared transaction:
transaction_id
An arbitrary identifier that later identifies this transaction for COMMIT PREPARED or ROLLBACK PREPARED. The identifier must be written as a string literal, and must be less than 200 bytes long. It must not be the same as the identifier used for any currently prepared transaction.
and from here Rollback prepared:
transaction_id
The transaction identifier of the transaction that is to be rolled back.
Lastly from here pg_prepared_xacts:
gid text
Global transaction identifier that was assigned to the transaction
So to rollback the transaction you need the global identifier assigned in the PREPARE TRANSACTION as shown in the gid column in pg_prepared_xacts.
I am working on a use case where in my current api application I need to kill any query that has been running more than 30 sec (as my server has a timeout of 30 sec but the query keeps running on Postgres).
So after some finding i came across the statement_timeout configuration in postgres. and implemented it in my sqlAlchemy code like this:
#contextmanager
def db_session():
"""Executes the query."""
import os
from my_aws import secretsmanager
secret_name = f'<my_secrey_key>'
secret = secretsmanager.get_secret(secret_name)
conn = f'{secret["dbname"]}://{secret["username"]}:{secret["password"]}#' \
f'{secret["host"]}:{secret["port"]}/{secret["dbname"]}'
eng = create_engine(
conn,
connect_args={'options': '-c statement_timeout=30s'})
connection = eng.connect()
db_session = scoped_session(sessionmaker(autocommit=False, autoflush=True, bind=eng))
yield db_session
db_session.close()
connection.close()
So my expectation here was that any query whcih cannot complete within 30s should timeout and return an error.
So when testing this.
I place a lock in one of my tables to delay my queries by doing this:
BEGIN WORK;
LOCK TABLE <schema>.<table_name> IN ACCESS EXCLUSIVE mode;
then i trigger an API call which queries the locked table (from above). this api does not repond as expected becuase the query is unable to execute witin 30 sec.
however the query does not terminate and i can still see it running in the pg_stat_activity
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' and usename='api_user'
ORDER BY query_start desc;
So the above query gives the reponse:
pid |age |usename |query
----|---------------|--------|-------------------------------
3334|00:05:17.962059|api_user|SELECT count(*) AS count_1 ¶FRO
1752|00:05:22.577919|api_user|COMMIT
1754|00:05:22.627446|api_user|COMMIT
3270|00:05:22.791417|api_user|SELECT count(*) AS count_1 ¶FRO
1755|00:05:23.058261|api_user|COMMIT
1753|00:05:23.123582|api_user|COMMIT
1689|00:05:24.149163|api_user|SELECT count(*) AS count_1 ¶FRO
1759|00:05:24.579171|api_user|SELECT DISTINCT sum(public.dema
1760|00:05:24.631371|api_user|SELECT count(*) AS count_1 ¶FRO
As you can see that the query on the locked tables are still waiting from more than 5 min.
Is there something wrong with my understanding of statement_timeout here.
FYI: I can see that the timeout is set on the postgres as the result of this query:
show statement_timeout;
Result:
statement_timeout|
-----------------|
30s |
I recommend that you set the parameter in postgresql.conf (then it is valid for the whole PostgreSQL server) or with ALTER DATABASE (then it is valid only for new connections to that database).
If that does not do the trick, the setting must be overridden somewhere. To debug, run the following SQL statement using SQLAlchemy:
SELECT current_setting('statement_timeout');
However, when I look at your query, perhaps everything is working anyway: add the state column to the pg_stat_activity query and check if the state is indeed active. Perhaps the query has already been canceled, and the state is idle or idle in transaction (aborted) (note that query shows the last query on that connection, which need not be active any more).
I think The statement_timeout should be a value in milliseconds. If you are really passing in 30s, that might be the wrong parameter value. Try using 30000 for 30 seconds.
eng = create_engine(
conn,
connect_args={'options': '-c statement_timeout=30000'})
On development server I'd like to remove unused databases. To realize that I need to know if database is still used by someone or not.
Is there a way to get last access or modification date of given database, schema or table?
You can do it via checking last modification time of table's file.
In postgresql,every table correspond one or more os files,like this:
select relfilenode from pg_class where relname = 'test';
the relfilenode is the file name of table "test".Then you could find the file in the database's directory.
in my test environment:
cd /data/pgdata/base/18976
ls -l -t | head
the last command means listing all files ordered by last modification time.
There is no built-in way to do this - and all the approaches that check the file mtime described in other answers here are wrong. The only reliable option is to add triggers to every table that record a change to a single change-history table, which is horribly inefficient and can't be done retroactively.
If you only care about "database used" vs "database not used" you can potentially collect this information from the CSV-format database log files. Detecting "modified" vs "not modified" is a lot harder; consider SELECT writes_to_some_table(...).
If you don't need to detect old activity, you can use pg_stat_database, which records activity since the last stats reset. e.g.:
-[ RECORD 6 ]--+------------------------------
datid | 51160
datname | regress
numbackends | 0
xact_commit | 54224
xact_rollback | 157
blks_read | 2591
blks_hit | 1592931
tup_returned | 26658392
tup_fetched | 327541
tup_inserted | 1664
tup_updated | 1371
tup_deleted | 246
conflicts | 0
temp_files | 0
temp_bytes | 0
deadlocks | 0
blk_read_time | 0
blk_write_time | 0
stats_reset | 2013-12-13 18:51:26.650521+08
so I can see that there has been activity on this DB since the last stats reset. However, I don't know anything about what happened before the stats reset, so if I had a DB showing zero activity since a stats reset half an hour ago, I'd know nothing useful.
PostgreSQL 9.5 let us to track last modified commit.
Check track commit is on or off using the following query
show track_commit_timestamp;
If it return "ON" go to step 3 else modify postgresql.conf
cd /etc/postgresql/9.5/main/
vi postgresql.conf
Change
track_commit_timestamp = off
to
track_commit_timestamp = on
Restart the postgres / system
Repeat step 1.
Use the following query to track last commit
SELECT pg_xact_commit_timestamp(xmin), * FROM YOUR_TABLE_NAME;
SELECT pg_xact_commit_timestamp(xmin), * FROM YOUR_TABLE_NAME where COLUMN_NAME=VALUE;
My way to get the modification date of my tables:
Python Function
CREATE OR REPLACE FUNCTION py_get_file_modification_timestamp(afilename text)
RETURNS timestamp without time zone AS
$BODY$
import os
import datetime
return datetime.datetime.fromtimestamp(os.path.getmtime(afilename))
$BODY$
LANGUAGE plpythonu VOLATILE
COST 100;
SQL Query
SELECT
schemaname,
tablename,
py_get_file_modification_timestamp('*postgresql_data_dir*/*tablespace_folder*/'||relfilenode)
FROM
pg_class
INNER JOIN
pg_catalog.pg_tables ON (tablename = relname)
WHERE
schemaname = 'public'
I'm not sure if things like vacuum can mess this aproach, but in my tests it's a pretty acurrate way to get tables that are no longer used, at least, on INSERT/UPDATE operations.
I guess you should activate some log options. You can get information about logging on postgreSQL here.
Below the steps I followed to test the SKIP LOCKED:
open one sql console of some Postgres UI client
Connect to Postgres DB
execute the queries
CREATE TABLE t_demo AS
SELECT *
FROM generate_series(1, 4) AS id;
check rows are created in that table:
TABLE t_demo
select rows using below query:
SELECT *
FROM t_demo
WHERE id = 2
FOR UPDATE SKIP LOCKED;
it is returning results as 2
Now execute the above query again:
SELECT *
FROM t_demo
WHERE id = 2
FOR UPDATE SKIP LOCKED;
this second query should not return any results, but it is returning results as 2
https://www.postgresql.org/docs/current/static/sql-select.html#SQL-FOR-UPDATE-SHARE
To prevent the operation from waiting for other transactions to
commit, use either the NOWAIT or SKIP LOCKED option
(emphasis mine)
if you run both queries in one window - you probably either run both in one transaction (then your next statement is not other transaction" or autocommiting after each statement (default)((but then you commit first statement transaction before second starts, thus lock released and you observe no effect
AFAIK, PostgreSQL 8.3 does not support transaction time out. I've read about supporting this feature in the future and there's some discussion about it. However, for specific reasons, I need a solution for this problem. So what I did is a script that runs periodically:
1) Based on locks and activity, query in order to retrieve processID of the transactions that is taking too long, and keeping the oldest (trxTimeOut.sql):
SELECT procpid
FROM
(
SELECT DISTINCT age(now(), query_start) AS age, procpid
FROM pg_stat_activity, pg_locks
WHERE pg_locks.pid = pg_stat_activity.procpid
) AS foo
WHERE age > '30 seconds'
ORDER BY age DESC
LIMIT 1
2) Based on this query, kill the corresponding process (trxTimeOut.sh):
psql -h localhost -U postgres -t -d test_database -f trxTimeOut.sql | xargs kill
Although I've tested it and seems to work, I'd like to know if it's an acceptable approach or should I consider a different one?
PostgreSQL provides idle_in_transaction_session_timeout since version 9.6, to automatically terminate transactions that are idle for too long.
It's also possible to set a limit on how long a command can take, through statement_timeout, independently on the duration of the transaction it's in, or why it's stuck (busy query or waiting for a lock).
To auto-abort transactions that are stuck specifically waiting for a lock, see lock_timeout.
These settings can be set at the SQL level with commands like SET shown below, or can be set as defaults to a database with ALTER DATABASE, or to a user with ALTER USER, or to the entire instance through postgresql.conf.
SET statement_timeout=10000; -- time out after 10 seconds