Why does python alembic postgres migration leave locks that prevent future migrations? - postgresql

I have a psql database that I want to do alembic migrations on. After the migrations, locks are still present:
SELECT l.pid, l.locktype, l.mode
FROM pg_locks l
INNER JOIN pg_stat_activity s ON (l.pid = s.pid) where usename='migrations_user';
pid | locktype | mode
-------+------------+-----------------
19918 | relation | AccessShareLock
19918 | relation | AccessShareLock
19918 | relation | AccessShareLock
19918 | relation | AccessShareLock
19918 | virtualxid | ExclusiveLock
(5 rows)
The database is owned by db_owner and I ran grant db_owner to migrations_user; so it has all the permissions to run the migration, and the migration succeeds. But a followup migration on the same table would fail due to these locks, and I need to go in and manually run pg_terminate_backend on these locks to proceed.
What could be causing this? The alembic migration code is pretty standard:
def run_migrations_online():
"""Run migrations in 'online' mode.
In this scenario we need to create an Engine
and associate a connection with the context.
"""
LOG.info("Running run_migrations_online")
connectable = create_engine(get_conn_url_from_env())
with connectable.connect() as connection:
context.configure(connection=connection, target_metadata=target_metadata)
with context.begin_transaction():
context.run_migrations()
connectable.dispose()

Related

dbeaver tab holding 'idle in transaction' transaction

I am seeing that dbeaver tabs sometimes block a postgres DB snapshot for 1 hour or more. The tab is set to 'manual commit - read committed' and 100% shows NO uncommited work. I have also seen this not coming from a tab but from what Dbeaver calls 'Main'.
The transaction in pg_stat_activity looks like below, you can see the session is idle in transaction. You can see that a backend_xmin is set and that pg_catalog.pg_proc was queried which was 100% not done by the user. From 'Main' I have seen it being idle in transaction on SHOW TRANSACTION ISOLATION LEVEL. When clicking rollback in the tab, the idle in transaction session immediately goes idle.
I do not want to set a server side idle_in_transaction_session_timeout timeout for this user. I have already set 'automatically end long idle transactions' in dbeaver.
How can I prevent dbeaver from holding transactions open so that backend_xmin or backend_xid get old and endanger autovacuum work?
Name |Value |
----------------+-----------------------------------------------------------------------------------------------------------------------------------------+
datid |16417 |
datname |<removed> |
pid |18974 |
leader_pid | |
usesysid |16394 |
usename |sys |
application_name|DBeaver - SQLEditor <Script-10.sql> |
client_addr |10.135.31.67 |
client_hostname | |
client_port |65397 |
backend_start |2023-02-08 12:12:45.098 +0000 |
xact_start |2023-02-08 12:16:23.121 +0000 |
query_start |2023-02-08 12:16:23.676 +0000 |
state_change |2023-02-08 12:16:23.677 +0000 |
wait_event_type |Client |
wait_event |ClientRead |
state |idle in transaction |
backend_xid | |
backend_xmin |222236770 |
query_id | |
query |SELECT pp.oid as poid, pp.* FROM pg_catalog.pg_proc pp WHERE pp.proname ILIKE $1 AND pp.pronamespace IN ($2) ORDER BY pp.proname LIMIT 10|
backend_type |client backend

flyway sessions locking itself

I was trying to implement code through flyway:
create index concurrently if not exists api_client_system_role_idx2 on profile.api_client_system_role (api_client_id);
create index concurrently if not exists api_client_system_role_idx3 on profile.api_client_system_role (role_type_id);
create index concurrently if not exists api_key_idx2 on profile.api_key (api_client_id);
However flyway sessions were blocking each other and script is in "pending" state.
| Versioned | 20.1 | add email verification table | SQL | 2021-11-01 21:55:52 | Success |
| Versioned | 21.1 | create role for doc api | SQL | 2021-11-01 21:55:52 | Success |
| Versioned | 22 | create indexes for profile | SQL | 2022-10-21 10:23:41 | Success |
| Versioned | 23 | test flyway | SQL | | Pending |
+-----------+---------+----------------------------------------------+--------+---------------------+---------+
Flyway: Flyway Community Edition 9.3.1 by Redgate
Database: Postgresql 14.4
Can you please advice how to properly implement creating indexes concurrently in postgresql?
I've tried simply to kill blocking session and let the script to continue, however then implementation failed and scripts stayed in "Pending" status.

GitLab encountering issues with PostgreSQL after OS upgrade to RHEL 7.6

We recently upgraded the OS:
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)
After upgrading, we are facing lot of issues with GitLab (predominantly with Postgres)..
Our GitLab is dockerized i.e. GitLab (and all its internal services including PostgreSQL) is running inside a single container. The container does not have it's own glibc, so it is using the one from the OS.
ERROR: canceling statement due to statement timeout
STATEMENT:
SELECT relnamespace::regnamespace as schemaname,
relname as relname,
pg_total_relation_size(oid) bytes FROM pg_class WHERE relkind = 'r';
The timeout messages appear continuously and this results in users facing 502 errors when accessing GitLab.
I checked the statement timeout set on the database.
gitlabhq_production=# show statement_timeout;
statement_timeout
-------------------
1min
(1 row)
I don't know what to make of this. This is probably the default setting. Is this an issue with postgres? What does this mean? Anything I can do to fix this?
EDIT:
Checked pg_stat_activity and don't see any locks as the server was rebooted earlier. The same query is running fine now but we keep seeing this issue intermittently.
Ran \d pg_class to check whether the table uses any indexes and also to check the string column.
gitlabhq_production=# \d pg_class
Table "pg_catalog.pg_class"
Column | Type | Modifiers
---------------------+-----------+-----------
relname | name | not null
relnamespace | oid | not null
reltype | oid | not null
reloftype | oid | not null
relowner | oid | not null
relam | oid | not null
relfilenode | oid | not null
reltablespace | oid | not null
relpages | integer | not null
reltuples | real | not null
relallvisible | integer | not null
reltoastrelid | oid | not null
relhasindex | boolean | not null
relisshared | boolean | not null
relpersistence | "char" | not null
relkind | "char" | not null
relnatts | smallint | not null
relchecks | smallint | not null
relhasoids | boolean | not null
relhaspkey | boolean | not null
relhasrules | boolean | not null
relhastriggers | boolean | not null
relhassubclass | boolean | not null
relrowsecurity | boolean | not null
relforcerowsecurity | boolean | not null
relispopulated | boolean | not null
relreplident | "char" | not null
relfrozenxid | xid | not null
relminmxid | xid | not null
relacl | aclitem[] |
reloptions | text[] |
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
"pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)
Would reindexing all tables and possibly alter tables help?
You should check whether the query us running for a minute or whether it is blocked behind a database lock. This can be seen from the pg_stat_activity row for the backend, which will show if the query is waiting for a lock or not (state=active and wait_event_type and wait_event indicate a lock).
If it is a lock, get rid of the locking transaction. It may be a prepared transaction, so check for these too.
If there is no lock at fault, it could be that your indexes have become corrupted by the operating system upgrade:
Since PostgreSQL uses operating system collations, database indexes on strings are sorted in collation order and an operating system upgrade can (and often does) lead to changed collations due to bug fixes in the C library, you should rebuild all indexes on string columns after such an upgrade.
The statement that you are showing does not use an index scan, so it should not be affected, but other statements may be.
Also, if you are using Docker, it may be that your container uses its own glibc that was not upgraded, and then you are not affected.

How to delete replication slot in postgres 9.4

I have replication slot which I want to delete but when I do delete I got an error that I can't delete from view. Any ideas?
postgres=# SELECT * FROM pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
--------------+--------------+-----------+--------+----------+--------+------+--------------+-------------
bottledwater | bottledwater | logical | 12141 | postgres | t | | 374036 | E/FE8D9010
(1 row)
postgres=# delete from pg_replication_slots;
ERROR: cannot delete from view "pg_replication_slots"
DETAIL: Views that do not select from a single table or view are not automatically updatable.
HINT: To enable deleting from the view, provide an INSTEAD OF DELETE trigger or an unconditional ON DELETE DO INSTEAD rule.
postgres=#
Use pg_drop_replication_slot:
select pg_drop_replication_slot('bottledwater');
See the docs and this blog.
The replication slot must be inactive, i.e. no active connections. So if there's a streaming replica using the slot you must stop the streaming replica. Or you can change its recovery.conf so it doesn't use a slot anymore and restart it.
As a complement to the accepted answer, I'd like to mention that following command will not fail in case the slot does not exist (this was useful for me because I scripted that).
select pg_drop_replication_slot(slot_name) from pg_replication_slots where slot_name = 'bottledwater';

postgres ALTER TABLE being blocked

Im running Postgres 8.3 and I am having trouble running AN ALTER TABLE ADD COLUMN statement which seems to be blocked by an AccessShareLock when I run this query
SELECT t.relname,l.locktype,page,virtualtransaction,pid,mode,granted FROM pg_locks l, pg_stat_all_tables t WHERE l.relation=t.relid ORDER BY relation asc;
The table's name is dealer.
relname | locktype | page | virtualtransaction | pid | mode | granted
dealer | relation | | 2/40 | 12719 | AccessExclusiveLock | f
dealer | relation | | -1/154985751 | | AccessShareLock | t
I also ran
SELECT * FROM pg_prepared_xacts
That returned
transaction | gid | prepared | owner | database
154985751 | 131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM= | 2014-09-19 08:01:49.650957+10 | user | database
The transaction id 154985751 looks similar to the virtualtransaction in the pg_locks table -1/154985751
I ran this command to view any processes that may be running queries on the database
ps axu | grep postgres | grep -v idle
and have confirmed there are no other processes running queries on the database.
The log file shows this after the query has been run
2014-11-14 17:25:00.794 EST (pid: 12719) LOG: statement: BEGIN;
2014-11-14 17:25:00.794 EST (pid: 12719) LOG: statement: ALTER TABLE dealer ADD bullet1 varchar;
2014-11-14 17:25:01.795 EST (pid: 12719) LOG: process 12719 still waiting for AccessExclusiveLock on relation 2321398 of database 2321293 after 1000.133 ms
2014-11-14 17:25:01.795 EST (pid: 12719) STATEMENT: ALTER TABLE dealer ADD bullet1 varchar;
What could be causing the AccessShareLock on the dealer table? Im guessing it has something to do with the transaction 154985751 is there a way to terminate a transaction with using the virtual id?
You have a prepared transaction in place. Prepared transactions - those where PREPARE TRANSACTION but not COMMIT PREPARED or ROLLBACK PREPARED has been run - hold locks, just like normal running transactions do.
Prepared transactions may be used by XA transaction managers, JTA, etc, not necessarily directly by your app. Many queuing systems use them too. If you don't know what the transaction is and you commit it or roll it back you may disrupt something that is relying on two-phase commit.
If you are certain that you know what it is you can:
COMMIT PREPARED '131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM='
or
ROLLBACK PREPARED '131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM='
depending on whether you wish to commit or abort the prepared xact.
You can't inspect the transaction to see what it did/does, you need to figure out what app/tool created it and why if you don't know what it is.
The identifier looks suspiciously like [number]_[base64]_[base64] so lets see what we can do with that:
postgres=> SELECT decode((string_to_array('131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM=','_'))[2], 'base64');
decode
------------------------------------------------------------------
\x312d613332303361373a623032333a35343130663433313a31633565383939
(1 row)
postgres=> SELECT decode((string_to_array('131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM=','_'))[3], 'base64');
decode
--------------------------------------------------------------
\x613332303361373a623032333a35343130663433313a31633565383963
(1 row)
Hm, looks like ASCII or similar, lets see:
postgres=> SELECT convert_from(decode((string_to_array('131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM=','_'))[2], 'base64'), 'utfpostgres=> SELECT convert_from(decode((string_to_array('131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM=','_'))[2], 'base64'), 'utf-8');
convert_from
---------------------------------
1-a3203a7:b023:5410f431:1c5e899
(1 row)
postgres=> SELECT convert_from(decode((string_to_array('131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM=','_'))[3], 'base64'), 'utf-8');
convert_from
-------------------------------
a3203a7:b023:5410f431:1c5e89c
(1 row)
Looks vaguely GUID/UUID-ish, with odd formatting and grouping.
Maybe those identifiers will help you figure out where the xact came from.
BTW, 8.3 is exceedingly obsolete. Plan your upgrade.