dbeaver tab holding 'idle in transaction' transaction - postgresql

I am seeing that dbeaver tabs sometimes block a postgres DB snapshot for 1 hour or more. The tab is set to 'manual commit - read committed' and 100% shows NO uncommited work. I have also seen this not coming from a tab but from what Dbeaver calls 'Main'.
The transaction in pg_stat_activity looks like below, you can see the session is idle in transaction. You can see that a backend_xmin is set and that pg_catalog.pg_proc was queried which was 100% not done by the user. From 'Main' I have seen it being idle in transaction on SHOW TRANSACTION ISOLATION LEVEL. When clicking rollback in the tab, the idle in transaction session immediately goes idle.
I do not want to set a server side idle_in_transaction_session_timeout timeout for this user. I have already set 'automatically end long idle transactions' in dbeaver.
How can I prevent dbeaver from holding transactions open so that backend_xmin or backend_xid get old and endanger autovacuum work?
Name |Value |
----------------+-----------------------------------------------------------------------------------------------------------------------------------------+
datid |16417 |
datname |<removed> |
pid |18974 |
leader_pid | |
usesysid |16394 |
usename |sys |
application_name|DBeaver - SQLEditor <Script-10.sql> |
client_addr |10.135.31.67 |
client_hostname | |
client_port |65397 |
backend_start |2023-02-08 12:12:45.098 +0000 |
xact_start |2023-02-08 12:16:23.121 +0000 |
query_start |2023-02-08 12:16:23.676 +0000 |
state_change |2023-02-08 12:16:23.677 +0000 |
wait_event_type |Client |
wait_event |ClientRead |
state |idle in transaction |
backend_xid | |
backend_xmin |222236770 |
query_id | |
query |SELECT pp.oid as poid, pp.* FROM pg_catalog.pg_proc pp WHERE pp.proname ILIKE $1 AND pp.pronamespace IN ($2) ORDER BY pp.proname LIMIT 10|
backend_type |client backend

Related

flyway sessions locking itself

I was trying to implement code through flyway:
create index concurrently if not exists api_client_system_role_idx2 on profile.api_client_system_role (api_client_id);
create index concurrently if not exists api_client_system_role_idx3 on profile.api_client_system_role (role_type_id);
create index concurrently if not exists api_key_idx2 on profile.api_key (api_client_id);
However flyway sessions were blocking each other and script is in "pending" state.
| Versioned | 20.1 | add email verification table | SQL | 2021-11-01 21:55:52 | Success |
| Versioned | 21.1 | create role for doc api | SQL | 2021-11-01 21:55:52 | Success |
| Versioned | 22 | create indexes for profile | SQL | 2022-10-21 10:23:41 | Success |
| Versioned | 23 | test flyway | SQL | | Pending |
+-----------+---------+----------------------------------------------+--------+---------------------+---------+
Flyway: Flyway Community Edition 9.3.1 by Redgate
Database: Postgresql 14.4
Can you please advice how to properly implement creating indexes concurrently in postgresql?
I've tried simply to kill blocking session and let the script to continue, however then implementation failed and scripts stayed in "Pending" status.

Why does python alembic postgres migration leave locks that prevent future migrations?

I have a psql database that I want to do alembic migrations on. After the migrations, locks are still present:
SELECT l.pid, l.locktype, l.mode
FROM pg_locks l
INNER JOIN pg_stat_activity s ON (l.pid = s.pid) where usename='migrations_user';
pid | locktype | mode
-------+------------+-----------------
19918 | relation | AccessShareLock
19918 | relation | AccessShareLock
19918 | relation | AccessShareLock
19918 | relation | AccessShareLock
19918 | virtualxid | ExclusiveLock
(5 rows)
The database is owned by db_owner and I ran grant db_owner to migrations_user; so it has all the permissions to run the migration, and the migration succeeds. But a followup migration on the same table would fail due to these locks, and I need to go in and manually run pg_terminate_backend on these locks to proceed.
What could be causing this? The alembic migration code is pretty standard:
def run_migrations_online():
"""Run migrations in 'online' mode.
In this scenario we need to create an Engine
and associate a connection with the context.
"""
LOG.info("Running run_migrations_online")
connectable = create_engine(get_conn_url_from_env())
with connectable.connect() as connection:
context.configure(connection=connection, target_metadata=target_metadata)
with context.begin_transaction():
context.run_migrations()
connectable.dispose()

GitLab encountering issues with PostgreSQL after OS upgrade to RHEL 7.6

We recently upgraded the OS:
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)
After upgrading, we are facing lot of issues with GitLab (predominantly with Postgres)..
Our GitLab is dockerized i.e. GitLab (and all its internal services including PostgreSQL) is running inside a single container. The container does not have it's own glibc, so it is using the one from the OS.
ERROR: canceling statement due to statement timeout
STATEMENT:
SELECT relnamespace::regnamespace as schemaname,
relname as relname,
pg_total_relation_size(oid) bytes FROM pg_class WHERE relkind = 'r';
The timeout messages appear continuously and this results in users facing 502 errors when accessing GitLab.
I checked the statement timeout set on the database.
gitlabhq_production=# show statement_timeout;
statement_timeout
-------------------
1min
(1 row)
I don't know what to make of this. This is probably the default setting. Is this an issue with postgres? What does this mean? Anything I can do to fix this?
EDIT:
Checked pg_stat_activity and don't see any locks as the server was rebooted earlier. The same query is running fine now but we keep seeing this issue intermittently.
Ran \d pg_class to check whether the table uses any indexes and also to check the string column.
gitlabhq_production=# \d pg_class
Table "pg_catalog.pg_class"
Column | Type | Modifiers
---------------------+-----------+-----------
relname | name | not null
relnamespace | oid | not null
reltype | oid | not null
reloftype | oid | not null
relowner | oid | not null
relam | oid | not null
relfilenode | oid | not null
reltablespace | oid | not null
relpages | integer | not null
reltuples | real | not null
relallvisible | integer | not null
reltoastrelid | oid | not null
relhasindex | boolean | not null
relisshared | boolean | not null
relpersistence | "char" | not null
relkind | "char" | not null
relnatts | smallint | not null
relchecks | smallint | not null
relhasoids | boolean | not null
relhaspkey | boolean | not null
relhasrules | boolean | not null
relhastriggers | boolean | not null
relhassubclass | boolean | not null
relrowsecurity | boolean | not null
relforcerowsecurity | boolean | not null
relispopulated | boolean | not null
relreplident | "char" | not null
relfrozenxid | xid | not null
relminmxid | xid | not null
relacl | aclitem[] |
reloptions | text[] |
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
"pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)
Would reindexing all tables and possibly alter tables help?
You should check whether the query us running for a minute or whether it is blocked behind a database lock. This can be seen from the pg_stat_activity row for the backend, which will show if the query is waiting for a lock or not (state=active and wait_event_type and wait_event indicate a lock).
If it is a lock, get rid of the locking transaction. It may be a prepared transaction, so check for these too.
If there is no lock at fault, it could be that your indexes have become corrupted by the operating system upgrade:
Since PostgreSQL uses operating system collations, database indexes on strings are sorted in collation order and an operating system upgrade can (and often does) lead to changed collations due to bug fixes in the C library, you should rebuild all indexes on string columns after such an upgrade.
The statement that you are showing does not use an index scan, so it should not be affected, but other statements may be.
Also, if you are using Docker, it may be that your container uses its own glibc that was not upgraded, and then you are not affected.

Google Cloud SQL VERY SLOW

I am thinking to migrate my website to Google Cloud SQL and I signed up for a free account (D32).
Upon testing on a table with 23k records the performances were very poor so I read that if I move from the free account to a full paid account I would have access to faster CPU and HDD... so I did.
performances are still VERY POOR.
I am running my own MySQL server for years now, upgrading as needed to handle more and more connections and to gain raw speed (needed because of a legacy application). I highly optimize tables, configuration, and heavy use of query cache, etc...
A few pages of our legacy system have over 1.5k of queries per page, currently I was able to push the mysql query time (execution and pulling of the data) down to 3.6seconds for all those queries, meaning that MySQL takes about 0.0024 seconds to execute the queries and return the values.. not the greatest but acceptable for those pages.
I upload a table involved in those many queries to Google Cloud SQL. I notices that the INSERT already takes SECONDS to execute instead than milliseconds.. but I think that it might be the sync vs async setting. I change it to async and the execution time for the insert doesn't feel like it changes. for now not a big problem, I am only testing queries for now.
I run a simple select * FROM <table> and I notice that it takes over 6 seconds.. I think that maybe the query cache needs to build.. i try again and this times it takes 4 seconds (excluding network traffic). I run the same query on my backup server after a restart and with no connections at all, and it takes less than 1 second.. running it again, 0.06 seconds.
Maybe the problem is the cache, too big... let's try a smaller subset
select * from <table> limit 5;
to my server: 0.00 seconds
GCS: 0.04
so I decide to try a dumb select on an empty table, no records at all, just created with only 1 field
to my server: 0.00 seconds
GCS: 0.03
profiling doesn't give any insights except that the query cache is not running on Google Cloud SQL and that the queries execution seems faster but .. is not...
My Server:
mysql> show profile;
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000225 |
| Waiting for query cache lock | 0.000116 |
| init | 0.000115 |
| checking query cache for query | 0.000131 |
| checking permissions | 0.000117 |
| Opening tables | 0.000124 |
| init | 0.000129 |
| System lock | 0.000124 |
| Waiting for query cache lock | 0.000114 |
| System lock | 0.000126 |
| optimizing | 0.000117 |
| statistics | 0.000127 |
| executing | 0.000129 |
| end | 0.000117 |
| query end | 0.000116 |
| closing tables | 0.000120 |
| freeing items | 0.000120 |
| Waiting for query cache lock | 0.000140 |
| freeing items | 0.000228 |
| Waiting for query cache lock | 0.000120 |
| freeing items | 0.000121 |
| storing result in query cache | 0.000116 |
| cleaning up | 0.000124 |
+--------------------------------+----------+
23 rows in set, 1 warning (0.00 sec)
Google Cloud SQL:
mysql> show profile;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000061 |
| checking permissions | 0.000012 |
| Opening tables | 0.000115 |
| System lock | 0.000019 |
| init | 0.000023 |
| optimizing | 0.000008 |
| statistics | 0.000012 |
| preparing | 0.000005 |
| executing | 0.000021 |
| end | 0.000024 |
| query end | 0.000007 |
| closing tables | 0.000030 |
| freeing items | 0.000018 |
| logging slow query | 0.000006 |
| cleaning up | 0.000005 |
+----------------------+----------+
15 rows in set (0.03 sec)
keep in mind that I connect to both server remotely from a server located in VA and my server is located in Texas (even if it should not matter that much).
What am I doing wrong ? why simple queries take this long ? am I missing or not understanding something here ?
As of right now I won't be able to use Google Cloud SQL because a page with 1500 queries will take way too long (circa 45 seconds)
I know this question is old but....
CloudSQL has poor support for MyISAM tables, it's recommend to use InnoDB.
We had poor performance when migrating a legacy app, after reading through the doc's and contacting the paid support, we had to migrate the tables into InnoDB; No query cache was also a killer.
You may also find later on you'll need to tweak the mysql conf via the 'flags' in the google console. An example being 'wait_timeout' is set too high by default (imo.)
Hope this helps someone :)
Query cache is not as yet a feature of Cloud SQL. This may explain the results. However, I recommend closing this question as it is quite broad and doesn't fit the format of a neat and tidy Q&A. There are just too many variables not mentioned in the Q&A and it doesn't appear clear what a decisive "answer" would look like to the very general question of optimization when there are so many variables at play.

postgres ALTER TABLE being blocked

Im running Postgres 8.3 and I am having trouble running AN ALTER TABLE ADD COLUMN statement which seems to be blocked by an AccessShareLock when I run this query
SELECT t.relname,l.locktype,page,virtualtransaction,pid,mode,granted FROM pg_locks l, pg_stat_all_tables t WHERE l.relation=t.relid ORDER BY relation asc;
The table's name is dealer.
relname | locktype | page | virtualtransaction | pid | mode | granted
dealer | relation | | 2/40 | 12719 | AccessExclusiveLock | f
dealer | relation | | -1/154985751 | | AccessShareLock | t
I also ran
SELECT * FROM pg_prepared_xacts
That returned
transaction | gid | prepared | owner | database
154985751 | 131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM= | 2014-09-19 08:01:49.650957+10 | user | database
The transaction id 154985751 looks similar to the virtualtransaction in the pg_locks table -1/154985751
I ran this command to view any processes that may be running queries on the database
ps axu | grep postgres | grep -v idle
and have confirmed there are no other processes running queries on the database.
The log file shows this after the query has been run
2014-11-14 17:25:00.794 EST (pid: 12719) LOG: statement: BEGIN;
2014-11-14 17:25:00.794 EST (pid: 12719) LOG: statement: ALTER TABLE dealer ADD bullet1 varchar;
2014-11-14 17:25:01.795 EST (pid: 12719) LOG: process 12719 still waiting for AccessExclusiveLock on relation 2321398 of database 2321293 after 1000.133 ms
2014-11-14 17:25:01.795 EST (pid: 12719) STATEMENT: ALTER TABLE dealer ADD bullet1 varchar;
What could be causing the AccessShareLock on the dealer table? Im guessing it has something to do with the transaction 154985751 is there a way to terminate a transaction with using the virtual id?
You have a prepared transaction in place. Prepared transactions - those where PREPARE TRANSACTION but not COMMIT PREPARED or ROLLBACK PREPARED has been run - hold locks, just like normal running transactions do.
Prepared transactions may be used by XA transaction managers, JTA, etc, not necessarily directly by your app. Many queuing systems use them too. If you don't know what the transaction is and you commit it or roll it back you may disrupt something that is relying on two-phase commit.
If you are certain that you know what it is you can:
COMMIT PREPARED '131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM='
or
ROLLBACK PREPARED '131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM='
depending on whether you wish to commit or abort the prepared xact.
You can't inspect the transaction to see what it did/does, you need to figure out what app/tool created it and why if you don't know what it is.
The identifier looks suspiciously like [number]_[base64]_[base64] so lets see what we can do with that:
postgres=> SELECT decode((string_to_array('131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM=','_'))[2], 'base64');
decode
------------------------------------------------------------------
\x312d613332303361373a623032333a35343130663433313a31633565383939
(1 row)
postgres=> SELECT decode((string_to_array('131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM=','_'))[3], 'base64');
decode
--------------------------------------------------------------
\x613332303361373a623032333a35343130663433313a31633565383963
(1 row)
Hm, looks like ASCII or similar, lets see:
postgres=> SELECT convert_from(decode((string_to_array('131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM=','_'))[2], 'base64'), 'utfpostgres=> SELECT convert_from(decode((string_to_array('131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM=','_'))[2], 'base64'), 'utf-8');
convert_from
---------------------------------
1-a3203a7:b023:5410f431:1c5e899
(1 row)
postgres=> SELECT convert_from(decode((string_to_array('131075_MS1hMzIwM2E3OmIwMjM6NTQxMGY0MzE6MWM1ZTg5OQ==_YTMyMDNhNzpiMDIzOjU0MTBmNDMxOjFjNWU4OWM=','_'))[3], 'base64'), 'utf-8');
convert_from
-------------------------------
a3203a7:b023:5410f431:1c5e89c
(1 row)
Looks vaguely GUID/UUID-ish, with odd formatting and grouping.
Maybe those identifiers will help you figure out where the xact came from.
BTW, 8.3 is exceedingly obsolete. Plan your upgrade.