Flyway migration hangs for postgres CREATE INDEX CONCURRENTLY - postgresql

I am trying to run a CREATE INDEX CONCURRENTLY command against a Postgres 9.2 database. I implemented a MigrationResolver as shown in issue 655. When this migration step is run via mvn flyway:migrate or similar, the command starts but hangs in waiting mode.
I verified that the command is executing via the pg_stat_activity table:
test_2015_04_13_110536=# select * from pg_stat_activity;
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | waiting | state | query
-------+------------------------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+---------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
21095 | test_2015_04_13_110536 | 56695 | 16385 | postgres | psql | | | -1 | 2015-04-13 11:10:01.127768-06 | 2015-04-13 11:13:08.936651-06 | 2015-04-13 11:13:08.936651-06 | 2015-04-13 11:13:08.936655-06 | f | active | select * from pg_stat_activity;
21095 | test_2015_04_13_110536 | 56824 | 16385 | postgres | | 127.0.0.1 | | 52437 | 2015-04-13 11:12:55.438927-06 | 2015-04-13 11:12:55.476442-06 | 2015-04-13 11:12:55.487139-06 | 2015-04-13 11:12:55.487175-06 | f | idle in transaction | SELECT "version_rank","installed_rank","version","description","type","script","checksum","installed_on","installed_by","execution_time","success" FROM "public"."schema_version" ORDER BY "version_rank"
21095 | test_2015_04_13_110536 | 56825 | 16385 | postgres | | 127.0.0.1 | | 52438 | 2015-04-13 11:12:55.443687-06 | 2015-04-13 11:12:55.49024-06 | 2015-04-13 11:12:55.49024-06 | 2015-04-13 11:12:55.490241-06 | t | active | CREATE UNIQUE INDEX CONCURRENTLY person_restrict_duplicates_2_idx ON person(name, person_month, person_year)
(3 rows)
An example project that replicates this problem can be found in my github: chrisphelps/flyway-experiment
My suspicion is that the flyway query against schema version which is idle in transaction is preventing postgres from proceeding with the index creation.
How can I resolve the conflict so that postgres will proceed with the migration? Has anyone been able to apply this sort of migration to postgres via flyway?

In the meantime, there is a Resolver included in flyway which looks for some magic in the filename.
Just add the prefix 'NT' (for No-Transaction) to your migration file, i. e.
V01__usual_migration_1.sql
V02__another_migration.sql
NTV03__migration_that_does_not_run_in_transaction.sql
V04__classical_migration_4.sql
etc.

Related

PostgreSQL insert performance - why would it be so slow?

I've got a PostgreSQL database running inside a docker container on an AWS Linux instance. I've got some telemetry running, uploading records in batches of ten. A Python server inserts these records into the database. The table looks like this:
postgres=# \d raw_journey_data ;
Table "public.raw_journey_data"
Column | Type | Collation | Nullable | Default
--------+-----------------------------+-----------+----------+---------
email | character varying | | |
t | timestamp without time zone | | |
lat | numeric(20,18) | | |
lng | numeric(21,18) | | |
speed | numeric(21,18) | | |
There aren't that many rows in the table; about 36,000 presently. But committing the transactions that insert the data is taking about a minute each time:
postgres=# SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
pid | age | usename | query
-----+-----------------+----------+--------
30 | | |
32 | | postgres |
28 | | |
27 | | |
29 | | |
37 | 00:00:11.439313 | postgres | COMMIT
36 | 00:00:11.439565 | postgres | COMMIT
39 | 00:00:36.454011 | postgres | COMMIT
56 | 00:00:36.457828 | postgres | COMMIT
61 | 00:00:56.474446 | postgres | COMMIT
35 | 00:00:56.474647 | postgres | COMMIT
(11 rows)
The load average on the system's CPUs is zero and about half of the 4GB system RAM is available (as shown by free). So what causes the super-slow commits here?
The insertion is being done with SqlAlchemy:
db.session.execute(import_table.insert([
{
"email": current_user.email,
"t": row.t.ToDatetime(),
"lat": row.lat,
"lng": row.lng,
"speed": row.speed
}
for row in data.data
]))
Edit Update with the state column:
postgres=# SELECT pid, age(clock_timestamp(), query_start), usename, state, query
FROM pg_stat_activity
WHERE query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
pid | age | usename | state | query
-----+-----------------+----------+-------+--------
32 | | postgres | |
30 | | | |
28 | | | |
27 | | | |
29 | | | |
46 | 00:00:08.390177 | postgres | idle | COMMIT
49 | 00:00:08.390348 | postgres | idle | COMMIT
45 | 00:00:23.35249 | postgres | idle | COMMIT
(8 rows)

how is backend_start great than xact_start

How can the backend_start be greater than 2 days of xact_start/query_start? The 3rd sessions looks good, but the first 2 looks weird, is this possible? Would this mean anything?
pg=> select * from pg_catalog.pg_stat_activity where usename = 'etl_user' and state = 'active' and backend_xmin = 65201266;
datid | datname | pid |usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event| state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+----------+------------------------+----------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------
16408 | pg| 37908 | 229661 | etl_user | PostgreSQL JDBC Driver | | | | 2021-04-20 21:36:22.540271+00 | 2021-04-17 22:31:32.314106+00 | 2021-04-17 22:31:32.317577+00 | 2021-04-20 21:36:22.541472+00 | || active | | 65201266 | SELECT 1 FROM (SELECT ...) | parallel worker
16408 | pg| 37909 | 229661 | etl_user | PostgreSQL JDBC Driver | | | | 2021-04-20 21:36:22.540909+00 | 2021-04-17 22:31:32.314106+00 | 2021-04-17 22:31:32.317577+00 | 2021-04-20 21:36:22.542134+00 | || active | | 65201266 | SELECT 1 FROM (SELECT ...) | parallel worker
16408 | pg| 3601 | 229661 | etl_user | PostgreSQL JDBC Driver | 10.175.130.142 | | 49832 | 2021-04-17 22:31:32.232008+00 | 2021-04-17 22:31:32.314106+00 | 2021-04-17 22:31:32.317577+00 | 2021-04-17 22:31:32.317578+00 | || active | | 65201266 | SELECT 1 FROM (SELECT ...) | client backend
(3 rows)
It looks to me like those are parallel workers started up to help the leader, and they inherit the leaders xact_start, but not backend_start. It would help to see the rest of the columns in pg_stat_activity, and know the version.
Yes, that looks impossible.
The only explanation that I have is that someone changed the system time since the sessions started.

CREATE DATABASE never ends

I cannot create a database with postgres 9.6.12, viewing pg_activity there's no blocking and waiting queries
this is my query:
-[ RECORD 1 ]----+------------------------------------
datid | 16390
datname | mydb
pid | 7275
usesysid | 10
usename | postgres96
application_name | pgAdmin III - Query Tool
client_addr | myip
client_hostname | mypc
client_port | 55202
backend_start | 2019-07-22 09:12:11.238705-04
xact_start | 2019-07-22 09:12:13.010278-04
query_start | 2019-07-22 09:12:13.010278-04
state_change | 2019-07-22 09:12:13.010282-04
wait_event_type |
wait_event |
state | active
backend_xid | 991367173
backend_xmin | 991367173
query | CREATE DATABASE mydb2\r +
| WITH OWNER = postgres96\r +
| ENCODING = 'UTF8'\r +
| TABLESPACE = system\r +
| LC_COLLATE = 'en_US.UTF-8'\r+
| LC_CTYPE = 'en_US.UTF-8'\r +
| CONNECTION LIMIT = -1;
why is tacking so long?
well... after dropping all subscriptions of pglogical and restart de service I could create the database (I couldn't after a simple restart)

postgresql remove stale lock

After a system crash my Postgresql database does have a lock on a row.
The pg_locks table contains a lot of rows without a pid. i.e.
select locktype,database,relation,virtualtransaction, pid,mode,granted from pg_locks p1;
locktype | database | relation | virtualtransaction | pid | mode | granted
---------------+----------+----------+--------------------+-------+------------------+---------
relation | 16408 | 31459 | -1/40059 | | AccessShareLock | t
relation | 16408 | 31459 | -1/40059 | | RowExclusiveLock | t
relation | 16408 | 31022 | -1/40060 | | AccessShareLock | t
transactionid | | | -1/40060 | | ExclusiveLock | t
relation | 16408 | 31485 | -1/40060 | | AccessShareLock | t
How do I get the transaction 40060 killed and the locks removed?
Ok, solution found by myself:
Find the gid to the transaction (i.e. 40060 in the case above) by select * from pg_prepared_xacts where transaction = 40060;
Find an awful long gid.
ROLLBACK PREPARED gid;
This will clear the locks.

PostgreSQL CREATE INDEX CONCURRENTLY waiting column

Im trying to create index on large table:
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | waiting | state | backend_xid | backend_xmin | query
-------+----------------------+-------+----------+-------------------------+--- --------------------------+--------------+-----------------+-------------+------ -------------------------+-------------------------------+---------------------- ---------+-------------------------------+---------+--------+-------------+----- ---------+---------------------------------------------------------------------- ---------------------
25439 | messengerdb | 30692 | 25438 | messengerdb_rw | pgAdmin III - Przegl??darka | 10.167.12.52 | | 50593 | 2016-08-11 05:27:12.101452+02 | 2016-08-11 05:28:01.535943+02 | 2016-08-11 05:28:01.535943+02 | 2016-08-11 05:28:01.535958+02 | t | active | | 1173740991 | CREATE INDEX CONCURRENTLY user_time_idx +
| | | | | | | | | | | | | | | | | ON core.conversations (user_id ASC NULLS LAST, last_message_timestamp ASC NULLS LAST);+
Is this query working? I'm worried about 'waiting' column === 't' does it mean that it is waiting for lock or sth?
Creating an index concurrently may take a long time since it does not lock the table from writes and it waits until other transactions are finished. However, it may wait forever if you have connections that stay idle in transactions (for example when a client or application keeps an open connection without rollback/commit).
Check if there are some idle connections in transactions (you should be able to see them in the processes list). You can also check PostgreSQL logs.
Section about creating index concurrently in PostgreSQL documentation can be helpful. There is also a nice article about concurrent indexes under this link.