PostgreSQL CREATE INDEX CONCURRENTLY waiting column - postgresql

Im trying to create index on large table:
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | waiting | state | backend_xid | backend_xmin | query
-------+----------------------+-------+----------+-------------------------+--- --------------------------+--------------+-----------------+-------------+------ -------------------------+-------------------------------+---------------------- ---------+-------------------------------+---------+--------+-------------+----- ---------+---------------------------------------------------------------------- ---------------------
25439 | messengerdb | 30692 | 25438 | messengerdb_rw | pgAdmin III - Przegl??darka | 10.167.12.52 | | 50593 | 2016-08-11 05:27:12.101452+02 | 2016-08-11 05:28:01.535943+02 | 2016-08-11 05:28:01.535943+02 | 2016-08-11 05:28:01.535958+02 | t | active | | 1173740991 | CREATE INDEX CONCURRENTLY user_time_idx +
| | | | | | | | | | | | | | | | | ON core.conversations (user_id ASC NULLS LAST, last_message_timestamp ASC NULLS LAST);+
Is this query working? I'm worried about 'waiting' column === 't' does it mean that it is waiting for lock or sth?

Creating an index concurrently may take a long time since it does not lock the table from writes and it waits until other transactions are finished. However, it may wait forever if you have connections that stay idle in transactions (for example when a client or application keeps an open connection without rollback/commit).
Check if there are some idle connections in transactions (you should be able to see them in the processes list). You can also check PostgreSQL logs.
Section about creating index concurrently in PostgreSQL documentation can be helpful. There is also a nice article about concurrent indexes under this link.

Related

Timescaledb: retention policy isn't removing data from hypertable

(note: I've also posted this as a github issue https://github.com/timescale/timescaledb/issues/3653)
I have a hypertable request_logs configured with a 24 hour retention policy. The retention policy is being marked as running successfully, however no old data from the table is being removed. The table continues to grow day by day.
I checked and don't see any errors in the postgresql log files.
Could use additional guidance on where to look for information to troubleshoot this issue.
request_logs table structure
\d+ request_logs;
Table "public.request_logs"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------+--------------------------+-----------+----------+---------+---------+--------------+-------------
time | timestamp with time zone | | not null | | plain | |
referer | bigint | | | | plain | |
useragent | bigint | | | | plain | |
Indexes:
"request_logs_time_idx" btree ("time" DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON request_logs FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_1_37_chunk,
_timescaledb_internal._hyper_1_38_chunk,
_timescaledb_internal._hyper_1_40_chunk
Access method: heap
This is the hypertable description retrieved by running select * from _timescaledb_catalog.hypertable;
id | schema_name | table_name | associated_schema_name | associated_table_prefix | num_dimensions | chunk_sizing_func_schema | chunk_sizing_func_name | chunk_target_size | compression_state | compressed_hypertable_id | replication_factor
----+-------------+--------------+------------------------+-------------------------+----------------+--------------------------+--------------------------+-------------------+-------------------+--------------------------+--------------------
1 | public | request_logs | _timescaledb_internal | _hyper_1 | 1 | _timescaledb_internal | calculate_chunk_interval | 0 | 0 | |
(1 row)
This is the retention_policy retrieved by running SELECT * FROM timescaledb_information.job_stats;.
hypertable_schema | hypertable_name | job_id | last_run_started_at | last_successful_finish | last_run_status | job_status | last_run_duration | next_start | total_runs | total_successes | total_failures
-------------------+-----------------+--------+-------------------------------+-------------------------------+-----------------+------------+-------------------+-------------------------------+------------+-----------------+----------------
public | request_logs | 1002 | 2021-10-05 23:59:01.601404+00 | 2021-10-05 23:59:01.638441+00 | Success | Scheduled | 00:00:00.037037 | 2021-10-06 23:59:01.638441+00 | 8 | 8 | 0
| | 1 | 2021-10-05 08:38:20.473945+00 | 2021-10-05 08:38:21.153468+00 | Success | Scheduled | 00:00:00.679523 | 2021-10-06 08:38:21.153468+00 | 45 | 45 | 0
(2 rows)
Relevant system information:
OS: Ubuntu 20.04.3 LTS
PostgreSQL version (output of postgres --version): 12
TimescaleDB version (output of \dx in psql): 2.4.1
Installation method: apt install process described https://docs.timescale.com/timescaledb/latest/how-to-guides/install-timescaledb/self-hosted/ubuntu/installation-apt-ubuntu/#installation-apt-ubuntu
It looks as though this might be related to a bug that has been fixed in version 2.4.2 of TimescaleDB. The GitHub report has been updated, if you find that the issue remains after upgrade, please update the issue on GitHub with your example. Thanks for reporting!
Transparency: I work for Timescale

Multiple autovacuum: ANALYZE/VACUUM on the same table

I am currently using Postgres 11. For some reason whenever I check running autovacuums, I would see several multiple autovacuum: ANALYZE or autovacuum: VACUUM on the same table.
postgres=# SELECT backend_type,
pid,
now() - xact_start AS duration,
query,
state
FROM pg_stat_activity
WHERE query LIKE 'autovacuum%'
ORDER BY duration DESC LIMIT 64;
backend_type | pid | duration | query | state
-------------------+------+-----------------+-------------------------------------------------------------------+--------
autovacuum worker | 5900 | 00:48:29.175656 | autovacuum: ANALYZE someschema.foo | active
autovacuum worker | 3552 | 00:48:18.72438 | autovacuum: ANALYZE someschema.foo | active
autovacuum worker | 6122 | 00:47:29.907655 | autovacuum: VACUUM someschema.foo | active
autovacuum worker | 5897 | 00:42:45.380395 | autovacuum: VACUUM someschema.bar | active
autovacuum worker | 5583 | 00:38:45.406516 | autovacuum: VACUUM someschema.bar | active
The DDL looks like the following:
Table "someschema.foo"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------------+------+-----------+----------+---------+----------+--------------+-------------
foo_id | text | | not null | | extended | |
baz_id | text | | | | extended | |
Indexes:
"foo_pkey" PRIMARY KEY, btree (foo_id)
"baz_reference_id_idx" UNIQUE, btree (foo_id, baz_id)
Options: autovacuum_vacuum_cost_limit=2000, autovacuum_vacuum_cost_delay=10
I am unsure what is going on, and I cannot find my answers from the official documentations:
https://www.postgresql.org/docs/11/routine-vacuuming.html#AUTOVACUUM
https://www.postgresql.org/docs/11/sql-analyze.html
max_worker_processes = 15
autovacuum_max_workers = 64

how is backend_start great than xact_start

How can the backend_start be greater than 2 days of xact_start/query_start? The 3rd sessions looks good, but the first 2 looks weird, is this possible? Would this mean anything?
pg=> select * from pg_catalog.pg_stat_activity where usename = 'etl_user' and state = 'active' and backend_xmin = 65201266;
datid | datname | pid |usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event| state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+----------+------------------------+----------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------
16408 | pg| 37908 | 229661 | etl_user | PostgreSQL JDBC Driver | | | | 2021-04-20 21:36:22.540271+00 | 2021-04-17 22:31:32.314106+00 | 2021-04-17 22:31:32.317577+00 | 2021-04-20 21:36:22.541472+00 | || active | | 65201266 | SELECT 1 FROM (SELECT ...) | parallel worker
16408 | pg| 37909 | 229661 | etl_user | PostgreSQL JDBC Driver | | | | 2021-04-20 21:36:22.540909+00 | 2021-04-17 22:31:32.314106+00 | 2021-04-17 22:31:32.317577+00 | 2021-04-20 21:36:22.542134+00 | || active | | 65201266 | SELECT 1 FROM (SELECT ...) | parallel worker
16408 | pg| 3601 | 229661 | etl_user | PostgreSQL JDBC Driver | 10.175.130.142 | | 49832 | 2021-04-17 22:31:32.232008+00 | 2021-04-17 22:31:32.314106+00 | 2021-04-17 22:31:32.317577+00 | 2021-04-17 22:31:32.317578+00 | || active | | 65201266 | SELECT 1 FROM (SELECT ...) | client backend
(3 rows)
It looks to me like those are parallel workers started up to help the leader, and they inherit the leaders xact_start, but not backend_start. It would help to see the rest of the columns in pg_stat_activity, and know the version.
Yes, that looks impossible.
The only explanation that I have is that someone changed the system time since the sessions started.

postgresql remove stale lock

After a system crash my Postgresql database does have a lock on a row.
The pg_locks table contains a lot of rows without a pid. i.e.
select locktype,database,relation,virtualtransaction, pid,mode,granted from pg_locks p1;
locktype | database | relation | virtualtransaction | pid | mode | granted
---------------+----------+----------+--------------------+-------+------------------+---------
relation | 16408 | 31459 | -1/40059 | | AccessShareLock | t
relation | 16408 | 31459 | -1/40059 | | RowExclusiveLock | t
relation | 16408 | 31022 | -1/40060 | | AccessShareLock | t
transactionid | | | -1/40060 | | ExclusiveLock | t
relation | 16408 | 31485 | -1/40060 | | AccessShareLock | t
How do I get the transaction 40060 killed and the locks removed?
Ok, solution found by myself:
Find the gid to the transaction (i.e. 40060 in the case above) by select * from pg_prepared_xacts where transaction = 40060;
Find an awful long gid.
ROLLBACK PREPARED gid;
This will clear the locks.

Flyway migration hangs for postgres CREATE INDEX CONCURRENTLY

I am trying to run a CREATE INDEX CONCURRENTLY command against a Postgres 9.2 database. I implemented a MigrationResolver as shown in issue 655. When this migration step is run via mvn flyway:migrate or similar, the command starts but hangs in waiting mode.
I verified that the command is executing via the pg_stat_activity table:
test_2015_04_13_110536=# select * from pg_stat_activity;
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | waiting | state | query
-------+------------------------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+---------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
21095 | test_2015_04_13_110536 | 56695 | 16385 | postgres | psql | | | -1 | 2015-04-13 11:10:01.127768-06 | 2015-04-13 11:13:08.936651-06 | 2015-04-13 11:13:08.936651-06 | 2015-04-13 11:13:08.936655-06 | f | active | select * from pg_stat_activity;
21095 | test_2015_04_13_110536 | 56824 | 16385 | postgres | | 127.0.0.1 | | 52437 | 2015-04-13 11:12:55.438927-06 | 2015-04-13 11:12:55.476442-06 | 2015-04-13 11:12:55.487139-06 | 2015-04-13 11:12:55.487175-06 | f | idle in transaction | SELECT "version_rank","installed_rank","version","description","type","script","checksum","installed_on","installed_by","execution_time","success" FROM "public"."schema_version" ORDER BY "version_rank"
21095 | test_2015_04_13_110536 | 56825 | 16385 | postgres | | 127.0.0.1 | | 52438 | 2015-04-13 11:12:55.443687-06 | 2015-04-13 11:12:55.49024-06 | 2015-04-13 11:12:55.49024-06 | 2015-04-13 11:12:55.490241-06 | t | active | CREATE UNIQUE INDEX CONCURRENTLY person_restrict_duplicates_2_idx ON person(name, person_month, person_year)
(3 rows)
An example project that replicates this problem can be found in my github: chrisphelps/flyway-experiment
My suspicion is that the flyway query against schema version which is idle in transaction is preventing postgres from proceeding with the index creation.
How can I resolve the conflict so that postgres will proceed with the migration? Has anyone been able to apply this sort of migration to postgres via flyway?
In the meantime, there is a Resolver included in flyway which looks for some magic in the filename.
Just add the prefix 'NT' (for No-Transaction) to your migration file, i. e.
V01__usual_migration_1.sql
V02__another_migration.sql
NTV03__migration_that_does_not_run_in_transaction.sql
V04__classical_migration_4.sql
etc.