We recently upgraded the OS:
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)
After upgrading, we are facing lot of issues with GitLab (predominantly with Postgres)..
Our GitLab is dockerized i.e. GitLab (and all its internal services including PostgreSQL) is running inside a single container. The container does not have it's own glibc, so it is using the one from the OS.
ERROR: canceling statement due to statement timeout
STATEMENT:
SELECT relnamespace::regnamespace as schemaname,
relname as relname,
pg_total_relation_size(oid) bytes FROM pg_class WHERE relkind = 'r';
The timeout messages appear continuously and this results in users facing 502 errors when accessing GitLab.
I checked the statement timeout set on the database.
gitlabhq_production=# show statement_timeout;
statement_timeout
-------------------
1min
(1 row)
I don't know what to make of this. This is probably the default setting. Is this an issue with postgres? What does this mean? Anything I can do to fix this?
EDIT:
Checked pg_stat_activity and don't see any locks as the server was rebooted earlier. The same query is running fine now but we keep seeing this issue intermittently.
Ran \d pg_class to check whether the table uses any indexes and also to check the string column.
gitlabhq_production=# \d pg_class
Table "pg_catalog.pg_class"
Column | Type | Modifiers
---------------------+-----------+-----------
relname | name | not null
relnamespace | oid | not null
reltype | oid | not null
reloftype | oid | not null
relowner | oid | not null
relam | oid | not null
relfilenode | oid | not null
reltablespace | oid | not null
relpages | integer | not null
reltuples | real | not null
relallvisible | integer | not null
reltoastrelid | oid | not null
relhasindex | boolean | not null
relisshared | boolean | not null
relpersistence | "char" | not null
relkind | "char" | not null
relnatts | smallint | not null
relchecks | smallint | not null
relhasoids | boolean | not null
relhaspkey | boolean | not null
relhasrules | boolean | not null
relhastriggers | boolean | not null
relhassubclass | boolean | not null
relrowsecurity | boolean | not null
relforcerowsecurity | boolean | not null
relispopulated | boolean | not null
relreplident | "char" | not null
relfrozenxid | xid | not null
relminmxid | xid | not null
relacl | aclitem[] |
reloptions | text[] |
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
"pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)
Would reindexing all tables and possibly alter tables help?
You should check whether the query us running for a minute or whether it is blocked behind a database lock. This can be seen from the pg_stat_activity row for the backend, which will show if the query is waiting for a lock or not (state=active and wait_event_type and wait_event indicate a lock).
If it is a lock, get rid of the locking transaction. It may be a prepared transaction, so check for these too.
If there is no lock at fault, it could be that your indexes have become corrupted by the operating system upgrade:
Since PostgreSQL uses operating system collations, database indexes on strings are sorted in collation order and an operating system upgrade can (and often does) lead to changed collations due to bug fixes in the C library, you should rebuild all indexes on string columns after such an upgrade.
The statement that you are showing does not use an index scan, so it should not be affected, but other statements may be.
Also, if you are using Docker, it may be that your container uses its own glibc that was not upgraded, and then you are not affected.
Related
I am seeing that dbeaver tabs sometimes block a postgres DB snapshot for 1 hour or more. The tab is set to 'manual commit - read committed' and 100% shows NO uncommited work. I have also seen this not coming from a tab but from what Dbeaver calls 'Main'.
The transaction in pg_stat_activity looks like below, you can see the session is idle in transaction. You can see that a backend_xmin is set and that pg_catalog.pg_proc was queried which was 100% not done by the user. From 'Main' I have seen it being idle in transaction on SHOW TRANSACTION ISOLATION LEVEL. When clicking rollback in the tab, the idle in transaction session immediately goes idle.
I do not want to set a server side idle_in_transaction_session_timeout timeout for this user. I have already set 'automatically end long idle transactions' in dbeaver.
How can I prevent dbeaver from holding transactions open so that backend_xmin or backend_xid get old and endanger autovacuum work?
Name |Value |
----------------+-----------------------------------------------------------------------------------------------------------------------------------------+
datid |16417 |
datname |<removed> |
pid |18974 |
leader_pid | |
usesysid |16394 |
usename |sys |
application_name|DBeaver - SQLEditor <Script-10.sql> |
client_addr |10.135.31.67 |
client_hostname | |
client_port |65397 |
backend_start |2023-02-08 12:12:45.098 +0000 |
xact_start |2023-02-08 12:16:23.121 +0000 |
query_start |2023-02-08 12:16:23.676 +0000 |
state_change |2023-02-08 12:16:23.677 +0000 |
wait_event_type |Client |
wait_event |ClientRead |
state |idle in transaction |
backend_xid | |
backend_xmin |222236770 |
query_id | |
query |SELECT pp.oid as poid, pp.* FROM pg_catalog.pg_proc pp WHERE pp.proname ILIKE $1 AND pp.pronamespace IN ($2) ORDER BY pp.proname LIMIT 10|
backend_type |client backend
I'm new to flyway & have been going through the documentation of flyway but couldn't find a doc which describes what each column in schema_version_history (or whatever you would have configured to name the flyway table) means. I'm specifically intrigued by the column named "type". So far the possible values for this column that I've observed in some legacy project at work are SQL & DELETE.
But I have no clue what this means in terms of flyway migrations.
Below are some sample rows from the table. Note that for installed rank 54 & 56, same migration file is present with same checksum but one has type SQL and another has DELETE.
-[ RECORD 53 ]-+---------------------------------------------------------------------------------------------------
installed_rank | 54
version | 2022.11.18.11.35.49.65
description | add column seqence in attribute table
type | SQL
script | V2022_11_18_11_35_49_65__add_column_seqence_in_attribute_table.sql
checksum | 408921517
installed_by | postgres
installed_on | 2022-11-18 12:04:47.652058
execution_time | 345
success | t
-[ RECORD 54 ]-+---------------------------------------------------------------------------------------------------
installed_rank | 55
version | 2022.11.15.14.17.44.36
description | update address column in attribute table
type | DELETE
script | V2022_11_15_14_17_44_36__update_address_column_in_attribute_table.sql
checksum | 1347853326
installed_by | postgres
installed_on | 2022-11-18 14:52:09.265902
execution_time | 0
success | t
-[ RECORD 55 ]-+---------------------------------------------------------------------------------------------------
installed_rank | 56
version | 2022.11.18.11.35.49.65
description | add column seqence in attribute table
type | DELETE
script | V2022_11_18_11_35_49_65__add_column_seqence_in_attribute_table.sql
checksum | 408921517
installed_by | postgres
installed_on | 2022-11-18 14:52:09.265902
execution_time | 0
success | t
-[ RECORD 56 ]-+---------------------------------------------------------------------------------------------------
installed_rank | 58
version | 2022.11.18.11.35.49.65
description | add column seqence in attribute table
type | SQL
script | V2022_11_18_11_35_49_65__add_column_seqence_in_attribute_table.sql
checksum | 408921517
installed_by | postgres
installed_on | 2022-12-09 14:01:59.352589
execution_time | 174
success | t
Great question. This is as close as I got to documentation on that table:
https://www.red-gate.com/hub/product-learning/flyway/exploring-the-flyway-schema-history-table
That article doesn't really describe the type column well at all, suggesting it only has two possible values and I've seen at least three; DELETE, SQL and JDBC. Not sure what else it may have.
EDIT: Also now confirmed these two values; BASELINE and UNDO_SQL
It's actually marked as intentionally not documented since it's not a part of the public API:
https://flywaydb.org/documentation/learnmore/faq#case-sensitive
I was trying to implement code through flyway:
create index concurrently if not exists api_client_system_role_idx2 on profile.api_client_system_role (api_client_id);
create index concurrently if not exists api_client_system_role_idx3 on profile.api_client_system_role (role_type_id);
create index concurrently if not exists api_key_idx2 on profile.api_key (api_client_id);
However flyway sessions were blocking each other and script is in "pending" state.
| Versioned | 20.1 | add email verification table | SQL | 2021-11-01 21:55:52 | Success |
| Versioned | 21.1 | create role for doc api | SQL | 2021-11-01 21:55:52 | Success |
| Versioned | 22 | create indexes for profile | SQL | 2022-10-21 10:23:41 | Success |
| Versioned | 23 | test flyway | SQL | | Pending |
+-----------+---------+----------------------------------------------+--------+---------------------+---------+
Flyway: Flyway Community Edition 9.3.1 by Redgate
Database: Postgresql 14.4
Can you please advice how to properly implement creating indexes concurrently in postgresql?
I've tried simply to kill blocking session and let the script to continue, however then implementation failed and scripts stayed in "Pending" status.
Apparently my PSQL 12 stopped working from one moment to the other.
Not accepting any insert statements anymore, while select statements for perfectly fine.
psql (12.11 (Ubuntu 12.11-0ubuntu0.20.04.1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
cfoo=> \dt
cfoo=> select * from django_session;
cfoo=> \d django_session;
Table "public.django_session"
Column | Type | Collation | Nullable | Default
--------------+--------------------------+-----------+----------+---------
session_key | character varying(40) | | not null |
session_data | text | | not null |
expire_date | timestamp with time zone | | not null |
Indexes:
"django_session_pkey" PRIMARY KEY, btree (session_key)
"django_session_expire_date_a5c62663" btree (expire_date)
"django_session_session_key_c0390e0f_like" btree (session_key varchar_pattern_ops)
cfoo=> select * from django_session;
cfoo=> insert into django_session (session_key, session_data, expire_date) values ('123', '123', '2021-12-18 18:01:44.119+01');
And then nothing. No error, no timeout. Just staying like this.
This is the master and the slave is also not replicating right now. Not sure this is related, as I uncommented the config for that in the pg_hbl.conf already and restarted.
Any clues?
Turned out, that the streaming replication was still active and had to be removed at several places including postgresql.conf.
Postgresql was waiting for the replication to happen and therefor locked the insert.
Steps :
We have created a Kafka topic called pgsqlcountry which has all the
streaming data from postgreSQL DB.
we created a stream called country for processing the topic into a
table.
stream was created successfully.
-
ksql> describe country;
Field | Type
------------------------------
ROWTIME | BIGINT
ROWKEY | VARCHAR(STRING)
ID | BIGINT
COUNTRY | VARCHAR(STRING)
CREATED_AT | BIGINT
UPDATED_AT | BIGINT
-
we run the SQL command "select * from country"
we get error as below
-
ksql> select * from country;
null | null | null | null | null | null
Exception in thread "ksql_query_1-8f1f36a7-e83c-476d-8561-98fe9ed8866b-StreamThread-2" java.lang.NullPointerException
Please find my stacktrace in this screenshot
When I had java.lang.NullPointerException, I had incompatible versions of the ksqldb images.
There appears to be the cp prefix images, e.g.
cp-ksqldb-server
cp-ksqldb-cli
Don't mix and match with these images (try to stick with one group or the other):
sqldb-server
ksqldb-cli
You can run some other commands (e.g. SHOW PROPERTIES;) and see if this also gives the same error of java.lang.NullPointerException. If it does, it probably is the image.