PostgreSQL hangs while executing DDL without log message - postgresql

I have Postgresql-9.2.10 on CentOS.
I experience the following error:
DETAIL: Multiple failures --- write error might be permanent.
ERROR: could not open file "pg_tblspc / 143862353 / PG_9.2_201204301 / 16439 / 199534370_fsm": No such file or directory
This happens since I stopped the PostgreSQL service, ran pg_resetxlog and started the service. The logs in pg_log look good, and the service is listed without any problem.
DML works well , but not a DDL statement like CREATE TABLE, otherwise an error message is thrown or nothing is visible in the logs in pg_log.
If I try to create a table, there is no reaction, and it looks like the statement is blocked by a lock.
So I tried the following query to look for locks:
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

You probably corrupted the PostgreSQL cluster with pg_resetxlog. How exactly did you run the command?
I would restore from the last good backup.

Related

Postgres, I am getting ERROR: unexpected chunk number 0 (expected 1) for toast value 12599063 in pg_toast_16687

I am new to Postgres and one of my reports that is using select and extracting JSON return the following error.
ERROR: unexpected chunk number 0 (expected 1) for toast value 12599063 in pg_toast_16687
SQL state: XX000
I do not know how to proceed in fixing my query. Any idea?
Run this command:
select reltoastrelid::regclass from pg_class where relname = 'table_name';
Where the table_name is where the error occur. Then check the result if it is the same toast# like pg_toast.pg_toast_XXXXX. Mine happen to be 16687
Then run these commands to reindex:
REINDEX table table_name;
REINDEX table pg_toast.pg_toast_16687;
VACUUM analyze table_name;
That is data corruption:
restore from backup
upgrade to the latest PostgreSQL minor release
check the hardware

PostGIS doesn't work after update in Manjaro

I recently ran a full update on my Manjaro-System, afterwards, when I tried to run a small script I use to start and automatically save my Postgres-DB I get the following error in the shell.
pg_dump: error: query failed: ERROR: could not load library
"/usr/lib/postgresql/postgis-3.so": /usr/lib/postgresql/postgis-3.so:
undefined symbol: list_make1_impl
pg_dump: error: query was:
SELECT
a.attnum,
a.attname,
a.atttypmod,
a.attstattarget,
a.attstorage,
t.typstorage,
a.attnotnull,
a.atthasdef,
a.attisdropped,
a.attlen,
a.attalign,
a.attislocal,
pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname,
a.attgenerated,
CASE WHEN a.atthasmissing AND NOT a.attisdropped THEN a.attmissingval ELSE null END AS attmissingval,
a.attidentity,
pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(option_name) || ' ' || pg_catalog.quote_literal(option_value) FROM pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name), E',
') AS attfdwoptions,
CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation ELSE 0 END AS attcollation,
array_to_string(a.attoptions, ', ') AS attoptions
FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_type t ON a.atttypid = t.oid
WHERE a.attrelid = '18597'::pg_catalog.oid AND a.attnum > 0::pg_catalog.int2
ORDER BY a.attnum
Also, when I start the database in pgAdmin I can open all the tables except the one which holds the geography-column, which needs PostGIS. It shows me the following error instead:
ERROR: could not load library /usr/lib/postgresql/postgis-3.so: /usr/lib/postgresql/postgis-3.so: undefined symbol: list_make1_impl SQL state: 58P01
Apparently something in the proj-package is messed up. According to this thread, it could have something to do with this package being installed several times. However, I reinstalled proj manually, from the Official Repo as well as from the AUR each time with a different version and cleaned the old version every time. The error is still there.
Currently the version setup is:
QGIS: Version 3.16.5
Postgres: Version 12.6-1
PostGIS: Version 3.0.3-1
proj: Version 6.3.2-1
Manjaro: KDE-Plasma 5.21.3
Kernel: 4.19.183-1-Manjaro
Does anyone have a solution for this?
Thanks to the local Linux User Group I found a solution to the problem. It turned out, that PostGIS was updated in the background and the new version did not correspond with PostgreSQL 12.6 any more. Since PostGIS was already installed, it did not show me this but the error message above instead.
To check if PostGIS is still compatible with PostgreSQL, I create a new database and then tried to add the PostGIS extension. I got the following message, that cleared things up:
test=# CREATE EXTENSION postgis;
ERROR: could not open extension control file "/usr/share/postgresql/extensionpost.control": No such file or directory
test=# CREATE EXTENSION postgis;
ERROR: PostGIS built for PostgreSQL 13.0 cannot be loaded in PostgreSQL 12.6
So I updated PostgreSQL, set up a new database cluster and used pg_restore on the last .sql from my automatic pg_dump to recreate the database. It now works as before again.

"500 Failed to retrieve records" error using PostgreSQL service with DreamFactory

I've configured the service, app, and roles for my endpoint. I can successfully run API calls from the DreamFactory API Docs on /schema, /function, /table - basically all the single nodes that don't have a table name.
When I do add a table name (e.g., /table/myTableName) I receive a 500 error:
{
"error": {
"code": 500,
"context": null,
"message": "Failed to retrieve records from 'getresidents'.\nSQLSTATE[42703]: Undefined column: 7 ERROR: column d.adsrc does not exist\nLINE 1: ...ER(format_type(a.atttypid, a.atttypmod)) AS type, d.adsrc, a...\n ^ (SQL: SELECT a.attname, LOWER(format_type(a.atttypid, a.atttypmod)) AS type, d.adsrc, a.attnotnull, a.atthasdef,\n\tpg_catalog.col_description(a.attrelid, a.attnum) AS comment\nFROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum\nWHERE a.attnum > 0 AND NOT a.attisdropped\n\tAND a.attrelid = (SELECT oid FROM pg_catalog.pg_class WHERE relname=:table\n\t\tAND relnamespace = (SELECT oid FROM pg_catalog.pg_namespace WHERE nspname = :schema))\nORDER BY a.attnum)",
"status_code": 500
}
}
This even occurs when running from the API Docs inside of the DreamFactory app.
I am running this as an admin user and still getting this error.
Are there any specific permissions I need to have on the user? I'm even trying with the default postgres user and am getting this error.
I could really use help from anyone who can offer assistance. Thanks.
Found a solution to this.
I was running PostgreSQL 13. So I decided to create a PostgreSQL version 12 install. The connection worked. Not sure why, but it looks like things work without a hitch in 12 but not with 13. Would be interesting to hear from the developers any input on this.
We've looked into this issue and indeed PostgreSQL 13 is not supported in the current DreamFactory version due to a simple coding oversight pertaining to version number detection. We've identified the issue and will be releasing a new version of DreamFactory later this week with the fix in place!
Thank you,
Jason

Kill Db2 select statement after the declared time

In Db2 LUW 9.7, how to implement a query timeout?
When performing queries, is there is a 'timeout' parameter that I can declare/implement somehow that will make a given query abort after a certain time?
So far, I can only consider the potentially unsafe practice of killing the process that performs the select query in Unix.
This question is not about programming, it is about administration.
A query will never "kill itself". But there are tools available which (after suitable configuration) will limit the resources that a query can consume at run time, and will cause the query to return an error (if desired) if such resources are exceeded.
You should never kill a process that is owned by a Db2-instance, doing that carries risk that you trash your database.
To control the resources (time, memory, CPU, disk, etc...) that a query can use, it is possible on V9.7 to use Workload-Manager. This requires configuration and experience to properly use it.
IBM also offered a Query-Patroller tool before IBM introduced the Workload Manager (WLM). This was a different solution than WLM. I think Query Patroller was also available in V9.7. It has since been replaced by WLM.
In principle there are two ways to achieve what you need, depending on whether you have the control over the application or the database.
From application end one can use SQL_ATTR_QUERY_TIMEOUT keyword with CLI/ODBC or setQueryTimeout with JDBC (there should be an equivalent for other interfaces). I will attach CLI example at the end.
If you have only control over database then one could consider creating a threshold with ACTIVITYTOTALTIME that would automatically interrupt long running queries.
If you are concerned with Db2 CLP client, then interrupting execution with ctrl+c will propagate interrupt to the database, i.e. you should no longer see statement as executing inside the database. If you however break it more "forcefully" e.g. run kill against db2bp process associated with the shell, statement will continue executing for some time, but it will be eventually interrupted (one will see AgentBreathingPoint Warning in db2diag.log, when Db2 engine periodically checks whether client is still around and it happens to be gone).
Db2 CLI example:
I will use
db2cli executable - CLI to run Db2 CLI commands
"bad query" that is expected to run for a long time:
select t1.*, t2.* from sysibm.syscolumns t1, sysibm.syscolumns t2, sysibm.syscolumns t3, sysibm.syscolumns t4, sysibm.syscolumns t5 order by t5.name desc, t4.name asc fetch first 10 rows only
set the timeout to 10 seconds
code:
SQLAllocEnv 1
SQLAllocConnect 1 1
SQLConnect 1 SAMPLE sql_nts db2v115 sql_nts **** sql_nts
SQLAllocStmt 1 1
SQLSetStmtAttr 1 SQL_ATTR_QUERY_TIMEOUT 10
SQLPrepare 1 "select t1.*, t2.* from sysibm.syscolumns t1, sysibm.syscolumns t2, sysibm.syscolumns t3, sysibm.syscolumns t4, sysibm.syscolumns t5 order by t5.name desc, t4.name asc fetch first 10 rows only" sql_nts
SQLExecute 1
SQLFetch 1
SQLError 1 1 1
it is SQLFetch 1 that will fail, full output including SQLCODEs below:
$ db2cli
IBM DATABASE 2 Interactive CLI Sample Program
(C) COPYRIGHT International Business Machines Corp. 1993,1996
All Rights Reserved
Licensed Materials - Property of IBM
US Government Users Restricted Rights - Use, duplication or
disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
> SQLAllocEnv 1
SQLAllocConnect 1 1
SQLConnect 1 SAMPLE sql_nts db2v115 sql_nts *** sql_nts
SQLAllocStmt 1 1
SQLSetStmtAttr 1 SQL_ATTR_QUERY_TIMEOUT 10
SQLAllocEnv: rc = 0 (SQL_SUCCESS)
CLI henv = 1, Test Driver henv = 1
> SQLAllocConnect: rc = 0 (SQL_SUCCESS)
CLI hdbc = 1, Test Driver hdbc = 1
> SQLConnect: rc = 0 (SQL_SUCCESS)
> SQLAllocStmt: rc = 0 (SQL_SUCCESS)
CLI hstmt = 1, Test Driver hstmt = 1
> SQLSetStmtAttr: rc = 0 (SQL_SUCCESS)
> SQLPrepare 1 "select t1.*, t2.* from sysibm.syscolumns t1, sysibm.syscolumns t2, sysibm.syscolumns t3, sysibm.syscolumns t4, sysibm.syscolumns t5 order by t5.name desc, t4.name asc fetch first 10 rows only" sql_nts
SQLPrepare: rc = 0 (SQL_SUCCESS)
> SQLExecute 1
SQLExecute: rc = 0 (SQL_SUCCESS)
> SQLFetch 1
SQLFetch: rc = -1 (SQL_ERROR)
> SQLError 1 1 1
SQLError: rc = 0 (SQL_SUCCESS)
SQLError: SQLState : S1008
fNativeError : -952
szErrorMsg : [IBM][CLI Driver][DB2/6000] SQL0952N Processing was cancelled due to an interrupt. SQLSTATE=57014
cbErrorMsg : 100
ODBC/CLI: QueryTimeout
JDBC: The commandTimeout property.
Common IBM Data Server Driver for JDBC and SQLJ properties for all supported database products
if I kill the process that performs the select query in Unix then the query should also be aborted
Yes.
create a threshold on ACTIVITYTOTALTIME or UOWTOTALTIME and set action on stop execution
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_9.7.0/com.ibm.db2.luw.sql.ref.doc/doc/r0050563.html

How do I use ROW_NUMBER in DB2 10 on z/OS?

I am running a SQL query and trying to break the results down into chunks.
select task_id, owner_cnum
from (select row_number() over(order by owner_cnum, task_id)
as this_row, wdpt.vtasks.*
from wdpt.vtasks)
where this_row between 1 and 5;
That SQL works with DB2 10.5 on Windows and Linux, but fails on DB2 10.1 on z/OS with the following error messages:
When I run the SQL from IBM DataStudio 4.1.1 running on my Windows machine connected to the database, I am getting:
ILLEGAL SYMBOL "<EMPTY>". SOME SYMBOLS THAT MIGHT BE LEGAL ARE: CORRELATION NAME. SQLCODE=-104, SQLSTATE=42601, DRIVER=4.18.60
When I run my Java program on a zLinux system connecting to the database, I get the following error:
DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=<EMPTY>;CORRELATION NAME, DRIVER=3.65.97
Any ideas what I'm doing wrong?
In some DB2 versions you must use a correlation name for a subselect, as suggested by the error message:
select FOO from (
select FOO from BAR
) as T
Here "T" is the correlation name.