I'm using PGWatch2 monitoring for a master(write) and two replication(read) servers. This monitoring uses this function to get query information. There is no errors on master server but replications have this error:
pgwatch2#database ERROR: canceling statement due to statement timeout
pgwatch2#database STATEMENT:
with q_data as (
select
...
This query takes about 7s on master and
both replication servers and master has statement_timeout=0:
statement_timeout
-------------------
0
(1 row)
I'm using PostgreSQL 12.9 on UBUNTU 20.04.1
Related
I have a view in Postgres with the following definition:
CREATE VIEW participant_data_view AS
SELECT participant_profile.*,
"user".public_id, "user".created, "user".status, "user".region,"user".name, "user".email, "user".locale,
(SELECT COUNT(id) FROM message_log WHERE message_log.target_id = "user".id AND message_log.type = 'diary') AS diary_reminder_count,
(SELECT SUM(pills) FROM "order" WHERE "order".user_id = "user".id AND "order".status = 'complete') AS pills
FROM participant_profile
JOIN "user" ON "user".id = participant_profile.id
;
The view creation works just fine. However, when I query the view SELECT * FROM participant_data_view, postgres crashes with
10:24:46.345 WARN HikariPool-1 - Connection org.postgresql.jdbc.PgConnection#172d19fe marked as broken because of SQLSTATE(08006), ErrorCode(0) c.z.h.p.ProxyConnection
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
this question suggests to me that it might be an internal assertion that causes it to crash.
If I remove the diary_reminder_count field from the view definition, the select works just fine.
What am I doing wrong? How can I fix my view, or change it so I can query the same data in a different way?
Note that creating the view works just fine, it only crashes when querying it.
I tried running explain (analyze) select * from participant_data_view; from the IntelliJ query console, which only returns
[2020-12-08 11:13:56] [08006] An I/O error occurred while sending to the backend.
[2020-12-08 11:13:56] java.io.EOFException
I ran the same using psql, there it returns
my-database=# explain (analyze) select * from participant_data_view;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Looking at the log files, it contains:
2020-12-08 10:24:01.383 CET [111] LOG: server process (PID 89670) was terminated by signal 9: Killed: 9
2020-12-08 10:24:01.383 CET [111] DETAIL: Failed process was running: select "public"."participant_data_view"."id", "public"."participant_data_view"."study_number", <snip many other fields>,
"public"."participant_data_view"."diary_reminder_count", "public"."participant
2020-12-08 10:24:01.383 CET [111] LOG: terminating any other active server processes
In all likelihood, the Linux kernel out-of-memory killer killed your query because the system ran out of memory.
Either restrict the number of database sessions (for example with a connection pool) or reduce work_mem.
It is usually a good idea to set vm.overcommit_memory = 2 in the kernel and tune vm.overcommit_ratio appropriately.
I am following this link and try to simulate the deadlock issue:
http://www.dba-db2.com/2012/06/how-to-monitor-a-deadlock-in-db2.html
I can see my command run successful.
After that I go to simulate a deadlock error through DbVisualiser tool. However I didnt see any file being generated to the path.
Can someone point the mistake to me?
And also, I try to read back those old 0000000.evt file, it show me something as follow:
EVENT LOG HEADER
Event Monitor name: DB2DETAILDEADLOCK
Server Product ID: SQL10059
Version of event monitor data: 12
Byte order: BIG ENDIAN
Number of nodes in db2 instance: 1
Codepage of database: 1208
Territory code of database: 1
Server instance name: db2inst1
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Database Name: MYDB
Database Path: /db2home/db2inst1/NODE0000/SQL00003/MEMBER0000/
First connection timestamp: 01/29/2018 10:00:17.694784
Event Monitor Start time: 01/29/2018 10:00:18.951331
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Database Name: MYDB
Database Path: /db2home/db2inst1/NODE0000/SQL00003/MEMBER0000/
First connection timestamp: 01/29/2018 10:12:54.382936
Event Monitor Start time: 01/29/2018 10:12:54.697223
--------------------------------------------------------------------------
This means no deadlock?
Works correctly for me (linux, Db2 v11.1). Here are some command lines with annotations. You need to have suitable authorisation/privilege for each command. I was using the instance owner account.
Disable default db2detaildeadlock monitor first and then create your own:
$ db2 "set event monitor db2detaildeadlock state=0"
DB20000I The SQL command completed successfully.
$
$ db2 "create event monitor dlmon for deadlocks write to file '/tmp'"
DB20000I The SQL command completed successfully.
$
$ db2 "set event monitor dlmon state=1"
DB20000I The SQL command completed successfully.
$
Generate a deadlock, ensure you see this SQLCODE -911 with reason code 2.
If you dont' see the reason code 2 then you don't have any deadlock but you might have a timeout and timeouts don't get recorded in the deadlock monitor.
Here I show the victim of the deadlock getting notified of rollback and you can see the correct reason code:
$ db2 +c "select * from db2inst1.dlk where a=4 with rr"
SQL0911N The current transaction has been rolled back because of a deadlock
or timeout. Reason code "2". SQLSTATE=40001
Investigate the monitor output with db2evmon and view resulting file
$ db2evmon -db mydb -evm dlmon > /tmp/db2evmon.dlmon.1
Reading /tmp/00000000.evt ...
$ view /tmp/db2evmon.dlmon.1
...<snip>
...
3) Deadlock Event ...
Deadlock ID: 2
Number of applications deadlocked: 2
Deadlock detection time: 01/03/2018 09:06:39.019854
Rolled back Appl participant no: 2
Rolled back Appl Id: *LOCAL.db2inst1.180301090546
Rolled back Appl seq number: 00001
Rolled back Appl handle: 11872
...<snip>
Error:
postgres=# insert into company values(4,'tom',21,'pune' ,21 );
^CCancel request sent
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.
INSERT 0 1
Even after the error it is executing the query on the master as well as replicating transactions the slave.
On Master:
postgres=# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
0/1900D0C0
(1 row)
On Slave:
postgres=# SELECT pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
0/1900D0C0
(1 row)
synchronous_standby_name set on master from config file(set by me) is different from the application_name I see it on record of pg_stats_replication table. Many of the solutions have suggested to change the application name. However, I am not sure from where it is taking application name as walreceiver on the master.
On Master:
postgres=# select application_name, sync_state from pg_stat_replication;
application_name | sync_state
------------------+------------
walreceiver | async
(1 row)
postgres=# show synchronous_standby_names;
synchronous_standby_names
---------------------------
slave1
(1 row)
postgres=# show synchronous_commit;
synchronous_commit
--------------------
on
(1 row)
One of the solution I found is to create tablespace dir under path '/var/lib/pgsql/9.2/data/' which I currently dont have. I am not sure if that solution will work for 9.5 Postgresql.
Any help on this is appreciated. Thank you.
Changing synchronous_standby_names to 'walreceiver' resolved the error. Followed link To know more about synchronous_standby_names
Im running a workflow in powercenter that is constatnly getting an SQL1224N error.
This process execute a query against one table (POLIZA) with 800k rows, it retrieves the first 10k rows and then it start to execute to another table with 75M rows, at ths moment in DB2 an idle thread error appear but the PWC process still running retrieving the 75M rows, when it is completed (after 20 minutes) the errros comes up related with the first table:
[IBM][CLI Driver] SQL1224N A database agent could not be started to service a request, or was terminated as a result of a database system shutdown or a force command. SQLSTATE=55032
sqlstate = 40003
[IBM][CLI Driver] SQL1224N A database agent could not be started to service a request, or was terminated as a result of a database system shutdown or a force command. SQLSTATE=55032
sqlstate = 40003
Database driver error...
Function Name : Fetch
SQL Stmt : SELECT POLIZA.BSPOL_BSCODCIA, POLIZA.BSPOL_BSRAMOCO
FROM POLIZA
WHERE
EXA01.POLIZA.BSPOL_IDEMPR='0015' for read only with ur
Native error code = -1224
DB2 Fatal Error].
I have a similar process runing against the same 2 tables and it is woking fine where the only difference I can see is that the DB2 user is different.
Any idea how can i fix this?
Regards
The common causes for -1224 are:
Your instance or database has crashed, or
Something/somebody is forcing off your application (FORCE APPLICATION or equivalent)
As for the crash, I think you would know by know. This typically requires a database or instance restart. At any rate, can you please have a look into your DIAGPATH to check for any FODC* directories whose timestamp would match the timestamp of the -1224 errors?
As for the FORCE case, you should find some evidence of the -1224 in db2diag.log. Try searching for the decimal -1224, but also for its hex representation (0xFFFFFB38).
I am running a SQL query and trying to break the results down into chunks.
select task_id, owner_cnum
from (select row_number() over(order by owner_cnum, task_id)
as this_row, wdpt.vtasks.*
from wdpt.vtasks)
where this_row between 1 and 5;
That SQL works with DB2 10.5 on Windows and Linux, but fails on DB2 10.1 on z/OS with the following error messages:
When I run the SQL from IBM DataStudio 4.1.1 running on my Windows machine connected to the database, I am getting:
ILLEGAL SYMBOL "<EMPTY>". SOME SYMBOLS THAT MIGHT BE LEGAL ARE: CORRELATION NAME. SQLCODE=-104, SQLSTATE=42601, DRIVER=4.18.60
When I run my Java program on a zLinux system connecting to the database, I get the following error:
DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=<EMPTY>;CORRELATION NAME, DRIVER=3.65.97
Any ideas what I'm doing wrong?
In some DB2 versions you must use a correlation name for a subselect, as suggested by the error message:
select FOO from (
select FOO from BAR
) as T
Here "T" is the correlation name.