My postgres crashes for long query. It's on Debian 7 64bit, and postgresql-9.3.2. I uses all default configuration. Could anyone suggest what problem it could be? thanks.
--part1:
SELECT r1.f2 as b, r1.e as l
FROM r r8,r r7,r r6,r r5,r r4,r r3,r r2,r r1
WHERE
r1.f2=r2.f1 AND
r1.f2=r3.f1 AND
r1.f2=r4.f1 AND
r1.f1=r5.f2 AND
r1.f1=r8.f1 AND
r2.f1=r3.f1 AND
r2.f1=r4.f1 AND
r2.f2=r6.f2 AND
r2.f2=r7.f1 AND
r3.f1=r4.f1 AND
r3.f2=r7.f2 AND
r3.f2=r8.f2 AND
r4.f2=r5.f1 AND
r4.f2=r6.f1 AND
r5.f1=r6.f1 AND
r5.f2=r8.f1 AND
r6.f2=r7.f1 AND
r7.f2=r8.f2 AND
r1.d=1 AND
r2.d=1 AND
r3.d=2 AND
r4.d=2 AND
r5.d=2 AND
r6.d=2 AND
r7.d=2 AND
r8.d=2
-- part2
group by r1.f2,r1.e
having
calc_empty_a() AND
calc_empty_b();
In the query, calc_empty_a() are just empty boolean functions (return true), so they should have no problem.
If I run the query in client, the server crashes. There is nothing useful information in the log (please refer to the error info at end of the post).
If I run the part 1 query, the query works well.
If I first run the part 1 query, then I run the whole query, it works well.
If I change the query, reduce the r numbers, for example, there are only r1 to r6 FROM tables, delete the predicates with r8, r7, but keep the part 2's GROUP BY and HAVING clause. The query still works well.
If the query have one empty function in HVING clause, the query also works well, but will crash if there are two functions.
The following query works well
SELECT r1.f2 as b, r1.f1 as a , r1.e as e
FROM r r8,r r7,r r6,r r5,r r4,r r3,r r2,r r1
WHERE
r1.f2=r2.f1 AND
r1.f2=r3.f1 AND
r1.f2=r4.f1 AND
r1.f1=r5.f2 AND
r1.f1=r8.f1 AND
r2.f1=r3.f1 AND
r2.f1=r4.f1 AND
r2.f2=r6.f2 AND
r2.f2=r7.f1 AND
r3.f1=r4.f1 AND
r3.f2=r7.f2 AND
r3.f2=r8.f2 AND
r4.f2=r5.f1 AND
r4.f2=r6.f1 AND
r5.f1=r6.f1 AND
r5.f2=r8.f1 AND
r6.f2=r7.f1 AND
r7.f2=r8.f2
group by r1.f2,r1.f1, r1.e
having
calc_empty_a() AND
calc_empty_a();
I have set the os to use strict overcommit mode:
sysctl -w vm.overcommit_memory=2
Error info:
at client
The connection to the server was lost. Attempting reset: Succeeded.
at server
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2014-11-07 16:47:03 GMT
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/2126C98
LOG: record with zero length at 0/21A9D98
LOG: redo done at 0/21A9D68
LOG: last completed transaction was at log time 2014-11-07 16:47:26.844406+00
LOG: autovacuum launcher started
LOG: database system is ready to accept connections
Related
I have a simple process that is reading logical replication messages from postgres. This process runs every second and generates a lot of messages in the postgres logs like:
2021-02-15 20:35:11.032 UTC [35] STATEMENT: SELECT * FROM pg_logical_slot_get_changes('lazy_cloud', NULL, NULL);
2021-02-15 20:35:11.032 UTC [35] LOG: logical decoding found consistent point at 0/167C618
2021-02-15 20:35:11.032 UTC [35] DETAIL: There are no running transactions.
I've configured logging with the following settings:
log_min_messages=ERROR
log_statement=none
log_replication_commands=0
But, the logical replication logs are still produced.
Is there a setting to disable these messages? I can use sed or something like that, but would prefer a built in solution.
There is no way to disable that message short of setting
log_min_messages = fatal
in postgresql.conf, but that is not a smart idea, because then you'd miss out on all error messages in the log file and essentially disable logging.
I have a view in Postgres with the following definition:
CREATE VIEW participant_data_view AS
SELECT participant_profile.*,
"user".public_id, "user".created, "user".status, "user".region,"user".name, "user".email, "user".locale,
(SELECT COUNT(id) FROM message_log WHERE message_log.target_id = "user".id AND message_log.type = 'diary') AS diary_reminder_count,
(SELECT SUM(pills) FROM "order" WHERE "order".user_id = "user".id AND "order".status = 'complete') AS pills
FROM participant_profile
JOIN "user" ON "user".id = participant_profile.id
;
The view creation works just fine. However, when I query the view SELECT * FROM participant_data_view, postgres crashes with
10:24:46.345 WARN HikariPool-1 - Connection org.postgresql.jdbc.PgConnection#172d19fe marked as broken because of SQLSTATE(08006), ErrorCode(0) c.z.h.p.ProxyConnection
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
this question suggests to me that it might be an internal assertion that causes it to crash.
If I remove the diary_reminder_count field from the view definition, the select works just fine.
What am I doing wrong? How can I fix my view, or change it so I can query the same data in a different way?
Note that creating the view works just fine, it only crashes when querying it.
I tried running explain (analyze) select * from participant_data_view; from the IntelliJ query console, which only returns
[2020-12-08 11:13:56] [08006] An I/O error occurred while sending to the backend.
[2020-12-08 11:13:56] java.io.EOFException
I ran the same using psql, there it returns
my-database=# explain (analyze) select * from participant_data_view;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Looking at the log files, it contains:
2020-12-08 10:24:01.383 CET [111] LOG: server process (PID 89670) was terminated by signal 9: Killed: 9
2020-12-08 10:24:01.383 CET [111] DETAIL: Failed process was running: select "public"."participant_data_view"."id", "public"."participant_data_view"."study_number", <snip many other fields>,
"public"."participant_data_view"."diary_reminder_count", "public"."participant
2020-12-08 10:24:01.383 CET [111] LOG: terminating any other active server processes
In all likelihood, the Linux kernel out-of-memory killer killed your query because the system ran out of memory.
Either restrict the number of database sessions (for example with a connection pool) or reduce work_mem.
It is usually a good idea to set vm.overcommit_memory = 2 in the kernel and tune vm.overcommit_ratio appropriately.
I am following this link and try to simulate the deadlock issue:
http://www.dba-db2.com/2012/06/how-to-monitor-a-deadlock-in-db2.html
I can see my command run successful.
After that I go to simulate a deadlock error through DbVisualiser tool. However I didnt see any file being generated to the path.
Can someone point the mistake to me?
And also, I try to read back those old 0000000.evt file, it show me something as follow:
EVENT LOG HEADER
Event Monitor name: DB2DETAILDEADLOCK
Server Product ID: SQL10059
Version of event monitor data: 12
Byte order: BIG ENDIAN
Number of nodes in db2 instance: 1
Codepage of database: 1208
Territory code of database: 1
Server instance name: db2inst1
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Database Name: MYDB
Database Path: /db2home/db2inst1/NODE0000/SQL00003/MEMBER0000/
First connection timestamp: 01/29/2018 10:00:17.694784
Event Monitor Start time: 01/29/2018 10:00:18.951331
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Database Name: MYDB
Database Path: /db2home/db2inst1/NODE0000/SQL00003/MEMBER0000/
First connection timestamp: 01/29/2018 10:12:54.382936
Event Monitor Start time: 01/29/2018 10:12:54.697223
--------------------------------------------------------------------------
This means no deadlock?
Works correctly for me (linux, Db2 v11.1). Here are some command lines with annotations. You need to have suitable authorisation/privilege for each command. I was using the instance owner account.
Disable default db2detaildeadlock monitor first and then create your own:
$ db2 "set event monitor db2detaildeadlock state=0"
DB20000I The SQL command completed successfully.
$
$ db2 "create event monitor dlmon for deadlocks write to file '/tmp'"
DB20000I The SQL command completed successfully.
$
$ db2 "set event monitor dlmon state=1"
DB20000I The SQL command completed successfully.
$
Generate a deadlock, ensure you see this SQLCODE -911 with reason code 2.
If you dont' see the reason code 2 then you don't have any deadlock but you might have a timeout and timeouts don't get recorded in the deadlock monitor.
Here I show the victim of the deadlock getting notified of rollback and you can see the correct reason code:
$ db2 +c "select * from db2inst1.dlk where a=4 with rr"
SQL0911N The current transaction has been rolled back because of a deadlock
or timeout. Reason code "2". SQLSTATE=40001
Investigate the monitor output with db2evmon and view resulting file
$ db2evmon -db mydb -evm dlmon > /tmp/db2evmon.dlmon.1
Reading /tmp/00000000.evt ...
$ view /tmp/db2evmon.dlmon.1
...<snip>
...
3) Deadlock Event ...
Deadlock ID: 2
Number of applications deadlocked: 2
Deadlock detection time: 01/03/2018 09:06:39.019854
Rolled back Appl participant no: 2
Rolled back Appl Id: *LOCAL.db2inst1.180301090546
Rolled back Appl seq number: 00001
Rolled back Appl handle: 11872
...<snip>
According to the log the server was startet with alias "myrepos" as expected.
But if I try to connect to this alias, I get an error, also visible in the log (last line).
What could be the cause?
[Server#28fc19eb]: Initiating startup sequence...
[Server#28fc19eb]: Server socket opened successfully in 6 ms.
[Server#28fc19eb]: Database [index=0, id=0, db=file:/Users/t..../myrepos, alias=myrepos ] opened sucessfully in 1238 ms.
[Server#28fc19eb]: Startup sequence completed in 1247 ms.
[Server#28fc19eb]: 2016-04-08 10:32:33.871 HSQLDB server 2.3.3 is online on port 9001
[Server#28fc19eb]: To close normally, connect and execute SHUTDOWN SQL
[Server#28fc19eb]: From command line, use [Ctrl]+[C] to abort abruptly
[Server#28fc19eb]: [Thread[HSQLDB Connection #2304d78b,5,HSQLDB Connections #28fc19eb]]: database alias=myrepos does not exist
Solved. I had spaces behind the database name in my configuration, which had been incorporated to the alias: 'alias=myrepos ]'
So, the alias actually wasn't "myrepos" but "myrepos "
Im running a workflow in powercenter that is constatnly getting an SQL1224N error.
This process execute a query against one table (POLIZA) with 800k rows, it retrieves the first 10k rows and then it start to execute to another table with 75M rows, at ths moment in DB2 an idle thread error appear but the PWC process still running retrieving the 75M rows, when it is completed (after 20 minutes) the errros comes up related with the first table:
[IBM][CLI Driver] SQL1224N A database agent could not be started to service a request, or was terminated as a result of a database system shutdown or a force command. SQLSTATE=55032
sqlstate = 40003
[IBM][CLI Driver] SQL1224N A database agent could not be started to service a request, or was terminated as a result of a database system shutdown or a force command. SQLSTATE=55032
sqlstate = 40003
Database driver error...
Function Name : Fetch
SQL Stmt : SELECT POLIZA.BSPOL_BSCODCIA, POLIZA.BSPOL_BSRAMOCO
FROM POLIZA
WHERE
EXA01.POLIZA.BSPOL_IDEMPR='0015' for read only with ur
Native error code = -1224
DB2 Fatal Error].
I have a similar process runing against the same 2 tables and it is woking fine where the only difference I can see is that the DB2 user is different.
Any idea how can i fix this?
Regards
The common causes for -1224 are:
Your instance or database has crashed, or
Something/somebody is forcing off your application (FORCE APPLICATION or equivalent)
As for the crash, I think you would know by know. This typically requires a database or instance restart. At any rate, can you please have a look into your DIAGPATH to check for any FODC* directories whose timestamp would match the timestamp of the -1224 errors?
As for the FORCE case, you should find some evidence of the -1224 in db2diag.log. Try searching for the decimal -1224, but also for its hex representation (0xFFFFFB38).