My application is deployed on remote application server (Linux) and from there it tries to connect to DB server (PostgreSql 9.4) which is again present on another remote server (Linux). I send a long message to app server through JMS and this message processing takes many hours to get processed. But unfortunately I am getting facing some issues of performance with DB server. When I see postgresql.log file I can see the below errors/warning:
< 2017-05-05 09:18:00.676 CEST >LOG: could not receive data from client: Connection timed out
< 2017-05-05 13:38:33.704 CEST >LOG: incomplete startup packet
< 2017-05-05 13:42:29.158 CEST >LOG: unexpected EOF on client connection with an open transaction
< 2017-05-05 13:50:49.163 CEST >LOG: checkpoints are occurring too frequently (1 second apart)
< 2017-05-05 13:50:49.163 CEST >HINT: Consider increasing the configuration parameter "checkpoint_segments".
Do I need to update something in postgresql.conf file. Can somebody please advise what should I follow to avoid these errors?
Related
We are facing this issue with POSTGRES 12.11, only on windows OS, with Linux OS, it works fine,
if the executable that is using the database session crashes, then the entire database goes to recovery mode and restarts, in the Postgres log, we can find the below messages.
2022-10-06 17:44:09.210 CEST [8860] LOG: server process (PID 9980) exited with exit code 0
2022-10-06 17:44:09.210 CEST [8860] LOG: terminating any other active server processes
2022-10-06 17:44:09.211 CEST [9992] WARNING: terminating the connection because of the crash of another server process
2022-10-06 17:44:09.211 CEST [9992] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit because another server process exited abnormally and possibly corrupted shared memory.
2022-10-06 17:44:09.211 CEST [9992] HINT: In a moment you should be able to reconnect to the database and repeat your command.
and then it goes to recovery mode.
any idea or workaround to avoid this issue? and if the Postgres crashes, where does the crash dump gets generated in the windows?
I'm running an application with about 20 processes connected to a postgres DB (10.0) on windows server 2016.
Since about a month I have unexpected crashes of postgres.exe.
To isolate the problem I extended the logging by setting log_min_duration_statement = 0
This creates more detailed logfile. What I can see is:
LOG: server process (PID xxxxx) was terminated by exception
0xFFFFFFFF DETAIL: Failed process was running: COMMIT HINT: See C
include file "ntstatus.h" for a description of the hexadecimal value.
Then it tears down all 20 processes like this:
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.Then DB recovers:
HINT: In a moment you should be able to reconnect to the database and repeat your command.
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2021-06-11 18:17:18 CEST
DB enters recovery mode
FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
LOG: database system was not properly shut down; automatic recovery in progress
...
LOG: redo starts at 1B2/33319E58
FATAL: the database system is in recovery mode
LOG: invalid record length at 1B2/33D29930: wanted 24, got 0
LOG: redo done at 1B2/33D29908
LOG: last completed transaction was at log time 2021-06-11 18:21:39.830526+02
FATAL: the database system is in recovery mode
...
FATAL: the database system is in recovery mode
LOG: database system is ready to accept connections
Now it's running again like normal
The crashed PID xxxxx I can identify to a postgres.exe running for one of the 20 application processes. It's not always the same one. This happens about every 5-10 days.
Can anybody give me some advice how to track down the reason of this crash?
Extensions used:
oracle_fdw 2.0.0, PostgreSQL 10.0, Oracle client 11.2.0.3.0, Oracle server 11.2.0.2.0
Crashdump:
Followed the link :
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows
Although the postgres user has "full control" of the crashdump folder in the security tab it does not write something. Folder stays empty.
Follow-Up on the comment #Laurenz Albe:
The COMMIT is not the reason of the crash. It is the last successfull executed command of the session. Explained on the following example:
Process gets a job and starts to do it's job
2021-06-15 16:27:51.100 CEST [25604] LOG: duration: 0.061 ms statement: DISCARD ALL
2021-06-15 16:27:51.100 CEST [25604] LOG: duration: 0.012 ms statement: BEGIN
2021-06-15 16:27:51.100 CEST [25604] LOG: duration: 0.015 ms statement: SET TRANSACTION ISOLATION LEVEL READ COMMITTED
now a lot of action going on within session 25604
and among others the oracle foreign datawrapper
2021-06-15 16:28:13.792 CEST [25604] LOG: duration: 0.016 ms execute <unnamed>: FETCH ALL FROM "<unnamed portal 689>"
finishes action successfully (data of the transaction in the database)
2021-06-15 16:28:13.823 CEST [25604] LOG: duration: 0.059 ms statement: COMMIT
a lot of action is going in different sessions
among others the oracle foreign datawrapper
more the 7 minutes afterwards the next job is requested and now postgres.exe crash
2021-06-15 16:36:01.524 CEST [17904] LOG: server process (PID 25604) was terminated by exception 0xFFFFFFFF
The process does not do DISCARD ALL, BEGIN and SET TRANSACTION ISOLATION LEVEL READ COMMITTED
It crashes immediately
My Conclusion:
"the possibly corrupted shared memory" was initiated by one of the processes before. Meaning between the last successful COMMIT and the new request.
That's a 7 minutes time span where the problem occurs.
Some feedback on this conclusion?
I have taken dump for the db and it make it run short of memory for Postgresql.
I have then restarted postgresql but it failed to restart.And kept on giving me error
[FAIL] Starting PostgreSQL 9.4 database server: main[....] The PostgreSQL server failed to start. Please check the log output: ... failed!
failed!
and in log file there were following lines
2017-05-05 05:49:25 UTC LOG: could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device
2017-05-05 05:49:30 UTC LOG: using stale statistics instead of current ones because stats collector is not responding
2017-05-05 05:49:35 UTC LOG: using stale statistics instead of current ones because stats collector is not responding
2017-05-05 05:49:35 UTC LOG: could not close temporary statistics file "pg_stat_tmp/db_0.tmp": No space left on device
2017-05-05 05:49:35 UTC LOG: could not write temporary statistics file "pg_stat_tmp/db_85990.tmp": No space left on device
2017-05-05 05:49:35 UTC LOG: could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device
2017-05-05 05:49:40 UTC LOG: could not close temporary statistics file "pg_stat_tmp/db_0.tmp": No space left on device
2017-05-05 05:49:40 UTC LOG: could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device
2017-05-05 05:49:45 UTC LOG: using stale statistics instead of current ones because stats collector is not responding
Please Help me to solve this issue if some one can.
thanks.
PostgreSQL version 9.1.3. OS is Windows XP. Anti-virus is F-Secure. Six instances of postgres.exe are running.
Here's what's in the pg_log:
2012-04-08 14:58:23 PDT LOG: incomplete startup packet
2012-04-08 14:58:24 PDT LOG: database system is ready to accept connections
2012-04-08 14:58:24 PDT LOG: autovacuum launcher started
2012-04-08 14:58:25 PDT LOG: could not receive data from client: An operation was attempted on something that is not a socket.
2012-04-08 14:58:25 PDT LOG: incomplete startup packet
2012-04-08 14:58:27 PDT LOG: could not receive data from client: An operation was attempted on something that is not a socket.
I disabled F-Secure but it made no difference. Any idea why?
It is not unusual for antivirus products to cause problems even when stopped or disabled. They must sometimes be completely uninstalled to avoid having them get in the way of normal database operations. Another likely possibility is that there is a firewall which needs to be configured to allow the TCP server socket to be opened or the UDP socket used by the various PostgreSQL processes to communicate regarding statistics.
The database in use is Postgres database V8.Every one hour there is a server connection error.The server gets disconnected and needs to be re connected again.
Please find below the log of the error and let know on a solution to resolve this error
2012-01-05 13:28:52 CEST LOG: server process (PID 6128) was terminated by exception 0xC0000017
2012-01-05 13:28:52 CEST HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.
2012-01-05 13:28:52 CEST LOG: terminating any other active server processes
2012-01-05 13:28:52 CEST WARNING: terminating connection because of crash of another server process
2012-01-05 13:28:52 CEST DETAIL:The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2012-01-05 13:28:52 CEST HINT:In a moment you should be able to reconnect to the database and repeat your command.
2012-01-05 13:28:52 CEST WARNING:terminating connection because of crash of another server process
2012-01-05 13:28:52 CEST DETAIL:The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory
Thanks in Advance
Apparently that status is STATUS_NO_MEMORY, so look at your server memory setup (shared_buffers, work_mem et al) and monitor the memory usage on the machine around the time it crashes (if it is regular). Does it always coincide with some sort of scheduled task?