We are facing this issue with POSTGRES 12.11, only on windows OS, with Linux OS, it works fine,
if the executable that is using the database session crashes, then the entire database goes to recovery mode and restarts, in the Postgres log, we can find the below messages.
2022-10-06 17:44:09.210 CEST [8860] LOG: server process (PID 9980) exited with exit code 0
2022-10-06 17:44:09.210 CEST [8860] LOG: terminating any other active server processes
2022-10-06 17:44:09.211 CEST [9992] WARNING: terminating the connection because of the crash of another server process
2022-10-06 17:44:09.211 CEST [9992] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit because another server process exited abnormally and possibly corrupted shared memory.
2022-10-06 17:44:09.211 CEST [9992] HINT: In a moment you should be able to reconnect to the database and repeat your command.
and then it goes to recovery mode.
any idea or workaround to avoid this issue? and if the Postgres crashes, where does the crash dump gets generated in the windows?
Related
I'm running an application with about 20 processes connected to a postgres DB (10.0) on windows server 2016.
Since about a month I have unexpected crashes of postgres.exe.
To isolate the problem I extended the logging by setting log_min_duration_statement = 0
This creates more detailed logfile. What I can see is:
LOG: server process (PID xxxxx) was terminated by exception
0xFFFFFFFF DETAIL: Failed process was running: COMMIT HINT: See C
include file "ntstatus.h" for a description of the hexadecimal value.
Then it tears down all 20 processes like this:
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.Then DB recovers:
HINT: In a moment you should be able to reconnect to the database and repeat your command.
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2021-06-11 18:17:18 CEST
DB enters recovery mode
FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
LOG: database system was not properly shut down; automatic recovery in progress
...
LOG: redo starts at 1B2/33319E58
FATAL: the database system is in recovery mode
LOG: invalid record length at 1B2/33D29930: wanted 24, got 0
LOG: redo done at 1B2/33D29908
LOG: last completed transaction was at log time 2021-06-11 18:21:39.830526+02
FATAL: the database system is in recovery mode
...
FATAL: the database system is in recovery mode
LOG: database system is ready to accept connections
Now it's running again like normal
The crashed PID xxxxx I can identify to a postgres.exe running for one of the 20 application processes. It's not always the same one. This happens about every 5-10 days.
Can anybody give me some advice how to track down the reason of this crash?
Extensions used:
oracle_fdw 2.0.0, PostgreSQL 10.0, Oracle client 11.2.0.3.0, Oracle server 11.2.0.2.0
Crashdump:
Followed the link :
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows
Although the postgres user has "full control" of the crashdump folder in the security tab it does not write something. Folder stays empty.
Follow-Up on the comment #Laurenz Albe:
The COMMIT is not the reason of the crash. It is the last successfull executed command of the session. Explained on the following example:
Process gets a job and starts to do it's job
2021-06-15 16:27:51.100 CEST [25604] LOG: duration: 0.061 ms statement: DISCARD ALL
2021-06-15 16:27:51.100 CEST [25604] LOG: duration: 0.012 ms statement: BEGIN
2021-06-15 16:27:51.100 CEST [25604] LOG: duration: 0.015 ms statement: SET TRANSACTION ISOLATION LEVEL READ COMMITTED
now a lot of action going on within session 25604
and among others the oracle foreign datawrapper
2021-06-15 16:28:13.792 CEST [25604] LOG: duration: 0.016 ms execute <unnamed>: FETCH ALL FROM "<unnamed portal 689>"
finishes action successfully (data of the transaction in the database)
2021-06-15 16:28:13.823 CEST [25604] LOG: duration: 0.059 ms statement: COMMIT
a lot of action is going in different sessions
among others the oracle foreign datawrapper
more the 7 minutes afterwards the next job is requested and now postgres.exe crash
2021-06-15 16:36:01.524 CEST [17904] LOG: server process (PID 25604) was terminated by exception 0xFFFFFFFF
The process does not do DISCARD ALL, BEGIN and SET TRANSACTION ISOLATION LEVEL READ COMMITTED
It crashes immediately
My Conclusion:
"the possibly corrupted shared memory" was initiated by one of the processes before. Meaning between the last successful COMMIT and the new request.
That's a 7 minutes time span where the problem occurs.
Some feedback on this conclusion?
We have an application pipeline and Postgres-12(TimescaleDB, managed through Patroni) on a separate server (VM with Ubuntu 18.04 LTS).
We are facing an issue with the DB, it suddenly stuck in the recovery mode, and also we can’t access it from the psql client and select queries also hung.
After an hour or late all got back to normal (As my current pipeline terminated) and able to run queries against the DB server.
Master DB error details:
2020-11-03 18:35:08.612 IST [9773] [unknown]#[unknown] LOG: connection received: host=x.x.x.x port=58780
2020-11-03 18:35:08.612 IST [9773] FATAL: the database system is in recovery mode
2020-11-03 18:35:08.596 IST [18276] LOG: could not send data to client: Broken pipe
Replica server error details:
2020-11-03 18:34:55 IST [18316]: [85649-1] user=postgres,db=postgres,app=[unknown],client=x.x.x.x LOG: duration: 10.228 ms statement: SELECT * FROM pg_stat_bgwriter;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-11-03 18:35:08 IST [18322]: [2-1] user=,db=,app=,client= FATAL: could not receive data from WAL stream: SSL SYSCALL error: EOF detected
2020-11-03 18:35:08 IST [20500]: [1-1] user=,db=,app=,client= FATAL: could not connect to the primary server: FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
Pipeline error details:
Job aborted due to stage failure: Task 4 in stage 0.0 failed 3 times, most recent failure: Lost task 4.2 in stage 0.0 (TID 29, ip-x-x-x-x.ap-southeast-1.compute.internal, executor 19): org.postgresql.util.PSQLException: FATAL: the database system is in recovery mode at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:514) at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:141) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49) at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:195) at org.postgresql.Driver.makeConnection(Driver.java:454) at org.postgresql.Driver.connect(Driver.java:256) at org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
Please any advise on this issue?
What version of TimescaleDB are you running? In particular, there were some issues with 1.7.x if you try to query a read replica; we recommend upgrading to 1.7.4.
(Otherwise, there's not much information about to suggest what might have happened.)
https://github.com/timescale/timescaledb/releases/tag/1.7.4
We have a postgresql 9.6.14 postgres server, where we run a query which caused a termination of postgresql process and restart of the postgresql process.
We don't know why it happened.
The query runs fine when we query it with another filter value, so I guess it has to do with the amount of data it is querying. But can this really cause a restart of the whole postgres service? So maybe a memory problem?
postgresql.log
2019-07-12 17:54:13.487 CEST [6459]: [7-1] user=,db=,app=,client= LOG:
server process (PID 11064) was terminated by signal 11: Segmentation
fault 2019-07-12 17:54:13.487 CEST [6459]: [8-1]
user=,db=,app=,client= DETAIL: Failed process was running:
2019-07-12 17:54:13.487 CEST [6459]: [9-1] user=,db=,app=,client= LOG:
terminating any other active server processes 2019-07-12 17:54:13.488
CEST [11501]: [1-1] user=hg,db=test,app=[unknown],client=172.31.0.43
WARNING: terminating connection because of crash of another server
process 2019-07-12 17:54:13.488 CEST [11501]: [2-1]
user=hg,db=test,app=[unknown],client=172.31.0.43 DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory. 2019-07-12 17:54:13.488 CEST
[11501]: [3-1] user=hg,db=test,app=[unknown],client=172.31.0.43 HINT:
In a moment you should be able to reconnect to the database and repeat
your command. 2019-07-12 17:54:13.488 CEST [8889]: [2-1]
user=hg,db=_test,app=[unknown],client=172.31.0.46 WARNING:
terminating connection because of crash of another server process
select stat.*,
(
Select
1
From
table1 a, table2 pg
Where
a.field_1::Text = stat.field_1::Text And
a.field_2::Text = stat.field_2::Text And
stat.field_3::Text = pg.field_3::Text And
a.field_4= pg.field_4
limit 1
)
from table3 stat
where field_1= 'xyz';
Database is postgresql-9.5.1 in docker. My host machine has 3.75 GB memory, linux. In some methods I am inserting 490000 rows one after another using psycopg2 with below code.
student_list = [(name, surname, explanation)]
args_str = ','.join(cur.mogrify("(%s,%s,%s)", x) for x in student_list)
cur.execute('INSERT INTO students (name, surname, explanation) VALUES ' + args_str)
This makes my database docker memory seems full and gives these errors:
LOG: server process (PID 11219) was terminated by signal 9:
Killed DETAIL: Failed process was running LOG: terminating
any other active server processes docker#test_db WARNING:
terminating connection because of crash of another server process
docker#test_db DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared
memory. docker#test_db HINT: In a moment you should be able to
reconnect to the database and repeat your command. docker#test_db
WARNING: terminating connection because of crash of another server
process docker#test_db DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory. ... docker#test_db FATAL: the database system is in
recovery mode LOG: all server processes terminated;
reinitializing LOG: database system was interrupted; last known
up at 2017-06-06 09:39:40 UTC LOG: database system was not
properly shut down; automatic recovery in progress docker#test_db
FATAL: the database system is in recovery mode docker#test_db
FATAL: the database system is in recovery mode docker#test_db
FATAL: the database system is in recovery mode LOG: autovacuum
launcher started
Script gives that log:
Inner exception
SSL SYSCALL error: EOF detected
I tried put some sleep time between consecutive queries but got same result. Is there any limitation for that?
Also I tried to connect and disconnect for each query but got same result. These are my connect and disconnect methods.
def connect():
conn = psycopg2.connect(database=database_name,
user=database_user,
host=database_host,
port=database_port)
conn
.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
cur = conn.cursor()
return conn, cur
def disconnect(conn, cur):
cur.close()
conn.close()
Here is what I did. Actually my memory was full enough. That's why linux OS used to kill the process in Postgresql. There were 1M values in every insert process. The trick was I divided data lists to chunks and tried it 100k by 100k. That works very well. Thanks for your helps.
The database in use is Postgres database V8.Every one hour there is a server connection error.The server gets disconnected and needs to be re connected again.
Please find below the log of the error and let know on a solution to resolve this error
2012-01-05 13:28:52 CEST LOG: server process (PID 6128) was terminated by exception 0xC0000017
2012-01-05 13:28:52 CEST HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.
2012-01-05 13:28:52 CEST LOG: terminating any other active server processes
2012-01-05 13:28:52 CEST WARNING: terminating connection because of crash of another server process
2012-01-05 13:28:52 CEST DETAIL:The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2012-01-05 13:28:52 CEST HINT:In a moment you should be able to reconnect to the database and repeat your command.
2012-01-05 13:28:52 CEST WARNING:terminating connection because of crash of another server process
2012-01-05 13:28:52 CEST DETAIL:The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory
Thanks in Advance
Apparently that status is STATUS_NO_MEMORY, so look at your server memory setup (shared_buffers, work_mem et al) and monitor the memory usage on the machine around the time it crashes (if it is regular). Does it always coincide with some sort of scheduled task?