"FATAL: the database system is shutting down" while creating connection to PostgreSQL - postgresql

I am getting FATAL: the database system is shutting down errors while creating PostgreSQL JDBC connections to a PostgreSQL 9.2 server. The specific exception path I'm getting from JDBC is here:
Caused by: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:398)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:173)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:64)
at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:136)
at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:29)
at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:21)
at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:31)
at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:24)
at org.postgresql.Driver.makeConnection(Driver.java:393)
at org.postgresql.Driver.connect(Driver.java:267)
From various log files (from PostgreSQL, from our management layer, from the app using PostgreSQL), I do not see any database shutdown actually happening (other connections are created to PostgreSQL as usual, no shutdown was initiated from our management layer, etc.), however in the PostgreSQL server logs I do see the error message with the same timestamp:
2014-06-16 12:30:00.736 GMT LOG: connection received: host=127.0.0.1 port=38530
2014-06-16 12:30:00.737 GMT FATAL: the database system is shutting down
Researching online, I learned that this error message is used whenever PostgreSQL shuts down connections.
Why would PostgreSQL refuse to give me a new connection? Could this be caused by some sort of resource contention? How can I get more information about the error from PostgreSQL itself?

This issue turned out to be caused by a very bad misuse of PostgreSQL where our server had booted two PostgreSQL instances on the same data directory (it had deleted postmaster.pid and used a new port number, so the normal safeguards against this behavior were bypassed) which explains why the logs did not hold any useful information -- they were being overwritten by the PostgreSQL instance which wasn't refusing connections. The actual issue was caused by a complex interaction between the competing PostgreSQL instances, and I hope nobody else runs into this issue as well!

We encountered this problem when the server was restarted. In this case the reboot did not initiate the connection with PostgreSql. Just check the status: "/etc/init.d/postgresql status" if the return is: "no server running" just start: "/etc/init.d/postgresql start"

Maybe there is not enough space on hard drive.

SOLUTION:
brew remove postgresql
Temporary rename the lock files: mv /tmp/.s.PGSQL.5432.lock /tmp/BK.s.PGSQL.5432.lockand mv /tmp/.s.PGSQL.5432 /tmp/BK.s.PGSQL.5432
brew install postgresql
Enjoy!

Related

Troubleshoot org.postgresql.util.PSQLException: Connection attempt timed out that only occasionally happens

I have several applications running on tomcat with a local PostgresSQL database. And tomcat occasionally reports the following errors:
org.postgresql.util.PSQLException: Connection attempt timed out.
I am able to connect to the database using other tools such as DBeaver. And this problem looks only happens when several applications are connecting to the database. So I want to know how to troubleshoot this issue. Is there any log in PostgresSQL that I can check?
PostgreSQL does have logging. It is very configurable, and we can't tell you how you have it configured. Common locations are /var/log/postgresql/, and in PGDATA/log/. However, a connection timeout will probably not be in the postgresql log as it probably never achieved contact with the postgresql server to start with.

Postgres streaming replication - servers keep shutting down

I am new to PostgreSQL and I am trying to set up a streaming replication from our server to a test DB on my laptop. I have been following this tutorial https://www.percona.com/blog/2018/09/07/setting-up-streaming-replication-postgresql/ along with the Postgres documentation here https://www.postgresql.org/docs/11/runtime-config-replication.html.
I'm running Windows 10, PostgreSQL 11, PostGIS 2.5 extension.
The server and my local machine both keep shutting down and the logs are filled with postmaster.pid errors such as:
LOG: performing immediate shutdown because data directory lock file is invalid
LOG: received immediate shutdown request
LOG: could not open file "postmaster.pid": No such file or directory
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
Could anyone point me towards the issue here? I know my server's aren't configured properly but I just don't know what configurations need to be changed.
Here is an image of my standby server's most recent log.
standby log
Here is an image of my master server's most recent log.
master log
You must have messed up in many ways.
You removed or overwrote postmaster.pid on the master server.
That is very dangerous and causes the server to die with the error message you quote.
You didn't create recovery.conf before starting the standby server, or you removed backup_label. From the error messages I'd suspect the second, with ensuing data corruption.

Exception raised when try to run web application using postgres database

when I run start.bat file of tomcat on linux with postgres database. It gives following error after changed in postgres_hba.conf file.
C3P0PooledConnectionPoolManager[identityToken->2u13c99u1fd86im176rze7|779becbd]-HelperThread-#1--WARN -com.mchange.v2.resourcepool.BasicResourcePool:com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask#1de8b573 -- Acquisition Attempt Failed!!! Clearing pending acquires. While trying to acquire a needed new resource, we failed to succeed more than the maximum number of allowed acquisition attempts (30). Last acquisition attempt exception:
org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections
The error message is fairly clear. You have tried to open more database connections than you have configured in postgresql.conf. Even the shortest search of your logs and the manuals would have made this clear.

incorrect resource manager data checksum in record at 2/XYZ + terminating walreceiver process due to administrator command

I am running a streaming replication environment with PostgreSQL 9.1 (1 master, 3 slaves). Everything worked fine for aprox. 2 months. Yesterday, the replication to one of the slaves failed with the log on the slave having:
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
FATAL: terminating walreceiver process due to administrator command
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
The slave was no longer in sync with the master. Two hours later, in which the log gets a new line like above every 5 seconds, I restarted the slave database server:
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
FATAL: terminating connection due to administrator command
FATAL: terminating connection due to administrator command
LOG: shutting down
LOG: database system is shut down
The new log file on the slave contains:
LOG: database system was shut down in recovery at 2016-02-29 05:12:11 CET
LOG: entering standby mode
LOG: redo starts at 61/D92C10C9
LOG: consistent recovery state reached at 61/DA2710A7
LOG: database system is ready to accept read only connections
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: streaming replication successfully connected to primary
Now the slave is in sync with the master but the checksum entry is still there. One more thing I checked were the network logs -> the network was available.
My questions are:
Does anyone know why the walreceiver was terminated?
Why didn't PostgreSQL retry the replication?
What can I do to prevent this in the future?
Thank you.
EDIT:
The database servers are running on SLES 11 with ext3. I found an article about low performance of SLES 11 with large RAM but I am not sure if it applies since my machine has only 8 GB RAM (https://www.novell.com/support/kb/doc.php?id=7010287)
Any help would be appreciated.
EDIT (2):
PostgreSQL version is 9.1.5. Seem that PostgreSQL version 9.1.6 provides a fix for similar issue?
Fix persistence marking of shared buffers during WAL replay (Jeff Davis)
This mistake can result in buffers not being written out during checkpoints, resulting in data corruption if the server later crashes without ever having written those buffers. Corruption can occur on any server following crash recovery, but it is significantly more likely to occur on standby slave servers since those perform much more WAL replay.
Source: http://www.postgresql.org/docs/9.1/static/release-9-1-6.html
Might this be the fix? Should I upgrade to PostgreSQL 9.1.6 and everything would run smooth?
In case someone stumbles across this question, I ended up reinstalling the databases from backed-up data and set up replication again. Never really figured out what went wrong.
Never really figured out what went wrong.
I'm experiencing the same error - just that it never syncs in full from the very beginning.
Then, the primary server had got some kernel errors (heat problem in server case?). The server was required to be switched off because of incomplete shutdown. Already while shutting down, the slave showed up with
LOG: incorrect resource manager data checksum in record at 1/63663CB0
After restart of the primary server and restart of the slave server, the situation doesn't change: same log entries every 5 seconds.

Heroku PostgreSQL Connection reset by peer

We are using PostgreSQL Crane plan, and got a lot of log like this
app postgres - - [5-1] ... LOG: could not receive data from client: Connection reset by peer
We are using about 50 dynos.
Is PostgreSQL running out of connections with bunch of dynos?
Can someone help me explain this case?
Thanks
From what I've found the cause for the errors is the client not disconnecting at the end of the session, or a new connection not being created.
New connection solving the problem:
Postgres error on Heroku with Resque
Explicit disconnection solving the problem:
https://github.com/resque/resque/issues/367 (comment #2)
There's a Heroku FAQ entry on this: Understanding Heroku Postgres Log Statements and Common Errors: could not receive data from client: Connection reset by peer.
Although this log is emitted from postgres, the cause for the error has nothing to do with the database itself. Your application happened to crash while connected to postgres, and did not clean up its connection to the database. Postgres noticed that the client (your application) disappeared without ending the connection properly, and logged a message saying so.
If you are not seeing your application’s backtrace, you may need to ensure that you are, in fact, logging to stdout (instead of a file) and that you have stdout sync’d.