Failed when hot standby parameter is changed - postgresql

I have configured primary / standby in version 10. If I change the hot_standby parameter to off on the standby then I can no longer connect and there is this error in log: FATAL log: the database system is starting up.
I don't understand this error

"Hot standby" is the database feature that allows you to connect to a recovering server (like a standby). If you disable that feature, it is no surprise if you can no longer connect to the standby.
The error message is the normal message you get from a recovering server that does not allow connections.
The FATAL message is not as bad as it seems. "Fatal" indicates an error message that terminates a database session. It is the message you get if you try to connect to a database that is not yet ready for connections.

Related

Postgres streaming replication - servers keep shutting down

I am new to PostgreSQL and I am trying to set up a streaming replication from our server to a test DB on my laptop. I have been following this tutorial https://www.percona.com/blog/2018/09/07/setting-up-streaming-replication-postgresql/ along with the Postgres documentation here https://www.postgresql.org/docs/11/runtime-config-replication.html.
I'm running Windows 10, PostgreSQL 11, PostGIS 2.5 extension.
The server and my local machine both keep shutting down and the logs are filled with postmaster.pid errors such as:
LOG: performing immediate shutdown because data directory lock file is invalid
LOG: received immediate shutdown request
LOG: could not open file "postmaster.pid": No such file or directory
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
Could anyone point me towards the issue here? I know my server's aren't configured properly but I just don't know what configurations need to be changed.
Here is an image of my standby server's most recent log.
standby log
Here is an image of my master server's most recent log.
master log
You must have messed up in many ways.
You removed or overwrote postmaster.pid on the master server.
That is very dangerous and causes the server to die with the error message you quote.
You didn't create recovery.conf before starting the standby server, or you removed backup_label. From the error messages I'd suspect the second, with ensuing data corruption.

How to disable subscription automatically on connexion lost (PostgreSQL Logical Replication)?

I am currently working on logical replication (master / slave) with PostgreSQL. It is working very well, but I would like to anticipate the scenario where the connexion is broken unexpectedly (master offline, or socket broken...).
I have found a way to mimic this scenario, and the following logs are produced :
2019-10-01 15:05:31.205 GMT [165] ERROR: could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2019-10-01 15:05:31.206 GMT [121] LOG: background worker "logical replication worker" (PID 165) exited with exit code 1
2019-10-01 15:05:31.207 GMT [170] LOG: logical replication apply worker for subscription "sub_2" has started
2019-10-01 15:05:31.207 GMT [170] ERROR: could not connect to the publisher: could not connect to server: Connection refused
Is the server running on host "100.x.x.x" and accepting
TCP/IP connections on port 5432?
Now, what I would like to do is find a way to disable the subscription automatically when it happens (ie. execute the following command), then the ERROR will not appear in the logs because the worker will not exist and will not try to connect to an offline master.
ALTER SUBSCRIPTION sub2 DISABLE
To do that, I see some options but I don't know how to apply them :
Catch the ERROR (either in WAL or "logical replication worker")
Find a way to execute some code at "logical replication worker" startup
Use a CRON (at application level, so not native to PSQL) to check if the server is offline (pulsation). If so, then execute the above command manually. Note that this solution may work but I would like to avoid it because there will still be some errors raised if the connexion is lost between two "ticks" of the CRON
Thank you very much for your help, cheers

Terminating connection because of crash of another server process -Postgres

Everytime i run the same query i'm getting this error:
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
I'm using phpPgAdmin 5.1.
A concurrent query in a different database session has crashed its server backend process. As a consequence, the whole database stops and performs crash recovery from the latest checkpoint.
You should look into the database server log to see what the problem is.

incorrect resource manager data checksum in record at 2/XYZ + terminating walreceiver process due to administrator command

I am running a streaming replication environment with PostgreSQL 9.1 (1 master, 3 slaves). Everything worked fine for aprox. 2 months. Yesterday, the replication to one of the slaves failed with the log on the slave having:
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
FATAL: terminating walreceiver process due to administrator command
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
The slave was no longer in sync with the master. Two hours later, in which the log gets a new line like above every 5 seconds, I restarted the slave database server:
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
FATAL: terminating connection due to administrator command
FATAL: terminating connection due to administrator command
LOG: shutting down
LOG: database system is shut down
The new log file on the slave contains:
LOG: database system was shut down in recovery at 2016-02-29 05:12:11 CET
LOG: entering standby mode
LOG: redo starts at 61/D92C10C9
LOG: consistent recovery state reached at 61/DA2710A7
LOG: database system is ready to accept read only connections
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: streaming replication successfully connected to primary
Now the slave is in sync with the master but the checksum entry is still there. One more thing I checked were the network logs -> the network was available.
My questions are:
Does anyone know why the walreceiver was terminated?
Why didn't PostgreSQL retry the replication?
What can I do to prevent this in the future?
Thank you.
EDIT:
The database servers are running on SLES 11 with ext3. I found an article about low performance of SLES 11 with large RAM but I am not sure if it applies since my machine has only 8 GB RAM (https://www.novell.com/support/kb/doc.php?id=7010287)
Any help would be appreciated.
EDIT (2):
PostgreSQL version is 9.1.5. Seem that PostgreSQL version 9.1.6 provides a fix for similar issue?
Fix persistence marking of shared buffers during WAL replay (Jeff Davis)
This mistake can result in buffers not being written out during checkpoints, resulting in data corruption if the server later crashes without ever having written those buffers. Corruption can occur on any server following crash recovery, but it is significantly more likely to occur on standby slave servers since those perform much more WAL replay.
Source: http://www.postgresql.org/docs/9.1/static/release-9-1-6.html
Might this be the fix? Should I upgrade to PostgreSQL 9.1.6 and everything would run smooth?
In case someone stumbles across this question, I ended up reinstalling the databases from backed-up data and set up replication again. Never really figured out what went wrong.
Never really figured out what went wrong.
I'm experiencing the same error - just that it never syncs in full from the very beginning.
Then, the primary server had got some kernel errors (heat problem in server case?). The server was required to be switched off because of incomplete shutdown. Already while shutting down, the slave showed up with
LOG: incorrect resource manager data checksum in record at 1/63663CB0
After restart of the primary server and restart of the slave server, the situation doesn't change: same log entries every 5 seconds.

Heroku PostgreSQL Connection reset by peer

We are using PostgreSQL Crane plan, and got a lot of log like this
app postgres - - [5-1] ... LOG: could not receive data from client: Connection reset by peer
We are using about 50 dynos.
Is PostgreSQL running out of connections with bunch of dynos?
Can someone help me explain this case?
Thanks
From what I've found the cause for the errors is the client not disconnecting at the end of the session, or a new connection not being created.
New connection solving the problem:
Postgres error on Heroku with Resque
Explicit disconnection solving the problem:
https://github.com/resque/resque/issues/367 (comment #2)
There's a Heroku FAQ entry on this: Understanding Heroku Postgres Log Statements and Common Errors: could not receive data from client: Connection reset by peer.
Although this log is emitted from postgres, the cause for the error has nothing to do with the database itself. Your application happened to crash while connected to postgres, and did not clean up its connection to the database. Postgres noticed that the client (your application) disappeared without ending the connection properly, and logged a message saying so.
If you are not seeing your application’s backtrace, you may need to ensure that you are, in fact, logging to stdout (instead of a file) and that you have stdout sync’d.