How to disable subscription automatically on connexion lost (PostgreSQL Logical Replication)? - postgresql

I am currently working on logical replication (master / slave) with PostgreSQL. It is working very well, but I would like to anticipate the scenario where the connexion is broken unexpectedly (master offline, or socket broken...).
I have found a way to mimic this scenario, and the following logs are produced :
2019-10-01 15:05:31.205 GMT [165] ERROR: could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2019-10-01 15:05:31.206 GMT [121] LOG: background worker "logical replication worker" (PID 165) exited with exit code 1
2019-10-01 15:05:31.207 GMT [170] LOG: logical replication apply worker for subscription "sub_2" has started
2019-10-01 15:05:31.207 GMT [170] ERROR: could not connect to the publisher: could not connect to server: Connection refused
Is the server running on host "100.x.x.x" and accepting
TCP/IP connections on port 5432?
Now, what I would like to do is find a way to disable the subscription automatically when it happens (ie. execute the following command), then the ERROR will not appear in the logs because the worker will not exist and will not try to connect to an offline master.
ALTER SUBSCRIPTION sub2 DISABLE
To do that, I see some options but I don't know how to apply them :
Catch the ERROR (either in WAL or "logical replication worker")
Find a way to execute some code at "logical replication worker" startup
Use a CRON (at application level, so not native to PSQL) to check if the server is offline (pulsation). If so, then execute the above command manually. Note that this solution may work but I would like to avoid it because there will still be some errors raised if the connexion is lost between two "ticks" of the CRON
Thank you very much for your help, cheers

Related

Failed when hot standby parameter is changed

I have configured primary / standby in version 10. If I change the hot_standby parameter to off on the standby then I can no longer connect and there is this error in log: FATAL log: the database system is starting up.
I don't understand this error
"Hot standby" is the database feature that allows you to connect to a recovering server (like a standby). If you disable that feature, it is no surprise if you can no longer connect to the standby.
The error message is the normal message you get from a recovering server that does not allow connections.
The FATAL message is not as bad as it seems. "Fatal" indicates an error message that terminates a database session. It is the message you get if you try to connect to a database that is not yet ready for connections.

Postgres streaming replication - servers keep shutting down

I am new to PostgreSQL and I am trying to set up a streaming replication from our server to a test DB on my laptop. I have been following this tutorial https://www.percona.com/blog/2018/09/07/setting-up-streaming-replication-postgresql/ along with the Postgres documentation here https://www.postgresql.org/docs/11/runtime-config-replication.html.
I'm running Windows 10, PostgreSQL 11, PostGIS 2.5 extension.
The server and my local machine both keep shutting down and the logs are filled with postmaster.pid errors such as:
LOG: performing immediate shutdown because data directory lock file is invalid
LOG: received immediate shutdown request
LOG: could not open file "postmaster.pid": No such file or directory
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
Could anyone point me towards the issue here? I know my server's aren't configured properly but I just don't know what configurations need to be changed.
Here is an image of my standby server's most recent log.
standby log
Here is an image of my master server's most recent log.
master log
You must have messed up in many ways.
You removed or overwrote postmaster.pid on the master server.
That is very dangerous and causes the server to die with the error message you quote.
You didn't create recovery.conf before starting the standby server, or you removed backup_label. From the error messages I'd suspect the second, with ensuing data corruption.

incorrect resource manager data checksum in record at 2/XYZ + terminating walreceiver process due to administrator command

I am running a streaming replication environment with PostgreSQL 9.1 (1 master, 3 slaves). Everything worked fine for aprox. 2 months. Yesterday, the replication to one of the slaves failed with the log on the slave having:
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
FATAL: terminating walreceiver process due to administrator command
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
The slave was no longer in sync with the master. Two hours later, in which the log gets a new line like above every 5 seconds, I restarted the slave database server:
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
FATAL: terminating connection due to administrator command
FATAL: terminating connection due to administrator command
LOG: shutting down
LOG: database system is shut down
The new log file on the slave contains:
LOG: database system was shut down in recovery at 2016-02-29 05:12:11 CET
LOG: entering standby mode
LOG: redo starts at 61/D92C10C9
LOG: consistent recovery state reached at 61/DA2710A7
LOG: database system is ready to accept read only connections
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: streaming replication successfully connected to primary
Now the slave is in sync with the master but the checksum entry is still there. One more thing I checked were the network logs -> the network was available.
My questions are:
Does anyone know why the walreceiver was terminated?
Why didn't PostgreSQL retry the replication?
What can I do to prevent this in the future?
Thank you.
EDIT:
The database servers are running on SLES 11 with ext3. I found an article about low performance of SLES 11 with large RAM but I am not sure if it applies since my machine has only 8 GB RAM (https://www.novell.com/support/kb/doc.php?id=7010287)
Any help would be appreciated.
EDIT (2):
PostgreSQL version is 9.1.5. Seem that PostgreSQL version 9.1.6 provides a fix for similar issue?
Fix persistence marking of shared buffers during WAL replay (Jeff Davis)
This mistake can result in buffers not being written out during checkpoints, resulting in data corruption if the server later crashes without ever having written those buffers. Corruption can occur on any server following crash recovery, but it is significantly more likely to occur on standby slave servers since those perform much more WAL replay.
Source: http://www.postgresql.org/docs/9.1/static/release-9-1-6.html
Might this be the fix? Should I upgrade to PostgreSQL 9.1.6 and everything would run smooth?
In case someone stumbles across this question, I ended up reinstalling the databases from backed-up data and set up replication again. Never really figured out what went wrong.
Never really figured out what went wrong.
I'm experiencing the same error - just that it never syncs in full from the very beginning.
Then, the primary server had got some kernel errors (heat problem in server case?). The server was required to be switched off because of incomplete shutdown. Already while shutting down, the slave showed up with
LOG: incorrect resource manager data checksum in record at 1/63663CB0
After restart of the primary server and restart of the slave server, the situation doesn't change: same log entries every 5 seconds.

postgreSQL 8.4. Error connecting to the server: FATAL: the database system is in recovery mode

I really need someone's help with this error in postgreSQL.
I have postgres installed on Windows Server 2008 R2. I'm trying to connect to it using pg-admin, some custom C# and another third-party tool that works on postgres.
Today, I've noticed that I can't connect to postgres. It gives me this error:
I'm no expert at postgres and this is a serious problem that I could not fix in a few hours. I've tried rebooting the server, restarting postgresql-8.4 service. the result is the same.
Update:
I've connected to the server with the problem via RDP.
The logs right before and after the problem contain the following information:
2014-01-29 18:47:46 MSK STATEMENT: INSERT INTO "TapeSegments"
(umid ,clip_index, markin,
markout_duration, clip_name,
state, clip_filename)
VALUES (:umid, :clip_index, :markin, :markout_duration, :clip_name, :state, :clip_filename) 2014-01-29
18:51:51 MSK LOG: server process (PID 7844) was terminated by
exception 0xC000012D 2014-01-29 18:51:51 MSK HINT: See C include file
"ntstatus.h" for a description of the hexadecimal value. 2014-01-29
18:51:51 MSK LOG: terminating any other active server processes
2014-01-29 18:51:51 MSK WARNING: terminating connection because of
crash of another server process 2014-01-29 18:51:51 MSK DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory. 2014-01-29 18:51:51 MSK HINT:
In a moment you should be able to reconnect to the database and repeat
your command. 2014-01-29 18:51:51 MSK WARNING: terminating connection
because of crash of another server process 2014-01-29 18:51:51 MSK
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory. 2014-01-29
18:51:51 MSK HINT: In a moment you should be able to reconnect to the
database and repeat your command. 2014-01-29 18:51:51 MSK WARNING:
terminating connection because of crash of another server process
2014-01-29 18:51:51 MSK DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory. 2014-01-29 18:51:51 MSK HINT: In a moment you should be able
to reconnect to the database and repeat your command. 2014-01-29
18:51:51 MSK WARNING: terminating connection because of crash of
another server process 2014-01-29 18:51:51 MSK DETAIL: The postmaster
has commanded this server process to roll back the current transaction
and exit, because another server process exited abnormally and
possibly corrupted shared memory. 2014-01-29 18:51:51 MSK HINT: In a
moment you should be able to reconnect to the database and repeat your
command. 2014-01-29 18:51:52 MSK WARNING: terminating connection
because of crash of another server process 2014-01-29 18:51:52 MSK
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory. 2014-01-29
18:51:52 MSK HINT: In a moment you should be able to reconnect to the
database and repeat your command. 2014-01-29 18:51:52 MSK WARNING:
terminating connection because of crash of another server process
2014-01-29 18:51:52 MSK DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory. 2014-01-29 18:51:52 MSK HINT: In a moment you should be able
to reconnect to the database and repeat your command. 2014-01-29
18:51:53 MSK FATAL: the database system is in recovery mode
2014-01-29 18:51:54 MSK FATAL: the database system is in recovery
mode
So, basically, postgres is stuck on "the database system is shutting down" Is there anything I can do to "kick it", so to speak?
This is a development server with no clusters, nothing fancy.
Update 2:
I've tried to connect to the server with the following command:
"C:\Program Files (x86)\PostgreSQL\8.4\bin\psql.exe" -U postgres -l -h ntv.ncdev.ru -p 5433
It gives me the same error:
psql: FATAL: the database system is shutting down
The exception 0xC000012D is "Out of Virtual Memory".
Increase the size of your virtual memory paging file.
Source here.

LOG: server process (PID 11748) was terminated by signal 11: Segmentation fault

I am using Postgres-8.3.7 on fedora core 2 linux box. And Postgres service is crashing.
When I restart the system, it is working fine for some time. At some random time it is crashing again.
What could be the possible reasons for this segfaults which are random?
FATAL: the database system is in recovery mode
LOG: autovacuum launcher started
LOG: database system is ready to accept connections
LOG: server process (PID 11748) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2010-05-24 13:28:06 PDT
LOG: database system was not properly shut down; automatic recovery in progress
A little too specific, few details - and perhaps more appropiate to serverfault.com , or the postgresql mailing lists.
Some random suggestions:
VACUUM ANALYZE VERBOSE ?
Can't you upgrade to the last version ?
Some special circumnstances when this happens ? Disk nearly full ? High load ? Nothing suspicious in the OS logs ( /var/log/message ) ?
Can't you raise the log level of postgresql to log the queries and see if this is related to some particular query (e.g. function)?
Postgresql has a very responsive developers community.