LOG: server process (PID 11748) was terminated by signal 11: Segmentation fault - postgresql

I am using Postgres-8.3.7 on fedora core 2 linux box. And Postgres service is crashing.
When I restart the system, it is working fine for some time. At some random time it is crashing again.
What could be the possible reasons for this segfaults which are random?
FATAL: the database system is in recovery mode
LOG: autovacuum launcher started
LOG: database system is ready to accept connections
LOG: server process (PID 11748) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2010-05-24 13:28:06 PDT
LOG: database system was not properly shut down; automatic recovery in progress

A little too specific, few details - and perhaps more appropiate to serverfault.com , or the postgresql mailing lists.
Some random suggestions:
VACUUM ANALYZE VERBOSE ?
Can't you upgrade to the last version ?
Some special circumnstances when this happens ? Disk nearly full ? High load ? Nothing suspicious in the OS logs ( /var/log/message ) ?
Can't you raise the log level of postgresql to log the queries and see if this is related to some particular query (e.g. function)?
Postgresql has a very responsive developers community.

Related

Postgres streaming replication - servers keep shutting down

I am new to PostgreSQL and I am trying to set up a streaming replication from our server to a test DB on my laptop. I have been following this tutorial https://www.percona.com/blog/2018/09/07/setting-up-streaming-replication-postgresql/ along with the Postgres documentation here https://www.postgresql.org/docs/11/runtime-config-replication.html.
I'm running Windows 10, PostgreSQL 11, PostGIS 2.5 extension.
The server and my local machine both keep shutting down and the logs are filled with postmaster.pid errors such as:
LOG: performing immediate shutdown because data directory lock file is invalid
LOG: received immediate shutdown request
LOG: could not open file "postmaster.pid": No such file or directory
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
Could anyone point me towards the issue here? I know my server's aren't configured properly but I just don't know what configurations need to be changed.
Here is an image of my standby server's most recent log.
standby log
Here is an image of my master server's most recent log.
master log
You must have messed up in many ways.
You removed or overwrote postmaster.pid on the master server.
That is very dangerous and causes the server to die with the error message you quote.
You didn't create recovery.conf before starting the standby server, or you removed backup_label. From the error messages I'd suspect the second, with ensuing data corruption.

Postgresql fatal the database system is starting up - windows 10

I have installed postgresql on windows 10 on usb disk.
Every day when i start my pc in work from sleep and plug in the disk again then trying to start postgresql i get this error:
FATAL: the database system is starting up
The service starts with following command:
E:\PostgresSql\pg96\pgservice.exe "//RS//PostgreSQL 9.6 Server"
It is the default one.
logs from E:\PostgresSql\data\logs\pg96
2019-02-28 10:30:36 CET [21788]: [1-1] user=postgres,db=postgres,app=[unknown],client=::1 FATAL: the database system is starting up
2019-02-28 10:31:08 CET [9796]: [1-1] user=postgres,db=postgres,app=[unknown],client=::1 FATAL: the database system is starting up
I want this start up to happen faster.
When you commit data to a Postgres database, the only thing which is immediately saved to disk is the write-ahead log. The actual table changes are only applied to the in-memory buffers, and won't be permanently saved to disk until the next checkpoint.
If the server is stopped abruptly, or if it suddenly loses access to the file system, then everything in memory is lost, and the next time you start it up, it needs to resort to replaying the log in order to get the tables back to the correct state (which can take quite a while, depending on how much has happened since the last checkpoint). And until it's finished, any attempt to use the server will result in FATAL: the database system is starting up.
If you make sure you shut the server down cleanly before unplugging the disk - giving it a chance to set a checkpoint and flush all of its buffers - then it should be able to start up again more or less immediately.

incorrect resource manager data checksum in record at 2/XYZ + terminating walreceiver process due to administrator command

I am running a streaming replication environment with PostgreSQL 9.1 (1 master, 3 slaves). Everything worked fine for aprox. 2 months. Yesterday, the replication to one of the slaves failed with the log on the slave having:
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
FATAL: terminating walreceiver process due to administrator command
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
The slave was no longer in sync with the master. Two hours later, in which the log gets a new line like above every 5 seconds, I restarted the slave database server:
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
FATAL: terminating connection due to administrator command
FATAL: terminating connection due to administrator command
LOG: shutting down
LOG: database system is shut down
The new log file on the slave contains:
LOG: database system was shut down in recovery at 2016-02-29 05:12:11 CET
LOG: entering standby mode
LOG: redo starts at 61/D92C10C9
LOG: consistent recovery state reached at 61/DA2710A7
LOG: database system is ready to accept read only connections
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: streaming replication successfully connected to primary
Now the slave is in sync with the master but the checksum entry is still there. One more thing I checked were the network logs -> the network was available.
My questions are:
Does anyone know why the walreceiver was terminated?
Why didn't PostgreSQL retry the replication?
What can I do to prevent this in the future?
Thank you.
EDIT:
The database servers are running on SLES 11 with ext3. I found an article about low performance of SLES 11 with large RAM but I am not sure if it applies since my machine has only 8 GB RAM (https://www.novell.com/support/kb/doc.php?id=7010287)
Any help would be appreciated.
EDIT (2):
PostgreSQL version is 9.1.5. Seem that PostgreSQL version 9.1.6 provides a fix for similar issue?
Fix persistence marking of shared buffers during WAL replay (Jeff Davis)
This mistake can result in buffers not being written out during checkpoints, resulting in data corruption if the server later crashes without ever having written those buffers. Corruption can occur on any server following crash recovery, but it is significantly more likely to occur on standby slave servers since those perform much more WAL replay.
Source: http://www.postgresql.org/docs/9.1/static/release-9-1-6.html
Might this be the fix? Should I upgrade to PostgreSQL 9.1.6 and everything would run smooth?
In case someone stumbles across this question, I ended up reinstalling the databases from backed-up data and set up replication again. Never really figured out what went wrong.
Never really figured out what went wrong.
I'm experiencing the same error - just that it never syncs in full from the very beginning.
Then, the primary server had got some kernel errors (heat problem in server case?). The server was required to be switched off because of incomplete shutdown. Already while shutting down, the slave showed up with
LOG: incorrect resource manager data checksum in record at 1/63663CB0
After restart of the primary server and restart of the slave server, the situation doesn't change: same log entries every 5 seconds.

Can't start postgresql replication

We have postgresql replication on different server. So today I was doing some optimization on replication cluster postgresql.conf
After doing replication, I restarted postgresql with this command:
pg_ctlcluster 9.2 main2 restart
But instead of restarting, it gave this error:
The PostgreSQL server failed to start. Please check the log output.
And checking the log, I see this:
2015-06-16 12:18:16 EEST [10655]: [2-1] LOG: received smart shutdown request
2015-06-16 12:18:16 EEST [10661]: [2-1] FATAL: terminating walreceiver process due to administrator command
2015-06-16 12:18:16 EEST [10658]: [1-1] LOG: shutting down
2015-06-16 12:18:16 EEST [10658]: [2-1] LOG: database system is shut down
Checking log now it shows the last restart log, but log does not update, when I try to start server. It says check the log, but there is no new information, even if I try to start server again.
P.S. Do I need to do anything on master?
Update
Changing postgresql.conf settings back, started replication. But from error it is hard to tell what was wrong.
here are settings I changed (after they changing, they were the same as on master. When I commented it, only then I could start replication):
shared_buffers = 1536MB
effective_cache_size = 3072MB
checkpoint_segments = 15
checkpoint_completion_target = 0.9
autovacuum = on
track_counts = on
work_mem = 25MB
So as I said, after commenting these, I could start it. But don't get it why it won't let start with these settings.
If I were you, and if upgrade is an option, the first thing I would do is to upgrade to PostgreSQL 9.4 (or newer). There's a good reason for do this when it comes to replication - a new feature called "replication slots" (see the announcement).
In short: replication slots are more robust and easier to implement than WAL archiving (you obviously use according to your logs).
In this post you'll find a comprehensive guide on implementing the feature.

Issue with PostgreSQL installation

I have started learning RoR, now i am trying to install PostgreSQL for performing backend operation with the database.
I have two machines where i installed Postgresql, it runs well in one machine. In another machine i uninstalled the PostgreSQL and re installed it. After re installing i could not able to start the PostgreSQL service.
Below is the error log that i see in PostgreSQL log file
2011-12-02 04:40:53 PST LOG: incomplete startup packet
2011-12-02 04:40:53 PST LOG: database system was not properly shut down; automatic recovery in progress
2011-12-02 04:40:54 PST LOG: consistent recovery state reached at 0/15F44E0
2011-12-02 04:40:54 PST LOG: record with zero length at 0/15F44E0
2011-12-02 04:40:54 PST LOG: redo is not required
2011-12-02 04:40:54 PST LOG: database system is ready to accept connections
2011-12-02 04:40:54 PST LOG: autovacuum launcher started
2011-12-02 04:40:54 PST LOG: could not receive data from client: An operation was attempted on something that is not a socket.
I have been trying several ways to solve this, but no use on all the steps. Could any one please give me a hand on this?
I am installing PostgreSQL Version 9.0.5 in Windows 7 machine (64 bit)
Am I correct in saying that you once had a working installation of Postgres on the same machine? But you uninstalled it and are now trying to install it on the same machine again?
I was in a similar situation once but it had to do with the service account the installer creates. Did you create a new service account for postgres, or are you trying to re-use the old one? If so, are you sure you are using the right password for the account?
Another hunch I got; the port on which the new Postgres server is listening, is it really free? It appears to me that it cannot open the listening port specified in postgresql.conf.
Or, yet another hunch; if I read the log correct I get the feeling that you are re-using the old data directory. Is this true? And if so, what was the version of the previous Postgres installation?
Sorry I don't have answer, but I do have questions that might lead you to a solution.