priyank#infahmcpu0010:/etc/pgpool2$ pgpool -n
2022-06-20 18:48:40.282: main pid 33987: LOG: reading status file: 1 th backend is set to down status
2022-06-20 18:48:40.282: main pid 33987: LOG: health_check_stats_shared_memory_size: requested size: 12288
2022-06-20 18:48:40.282: main pid 33987: LOG: memory cache initialized
2022-06-20 18:48:40.282: main pid 33987: DETAIL: memcache blocks :64
2022-06-20 18:48:40.282: main pid 33987: LOG: allocating (136981824) bytes of shared memory segment
2022-06-20 18:48:40.282: main pid 33987: LOG: allocating shared memory segment of size: 136981824
2022-06-20 18:48:40.340: main pid 33987: LOG: health_check_stats_shared_memory_size: requested size: 12288
2022-06-20 18:48:40.340: main pid 33987: LOG: health_check_stats_shared_memory_size: requested size: 12288
2022-06-20 18:48:40.340: main pid 33987: LOG: memory cache initialized
2022-06-20 18:48:40.340: main pid 33987: DETAIL: memcache blocks :64
2022-06-20 18:48:40.341: main pid 33987: LOG: pool_discard_oid_maps: discarded memqcache oid maps
2022-06-20 18:48:40.344: main pid 33987: LOG: Setting up socket for 0.0.0.0:9999
2022-06-20 18:48:40.344: main pid 33987: FATAL: failed to create INET domain socket
2022-06-20 18:48:40.344: main pid 33987: DETAIL: bind on socket failed with error "Address already in use"
2022-06-20 18:48:40.347: main pid 33987: LOG: shutting down
all configuration are done but whenever starting pgpool2 getting error
As you can see in the log
2022-06-20 18:48:40.344: main pid 33987: DETAIL: bind on socket failed with error "Address already in use"
You need to stop already running instance using command
pgpool stop
Related
In the Datastore logs, I encountered the following error, Not sure what has gone wrong.
[7804] LOG: starting PostgreSQL 13.1, compiled by Visual C++ build 1914, 64-bit
2021-08-23 22:56:15.980 CEST [7804] LOG: listening on IPv4 address "127.0.0.1", port 9003
2021-08-23 22:56:15.983 CEST [7804] LOG: listening on IPv4 address "10.91.198.36", port 9003
2021-08-23 22:56:16.041 CEST [8812] LOG: database system was shut down at 2021-08-23 22:54:51 CEST
2021-08-23 22:56:16.044 CEST [8812] LOG: invalid primary checkpoint record
2021-08-23 22:56:16.045 CEST [8812] PANIC: could not locate a valid checkpoint record
2021-08-23 22:56:16.076 CEST [7804] LOG: startup process (PID 8812) was terminated by exception 0xC0000409
2021-08-23 22:56:16.076 CEST [7804] HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.
2021-08-23 22:56:16.078 CEST [7804] LOG: aborting startup due to startup process failure
2021-08-23 22:56:16.094 CEST [7804] LOG: database system is shut down
Somebody deleted crucial WAL files (to free space?), and now your cluster is corrupted
Restore from backup. If you have no backup, running pg_resetwal is an option, since it seems there was a clean shutdown.
I have potentially a corrupted Postgres database running in a docker container. This container mounts a data volume to /var/lib/postgresql/data.
I exec'd into the docker container and ran the command gosu postgres pg_resetxlog /var/lib/postgresql/data. The error I received is of the following:
pg_resetxlog: lock file "postmaster.pid" exists
Is a server running? If not, delete the lock file and try again.
I tried 3 things:
match the PID listed under postmaster.pid with my Postgres process and manually killing Postgres using kill <PID>. This did nothing to shut down Postgres.
Delete the postmaster.pid located under /var/lib/postgresql/data. This forced the container to restart but the same issue persists.
I ran docker restart <POSTGRES> to restart postgres.
None of the above did anything to help. What I'm trying to do is essentially have a way for this container to recover without completely destroying it and forcing it to start anew. I'm using Postgres:9.5 docker.
Any ideas?
EDIT: add container logs
Sep 11 18:23:34 VM postgres-container[1045]: LOG: incomplete startup packet
Sep 11 18:24:36 VM postgres-container[1045]: LOG: incomplete startup packet
Sep 11 18:24:46 VM postgres-container[1045]: LOG: received smart shutdown request
Sep 11 18:24:46 VM postgres-container[1045]: LOG: autovacuum launcher shutting down
Sep 11 18:24:58 VM postgres-container[1045]: FATAL: the database system is shutting down
Sep 11 18:25:01 VM postgres-container[1045]: FATAL: the database system is shutting down
Sep 11 18:25:39 VM postgres-container[1045]: LOG: incomplete startup packet
Sep 11 18:25:39 VM postgres-container[1045]: FATAL: the database system is shutting down
Sep 11 18:25:59 VM postgres-container[1045]: FATAL: the database system is shutting down
Sep 11 18:26:02 VM postgres-container[1045]: FATAL: the database system is shutting down
Sep 11 18:26:40 VM postgres-container[1045]: LOG: incomplete startup packet
Sep 11 18:26:40 VM postgres-container[1045]: FATAL: the database system is shutting down
Sep 11 18:27:00 VM postgres-container[1045]: FATAL: the database system is shutting down
Sep 11 18:27:03 VM postgres-container[1045]: FATAL: the database system is shutting down
Sep 11 18:27:41 VM postgres-container[1045]: LOG: incomplete startup packet
Sep 11 18:27:41 VM postgres-container[1045]: FATAL: the database system is shutting down
Sep 11 18:28:01 VM postgres-container[1045]: FATAL: the database system is shutting down
Sep 11 18:28:04 VM postgres-container[1045]: FATAL: the database system is shutting down
Sep 11 18:28:41 VM postgres-container[1045]: LOG: could not open file "postmaster.pid": No such file or directory
Sep 11 18:28:41 VM postgres-container[1045]: LOG: performing immediate shutdown because data directory lock file is invalid
Sep 11 18:28:41 VM postgres-container[1045]: LOG: received immediate shutdown request
Sep 11 18:28:41 VM postgres-container[1045]: WARNING: terminating connection because of crash of another server process
Sep 11 18:28:41 VM postgres-container[1045]: DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
Sep 11 18:28:41 VM postgres-container[1045]: HINT: In a moment you should be able to reconnect to the database and repeat your command.
Sep 11 18:28:41 VM postgres-container[1045]: LOG: incomplete startup packet
Sep 11 18:28:43 VM postgres-container[1045]: LOG: database system was interrupted; last known up at 2019-09-12 01:24:31 UTC
Sep 11 18:28:56 VM postgres-container[1045]: LOG: database system was not properly shut down; automatic recovery in progress
Sep 11 18:28:56 VM postgres-container[1045]: LOG: redo starts at 0/1C384970
Sep 11 18:28:56 VM postgres-container[1045]: LOG: invalid record length at 0/1C44DAC8
Sep 11 18:28:56 VM postgres-container[1045]: LOG: redo done at 0/1C44DAA0
Sep 11 18:28:56 VM postgres-container[1045]: LOG: last completed transaction was at log time 2019-09-12 01:24:42.2848+00
Sep 11 18:28:57 VM postgres-container[1045]: LOG: MultiXact member wraparound protections are now enabled
Sep 11 18:28:57 VM postgres-container[1045]: LOG: database system is ready to accept connections
Sep 11 18:28:57 VM postgres-container[1045]: LOG: autovacuum launcher started
Sep 11 18:29:42 VM postgres-container[1045]: LOG: incomplete startup packet
Sep 11 18:30:45 VM postgres-container[1045]: LOG: incomplete startup packet
Sep 11 18:31:47 VM postgres-container[1045]: LOG: incomplete startup packet
Good News thestateofmay !!! The server is up and running :)
You can connect to the container , use
docker exec -ti postgres /bin/bash
you should be user root ,switch to postgres
su postgres
then start
psql
and check if the data is present in the server
I'm using PostgreSQL v11.0. I am executing a simple SQL query
delete from base.sys_attribute where id=20;
It fails and returns the error:
server process (PID 29) was terminated by signal 11 (see log below)
Any idea how I can troubleshoot this issue? I'm completely stuck...
listof-db | 2019-05-10 08:54:46.425 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
listof-db | 2019-05-10 08:54:46.425 UTC [1] LOG: listening on IPv6 address "::", port 5432
listof-db | 2019-05-10 08:54:46.425 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
listof-db | 2019-05-10 08:54:46.435 UTC [20] LOG: database system was shut down at 2019-05-10 08:54:26 UTC
listof-db | 2019-05-10 08:54:46.438 UTC [1] LOG: database system is ready to accept connections
listof-db | 2019-05-10 08:56:52.295 UTC [1] LOG: server process (PID 29) was terminated by signal 11
listof-db | 2019-05-10 08:56:52.295 UTC [1] DETAIL: Failed process was running: delete from base.sys_attribute where id=20
listof-db | 2019-05-10 08:56:52.295 UTC [1] LOG: terminating any other active server processes
listof-db | 2019-05-10 08:56:52.295 UTC [24] WARNING: terminating connection because of crash of another server process
listof-db | 2019-05-10 08:56:52.295 UTC [24] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
listof-db | 2019-05-10 08:56:52.295 UTC [24] HINT: In a moment you should be able to reconnect to the database and repeat your command.
listof-db | 2019-05-10 08:56:52.486 UTC [1] LOG: all server processes terminated; reinitializing
listof-db | 2019-05-10 08:56:53.093 UTC [30] LOG: database system was interrupted; last known up at 2019-05-10 08:54:46 UTC
listof-db | 2019-05-10 08:56:53.187 UTC [30] LOG: database system was not properly shut down; automatic recovery in progress
listof-db | 2019-05-10 08:56:53.189 UTC [30] LOG: redo starts at 0/75955A8
listof-db | 2019-05-10 08:56:53.189 UTC [30] LOG: invalid record length at 0/75971B8: wanted 24, got 0
listof-db | 2019-05-10 08:56:53.189 UTC [30] LOG: redo done at 0/7597180
listof-db | 2019-05-10 08:56:53.194 UTC [1] LOG: database system is ready to accept connections
I've setup streaming replication with postgres 9.3
My problem is that on the Slave server the pg_xlog folder just gets fuller and fuller and WAL files are not getting recycled.
The slave server has the following (relevant) values in postgresql.conf on slave server:
wal_keep_segments = 150
hot_standby = on
checkpoint_segments = 32
checkpoint_completion_target = 0.9
archive_mode = off
#archive_command = ''
My initial replication command was:
pg_basebackup --xlog-method=stream -h <master-ip> -D . --username=replication --password
So I guess my WAL files are OK.
Here is my slave server startup log:
2017-05-08 09:55:31 IDT LOG: database system was shut down in recovery at 2017-05-08 09:55:19 IDT
2017-05-08 09:55:31 IDT LOG: entering standby mode
2017-05-08 09:55:31 IDT LOG: redo starts at 361/C76DD3E8
2017-05-08 09:55:31 IDT LOG: consistent recovery state reached at 361/C89A8278
2017-05-08 09:55:31 IDT LOG: database system is ready to accept read only connections
2017-05-08 09:55:31 IDT LOG: record with zero length at 361/C89A8278
2017-05-08 09:55:31 IDT LOG: started streaming WAL from primary at 361/C8000000 on timeline 1
2017-05-08 09:55:32 IDT LOG: incomplete startup packet
2017-05-08 09:58:34 IDT LOG: received SIGHUP, reloading configuration files
2017-05-08 09:58:34 IDT LOG: parameter "checkpoint_completion_target" changed to "0.9"
I even tried to copy older WAL files from master server manually to slave but that also didn't help.
What am I doing wrong? How can I stop the pg_xlog folder from growing indefinitely?
Is it related to the "incomplete startup packet" log message?
one last thing: under the pg_xlog\archive_status folder all of the WAL files are with .done suffix.
Appreciate any help I can get on this.
Edit:
I enabled log_checkpoints in postgresql.conf.
Here are the relevant log entries since I enabled it:
2017-05-12 08:43:11 IDT LOG: parameter "log_checkpoints" changed to "on"
2017-05-12 08:43:24 IDT LOG: checkpoint complete: wrote 2128 buffers (0.9%); 0 transaction log file(s) added, 0 removed, 9 recycled; write=189.240 s, sync=0.167 s, total=189.549 s; sync files=745, longest=0.010 s, average=0.000 s
2017-05-12 08:45:15 IDT LOG: checkpoint starting: time
2017-05-12 08:48:46 IDT LOG: checkpoint complete: wrote 15175 buffers (6.6%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=209.078 s, sync=1.454 s, total=210.617 s; sync files=769, longest=0.032 s, average=0.001 s
2017-05-12 08:50:15 IDT LOG: checkpoint starting: time
2017-05-12 08:53:45 IDT LOG: checkpoint complete: wrote 2480 buffers (1.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=209.162 s, sync=0.991 s, total=210.253 s; sync files=663, longest=0.076 s, average=0.001 s
Edit2:
Following the fact that my slave server has no restart points in the log, here is the relevant log for starting and recovering WALS in slave server before achieving consistent recovery state:
2017-05-12 09:35:42 IDT LOG: database system was shut down in recovery at 2017-05-12 09:35:41 IDT
2017-05-12 09:35:42 IDT LOG: entering standby mode
2017-05-12 09:35:42 IDT LOG: incomplete startup packet
2017-05-12 09:35:43 IDT FATAL: the database system is starting up
2017-05-12 09:35:43 IDT LOG: restored log file "0000000100000369000000B1" from archive
2017-05-12 09:35:43 IDT FATAL: the database system is starting up
2017-05-12 09:35:44 IDT FATAL: the database system is starting up
2017-05-12 09:35:44 IDT LOG: restored log file "0000000100000369000000AF" from archive
2017-05-12 09:35:44 IDT LOG: redo starts at 369/AFD28900
2017-05-12 09:35:44 IDT FATAL: the database system is starting up
2017-05-12 09:35:45 IDT FATAL: the database system is starting up
2017-05-12 09:35:45 IDT FATAL: the database system is starting up
2017-05-12 09:35:46 IDT LOG: restored log file "0000000100000369000000B0" from archive
2017-05-12 09:35:46 IDT FATAL: the database system is starting up
2017-05-12 09:35:46 IDT FATAL: the database system is starting up
2017-05-12 09:35:47 IDT FATAL: the database system is starting up
2017-05-12 09:35:47 IDT LOG: restored log file "0000000100000369000000B1" from archive
2017-05-12 09:35:47 IDT FATAL: the database system is starting up
2017-05-12 09:35:48 IDT FATAL: the database system is starting up
2017-05-12 09:35:48 IDT LOG: incomplete startup packet
2017-05-12 09:35:49 IDT LOG: restored log file "0000000100000369000000B2" from archive
2017-05-12 09:35:50 IDT LOG: restored log file "0000000100000369000000B3" from archive
2017-05-12 09:35:52 IDT LOG: restored log file "0000000100000369000000B4" from archive
.
.
.
2017-05-12 09:42:33 IDT LOG: restored log file "000000010000036A000000C0" from archive
2017-05-12 09:42:35 IDT LOG: restored log file "000000010000036A000000C1" from archive
2017-05-12 09:42:36 IDT LOG: restored log file "000000010000036A000000C2" from archive
2017-05-12 09:42:37 IDT LOG: restored log file "000000010000036A000000C3" from archive
2017-05-12 09:42:37 IDT LOG: consistent recovery state reached at 36A/C3ACEB28
2017-05-12 09:42:37 IDT LOG: database system is ready to accept read only connections
2017-05-12 09:42:39 IDT LOG: restored log file "000000010000036A000000C4" from archive
2017-05-12 09:42:40 IDT LOG: restored log file "000000010000036A000000C5" from archive
2017-05-12 09:42:42 IDT LOG: restored log file "000000010000036A000000C6" from archive
ERROR: WAL file '000000010000036A000000C7' not found in server 'main-db-server'
2017-05-12 09:42:42 IDT LOG: started streaming WAL from primary at 36A/C6000000 on timeline 1
Thanks!
The problem seems to have been resolved.
Apparently I had hardware issues on the master server.
I was able to perform full pg_dump and re-index my DB so I was pretty sure I did not have any data integrity issues.
But when looking at the master server logs after I've enabled log_checkpoints in the config - a few minutes before the slave server stopped performing checkpoints I saw the following message:
IDT ERROR: failed to re-find parent key in index "<table_name>_id_udx" for split pages 17/18
After seeing that - I decided to switch hosting provider and moved my DB to a new server.
Since then (almost a week now) - everything has been running smoothly replication and checkpoints are running as expected.
I really hope this will help other people - but when something like this is happening - always be advised that this issue might be caused by data integrity/hardware issues.
I'm running a postgres server locally on my computer and it seems that even the simple queries like the one below is giving me an EOF detected error.
For instance, this query
ALTER TABLE maintab ADD COLUMN testing numeric;
UPDATE maintab SET testing = numeric1 * numeric2;
And similar activities will throw an EOF error. I'm also running PostGIS with QGIS and my spatial queries, no matter how simple, will throw this error.
I've look around at forums and documentation but nothing can seem to help solve this problem. Is there anything I can do to stop this?
EDIT
I ran a check on my error logs after doing some Googling. Found these logs, not sure what to make of them
2015-09-04 11:18:31 EDT [1138-4] LOG: terminating any other active server processes
2015-09-04 11:18:31 EDT [1208-3] WARNING: terminating connection because of crash of another server process
2015-09-04 11:18:31 EDT [1208-4] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2015-09-04 11:18:31 EDT [1208-5] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2015-09-04 11:18:31 EDT [1138-5] LOG: all server processes terminated; reinitializing
2015-09-04 11:18:31 EDT [3861-1] LOG: database system was interrupted; last known up at 2015-09-04 15:08:49 EDT
2015-09-04 11:18:32 EDT [3861-2] LOG: database system was not properly shut down; automatic recovery in progress
2015-09-04 11:18:32 EDT [3861-3] LOG: record with zero length at 1D/123A250
2015-09-04 11:18:32 EDT [3861-4] LOG: redo is not required
2015-09-04 11:18:32 EDT [3861-5] LOG: MultiXact member wraparound protections are now enabled
2015-09-04 11:18:32 EDT [1138-6] LOG: database system is ready to accept connections
2015-09-04 11:18:32 EDT [3865-1] LOG: autovacuum launcher started
2015-09-04 16:07:22 EDT [1122-1] LOG: database system was interrupted; last known up at 2015-09-04 16:06:25 EDT
2015-09-04 16:07:22 EDT [1179-1] [unknown]#[unknown] LOG: incomplete startup packet
2015-09-04 16:07:23 EDT [1122-2] LOG: database system was not properly shut down; automatic recovery in progress
2015-09-04 16:07:23 EDT [1122-3] LOG: record with zero length at 1D/123A320
2015-09-04 16:07:23 EDT [1122-4] LOG: redo is not required
2015-09-04 16:07:23 EDT [1122-5] LOG: MultiXact member wraparound protections are now enabled
2015-09-04 16:07:23 EDT [1114-1] LOG: database system is ready to accept connections
2015-09-04 16:07:23 EDT [1183-1] LOG: autovacuum launcher started
2015-09-04 12:15:05 EDT [1183-2] LOG: stats collector's time 2015-09-04 16:07:23.363257-04 is later than backend local time 2015-09-04 12:15:05.07308-04
2015-09-04 12:17:34 EDT [1114-2] LOG: server process (PID 3824) was terminated by signal 11: Segmentation fault
2015-09-04 12:17:34 EDT [1114-4] LOG: terminating any other active server processes
2015-09-04 12:17:34 EDT [1183-3] WARNING: terminating connection because of crash of another server process
2015-09-04 12:17:34 EDT [1183-4] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2015-09-04 12:17:34 EDT [1183-5] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2015-09-04 12:17:34 EDT [1114-5] LOG: all server processes terminated; reinitializing
2015-09-04 12:17:34 EDT [3828-1] LOG: database system was interrupted; last known up at 2015-09-04 16:07:23 EDT
2015-09-04 12:17:35 EDT [3828-2] LOG: database system was not properly shut down; automatic recovery in progress
2015-09-04 12:17:35 EDT [3828-3] LOG: redo starts at 1D/123A388
2015-09-04 12:17:35 EDT [3828-4] LOG: unexpected pageaddr 1C/F9258000 in log segment 000000010000001D00000001, offset 2457600
2015-09-04 12:17:35 EDT [3828-5] LOG: redo done at 1D/1255C18
2015-09-04 12:17:36 EDT [3828-6] LOG: MultiXact member wraparound protections are now enabled
2015-09-04 12:17:36 EDT [3833-1] LOG: autovacuum launcher started
2015-09-04 12:17:36 EDT [1114-6] LOG: database system is ready to accept connections