I want to move my postgresql databases to an external hard drive (HDD 2TB USB 3.0). I copied the whole directory:
/var/lib/postgresql/9.4/main/
to the external drive, preserving permissions, with a command (ran by the user postgres):
$ rsync -aHAX /var/lib/postgresql/9.4/main/* new_dir_path
First run of this command was interrupted, but in the second attempt I copied everything (basically one database of size 800 GB). In the file
/etc/postgresql/9.4/main/postgresql.conf
I changed the line
data_directory = '/var/lib/postgresql/9.4/main'
to point to the new location. I restarted the postgresql service, and when from the user postgres I run the command psql, I get:
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I didn't change any other settings. There is no pidfile 'postmaster.pid' in the new location (or in the old one). When I run a command
$ /usr/lib/postgresql/9.4/bin/postgres --single -D /etc/postgresql/9.4/main -P -d 1
I get
2017-03-16 20:47:39 CET [2314-1] DEBUG: mmap with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2017-03-16 20:47:39 CET [2314-2] NOTICE: database system was shut down at 2017-03-16 20:01:23 CET
2017-03-16 20:47:39 CET [2314-3] DEBUG: checkpoint record is at 647/4041B3A0
2017-03-16 20:47:39 CET [2314-4] DEBUG: redo record is at 647/4041B3A0; shutdown TRUE
2017-03-16 20:47:39 CET [2314-5] DEBUG: next transaction ID: 1/414989450; next OID: 112553
2017-03-16 20:47:39 CET [2314-6] DEBUG: next MultiXactId: 485048384; next MultiXactOffset: 1214064579
2017-03-16 20:47:39 CET [2314-7] DEBUG: oldest unfrozen transaction ID: 259446705, in database 12141
2017-03-16 20:47:39 CET [2314-8] DEBUG: oldest MultiXactId: 476142442, in database 12141
2017-03-16 20:47:39 CET [2314-9] DEBUG: transaction ID wrap limit is 2406930352, limited by database with OID 12141
2017-03-16 20:47:39 CET [2314-10] DEBUG: MultiXactId wrap limit is 2623626089, limited by database with OID 12141
2017-03-16 20:47:39 CET [2314-11] DEBUG: starting up replication slots
2017-03-16 20:47:39 CET [2314-12] DEBUG: oldest MultiXactId member is at offset 1191132700
2017-03-16 20:47:39 CET [2314-13] DEBUG: MultiXact member stop limit is now 1191060352 based on MultiXact 476142442
PostgreSQL stand-alone backend 9.4.9
backend>
but I don't now how to understand this output. When I revert the changes in the postgresql.conf file, everything works fine. Interestingly, few months ago I moved the database in the same way, but to the local directory, and it worked.
I use postgresql-9.4 and debian-jessie.
Thanks for your help!
UPDATE
Content of the log file:
$ cat /var/log/postgresql/postgresql-9.4-main.log
2017-03-14 17:07:16 CET [13822-2] LOG: received fast shutdown request
2017-03-14 17:07:16 CET [13822-3] LOG: aborting any active transactions
2017-03-14 17:07:16 CET [13827-3] LOG: autovacuum launcher shutting down
2017-03-14 17:07:16 CET [13824-1] LOG: shutting down
2017-03-14 17:07:16 CET [13824-2] LOG: database system is shut down
Related
Postgres is restarting continuously on using shared_preload_libraries extension.
https://postgresqlco.nf/doc/en/param/shared_preload_libraries/
I am running postgres-15.1 using a python-based daemon in CentOS7-32bit arch. It is working fine if we do not use "shared_preload_libraries" extension. But after enabling this extension using "ALTER SYSTEM SET shared_preload_libraries" command, the postgres is restarting every few seconds.
Initially it was working fine with postgres-9.6.4.
Postgres logs:
waiting for server to start....2023-02-15 07:13:45.676 GMT [28605] LOG: skipping missing configuration file "/home/runtime/pgsql/data/postgresql.auto.conf"
2023-02-15 07:13:45.825 GMT [28605] LOG: starting PostgreSQL 15.1 on i686-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 32-bit
2023-02-15 07:13:45.825 GMT [28605] LOG: listening on IPv4 address "127.0.0.1", port 5432
2023-02-15 07:13:45.933 GMT [28605] LOG: listening on Unix socket "/home/runtime/pgsql/.s.PGSQL.5432"
2023-02-15 07:13:45.969 GMT [28608] LOG: database system was shut down at 2023-02-15 07:13:35 GMT
2023-02-15 07:13:45.989 GMT [28605] LOG: database system is ready to accept connections
done
server started
ALTER SYSTEM
ALTER SYSTEM
ALTER SYSTEM
ALTER SYSTEM
2023-02-15 07:13:51.480 GMT [28605] LOG: received fast shutdown request
waiting for server to shut down....2023-02-15 07:13:51.512 GMT [28605] LOG: aborting any active transactions
2023-02-15 07:13:51.513 GMT [28605] LOG: background worker "logical replication launcher" (PID 28611) exited with exit code 1
2023-02-15 07:13:51.513 GMT [28606] LOG: shutting down
2023-02-15 07:13:51.536 GMT [28606] LOG: checkpoint starting: shutdown immediate
2023-02-15 07:13:51.908 GMT [28606] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.090 s, sync=0.028 s, total=0.395 s; sync files=2, longest=0.021 s, average=0.014 s; distance=0 kB, estimate=0 kB
2023-02-15 07:13:51.909 GMT [28605] LOG: database system is shut down
done
server stopped
I tried to use postgres-15.0 and postgres-14.4, got the same behavior with both. I am not able to find any open issues w.r.t. shared_preload_libraries extension with new versions of Postgres.
PS: I have built this Postgres from the source code with openssl-1.1.1i.
I am using "citus" library with this.
ALTER SYSTEM SET shared_preload_libraries="citus";
I have generated a new citus.so file from it's source code using postgres-15.1. github.com/citusdata/citus
I've installed, correctly Postgres13 with BDR. First node is configured correctly
create_node
-------------
661510928
(1 row)
create_node_group
-------------------
3209631483
(1 row)
wait_for_join_completion
--------------------------
ACTIVE
(1 row)
The problem is on second node, if I try to join the node 1, with command:
bdr_init_physical -D /home/postgres/data -n bdr_node_rm1_02 --local-dsn="port=5432 dbname=lmw host=192.168.0.101 user=postgres password=PWD" -d "port=5432 dbname=lmw host=192.168.0.102 user=postgres password=PWD"
Starting bdr_init_physical ...
Getting remote server identification ...
Creating replication slot on remote node ...
Creating base backup of the remote node ...
38798/38798 kB (100%), 1/1 tablespace
Creating temporary synchronization replication slot on remote node ...
Bringing local node to the target lsn ...
I see on log, that this command is in idle:
Feb 15 10:16:43 localhost postgres[8080]: [11-1] 2023-02-15 10:16:43.685 CET [8080] LOG: logical decoding found consistent point at 0/405EA10
Feb 15 10:16:43 localhost postgres[8080]: [11-2] 2023-02-15 10:16:43.685 CET [8080] DETAIL: There are no running transactions.
Feb 15 10:16:43 localhost postgres[8080]: [11-3] 2023-02-15 10:16:43.685 CET [8080] STATEMENT: SELECT pg_catalog.pg_create_logical_replication_slot('bdr_lmw_lmw_bdr_node_rm1_02', 'pglogical_output') -- bdr_init_physical
Feb 15 10:16:43 localhost postgres[8081]: [11-1] 2023-02-15 10:16:43.734 CET [8081] LOG: using default exclude directory 0xa0a220 0xa0a220
Feb 15 10:16:43 localhost postgres[8081]: [11-2] 2023-02-15 10:16:43.734 CET [8081] STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS FAST NOWAIT MANIFEST 'yes'
Feb 15 10:16:45 localhost postgres[8078]: [11-1] 2023-02-15 10:16:45.851 CET [8078] LOG: logical decoding found consistent point at 0/6000028
Feb 15 10:16:45 localhost postgres[8078]: [11-2] 2023-02-15 10:16:45.851 CET [8078] DETAIL: There are no running transactions.
Feb 15 10:16:45 localhost postgres[8078]: [11-3] 2023-02-15 10:16:45.851 CET [8078] STATEMENT: CREATE_REPLICATION_SLOT "bdr_lmw_lmw_bdr_node_rm1_02_tmp" TEMPORARY LOGICAL pglogical_output
Feb 15 10:16:45 localhost postgres[8078]: [12-1] 2023-02-15 10:16:45.851 CET [8078] LOG: exported logical decoding snapshot: "00000008-0000002C-1" with 0 transaction IDs
Feb 15 10:16:45 localhost postgres[8078]: [12-2] 2023-02-15 10:16:45.851 CET [8078] STATEMENT: CREATE_REPLICATION_SLOT "bdr_lmw_lmw_bdr_node_rm1_02_tmp" TEMPORARY LOGICAL pglogical_output
Do you know why this command is in idle?
I am trying to restore a PostgreSQL database to a point in time.
When I am using only restore_command in recovery.conf then its working fine.
restore_command = 'cp /var/lib/pgsql/pg_log_archive/%f %p'
When I am using the recovery_target_time parameter, it is not restoring to the target time.
restore_command = 'cp /var/lib/pgsql/pg_log_archive/%f %p'
recovery_target_time='2018-06-05 06:43:00.0'
Below is the log file content:
2018-06-05 07:31:39.166 UTC [22512] LOG: database system was interrupted; last known up at 2018-06-05 06:35:52 UTC
2018-06-05 07:31:39.664 UTC [22512] LOG: starting point-in-time recovery to 2018-06-05 06:43:00+00
2018-06-05 07:31:39.671 UTC [22512] LOG: restored log file "00000005.history" from archive
2018-06-05 07:31:39.769 UTC [22512] LOG: restored log file "00000005000000020000008F" from archive
2018-06-05 07:31:39.816 UTC [22512] LOG: redo starts at 2/8F000028
2018-06-05 07:31:39.817 UTC [22512] LOG: consistent recovery state reached at 2/8F000130
2018-06-05 07:31:39.818 UTC [22510] LOG: database system is ready to accept read only connections
2018-06-05 07:31:39.912 UTC [22512] LOG: restored log file "000000050000000200000090" from archive
2018-06-05 07:31:39.996 UTC [22512] LOG: recovery stopping before abort of transaction 9525, time 2018-06-05 06:45:02.088502+00
2018-06-05 07:31:39.996 UTC [22512] LOG: recovery has paused
I am trying to restore the database instance to 06:43:00. Why is it recovering up to 06:45:02?
EDIT
In first scenario recovery.conf converted into recovery.done but this didn't happen in second scenario
What could be the reason of this?
You forgot to set
recovery_target_action = 'promote'
After point-in-time-recovery, recovery_target_action determines how PostgreSQL will proceed.
The default value is pause which means that PostgreSQL will do nothing and wait for you to tell it how to proceed.
To complete recovery, connect to the database and run
SELECT pg_wal_replay_resume();
It seems that there has been no database activity logged between 06:43:00 and 06:45:02. Observe that the log says recovery stopping before abort of transaction 9525.
I've setup streaming replication with postgres 9.3
My problem is that on the Slave server the pg_xlog folder just gets fuller and fuller and WAL files are not getting recycled.
The slave server has the following (relevant) values in postgresql.conf on slave server:
wal_keep_segments = 150
hot_standby = on
checkpoint_segments = 32
checkpoint_completion_target = 0.9
archive_mode = off
#archive_command = ''
My initial replication command was:
pg_basebackup --xlog-method=stream -h <master-ip> -D . --username=replication --password
So I guess my WAL files are OK.
Here is my slave server startup log:
2017-05-08 09:55:31 IDT LOG: database system was shut down in recovery at 2017-05-08 09:55:19 IDT
2017-05-08 09:55:31 IDT LOG: entering standby mode
2017-05-08 09:55:31 IDT LOG: redo starts at 361/C76DD3E8
2017-05-08 09:55:31 IDT LOG: consistent recovery state reached at 361/C89A8278
2017-05-08 09:55:31 IDT LOG: database system is ready to accept read only connections
2017-05-08 09:55:31 IDT LOG: record with zero length at 361/C89A8278
2017-05-08 09:55:31 IDT LOG: started streaming WAL from primary at 361/C8000000 on timeline 1
2017-05-08 09:55:32 IDT LOG: incomplete startup packet
2017-05-08 09:58:34 IDT LOG: received SIGHUP, reloading configuration files
2017-05-08 09:58:34 IDT LOG: parameter "checkpoint_completion_target" changed to "0.9"
I even tried to copy older WAL files from master server manually to slave but that also didn't help.
What am I doing wrong? How can I stop the pg_xlog folder from growing indefinitely?
Is it related to the "incomplete startup packet" log message?
one last thing: under the pg_xlog\archive_status folder all of the WAL files are with .done suffix.
Appreciate any help I can get on this.
Edit:
I enabled log_checkpoints in postgresql.conf.
Here are the relevant log entries since I enabled it:
2017-05-12 08:43:11 IDT LOG: parameter "log_checkpoints" changed to "on"
2017-05-12 08:43:24 IDT LOG: checkpoint complete: wrote 2128 buffers (0.9%); 0 transaction log file(s) added, 0 removed, 9 recycled; write=189.240 s, sync=0.167 s, total=189.549 s; sync files=745, longest=0.010 s, average=0.000 s
2017-05-12 08:45:15 IDT LOG: checkpoint starting: time
2017-05-12 08:48:46 IDT LOG: checkpoint complete: wrote 15175 buffers (6.6%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=209.078 s, sync=1.454 s, total=210.617 s; sync files=769, longest=0.032 s, average=0.001 s
2017-05-12 08:50:15 IDT LOG: checkpoint starting: time
2017-05-12 08:53:45 IDT LOG: checkpoint complete: wrote 2480 buffers (1.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=209.162 s, sync=0.991 s, total=210.253 s; sync files=663, longest=0.076 s, average=0.001 s
Edit2:
Following the fact that my slave server has no restart points in the log, here is the relevant log for starting and recovering WALS in slave server before achieving consistent recovery state:
2017-05-12 09:35:42 IDT LOG: database system was shut down in recovery at 2017-05-12 09:35:41 IDT
2017-05-12 09:35:42 IDT LOG: entering standby mode
2017-05-12 09:35:42 IDT LOG: incomplete startup packet
2017-05-12 09:35:43 IDT FATAL: the database system is starting up
2017-05-12 09:35:43 IDT LOG: restored log file "0000000100000369000000B1" from archive
2017-05-12 09:35:43 IDT FATAL: the database system is starting up
2017-05-12 09:35:44 IDT FATAL: the database system is starting up
2017-05-12 09:35:44 IDT LOG: restored log file "0000000100000369000000AF" from archive
2017-05-12 09:35:44 IDT LOG: redo starts at 369/AFD28900
2017-05-12 09:35:44 IDT FATAL: the database system is starting up
2017-05-12 09:35:45 IDT FATAL: the database system is starting up
2017-05-12 09:35:45 IDT FATAL: the database system is starting up
2017-05-12 09:35:46 IDT LOG: restored log file "0000000100000369000000B0" from archive
2017-05-12 09:35:46 IDT FATAL: the database system is starting up
2017-05-12 09:35:46 IDT FATAL: the database system is starting up
2017-05-12 09:35:47 IDT FATAL: the database system is starting up
2017-05-12 09:35:47 IDT LOG: restored log file "0000000100000369000000B1" from archive
2017-05-12 09:35:47 IDT FATAL: the database system is starting up
2017-05-12 09:35:48 IDT FATAL: the database system is starting up
2017-05-12 09:35:48 IDT LOG: incomplete startup packet
2017-05-12 09:35:49 IDT LOG: restored log file "0000000100000369000000B2" from archive
2017-05-12 09:35:50 IDT LOG: restored log file "0000000100000369000000B3" from archive
2017-05-12 09:35:52 IDT LOG: restored log file "0000000100000369000000B4" from archive
.
.
.
2017-05-12 09:42:33 IDT LOG: restored log file "000000010000036A000000C0" from archive
2017-05-12 09:42:35 IDT LOG: restored log file "000000010000036A000000C1" from archive
2017-05-12 09:42:36 IDT LOG: restored log file "000000010000036A000000C2" from archive
2017-05-12 09:42:37 IDT LOG: restored log file "000000010000036A000000C3" from archive
2017-05-12 09:42:37 IDT LOG: consistent recovery state reached at 36A/C3ACEB28
2017-05-12 09:42:37 IDT LOG: database system is ready to accept read only connections
2017-05-12 09:42:39 IDT LOG: restored log file "000000010000036A000000C4" from archive
2017-05-12 09:42:40 IDT LOG: restored log file "000000010000036A000000C5" from archive
2017-05-12 09:42:42 IDT LOG: restored log file "000000010000036A000000C6" from archive
ERROR: WAL file '000000010000036A000000C7' not found in server 'main-db-server'
2017-05-12 09:42:42 IDT LOG: started streaming WAL from primary at 36A/C6000000 on timeline 1
Thanks!
The problem seems to have been resolved.
Apparently I had hardware issues on the master server.
I was able to perform full pg_dump and re-index my DB so I was pretty sure I did not have any data integrity issues.
But when looking at the master server logs after I've enabled log_checkpoints in the config - a few minutes before the slave server stopped performing checkpoints I saw the following message:
IDT ERROR: failed to re-find parent key in index "<table_name>_id_udx" for split pages 17/18
After seeing that - I decided to switch hosting provider and moved my DB to a new server.
Since then (almost a week now) - everything has been running smoothly replication and checkpoints are running as expected.
I really hope this will help other people - but when something like this is happening - always be advised that this issue might be caused by data integrity/hardware issues.
I've just stumbled upon this error while testing failover of a PostgreSQL 9.4 cluster I've set up. Here I'm trying to promote a slave to be the new master:
$ repmgr -f /etc/repmgr/repmgr.conf --verbose standby promote
2014-09-22 10:46:37 UTC LOG: database system shutdown was interrupted; last known up at 2014-09-22 10:44:02 UTC
2014-09-22 10:46:37 UTC LOG: database system was not properly shut down; automatic recovery in progress
2014-09-22 10:46:37 UTC LOG: redo starts at 0/18000028
2014-09-22 10:46:37 UTC LOG: consistent recovery state reached at 0/19000600
2014-09-22 10:46:37 UTC LOG: record with zero length at 0/1A000090
2014-09-22 10:46:37 UTC LOG: redo done at 0/1A000028
2014-09-22 10:46:37 UTC LOG: last completed transaction was at log time 2014-09-22 10:36:22.679806+00
2014-09-22 10:46:37 UTC FATAL: could not open directory "pg_logical/snapshots": No such file or directory
2014-09-22 10:46:37 UTC LOG: startup process (PID 2595) exited with exit code 1
2014-09-22 10:46:37 UTC LOG: aborting startup due to startup process failure
pg_logical/snapshots dir in fact exists on master node and it is empty.
UPD: I've just manually created empty directories pg_logical/snapshots and pg_logical/mappings and server has started without complaining. repmgr standby clone seems to omit this dirs while syncing. But the question still remains because I'm just curious what this directory is for, maybe I'm missing something in my setup. Simply Googling it did not yield any meaningful results.
It's for the new logical changeset extraction / logical replication feature in 9.4.
This shouldn't happen, though... it suggests a significant bug somewhere, probably repmgr. I'll wait for details (repmgr version, etc).
Update: Confirmed, it's a repmgr bug. It's fixed in git master already (and was before this report) and will be in the next release. Which had better be soon, given the significance of this issue.