Is it mandatory parameter recover_target_timeline='latest' in switchover and switchback in PostgreSQL 9.4.1? - postgresql

I have followed below steps for switchover and switchback.
Step 1:-
Disconnect application services from 10.x.x.10 and do the following
#Master(10.x.x.10)
pg_ctl -D /DATA_VEC/pgdata stop --mode=fast
#DR(20.x.x.20)
promote DR as read write mode
Step 2:- Start master as DR from new primary
#Master(10.x.x.10)
create recovery.conf
standby_mode = 'on'
primary_conninfo = 'user= password= host=20.x.x.20 port=9999
trigger_file = '/tmp/node1'
restore_command = 'cp /DATA_VEC/restore/%f "%p"'
pg_ctl -D /DATA_VEC/pgdata start
after promotion new standby, old primary not getting syncing with new primary server(old standby)
Logs from DR server which was primary.
2019-12-01 18:46:56 IST LOG: database system was shut down in recovery at 2019-12-01 18:46:53 IST
2019-12-01 18:46:56 IST LOG: entering standby mode
cp: cannot stat `/DATA_VEC/restore/00000002.history': No such file or directory
2019-12-01 18:46:56 IST LOG:
2019-12-01 18:46:56 IST LOG: restored log file "00000002000000000000000C" from archive
2019-12-01 18:46:57 IST LOG: consistent recovery state reached at 0/C000090
2019-12-01 18:46:57 IST LOG: record with zero length at 0/C000090
2019-12-01 18:46:57 IST LOG: database system is ready to accept read only connections
2019-12-01 18:46:57 IST LOG: started streaming WAL from primary at 0/C000000 on timeline 2
2019-12-01 18:46:57 IST LOG: replication terminated by primary server
2019-12-01 18:46:57 IST DETAIL: End of WAL reached on timeline 2 at 0/C000090.
2019-12-01 18:46:57 IST LOG: restored log file "00000002000000000000000C" from archive
2019-12-01 18:46:57 IST LOG: record with zero length at 0/C000090
2019-12-01 18:46:57 IST LOG: restarted WAL streaming at 0/C000000 on timeline 2
2019-12-01 18:46:57 IST LOG: replication terminated by primary server
2019-12-01 18:46:57 IST DETAIL: End of WAL reached on timeline 2 at 0/C000090.
#Master(10.x.x.10)
Pg_xlog content
-bash-4.1$ cd pg_xlog
-bash-4.1$ ll
total 65552
-rw------- 1 postgres postgres 302 Dec 1 12:52 00000002000000000000000A.00000028.backup
-rw------- 1 postgres postgres 16777216 Dec 1 13:52 00000002000000000000000B
-rw------- 1 postgres postgres 16777216 Dec 1 14:28 00000002000000000000000C
-rw------- 1 postgres postgres 16777216 Dec 1 12:52 00000002000000000000000D
-rw------- 1 postgres postgres 16777216 Dec 1 12:52 00000002000000000000000E
-rw------- 1 postgres postgres 41 Dec 1 13:57 00000002.history
-rw------- 1 postgres postgres 83 Dec 1 13:57 00000003.history
drwx------ 2 postgres postgres 4096 Dec 1 13:57 archive_status
#in restore_command location content:-
-bash-4.1$ cd /DATA_VEC/restore/
-bash-4.1$ ll
total 49156
-rw------- 1 postgres postgres 16777216 Dec 1 18:45 00000002000000000000000A
-rw------- 1 postgres postgres 16777216 Nov 30 21:22 00000002000000000000000B
-rw------- 1 postgres postgres 16777216 Dec 1 18:45 00000002000000000000000C
-rw------- 1 postgres postgres 83 Dec 1 18:45 00000003.history
-bash-4.1$
as per_pg_xlog timeline history file 00000003.historyarrived at standby still not starting streaming from new primary.
Question:-
1. Is it mandatory parameter recover_target_timeline='latest' in recovery.conf file" to get latest timeline id from new primary through streaming to start streaming replication?
2.If yes, is it for all PostgreSQL Version? like from 9.3 to 11.5

If you want switch-back functionality, you will have to set recovery_target_timeline='latest', as any promotion will increment the timeline. Using a fixed target timeline is usually reserved for very specific cases (i.e., you need to recover changes after a split-brain, diverged-timeline scenario).
To answer your specific questions:
Yes
Yes

Related

Postgres bdr_init_physical command is in idle

I've installed, correctly Postgres13 with BDR. First node is configured correctly
create_node
-------------
661510928
(1 row)
create_node_group
-------------------
3209631483
(1 row)
wait_for_join_completion
--------------------------
ACTIVE
(1 row)
The problem is on second node, if I try to join the node 1, with command:
bdr_init_physical -D /home/postgres/data -n bdr_node_rm1_02 --local-dsn="port=5432 dbname=lmw host=192.168.0.101 user=postgres password=PWD" -d "port=5432 dbname=lmw host=192.168.0.102 user=postgres password=PWD"
Starting bdr_init_physical ...
Getting remote server identification ...
Creating replication slot on remote node ...
Creating base backup of the remote node ...
38798/38798 kB (100%), 1/1 tablespace
Creating temporary synchronization replication slot on remote node ...
Bringing local node to the target lsn ...
I see on log, that this command is in idle:
Feb 15 10:16:43 localhost postgres[8080]: [11-1] 2023-02-15 10:16:43.685 CET [8080] LOG: logical decoding found consistent point at 0/405EA10
Feb 15 10:16:43 localhost postgres[8080]: [11-2] 2023-02-15 10:16:43.685 CET [8080] DETAIL: There are no running transactions.
Feb 15 10:16:43 localhost postgres[8080]: [11-3] 2023-02-15 10:16:43.685 CET [8080] STATEMENT: SELECT pg_catalog.pg_create_logical_replication_slot('bdr_lmw_lmw_bdr_node_rm1_02', 'pglogical_output') -- bdr_init_physical
Feb 15 10:16:43 localhost postgres[8081]: [11-1] 2023-02-15 10:16:43.734 CET [8081] LOG: using default exclude directory 0xa0a220 0xa0a220
Feb 15 10:16:43 localhost postgres[8081]: [11-2] 2023-02-15 10:16:43.734 CET [8081] STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS FAST NOWAIT MANIFEST 'yes'
Feb 15 10:16:45 localhost postgres[8078]: [11-1] 2023-02-15 10:16:45.851 CET [8078] LOG: logical decoding found consistent point at 0/6000028
Feb 15 10:16:45 localhost postgres[8078]: [11-2] 2023-02-15 10:16:45.851 CET [8078] DETAIL: There are no running transactions.
Feb 15 10:16:45 localhost postgres[8078]: [11-3] 2023-02-15 10:16:45.851 CET [8078] STATEMENT: CREATE_REPLICATION_SLOT "bdr_lmw_lmw_bdr_node_rm1_02_tmp" TEMPORARY LOGICAL pglogical_output
Feb 15 10:16:45 localhost postgres[8078]: [12-1] 2023-02-15 10:16:45.851 CET [8078] LOG: exported logical decoding snapshot: "00000008-0000002C-1" with 0 transaction IDs
Feb 15 10:16:45 localhost postgres[8078]: [12-2] 2023-02-15 10:16:45.851 CET [8078] STATEMENT: CREATE_REPLICATION_SLOT "bdr_lmw_lmw_bdr_node_rm1_02_tmp" TEMPORARY LOGICAL pglogical_output
Do you know why this command is in idle?

postgresql archive permission denied

We have installed postgres v12 on ubuntu 20.04 (with apt install -y postgresql postgresql-contrib) and wish to enable archiving to /data/db/postgres/archive by setting the following in postgresql.conf:
max_wal_senders=2
wal_keep_segments=256
wal_sender_timeout=60s
archive_mode=on
archive_command=cp %p /data/db/postgres/archive/%f
However the postgres service fails to write there:
2022-11-15 15:02:26.212 CET [392860] FATAL: archive command failed with exit code 126
2022-11-15 15:02:26.212 CET [392860] DETAIL: The failed archive command was: archive_command=cp pg_wal/000000010000000000000002 /data/db/postgres/archive/000000010000000000000002
2022-11-15 15:02:26.213 CET [392605] LOG: archiver process (PID 392860) exited with exit code 1
sh: 1: pg_wal/000000010000000000000002: Permission denied
This directory /data/db/postgres/archive/ is owned by the postgres user and when we su postgres we are able to create and delete files without a problem.
Why can the postgresql service (running as postgres) not write to a directory it owns?
Here are the permissions on all the parents of the archive directory:
drwxr-xr-x 2 postgres root 6 Nov 15 14:59 /data/db/postgres/archive
drwxr-xr-x 3 root root 21 Nov 15 14:29 /data/db/postgres
drwxr-xr-x 3 root root 22 Nov 15 14:29 /data/db
drwxr-xr-x 5 root root 44 Nov 15 14:29 /data
2022-11-15 15:02:26.212 CET [392860] DETAIL: The failed archive command was: archive_command=cp pg_wal/000000010000000000000002 /data/db/postgres/archive/000000010000000000000002
So, your archive_command is apparently set to the peculiar string archive_command=cp %p /data/db/postgres/archive/%f.
After the %variables are substituted, the result is passed to the shell. The shell does what it was told, which is to set the (unused) environment variable 'archive_command' to be 'cp', and then tries to execute the file pg_wal/000000010000000000000002, which is not allowed to because it doesn't have the execute bit set.
I don't know how you managed to get such a deformed archive_command, but it didn't come from anything you showed us.

PostgreSQL keeps WAL segments not required by any replication slot

I have wal_keep_segments set to 3000. But directory pg_xlog contains more than 6000 WAL segments. Interesting thing that there are ~ 3000 files dated after Aug 14, so files dated before Aug 14 should not be exists, I guess. Also these files have an executable bit set.
$ ls -al pg_xlog | grep -A2 -B2 00000001000034DB0000003B
-rwx------ 1 postgres postgres 16777216 Jul 19 07:58 00000001000034DB00000039
-rwx------ 1 postgres postgres 16777216 Jul 19 07:58 00000001000034DB0000003A
-rwx------ 1 postgres postgres 16777216 Jul 19 07:58 00000001000034DB0000003B
-rw------- 1 postgres postgres 16777216 Aug 14 19:17 0000000100003826000000EA
-rw------- 1 postgres postgres 16777216 Aug 14 19:17 0000000100003826000000EB
```
This cluster has no replication slots, archive_mode is enabled but archive_command is set to /bin/true. I think the new WAL segments are recycled and total amount is about 6000 but postgres does not delete the old files for some reason. Any ideas?
PostgreSQL is not in the habit of setting executable flags on WAL segments.
Besides, it looks like there is a gap in the numbering.
These files must be there by accident, you can delete them.

Postgresql Start failed

psql (PostgreSQL) 9.5.5
sorry for my eng lang.
can't connect to DataBase.
postgresql.service - LSB: PostgreSQL RDBMS server
Loaded: loaded (/etc/init.d/postgresql; bad; vendor preset: enabled)
Active: active (exited) since Thu 2016-12-01 00:53:23 UTC; 2s ago
Docs: man:systemd-sysv-generator(8)
Process: 25257 ExecStop=/etc/init.d/postgresql stop (code=exited, status=0/SUCCESS)
Process: 24764 ExecReload=/etc/init.d/postgresql reload (code=exited, status=0/SUCCESS)
Process: 25293 ExecStart=/etc/init.d/postgresql start (code=exited, status=0/SUCCESS)
Main PID: 1058 (code=exited, status=0/SUCCESS)
Dec 01 00:53:23 Ubuntu-1604-xenial-64-minimal systemd[1]: Starting LSB: PostgreSQL RDBMS server...
Dec 01 00:53:23 Ubuntu-1604-xenial-64-minimal systemd[1]: Started LSB: PostgreSQL RDBMS server.
Try to connect:
psql -h localhost -p 5432 -U postgres -W
Password for user postgres:
psql: could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
In my postgresql.conf i have listen all = '*' and port 5432
sudo netstat -pant | grep postgres - nothing show
root#Ubuntu-1604-xenial-64-minimal /var/log # tail postgresql/postgresql-9.5-main.log
2016-11-28 23:58:21 UTC [897-3] LOG: invalid record length at 0/14CC9C90
2016-11-28 23:58:21 UTC [897-4] LOG: redo is not required
2016-11-28 23:58:21 UTC [897-5] LOG: MultiXact member wraparound protections are now enabled
2016-11-28 23:58:21 UTC [847-1] LOG: database system is ready to accept connections
2016-11-28 23:58:21 UTC [909-1] LOG: autovacuum launcher started
2016-11-28 23:58:21 UTC [915-1] [unknown]#[unknown] LOG: incomplete startup packet
2016-11-29 22:43:00 UTC [847-2] LOG: received smart shutdown request
2016-11-29 22:43:00 UTC [909-2] LOG: autovacuum launcher shutting down
2016-11-29 22:43:00 UTC [906-1] LOG: shutting down
2016-11-29 22:43:00 UTC [906-2] LOG: database system is shut down
postgres#Ubuntu-1604-xenial-64-minimal:~/9.5/main$ ls
base global pg_clog pg_commit_ts pg_dynshmem pg_logical pg_multixact pg_notify pg_replslot pg_serial pg_snapshots pg_stat pg_stat_tmp pg_subtrans pg_tblspc pg_twophase PG_VERSION pg_xlog postgresql.auto.conf postmaster.opts
And
root#Ubuntu-1604-xenial-64-minimal ~ # sudo systemctl start postgresql
root#Ubuntu-1604-xenial-64-minimal ~ # sudo su - postgres -c psql
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
After reboot check status
● postgresql.service - LSB: PostgreSQL RDBMS server
Loaded: loaded (/etc/init.d/postgresql; bad; vendor preset: enabled)
Active: active (exited) since Thu 2016-12-01 02:21:43 UTC; 13min ago
Docs: man:systemd-sysv-generator(8)
Dec 01 02:21:43 Ubuntu-1604-xenial-64-minimal systemd[1]: Starting LSB: PostgreSQL RDBMS server...
Dec 01 02:21:43 Ubuntu-1604-xenial-64-minimal systemd[1]: Started LSB: PostgreSQL RDBMS server.
Dec 01 02:25:56 Ubuntu-1604-xenial-64-minimal systemd[1]: Started LSB: PostgreSQL RDBMS server.
I would look at /var/log to see if it wrote a log.
If it did not, I would attempt to start it manually to the database:
su - postgres
postgres -d 5 -D /var/db/postgres/data96
The -d 5 command sets debugging to level 5. The -D command tells PostgreSQL where the database files are. The above directory is the location where PostgreSQL 9.6 runs on FreeBSD. If you are running Ubuntu, the directory should be /var/lib/postgresql/[PostgreSQL version]/data/. The PostgreSQL default data directory is /usr/local/pgsql/data.

Postgres restore from WAL files without having a basebackup using pg_basebackup

I have the following situation.
Have a master/replica setup.
Somehow the database was dropped and a new database with the same name was created by django. in this case, will the WAL files of the previous database still be there?
I have not created a backup using of the previous database using a tool like pg_basebackup, but do have some WAL files in the pg_xlog.
Now, i am trying to do the following:
- Shutdown the postgres server.
- Use the recovery.conf file in PGDATA (/var/lib/postgresql/9.3/main) directory and entering the following in there:
restore_command = 'cp /var/lib/postgresql/9.3/main/pg_xlog_backup_jan072016/%f %p'
recovery_target_time = '2016-01-07 03:00:00'
- Startup the postgres server again.
What I see in the log file is:
postgres:~/9.3/main/pg_log$ tail -100f postgresql-2016-01-07_170256.log
2016-01-07 17:02:56 UTC LOG: database system was shut down at 2016-01-07 17:02:55 UTC
2016-01-07 17:02:56 UTC LOG: starting point-in-time recovery to 2016-01-06 00:00:00+00
cp: cannot stat ‘/var/lib/postgresql/9.3/main/pg_xlog_backup_jan072016/0000000A.history’: No such file or directory
cp: cannot stat ‘/var/lib/postgresql/9.3/main/pg_xlog_backup_jan072016/0000000A00000000000000CC’: No such file or directory
2016-01-07 17:02:56 UTC LOG: consistent recovery state reached at 0/CC000090
2016-01-07 17:02:56 UTC LOG: record with zero length at 0/CC000090
2016-01-07 17:02:56 UTC LOG: redo is not required
2016-01-07 17:02:56 UTC LOG: database system is ready to accept read only connections
cp: cannot stat ‘/var/lib/postgresql/9.3/main/pg_xlog_backup_jan072016/0000000A00000000000000CC’: No such file or directory
cp: cannot stat ‘/var/lib/postgresql/9.3/main/pg_xlog_backup_jan072016/0000000B.history’: No such file or directory
2016-01-07 17:02:56 UTC LOG: selected new timeline ID: 11
cp: cannot stat ‘/var/lib/postgresql/9.3/main/pg_xlog_backup_jan072016/0000000A.history’: No such file or directory
2016-01-07 17:02:57 UTC LOG: archive recovery complete
2016-01-07 17:02:57 UTC LOG: autovacuum launcher started
2016-01-07 17:02:57 UTC LOG: database system is ready to accept connections
2016-01-07 17:02:57 UTC LOG: incomplete startup packet
My pg_xlog_backup_jan072016 folder has the foll content.:
:~/9.3/main/pg_xlog_backup_jan072016$ ls -larth
total 161M
-rw------- 1 postgres postgres 257 Jan 7 15:57 00000007.history
-rw------- 1 postgres postgres 16M Jan 7 15:57 0000000700000000000000CA
-rw------- 1 postgres postgres 16M Jan 7 15:57 0000000700000000000000C9
-rw------- 1 postgres postgres 16M Jan 7 15:57 0000000700000000000000C8
-rw------- 1 postgres postgres 16M Jan 7 15:57 0000000700000000000000C7
-rw------- 1 postgres postgres 16M Jan 7 15:57 0000000700000000000000C6
-rw------- 1 postgres postgres 16M Jan 7 15:57 0000000700000000000000C5
-rw------- 1 postgres postgres 16M Jan 7 15:57 0000000700000000000000C4
-rw------- 1 postgres postgres 16M Jan 7 15:57 0000000700000000000000C3
-rw------- 1 postgres postgres 16M Jan 7 15:57 0000000700000000000000C2
-rw------- 1 postgres postgres 16M Jan 7 15:57 0000000700000000000000C1
-rw------- 1 postgres postgres 298 Jan 7 15:57 00000007000000000000005D.00000028.backup
-rw------- 1 postgres postgres 214 Jan 7 15:57 00000006.history
-rw------- 1 postgres postgres 171 Jan 7 15:57 00000005.history
drwx------ 3 postgres postgres 4.0K Jan 7 16:54 .
drwx------ 18 postgres postgres 4.0K Jan 7 17:02 ..
drwx------ 2 postgres postgres 4.0K Jan 7 17:08 archive_status
The thing I am trying to figure out is:
Is it possible to restore from WAL without have a backup from pg_basebackup command? I am just trying to restore from WAL in an existing postgres installation.
Why does the log say that it cannot find 0000000A.history and 0000000A00000000000000CC files? The folder does not contain any such files.
Can anyone help us with this?
Thanks.
Sarthak