Postgresql 10 replication mode error - postgresql

I'm trying to setup a basic master slave configuration using streaming replication for postgres 10 and docker
Since the official docker image provides a docker-entrypoint-initdb.d folder for placing initialization scripts i thought it would be a swell idea to start placing my preparation code here.
What i'm trying to do is automate the way the database is restored before starting the slave in standby mode, so i run
rm -rf /var/lib/postgresql/data/* && pg_basebackup 'host=postgres-master port=5432 user=foo password=foo' -D /var/lib/postgresql/data/
and this succeeds.
Then the server is shutdown and restarted as per the docker initialization script, which pops up a message saying
database system identifier differs between the primary and standby
Now I've been sitting online for a while now, and the only 2 explanations i got is that I either have a misconfigured recovery.conf file, which looks like this
standby_mode = 'on'
primary_conninfo = 'host=postgres-master port=5432 user=foo password=foo'
trigger_file = '/tmp/postgresql.trigger'
Where the connection string is the same one i used for the base backup.
The second solution circulating is that the backup command could be messed up, but the only thing i add to the data folder after backup is the recovery.conf file.
Anyone have any idea where i'm messing up?
Should i just go for repmgr and call it a day?
Thanks in advance

To give an answer to my own question, the issue lied in how the dockerfile entrypoint scripts were called. Rather, they would end up starting the instance as a master or slave according to variables that were not properly set by me.

Related

PostgreSQL restoration throwing error : replication slot does not exist

Environment: Postgresql 13.x (dockerized)
I was trying to test the DR setup for PostgreSQL nodes.
pg_basebackup and wal_files archive was taken from the standby mode.
Done restoration on a new node by copying pg_basebackup and configured postgresql.conf to use restore_command pointing to walfiles archive.
#----------------------- RECOVERY CONFIGS -----------------------
restore_command = 'cp /db-restore/mydb/walfiles/%f "%p"'
recovery_target_timeline = 'latest'
recovery_target_action = promote
recovery seems to be fine. Some random select queries returning correct results.
But logfile is throwing below error frequently.
2022-04-19 10:19:53 UTC [291] rep_usr#[unknown] ERROR: replication slot "slot_name" does not exist
2022-04-19 10:19:58 UTC [296] rep_usr#[unknown] ERROR: replication slot "slot_name" does not exist
As I have taken backup from standby, is this restoration making new node as a standby and looking for the replication_slot it used in the previous generation?
How can I make new node as a Master (remove replication_slot info)
What are the proper steps to recover if the backup was taken from standby.
I have 1 master and 2 standby nodes. And planning to take a backup from a standby. So is there any specific changes required for archive_mode and archive_command when using this on a standby node? Current commands:
archive_mode = always
archive_level = logical
archive_command = 'test ! -f /db-archives/walfiles/%f && cp %p /db-archives/walfiles/%f'"
Could someone help with this? Any pointers?
I am sure, db-backup will have info about replication_slot and connection_info as the pg_basebackup itself is a clone of entire DB. To revert configs, I am manually removing postgresql.auto.conf in main directory which contains above parameters.
So how can I remove any other references of replication_slot if there are any in the DB backup?
These error messages don't seem to be thrown by recovery, but by some other tool that connects as database user rep_usr.
Create the replication slot if your application needs it!
I removed all configs and started with fresh.
removed main/postgresql.auto.conf which was present in the backup.
main/postgresql.auto.conf is present in standby nodes when we take pg_basebackup. contains the configs used for pg_basebackup in standby nodes. (slot_name, and connect_info).
As I was restoring backup from standby to a Master, I don't need that postgresql.auto.conf.

How to take periodic pg_basebackup without losing any WAL files. How to pause wal archive

Environment: PostgreSQL 13.x Docker container.
I took a pg_basebackup and have configured PostgreSQL 13.x with wal_archive=on. And it is working as expected.
I see that it is recommended to take pg_basebackup periodically. How can I rotate the base_backups weekly or daily?
Example: If new pg_basebackup is running every Saturday night, Should we consider stopping/pausing wal_archiving for that duration?
#Locations:
pg_basebackup : /db-backup/basebackup
archive_command: /db-backup/wal_files
So want to move archive db-backup every Saturday.
mv /db-backup /db-backup-old
While performing these Should I pause the wal_archiving process? As per docs
we can stop/pause it by setting 24.3.1. Setting up WAL archiving
archive_command = ''
Is this the right approach? If so, should we reload the configuration OR any way we can update this configuration on-the-fly?
Note: using Postgres-docker container.
What I am trying to achieve is:
If some data is getting written on DB during DB backup rotation, either it should be in new basebackup OR new wal-files directory.
Please correct me if these confusions are irrelevant.

Postgres Recovery Failure

What I am trying to accomplish is a recovery using a continuous archive backup.
I am running a vm of CentOS 6.8 and Postgres 9.1 Postgres 9.1 is the same as the DB that I am pulling from.
I installed Postgres and initialized the DB, started up fine.
Then, following these directions: https://www.postgresql.org/docs/9.3/static/continuous-archiving.html
Stopped the destination pSQL server (as root: service postgresql-9.1 stop)
Copied the destination cluster data folder to the side (as postgres)
Removed the cluster data files (as postgres)
Copied in my source data folder (as postgres)
Copied WAL files into a clean pg_xlog folder under the data folder (as postgres)
Created a recovery.conf file which contained:
restore_command = 'cp /var/lib/pgsql/database_sample_backup/wal_archives/0A/%f %p'
This being another location for the WAL files other than the copy I placed in pg_xlog (was not sure if I needed both)
But when I attempt to restart my server, it fails. (as root: service postgresql-9.1 start)
My pgstartup.log at one point spit out "runuser: cannot set groups: Operation not permitted" but it doesn't consistently do this with every attempt to start.
I've also tried turning off archiving and replication directive in postgres.conf (so that it can run stand alone) and tried copying over the pg_hba.conf from the new DB I had created to see if they would resolve the issue. Neither did.
I've also done a netstat -ntap | grep 5432 which confirmed that I don't have anything else running on the port.
What else can I provide in the form of details, and what else my I attempt in this restoration process.
Thank you for your help!

PostgreSQL DB after replication inaccessible - "role "postgres" does not exist"

We have master-slave replication on
PostgreSQL 9.4.9
CentOS 6.8
and till today we've had beautiful time with our replication between our two +- identical servers. But today I ran VACUUM FULL (on the master of course) which destroyed replication (as expected), but that was not supposed to be a big deal as we have "turned" the replication off and back on for so many times before. But this time it was different.
After executing our many-times-proved script (basically pg_start_backup(), full rsync of data/ directory (with some conf exludes) and pg_stop_backup()), the synchronization looked ok, but the slave DB has become no longer (RO-) accessible by psql. The error reads:
psql: FATAL: could not open file "global/12745": No such file or
directory
After a couple of re-runs I gave up and created empty global/12745 to see what's going to happen, but instead I am always getting
psql: FATAL: role "postgres" does not exist
Actually it seems, that no role we have on the master exists for the slave DB, and this is still true even after disabling replication.
So now, I have no idea how even to access the slave database.
At the same time, the master DB has no such issue, and "postgres" (or any other user we have) is functioning there perfectly.
I did many attempts, including complete removal of /var/lib/pgsql/9.4 directory and reinstall of rpms with initdb. (Fresh empty DB works fine on the slave.)
What could have gone wrong? Have my primary DB became somehow "non-replicable" anymore? That'd be pitty, as this is our primary mean of backup.
Any help is greatly appreciated. Thanks a lot.

Postgresql 9.0 streaming replication - processes not starting

I have followed the postgresql wiki binary replication tutorial and cannot get the wal_sender and wal_receiver processes to start on the master or slave server. I'm not seeing any relevant information in the log files to help. I'm able to connect via psql from my slave to my master server, so I'm relatively certain the connection configuration for SR has been setup correctly. Any pointers or tips on setting up SR without log shipping would be wonderful.
Assuming you have PG installed and everything the settings are:
On Master:
add to postgres.conf wal_level = hot_standby and max_wal_senders = 5 settings
add to pb_hba.conf host replication [insert uname] [insert slave ip]/32 trust
On Slave:
create recovery.conf file and add standby_mode = 'on' and primary_conninfo = 'host=localhost port=5432 user=eggie5 password=asdf'
Create baseline:
This is the hard part. You need to get a "snapshot" of the master data (directory) and get to to the slave so they start in synch. You can do this any number of ways: see this page for simple instructions: http://eggie5.com/15-setting-up-pg9-streaming-replication
I had the same problem. I traced the problem back to having used the Postgres-9.0 package that Martin Pitt provides (which I have used since Ubuntu 10.10 doesn't have Postgres-9* in it's package repository yet). I'm guessing that he didn't build the package with streaming replication support.
I have then downloaded and installed the binary package that PostgreSQL provides and the streaming replication started to work smoothly.