Postgresql 12 Replication Fail - postgresql

I'm trying to replicate database server which still running/active accepting requests from users such as inserting and updating.
I ran this command mentioned below to start copying my primary server to replication server:
root#replica:~#sudo -u postgres pg_basebackup -h [PRIMARY_IP] -D /var/lib/postgresql/12/main -U replication -P -v
The backup process had no error and finished, but I encounter error as follows when trying to start postgresql server.
root#replica:~#tail /var/log/postgresql/postgresql-12-main.log
2020-10-03 01:15:12.198 UTC [552567] FATAL: database system identifier differs between the primary and standby
2020-10-03 01:15:12.198 UTC [552567] DETAIL: The primary's identifier is 6805716485467355646, the standby's identifier is 6875279138564418280.
2020-10-03 01:15:17.204 UTC [552568] FATAL: database system identifier differs between the primary and standby
2020-10-03 01:15:17.204 UTC [552568] DETAIL: The primary's identifier is 6805716485467355646, the standby's identifier is 6875279138564418280.
2020-10-03 01:15:22.207 UTC [552570] FATAL: database system identifier differs between the primary and standby
2020-10-03 01:15:22.207 UTC [552570] DETAIL: The primary's identifier is 6805716485467355646, the standby's identifier is 6875279138564418280.
2020-10-03 01:15:27.212 UTC [552579] FATAL: database system identifier differs between the primary and standby
2020-10-03 01:15:27.212 UTC [552579] DETAIL: The primary's identifier is 6805716485467355646, the standby's identifier is 6875279138564418280.
2020-10-03 01:15:32.216 UTC [552581] FATAL: database system identifier differs between the primary and standby
2020-10-03 01:15:32.216 UTC [552581] DETAIL: The primary's identifier is 6805716485467355646, the standby's identifier is 6875279138564418280.
Any one have idea how to fix this things without trying pg_basebackup process again? since it took time and bandwidth for me.

Yes, I never did it replication before. I follow #jjanes suggestion, I delete my previous copy and then ran:
sudo -u postgres pg_basebackup -h [PRIMARY_IP] -D /var/lib/postgresql/12/main -U replication -P -v -R -X stream -C -S pgbackup1
And its working.

This answer is in context of Postgres 14.
I faced same error while bringing up a standby in streaming replication mode. In my case, these were the steps to reproduce this error:
Brought up an empty standby server.
All required configs were already in place on the standby to enable streaming replication.
Take back from standby with pg_basebackup utility as:
pg_basebackup -D /backup -F t -P -v -U replicator -X stream -w --no-password
Back up got successfully created. But, note that, in the above command, the -h option was left out, as a result, the back up got taken from the standby instead of the primary.
Stop the stand-by and clear its data directory.
Extract the /backup/base.tar to the empty standby data directory.
Create an empty standby.signal in the standby data directory.
Restart standby.
The standby fails to start with database system identifier differs between the primary and standby.
Execute step 3 with an additional -h option specifying the primary host.
Repeat step 4 to 8.
The standby comes up without error.
Also, it's important to note that the standby data directory gets fully replaced from the back up. The same error may re-appear, if standby data directory is not fully replaced from the backup.

Related

Postgresql 12 streaming replication: restored replica but still not working

Primary and salve server started showing a "requested WAL segment has already been removed" error after I've changed the PostgreSQL configs.
So, I decided to restore the backup from the primary server using the following steps:
Shut down the replica server.
Removed PostgreSQL data directory on the replica (/var/lib/postgresql/12/main)
Performed the base backup (sudo -u postgres pg_basebackup -h [PRIMARY_IP] -D /var/lib/postgresql/12/main -U replication -P -v -R) and completed successfully.
Started replica again.
But the error mentioned above still showing.
Configs on primary and slave:
wal_level = 'replica'
archive_mode = on
archive_command = 'cd .'
max_wal_senders = 48
wal_keep_segments = 50
hot_standby = on
When I run pg_basebackup .. command, I got this logs:
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 188/92000148 on timeline 1
pg_basebackup: starting background WAL receiver
I'm curious why it says from 188, not 0.
You need a replication slot to keep the primary from removing WAL that is still needed by the standby.
Create the replication slot with pg_create_logical_replication_slot on the primary.
Use the replication slot with the -S option of pg_basebackup.
Make sure primary_slot_name is set in the standby configuration. pg_basebackup's -R option will do that automatically.

How to bring back the replication system in postgresql if the master ip address has been changed in ubuntu?

Postgresql database replication has two servers one for master and the other for a slave. Due to some reason the master IP address got changed which was being used at several places in the slave server. With the new IP address, after replacing the old ones with the latest one in the slave server the replication is not working as before. Can someone help to resolve this issue?
Following are the steps used in setting up the slave server :
1.add the master IP address in the pg_hba.conf file for the user replication
nano /etc/postgresql/11/main/pg_hba.conf host
replication master-IP/24 md5
2.modify the following lines in the PostgreSQL.conf file of slave server where listen_addresses should be the IP of the slave server
nano /etc/postgresql/11/main/postgresql.conf
listen_addresses = 'localhost,slave-IP'
wal_level = replica
max_wal_senders = 10
wal_keep_segments = 64
3. Take the backup of the master server by entering the IP
pg_basebackup -h master-ip -D /var/lib/postgresql/11/main/ -P -U
replication --wal-method=fetch
4.create a recovery file and adding the following commands
standby_mode = 'on'
primary_conninfo = 'host=master-ip port=5432 user=replication password= '
trigger_file = '/tmp/MasterNow'
Below is the error from the log file:
started streaming WAL from primary at A/B3000000 on timeline 2
FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000020000000A000000B3 has already been removed
FATAL: could not connect to the primary server: could not connect to server: Connection timed out
Is the server running on host "master ip" and accepting
TCP/IP connections on port 5432?
record with incorrect prev-link 33018C00/0 at 0/D8E15D18
The standby server was down long enough that the primary server does not have the required transaction log information any more.
There are three remedies:
set the restore_command parameter in the standby server's recovery configuration to restore WAL segments from the archive (that should be the inverse of archive_command on your primary server). Then restart the standby.
This is the only option that allows you to recover without rebuilding the standby server from scratch.
Set wal_keep_segments on the primary server high enough that it retains enough WAL to cover the outage.
This won't help you recover now, but it will avoid the problem in the future.
Define a physical replication slot on the primary and put its name in the primary_slot_name parameter in the standby server's recovery configuration.
This won't help you recover now, but it will avoid the problem in the future.
Note: When using replication slots, monitor the replication. Otherwise a standby that is down will lead to WAL segments piling up on the primary, eventually filling up the disk.
All but the first options require that you rebuild your standby with pg_basebackup, because the required WAL information is no longer available.
host replication master-IP/24 md5
This line is missing a field. The USER field.
listen_addresses = 'localhost,slave-IP'
It is rarely necessary for this to be anything other than '*'. If you don't try to micromanage it, that is one less thing you will need to change. Also, changing wal_keep_segments on the replica doesn't do much unless you are using cascading replication. It needs to be changed on the master.
pg_basebackup -h master-ip -D /var/lib/postgresql/11/main/ -P -U
replication --wal-method=fetch
Did this indicate that it succeeded?
FATAL: could not receive data from WAL stream: ERROR: requested WAL
segment 000000020000000A000000B3 has already been removed
FATAL: could not connect to the primary server: could not connect to
server: Connection timed out
Is the server running on host "master ip" and accepting
TCP/IP connections on port 5432?
This is strange. In order to be informed that the file "has already been removed", it necessarily had to have connected. But the next line says it can't connect. It is not unusual to have a misconfiguration that prevents you from connecting, but in that case it wouldn't have been able to connect the first time. Did you change configuration between these two log messages? Is your network connection flaky?

Installing PostgreSQL and `createdb` with Terminal

After installing PostgreSQL from the terminal with Homebrew...
➜ ~ brew link postgresql
Warning: Already linked: /usr/local/Cellar/postgresql/11.2_1: 3,186 files, 35.3MB
To relink: brew unlink postgresql && brew link postgresql
➜ ~ brew services restart postgresql
Successfully stopped postgresql (label: homebrew.mxcl.postgresql)
Successfully started postgresql (label: homebrew.mxcl.postgresql)
➜ ~ createdb 'test'
createdb: could not connect to database template1: could not connect to server: No server file or directory
Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
I would like to be able to run strictly from terminal and not be subject to using the PSequel GUI...
Thanks,
Solved
My main issue was:
➜ ~ createdb 'test'
createdb: could not connect to database template1: could not connect to server: No server file or directory
Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
this issue was simply solved by:
➜ ~ postgres -D /usr/local/var/postgres
which invocated the database directory...
2019-05-06 14:20:37.367 EDT [41854] LOG: listening on IPv6 address "::1", port 5432
2019-05-06 14:20:37.367 EDT [41854] LOG: listening on IPv4 address "127.0.0.1", port 5432
2019-05-06 14:20:37.367 EDT [41854] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-05-06 14:20:37.384 EDT [41854] LOG: database system is ready to accept connections
keeping that running, and opening a new-tab on my terminal:
➜ ~ createdb 'test'
➜ ~ psql 'test'
psql (11.2)
Type "help" for help.
test=#
Something else of use during my debugging:
remove old database files (this is dangerous)
➜ ~ rm -rf /usr/local/var/postgres
followed-up with
➜ ~ initdb /usr/local/var/postgres
initdb is not used to create a "new database"
As Documented in the Manual you need it to create a "cluster" or "data directory" which then stores databases created with createdb
Quote from the manual:
Before you can do anything, you must initialize a database storage area on disk. We call this a database cluster. (The SQL standard uses the term catalog cluster.) A database cluster is a collection of databases that is managed by a single instance of a running database server
[...]
In file system terms, a database cluster is a single directory under which all data will be stored. We call this the data directory or data area
In short: initdb creates the necessary directory layout on the harddisk to be able to create and manage databases.
It's a necessary part of the installation process of a Postgres server.

Postgres/Mac - 'service already loaded', but 'could not connect to server'

I'm running Homebrew-installed Postgres version 9.6.3 on my Mac (High Sierra, 10.13.3), and this morning I'm finding that Postgres is having some issues. It was working fine last night, then I put the computer to sleep... when I woke it up this morning and tried to run a Phoenix app, I got
[error] Postgrex.Protocol (#PID<0.306.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (localhost:5432): connection refused - :econnrefused
Running psql returned
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
so it seemed that perhaps the server had stopped... however, running my alias pg-start, which translates to launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist returns
/usr/local/Cellar/postgresql/9.6.3/homebrew.mxcl.postgresql.plist: service already loaded
So this is confusing, because it seems that one command suggests that Postgres is not running, while the other suggests that it is.
I can't recall for sure, but I may have stopped the server before putting the computer to sleep last night, which I actually usually do not do... my pg-stop alias is launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
though I can't see why that would cause problems, it's the only thing that sticks out in my mind as something 'different' that I may have done.
I've tried restarting my machine, but the problem persists. I'm not terribly experienced with debugging this sort of issue, so any guidance or suggestions would be much appreciated.
Well, I resolved it, though I'm not sure what the exact problem was. To get some more error info, I ran
postgres -D /usr/local/var/postgres
which gave me
FATAL: lock file "postmaster.pid" already exists
HINT: Is another postmaster (PID 323) running in data directory "/usr/local/var/postgres"?
which I'd encountered before, so I kind of knew how to proceed...
Here are the steps I took to resolve this:
First, I ran
pg_ctl -D /usr/local/var/postgres start
which returned
pg_ctl: another server might be running; trying to start server anyway
server starting
My-MBP:~ me$ FATAL: lock file "postmaster.pid" already exists
HINT: Is another postmaster (PID 1188) running in data directory "/usr/local/var/postgres"?
then I ran my alias pg-stop
launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
followed by
pg_ctl -D /usr/local/var/postgres start
again. This time, it returned
server starting
My-MBP:~ me$ LOG: database system was shut down at 2018-02-07 11:10:43 EST
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
and now Postgres was running correctly - psql commands, etc.
However, now my alias pg-stop wouldn't work -
/Users/me/Library/LaunchAgents/homebrew.mxcl.postgresql.plist: Could not find specified service
I ran
pg_ctl -D /usr/local/var/postgres stop
and then my pg-stop alias was restored. So now pg-start and pg-stop are working as they should be.
I hope that this is helpful to someone in the future, but if anyone can explain what happened here, I'd really appreciated having a deeper understanding of what went wrong.

Error in Postgres TX

Trying to create a user in the database and it's telling me that cannot execute **** in a read-only transaction. I have no idea what's causing this? Is this a bad state in the database or connection? Why is this telling that it's not in a transaction but then telling me it's a read-only transaction? Does "transaction" refer to the same thing?
$ psql --host localhost --port 5432 --username **** postgres --no-password -v dbuser=root -t -X
psql (10.1, server 9.4.13)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.
postgres=# set transaction read write;
WARNING: SET TRANSACTION can only be used in transaction blocks
ERROR: cannot set transaction read-write mode during recovery
postgres=# CREATE USER root WITH PASSWORD 'root';
ERROR: cannot execute CREATE ROLE in a read-only transaction
Your database is in recovery.
Either it is a hot standby (most likely), or recovery was paused with SELECT pg_wal_replay_pause() or recovery_target_action = 'pause' in recovery.conf (check with SELECT pg_is_wal_replay_paused()), or a point-in-time-recovery is still running.
Use the primary or complete recovery.