We currently have a postgres server on AWS pulling updates from an internal server. We want to stop the replication, but keep the data on both the internal and the AWS server - the internal server is no longer being updated.
Is it just a case of removing the
host replication replicator x.x.x.x/32 md5
line from the pg_hba.conf file on both master and slave and restarting Postgres?
and then running
pg_ctl -D /target_directory promote
on the old slave server to promote the read only slave to a read / write master.
You should drop replication slot on master db. (physical replication)
SELECT pg_drop_replication_slot('your_slot_name');
Then promote replica server.
/usr/lib/postgresql/12/bin/pg_ctl promote -D /pg_data/12/main/
Edit pg_hab.file, remove replication line and run on master select pg_reload_conf();.
Related
I am trying to configure pgpool with postgresql and repmgr. But after configuring that I found it is not working as expected.
When I switchover standby to primary using repmgr it is working expected but at pgpool primary connection info not changing.
My question is that how can I sync repmgr info with pgpool such as switchover, failover etc.?
In the below screenshot I run pcp_watchdog_info but show server1 is leader and server2,server3 is standby.
But at repmgr after switchover new primary is server2 and server3,server4 is standby.
Why pgpool still showing server1 is a leader? Though as per architecture watchdog only monitor pgpool status.
Is there any relation with repmgr primary with pgpool LEADER?
Need expert opinion, thanks in advance.
If you are making repmgr handle automatic failover, you need to add another step in follow_command parameter in repmgr.conf file as follow:
follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr/15/repmgr.conf --log-to-file --upstream-node-id=%n && /usr/sbin/pcp_attach_node -w -h 172.16.0.100 -U pgpool -p 9898 -n %n'
Where 172.16.0.100 is the VIP of Pgpool-II cluster.
Why?
Because, when the primary node is unavailable and another standby is promoted to be the new primary, repmgr needs to resync other standby nodes in the cluster to the new primary and it will shutdown the PostgreSQL service before doing this step and this would detach this standby node from the Pgpool-II cluster.
So, you need to attach the node to the Pgpool-II cluster in follow_command.
Postgresql database replication has two servers one for master and the other for a slave. Due to some reason the master IP address got changed which was being used at several places in the slave server. With the new IP address, after replacing the old ones with the latest one in the slave server the replication is not working as before. Can someone help to resolve this issue?
Following are the steps used in setting up the slave server :
1.add the master IP address in the pg_hba.conf file for the user replication
nano /etc/postgresql/11/main/pg_hba.conf host
replication master-IP/24 md5
2.modify the following lines in the PostgreSQL.conf file of slave server where listen_addresses should be the IP of the slave server
nano /etc/postgresql/11/main/postgresql.conf
listen_addresses = 'localhost,slave-IP'
wal_level = replica
max_wal_senders = 10
wal_keep_segments = 64
3. Take the backup of the master server by entering the IP
pg_basebackup -h master-ip -D /var/lib/postgresql/11/main/ -P -U
replication --wal-method=fetch
4.create a recovery file and adding the following commands
standby_mode = 'on'
primary_conninfo = 'host=master-ip port=5432 user=replication password= '
trigger_file = '/tmp/MasterNow'
Below is the error from the log file:
started streaming WAL from primary at A/B3000000 on timeline 2
FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000020000000A000000B3 has already been removed
FATAL: could not connect to the primary server: could not connect to server: Connection timed out
Is the server running on host "master ip" and accepting
TCP/IP connections on port 5432?
record with incorrect prev-link 33018C00/0 at 0/D8E15D18
The standby server was down long enough that the primary server does not have the required transaction log information any more.
There are three remedies:
set the restore_command parameter in the standby server's recovery configuration to restore WAL segments from the archive (that should be the inverse of archive_command on your primary server). Then restart the standby.
This is the only option that allows you to recover without rebuilding the standby server from scratch.
Set wal_keep_segments on the primary server high enough that it retains enough WAL to cover the outage.
This won't help you recover now, but it will avoid the problem in the future.
Define a physical replication slot on the primary and put its name in the primary_slot_name parameter in the standby server's recovery configuration.
This won't help you recover now, but it will avoid the problem in the future.
Note: When using replication slots, monitor the replication. Otherwise a standby that is down will lead to WAL segments piling up on the primary, eventually filling up the disk.
All but the first options require that you rebuild your standby with pg_basebackup, because the required WAL information is no longer available.
host replication master-IP/24 md5
This line is missing a field. The USER field.
listen_addresses = 'localhost,slave-IP'
It is rarely necessary for this to be anything other than '*'. If you don't try to micromanage it, that is one less thing you will need to change. Also, changing wal_keep_segments on the replica doesn't do much unless you are using cascading replication. It needs to be changed on the master.
pg_basebackup -h master-ip -D /var/lib/postgresql/11/main/ -P -U
replication --wal-method=fetch
Did this indicate that it succeeded?
FATAL: could not receive data from WAL stream: ERROR: requested WAL
segment 000000020000000A000000B3 has already been removed
FATAL: could not connect to the primary server: could not connect to
server: Connection timed out
Is the server running on host "master ip" and accepting
TCP/IP connections on port 5432?
This is strange. In order to be informed that the file "has already been removed", it necessarily had to have connected. But the next line says it can't connect. It is not unusual to have a misconfiguration that prevents you from connecting, but in that case it wouldn't have been able to connect the first time. Did you change configuration between these two log messages? Is your network connection flaky?
I'm trying to setup postgres cluster of two nodes (primary and standby). In order to activate automatic failover, I'm using pgpool-II.
I followed the following article: https://www.pgpool.net/docs/41/en/html/example-cluster.html
and the only thing difference I did is installing postgresql version 12 instead of version 11.
Knowing that I'm trying it useing two centos7 images on VMware. I faced the following issues:
When I run systemctl status pgpool.service on both nodes, it returned success.
Also I can access postgresql using the watchdog delegate IP.
But what testing failover, everything goes wrong.
Scenario 1:
I accessed my database using watchdog delegate Ip.
I disconnect the standby server.
Result: My session to postgresql continued to work for less than a minute and then it failed. and I'm unable to connect again, until I reconnect the standby node, and restart the pgpool service again.
Scenario 2:
I accessed my database using watchdog delegate Ip.
I disconnect the primary server.
Result: My session stopped directly. and the standby server is not promoted to be master.
I noticed something (might be related to the above described problem): when I try to run the following command
psql 192.168.220.146 -p 9999 -U postgres -c "show pool_nodes"
it fails to work and returned the following:
psql: error: could not connect to server: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.9999"
However if I ran: psql 192.168.220.160 -p 5432 -U postgres
it works fine and I can access the postgres shell.
My pool_hba file:
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
host all pgpool 0.0.0.0/0 scram-sha-256
host all postgres 0.0.0.0/0 scram-sha-256
Any help would be appreciated.
I followed the following article: https://www.pgpool.net/docs/41/en/html/example-cluster.html and the only thing difference I did is installing postgresql version 11.
I not ping delegate_IP = '192.168.1.233'. May i help you?
Thanks you.
you are not providing -h argument to psql for specifying the IP address. So effectively psql is trying to connect to UNIX domain socket and considering the IP address in the command as the database name.
Try putting -h before the IP address
psql -h 192.168.220.146 -p 9999 -U postgres -c "show pool_nodes"
I am trying to setup a pgpool loadbalancer for a Postgresql streaming replication cluster.
I am using postgresql-12 and pgpool2-4.1.0 from the Postgresql repo https://apt.postgresql.org/pub/repos/apt/ on Debian 10.2 (latest stable).
I have setup Postgresql cluster with streaming replication using physical slots (not WAL shipping) and everything seems to be working properly. The secondaries connect replicate data without any issues.
Then I installed the pgpool2-4.1.0 on the same servers. I have made the proper modifications to pgpool.conf according to the pgpool wiki and I have enabled the watchdog process.
When I start pgpool, on all three nodes, I can see that watchdog is working properly, quorum exists and pgpool elects a master (pgpool node) which also enables the virtual IP from the configuration.
I can connect to the postgres backend via pgpool and issue read and write commands successfully.
The problem appears on the pgpool logs, from syslog, I get:
Jan 13 15:10:30 debian10 pgpool[9826]: 2020-01-13 15:10:30: pid 9870: LOG: failed to connect to PostgreSQL server on "pg1:5433", getsockopt() detected error "Connection refused"
Jan 13 15:10:30 debian10 pgpool[9826]: 2020-01-13 15:10:30: pid 9870: LOCATION: pool_connection_pool.c:680
When checking the PID mentioned above, I get the pgpool healthcheck process. I
pg1, pg2, pg3 are the database servers listening on all addresses on port 5433, pg1 is the primary.
pgpool listens on 5432.
The database user that is used for the healthcheck is "pgpool", I have verified that I can connect to the database using that user from all hosts on the particular subnet.
When I disable the healthcheck the issue goes away, but the defeats the purpose.
Any ideas?
Turns out it was name resolution in the /etc/hosts file and the postgresql.conf.
Specifically, the /etc/hosts was like this:
vagrant#pg1:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 pg1
....
10.10.20.11 pg1
....
And postgresql.conf like this:
....
listen_addresses = 'localhost,10.10.20.11' # what IP address(es) to listen on;
....
So when healthcheck tried to reach the local node on every machine, it would check via hostname (pg1, pg2, etc). With the hosts file above that leads to 127.0.1.1 that postgresql doesn't listen, so it would fail, hence the error, and then try with the 10.10.20.11 which would be successful. That also explains why there was no error from healthchecks of remote hosts.
I changed the hosts file to the following:
vagrant#pg1:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 pg1-local
....
10.10.20.11 pg1
....
And the logs are clear.
This is Debian specific, as Red Hat-based distros don't have a
127.0.1.1 hostname
record in their /etc/hosts
I'm trying to configure pgpool to use load balance between two servers (Both running Debian 8.2 and Postgresql 9.4).
I already have Streaming Replication working between the two (Master in 153 and Slave in 155). Now I installed PgPool and configured with the two servers:
backend_hostname0 = '10.0.0.153'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresq/9.4/main'
backend_hostname1 = '10.0.0.155'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresq/9.4/main'
and configure pool_hba, pool_passwd, and load_balance_mode is on.
My problem is: When i try to use psql via pgpool it displays an error:
"psql: FATAL: password authentication failed for user 'postgres'"
BUT if I comment all the section of backend1, changing nothing more, and restart the pgpool2 service, I can connect without problem, in the same machine, using the exact same user and password. I don't know if there's another parameter that I should set in order to use two servers and load balance between them.
I can connect to each backend server from each client with psql just fine, so credentials shouldn't be the problem.
PS: Don't know if that helps. But in pgpool.conf, the replication_mode is off, because I'm using the Streaming Replication, and as far as I heard, it's possible to use load_balance without making replication via pgpool.
Thanks.
Try to login to pgppol server and run the command:
pg_md5 -m -u postgres your_password_for_postgres_user
Now see the content of pool_passwd. It should contain something like:
postgres:md55ebe2294ecd0e0f08eab7690d2a6ee69
Now restart your pgpool and try to connect again. Hopefully your problem will be resolved.
pgpool's load balancing can be used with streaming replication, for that you need to set master_slave_mode and load_balance_mode to true and also set master_slave_sub_mode to stream.
For the authentication error you are getting, check the pg_hba.conf settings for pgpool host. And depending on the type of authentication method specified see the respective client authentication section in the pgpool documentation.
http://www.pgpool.net/docs/latest/pgpool-en.html#hba