PostgreSQL - data replication stopped - postgresql

Data replication has stopped from one of my three nodes. The replication slot on the errant node has disappeared. Does anyone have insight as to what happened or how to fix it?
DETAILS:
Nodes SS1, SS2, and SS3 have publications to which SSK subscribes. Replication from SS2 is now failing. Using PostgreSQL 10.1.
SSK psql log:
2019-02-07 10:21:13.953 CST [26274] LOG: logical replication apply worker for subscription "SS2" has started
2019-02-07 10:21:14.309 CST [26274] ERROR: could not start WAL streaming: ERROR: replication slot "SS2" does not exist
2019-02-07 10:21:14.311 CST [1641] LOG: worker process: logical replication worker for subscription 17237 (PID 26274) exited with exit code 1
SS2 replication slots table:
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
(0 rows)
For comparison, SS1 replication slots table:
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-----------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
SS1 | pgoutput | logical | 33280 | DBAdd | f | t | 2113 | | 56655301 | 3/114FB460 | 3/114FB498
(1 row)

Replication slot don't just disappear.
Somebody or something must have deleted it.
Perhaps the PostgreSQL database log of the primary server has valuable information.
Did you promote a standby recently? Since replication slots are not replicated, that would make them disappear.

Related

Postgres Register Standby fails

I am trying to setup a Primary and a standby using repmgr. I think I have successfully setup master, but standby setup keeps failing.
On Standby node
/usr/pgsql-12/bin/repmgr -h master_ip standby clone
NOTICE: destination directory "/var/lib/pgsql/12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=master_ip
DETAIL: current installation size is 32 MB
ERROR: repmgr extension is available but not installed in database "(null)"
HINT: check that you are cloning from the database where "repmgr" is installed
On Master Node:
/usr/pgsql-12/bin/repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | hostname | primary | * running | | default | 100 | 1 | host=master_ip dbname=repmgr user=repmgr connect_timeout=2
postgres=# SELECT * FROM pg_available_extensions WHERE name='repmgr';
name | default_version | installed_version | comment
--------+-----------------+-------------------+------------------------------------
repmgr | 5.3 | | Replication manager for PostgreSQL
resolved after adding -U repmgr -d repmgr to the clone command.

using pgpool, i got empty value in replication state

I'm trying to use pgpool to postgres HA.
node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_bala
nce_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+-----------+---------+---------+------------+----------
---------+-------------------+-------------------+------------------------+---------------------
0 | master | 5432 | up | up | 0.500000 | primary | primary | 1 | false
| 0 | | | 2022-05-30 10:33:21
1 | slave | 5432 | up | up | 0.500000 | standby | primary | 0 | true
| 419431440 | | | 2022-05-30 10:33:21
In this process, other process is working well, but I got empty value replictation_state and replication_sync_state.
And I got high value in replication_delay.
Why those values are empty and high value?
Is there should change values in postgres.conf or pgpool.conf for replication?
In this case, I used 'pg_basebackup -h host -U Repuser -p port -D dir -X stream' for slave
this is pcp_node_info's result
master 5432 2 0.500000 up up primary primary 0 none none 2022-05-30 10:42:40
slave 5432 2 0.500000 up up standby primary 419431848 none none 2022-05-30 10:42:40
Sorry to my English Level, Thank you for your help
My version
postgres 14.2
pgpool 4.3.1
You need to provide application_name in both configurations files - myrecovery.conf (primary_conninfo variable) and pgpool.conf for each node.
Also you should check recovery_1st_stage and follow_primary.sh files as there you also find block with application_name. Script are used by pgpool to recover replica (with pcp_recover_node) or promote new master.
After all you can check current value with "select * from pg_stat_replication;" (on master) or "select * from pg_stat_wal_receiver;" (on replica)
More information: https://www.pgpool.net/docs/pgpool-II-4.3.1/en/html/example-cluster.html

repmgr - how to make previous Primary to become a standby after failover

After performing a fail over, I had the previous Primary down, and the old standby became the Primary, as expected.
$ repmgr -f /etc/repmgr.conf cluster show --compact
ID | Name | Role | Status | Upstream | Location | Prio. | TLI
----+-----------------+---------+-----------+----------+----------+-------+-----
1 | server1 | primary | - failed | | default | 100 | ?
2 | server2 | primary | * running | | default | 100 | 2
3 | PG-Node-Witness | witness | * running | server2 | default | 0 | 1
I would like to make the old Primary join the cluster as a standby.
I gather the rejoin command should do that.
However, when I try to rejoin it, to be the new standby, I get this (I run this on the old Primary which is down ):
repmgr -f /etc/repmgr.conf -d 'host=10.9.7.97 user=repmgr dbname=repmgr' node rejoin
--where 10.9.7.97 is the ip of node I am running from.
I get this error:
$ repmgr -f /etc/repmgr.conf -d 'host=10.97.7.97 user=repmgr dbname=repmgr' node rejoin --verbose -
NOTICE: using provided configuration file "/etc/repmgr.conf"
ERROR: connection to database failed
DETAIL:
could not connect to server: Connection refused
Is the server running on host "10.97.7.97" and accepting
TCP/IP connections on port 5432?
Of course postgres is down on 10.9.7.97 - the old primary.
If I start it however, it starts as another primary:
$ repmgr -f /etc/repmgr.conf cluster show --compact
ID | Name | Role | Status | Upstream | Location | Prio. | TLI
----+-----------------+---------+-----------+----------+----------+-------+-----
1 | server1 | primary | ! running | | default | 100 | 1
2 | server2 | primary | * running | | default | 100 | 2
3 | PG-Node-Witness | witness | * running | server2 | default | 0 | 1
so what is the way to make the old primary the new standby...?
Thanks
Apparently the
-d 'host=
in the rejoin command, should specify the current Primary (previous standby).

PostgreSql streaming replication - table not created on the slave

I am new to PostgreSql replication.
I tried to set up streaming replication, and at the end I created a database on the master, which I could afterwards see on the slave.
However, when I created a table on the master, it is not replicated to the slave.
Checking the table pg_stat_replication on the master, it looks OK as far as I can understand:
select usename,application_name,client_addr,backend_start,state,sync_state from pg_stat_replication ;
usename | application_name | client_addr | backend_start | state | sync_state
------------+------------------+-------------+-------------------------------+-----------+------------
replicator | walreceiver | 10.97.7.150 | 2020-06-28 20:48:15.463922+03 | streaming | async
select client_addr, state, sent_lsn, write_lsn,replitest,flush_lsn, replay_lsn from pg_stat_replication;
client_addr | state | sent_lsn | write_lsn | flush_lsn | replay_lsn
-------------+-----------+------------+------------+------------+------------
10.97.7.150 | streaming | 0/2701AFB8 | 0/2701AFB8 | 0/2701AFB8 | 0/2701AFB8
On the slave side I see this:
SELECT pg_last_xact_replay_timestamp();
pg_last_xact_replay_timestamp
-------------------------------
2020-06-28 20:52:22.915897+03
select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
Still when I create a table, I cannot find her on the slave side.
What should I check further?

Can't get new postgres config file settings to take effect

I have a somewhat large table in my database and I am inserting new records to it. As the number of records grow, I started having issues and can't insert.
My postgresql log files suggest I increase WAL size:
[700] LOG: checkpoints are occurring too frequently (6 seconds apart)
[700] HINT: Consider increasing the configuration parameter "max_wal_size".
I got the path to my config file with =# show config_file; and made some modifications with vim:
max_wal_senders = 0
wal_level = minimal
max_wal_size = 4GB
When I check the file I see the changes I made.
I then tried reloading and restarting the database:
(I get the data directory with =# show data_directory ;)
I tried reload:
pg_ctl reload -D path
server signaled
I tried restart
pg_ctl restart -D path
waiting for server to shut down.... done
server stopped
waiting for server to start....
2020-01-17 13:08:19.063 EST [16913] LOG: listening on IPv4 address
2020-01-17 13:08:19.063 EST [16913] LOG: listening on IPv6 address
2020-01-17 13:08:19.079 EST [16913] LOG: listening on Unix socket
2020-01-17 13:08:19.117 EST [16914] LOG: database system was shut down at 2020-01-17 13:08:18 EST
2020-01-17 13:08:19.126 EST [16913] LOG: database system is ready to accept connections
done
server started
But when I connect to the database and check for my settings:
name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart
-----------------+---------+------+-------------------------------+-------------------------------------------------------------------------+------------+------------+---------+---------+---------+------------+---------------------------+----------+-----------+------------+------------+-----------------
max_wal_senders | 10 | | Replication / Sending Servers | Sets the maximum number of simultaneously running WAL sender processes. | | postmaster | integer | default | 0 | 262143 | | 10 | 10 | | | f
max_wal_size | 1024 | MB | Write-Ahead Log / Checkpoints | Sets the WAL size that triggers a checkpoint. | | sighup | integer | default | 2 | 2147483647 | | 1024 | 1024 | | | f
wal_level | replica | | Write-Ahead Log / Settings | Set the level of information written to the WAL. | | postmaster | enum | default | | | {minimal,replica,logical} | replica | replica | | | f
(3 rows)
I still see the old default settings.
What am I missing here? How can I get these settings to take effect?
Configuration settings can come from several sources:
postgresql.conf
postgresql.auto.conf (set with ALTER SYSTEM)
command line arguments at server start
set with ALTER DATABASE or ALTER USER
Moreover, if a parameter occurs twice in a configuration file, the second entry wins.
To figure out from where in this mess your setting originates, run
SELECT name, source, sourcefile, sourceline, pending_restart
FROM pg_settings
WHERE name IN ('wal_level', 'max_wal_size', 'max_wal_senders');
If the source is database or user, you can user the psql command \drds to figure out details.
The result of the queries shows that your PostgreSQL has been modified or built so that these values are the default values.
You'd have to override these defaults with any of the methods shown above.
Locations of config files. Ordered by priority.
/var/lib/postgresql/12/main/postgresql.auto.conf
/etc/postgresql/12/main/postgresql.conf