Automatic failover not happening in repmgr - postgresql

I have a postgres active/standby cluster of 2 nodes, I have used repmgr for creating the cluster. The issue is, automatic failover is not happening when I stop the postgres services on master node. Contents of file repmgr.conf on master are as follows:
node_id=1
data_directory='/data/pgdatabase/masterdb/data'
node_name=node1
conninfo='host=IP-Of-Master user=repmgr dbname=repmgr'
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr/11/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr/11/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/var/log/repmgr/repmgr.log'
log_level=NOTICE
reconnect_attempts=4
reconnect_interval=5
repmgrd_service_start_command='sudo systemctl repmgr11 start'
repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
Contents of postgresql.conf are as follows:
listen_addresses = '*'
shared_preload_libraries = 'repmgr'
max_wal_senders = 15
max_replication_slots = 15
wal_level = 'replica'
hot_standby = on
archive_mode = on
archive_command = 'cp %p /var/lib/pgsql/11/archive/%f'
The contents are same on master and slave expect the name, which is node2 on slave.
Can anyone guide me what can be possible reason for automatic failover not happening

I have resolved it myself, I needed to create file /etc/default/repmgrd and add following lines into it
REPMGRD_ENABLED=yes
REPMGRD_CONF="/etc/repmgr.conf"

Related

Problem with barman to backup instance with archive_mode = off

I am using postgres v14.4. I would use barman to make a backup of this instance. Log archives are not needed and hence archive_mode = off.
The problem is that barman refuses to make a backup with the archive mode disabled. Is there a way to do a consistent backup with barman without keeping the archives?
Below is the barman configuration used:
[xxxxxxx]
description = "Cluster x of type oltp"
active = true
archiver = true
streaming_archiver = false
; The mandatory connection info to PostgreSQL
conninfo = host=xxxxxxx port=xxxxx user=barman dbname=postgres
; Activate pg_basebackup
backup_method = postgres
reuse_backup = off
; The mandatory connection info to login with REPLICATION privileges
streaming_conninfo = host=xxxxxxxxx port=xxxxx user=streaming_barman
slot_name = barman_ge
; recover WAL files with barman-wal-restore command
recovery_options = get-wal
path_prefix=/usr/pgsql-14/bin
backup_directory=/barman/aa/bb

My Postgres replication isn't functioning, see below for specific error

I have two Postgres databases set up in a Primary/Secondary configuration. I tried to setup replication between them, but have hit a road block. Where am I going wrong?
I have checked various configuration files: recovery.conf, postgresql.conf, pg_hba.conf, and all seem to be set up correctly.
This is the error I have found in the pg_log folder:
cp: cannot stat ‘/var/lib/pgsql/walfiles/00000002000001CA0000003E’: No such file or directory
cp: cannot stat ‘/var/lib/pgsql/walfiles/00000003.history’: No such file or directory
2019-04-16 16:17:19 AEST FATAL: database system identifier differs between the primary and standby
2019-04-16 16:17:19 AEST DETAIL: The primary's identifier is 6647133350114885049, the standby's identifier is 6456613398298492847.
I am using PostgreSQL 9.2.23.
This is my recovery.conf:
standby_mode = 'on'
primary_conninfo = 'host=10.201.108.25 port=5432 user=repl-master password=111222333'
restore_command = 'cp -p /var/lib/pgsql/walfiles/%f %p'
trigger_file = '/var/lib/pgsql/i_am_master.pg.trigger'
recovery_target_timeline = 'latest'
archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/walfiles %r'
I'd expect replication from Primary to Secondary. So far, nothing.
Appreciate any input/ideas.
You didn't set up replication correctly. You cannot use pg_dump to create the replica, you have to use a physical backup technique like pg_basebackup.
See the documentation for details.
Do not use PostgreSQL 9.2, it is out of support.

psql: FATAL: Could not obtain a transaction ID from GTM. The GTM might have failed or lost connectivity

I want to create a postgres-xl cluster. The cluster includes 5 nodes, 1 GTM, 2 Coordinator and 2 Datanodes. The following are the details of nodes
GTM:
hostname=localhost
nodename=gtm
IP=127.0.0.1
port=20001
Coordinator1:
hostname=localhost
nodename=coord1
IP=127.0.0.1
pooler_port=30011,port=30001
Coordinator2:
hostname=host2
nodename=coord2
IP=10.4.6.36
pooler_port=30012,port=30002
Datanode1:
hostname=localhost
nodename=dn1
IP=127.0.0.1
pooler_port=40011, port=40001
Datanode2:
hostname=host2
nodename=dn2
IP=10.4.6.36
pooler_port=40012, port=40002
I have installed pgxc_ctl and added /usr/local/pgsql/bin to PATH for postgres. I have Configured ssh authentication to avoid inputting the password for pgxc_ctl. I have edited postgresql.conf and pg_hba.conf on both nodes.
Then I built the cluster as follows:
$ pgxc_ctl
PGXC$ add gtm master gtm localhost 20001 $dataDirRoot/gtm
PGXC$ add coordinator master coord1 localhost 30001 30011
$dataDirRoot/coord_master.1 none none
PGXC$ add coordinator master coord2 10.4.6.36 30002 30012
$dataDirRoot/coord_master.2 none none
after adding coord2, i got the following
psql: FATAL: Could not obtain a transaction ID from GTM. The GTM might have failed or lost connectivity
PGXC$ add datanode master dn1 localhost 40001 40011
$dataDirRoot/dn_master.1 none none none
PGXC$ add datanode master dn2 10.4.6.36 40002 40012
$dataDirRoot/dn_master.2 none none none
after adding dn2, I got the following error
ERROR: Failed to get pooled connections
HINT: This may happen because one or more nodes are currently unreachable, either because of node or network failure.
It's also possible that the target node may have hit the connection limit or the pooler is configured with low connections.
Please check if all nodes are running fine and also review max_connections and max_pool_size configuration parameters
But when I monitor all the nodes, it shows
PGXC$ monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
I could not connect to coord2 by running
psql -h 10.4.6.36 -p 30002 -U user -d postgres
It shows
psql: FATAL: Could not obtain a transaction ID from GTM. The GTM might have failed or lost connectivity
But I could connect to the coord1 by running
psql -p 30001 -U user -d postgres
I could ping host2 from my localhost without the password.
I need to resolve the above errors. Any help?
Adding the configuraion:
pgxcInstallDir=$HOME/pgxc
pgxcOwner=$USER
pgxcUser=$pgxcOwner
tmpDir=/tmp
localTmpDir=$tmpDir
configBackup=n
configBackupHost=pgxc-linker
configBackupDir=$HOME/pgxc
configBackupFile=pgxc_ctl.bak
dataDirRoot=$HOME/DATA/pgxl/nodes
#---- Coordinators ----------------------------------------------------------------------------------------------------
coordMasterDir=$dataDirRoot/coord_master
coordSlaveDir=$HOME/coord_slave
coordArchLogDir=$HOME/coord_archlog
coordExtraConfig=coordExtraConfig
cat > $coordExtraConfig <<EOF
#================================================
# Added to all the coordinator postgresql.conf
# Original: $coordExtraConfig
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
max_pool_size=300
max_connections=200
hot_standby = off
EOF
#---- Datanodes -------------------------------------------------------------------------------------------------------
datanodeMasterDir=$dataDirRoot/dn_master
datanodeSlaveDir=$dataDirRoot/dn_slave
datanodeArchLogDir=$dataDirRoot/datanode_archlog
datanodeExtraConfig=datanodeExtraConfig
cat > $datanodeExtraConfig <<EOF
#================================================
# Added to all the datanode postgresql.conf
# Original: $datanodeExtraConfig
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
max_pool_size=300
max_connections=200
hot_standby = off
EOF
#---- GTM ------------------------------------------------------------------------------------
gtmName=gtm
gtmMasterServer=localhost
gtmMasterPort=20001
gtmMasterDir=$dataDirRoot/gtm
coordNames=( coord1 coord2 )
coordMasterServers=( localhost 10.4.6.36 )
coordPorts=( 30001 30002 )
poolerPorts=( 30011 30012 )
coordMasterDirs=( $dataDirRoot/coord_master.1 $dataDirRoot/coord_master.2 )
coordMaxWALSenders=( 5 5 )
coordSlave=n
coordSlaveServers=( none none )
coordSlavePorts=( none none )
coordSlavePoolerPorts=( none none )
coordSlaveDirs=( none none )
coordArchLogDirs=( none none )
coordSpecificExtraConfig=( coordExtraConfig coordExtraConfig )
coordSpecificExtraPgHba=( none none )
datanodeNames=( dn1 dn2 )
datanodeMasterServers=( localhost 10.4.6.36 )
datanodePorts=( 40001 40002 )
datanodePoolerPorts=( 40011 40012 )
datanodeMasterDirs=( $dataDirRoot/dn_master.1 $dataDirRoot/dn_master.2 )
datanodeMasterWALDirs=( none none )
datanodeMaxWALSenders=( 5 5 )
datanodeSpecificExtraConfig=( datanodeExtraConfig datanodeExtraConfig )
datanodeSpecificExtraPgHba=( none none )
Could you show us your configuration?
What are your max_connections and max_pool_size? What did the initdb show for your kernel? My guess is that when you add the datanode2 (dn2) you don't have enough connections.
You have:
cluster includes 5 nodes, 1 GTM, 2 Coordinator and 2 Datanodes. The
following are the details of nodes.
Postgres-xl specific:
max_pool_size=300
max_coordinators=2
max_datanodes=2
In case of Coordinator (minimal settings):
max_connections=100# number of connections accepted from application(s)
max_prepared_transactions = 100 # same as number of connections
In case of Datanode (minimal settings):
max_connections=200 # 2 coordinators
max_prepared_transactions=2 #Specify at least total number of Coordinators in the cluster.
Excerpt from the Postgres(-xl) documentation
max_connections (integer)
Determines the maximum number of concurrent connections to the database server. The default is typically 100 connections, but might
be less if your kernel settings will not support it (as determined
during initdb). This parameter can only be set at server start.
When running a standby server, you must set this parameter to the same or higher value than on the master server. Otherwise, queries
will not be allowed in the standby server.
In the case of the Coordinator, this parameter determines how many connections can each Coordinator accept.
In the case of the Datanode, number of connection to each Datanode may become as large as max_connections multiplied by the number of
Coordinators.
max_pool_size (integer)
Specify the maximum connection pool of the Coordinator to Datanodes. Because each transaction can be involved by all the
Datanodes, this parameter should at least be max_connections
multiplied by number of Datanodes.
Edit - for update question configuration
Try this:
Coordinator
max_connections=100
max_pool_size=300
Datanode (you have 2 datanodes defined)
max_connections=200
max_pool_size=500

ERROR could not access file "$libdir/repmgr_funcs" No such file or directory

I follow this link to create master slave replication on Ubuntu postgresql server.
My Configuration of repmgr and postgresql are:
Postgresql 9.5-: /opt/PostgreSQL/9.5/
repmgr-: /usr/lib/postgresql/9.5/bin/repmgr
repmgr.conf -: /etc/rep.conf
pg_config --pkglibdir => /usr/lib/postgresql/9.5/lib
ls /usr/lib/postgresql/9.5/lib | grep repmgr_funcs => repmgr_funcs.so
I am getting ERROR-: unable to create the function
repmgr_update_last_updated: ERROR: could not access file "$libdir/repmgr_funcs": No such file or directory
ERROR: Unable to create repmgr schema - see preceding error message(s); aborting
If you are using repmgr ver. 4 and above then you need to change in postgresql.conf
From shared_preload_libraries = 'repmgr_funcs'
To shared_preload_libraries = 'repmgr'
Below are their upgrade notes
The repmgr shared library has been renamed from repmgr_funcs to
repmgr, meaning shared_preload_libraries in postgresql.conf needs to
be updated to the new name: shared_preload_libraries = 'repmgr'

postgresql 9.4 streaming replication

I have the following problem: i am trying to set up a streaming replication scenario with load balancing. I read various tutorials but i cannot find the mistake. The replication does not work. I do not have a "wal sender/receiver process". The archiving works and everytime the master restarts, the archived wal files are copied to the slave. I even do not get any error. And in configuration file(s) everything looks like fine for me, e.g. master:
wal_level = hot_standby
wal_keep_segments = 32
max_wal_senders = 5
max_replication_slots = 5
wal_sender_timeout = 60s
What irritates me the most is that there is no "wal sender process" and there is no error thrown.
Thank you for any idea,
Sven
UPDATE 1: my recovery.conf:
standby_mode = 'on'
primary_conninfo = 'host=arcserver1 port=5432 user=postgres pass=postgres'
restore_command = 'pg_standby /db/pg_archived %f %p >> /var/log/standby.log'
primary_slot_name='standby1'
and my client postgresql.conf contains:
hot_standby = on
I found the solution:i replaced pg_standby with cp, because pg_standby seems to be only for warm standby, not hot standby.