I'm trying to conifure pgpool in my postgresql environment (2 postgresql servers + 1 pgpool) to do HA while repmgr is responsible for the replication.
I'm getting the next messages in the log :
017-12-03 19:27:07: pid 19033: DEBUG: pool_flush_it: flush size: 0
2017-12-03 19:27:07: pid 19033: DEBUG: pool_read: read 103 bytes from backend 1
2017-12-03 19:27:07: pid 19033: ERROR: failed to authenticate
2017-12-03 19:27:07: pid 19033: DETAIL: password authentication failed for user "nobody"
2017-12-03 19:27:07: pid 19033: DEBUG: find_primary_node: no primary node found
2017-12-03 19:27:08: pid 19033: LOG: find_primary_node: checking backend no 0
2017-12-03 19:27:08: pid 19033: DEBUG: SSL is requested but SSL support is not available
2017-12-03 19:34:27: pid 22132: ERROR: unable to read data from DB node 1
2017-12-03 19:34:27: pid 22132: DETAIL: EOF encountered with backend
2017-12-03 19:28:27: pid 19033: DEBUG: find_primary_node: no primary node found
The pool_hba.conf :
TYPE DATABASE USER CIDR-ADDRESS METHOD
local all all trust
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
In postgresql pg_hba.conf I enabled connection from pgpool server :
####pgpool####
host all all 172.22.13.170/32 trust
1.What can be the problem ?
2.If the repmgr is responsible for the replication should I set the parameter backend_flag to 'DISALLOW_TO_FAILOVER'?
Thanks.
I'm just getting up to speed on repmgr and pgpool, but I think there are multiple issues here:
1) Your pgpool.conf has some default settings for alive checking, and the user for that is 'nobody', so to get that to work you need to create a pgsql user with that name so that pgpool can query all hosts to find the current master.
2) pgpool executes scripts to change which is the master etc, and that script would normally just run repmgr commands to promote a new primary at failover, so I don't think DISALLOW_TO_FAILOVER is needed.
If repmgr would failover, then the 1 part of you question would make pgpool find which the new master is anyway, but in that case i would have repmgr configure to not failover automatically (since they could fight on who should do what.
Related
We have a high available database architecture that contains PGPool cluster with 3 instances, and at the backend PostgreSQL database cluster with 2 instances.
Our PostgreSQL and PGPool versions are, respectively, PostgreSQL 10.17 and pgpool-II version 4.1.9(karasukiboshi).
We tested our PGPool cluster for failover. Our steps were:
Stopping PGPool service on the master node. -> Successful.
Killing PGPool service on the master node. -> Successful.
Rebooting PGPool master node server. -> Unsuccessful.
I found failover related topics that are related my first 2 cases, but there is no topic rebooting PGPool master node server.
Does any of you had similar problem? This is not capability of PGPool? We don' t what the problem is.
PS: If you need I will update the topic with my configuration files.
The log entries when we reboot the master PGPool server:
pid 669752: LOG: authentication timeout May 10 17:22:51 my_server_name pgpool[3536624]: 2022-05-10 17:22:51: pid 670080:
LOG: authentication timeout May 10 17:22:51 my_server_name pgpool[3536624]: 2022-05-10 17:22:51: pid 670090:
LOG: authentication timeout May 10 17:22:51 my_server_name pgpool[3536624]: 2022-05-10 17:22:51: pid 669677:
LOG: authentication timeout May 10 17:22:51 my_server_name pgpool[3536624]: 2022-05-10 17:22:51: pid 670092:
LOG: authentication timeout May 10 17:22:51 my_server_name pgpool[3536624]: 2022-05-10 17:22:51: pid 670094: LOG: authenticatio
Thanks!
I'm trying to configure pgpool as the load balancer for my Postgres cluster.
I have two postgres nodes, 1 master and 1 slave.
My pg_hba.conf looks like
hostssl user mydb 1.1.1.1/32 md5
hostssl user postgres 1.1.1.1/32 md5
host user mydb 1.1.1.1/32 md5
host user postgres 1.1.1.1/32 md5
where 1.1.1.1/32 is my actual pgpool server IP.
If I try to establish a connection to ether master or slave using psql right from the pgpool container, I can do it without any problems.
But when I start pgpool I got this error message:
2021-10-26 13:50:13: pid 753: ERROR: backend authentication failed
2021-10-26 13:50:13: pid 753: DETAIL: backend response with kind 'E' when expecting 'R'
2021-10-26 13:50:13: pid 753: HINT: This issue can be caused by version mismatch (current version 3)
2021-10-26 13:50:13: pid 736: ERROR: backend authentication failed
2021-10-26 13:50:13: pid 736: DETAIL: backend response with kind 'E' when expecting 'R'
2021-10-26 13:50:13: pid 736: HINT: This issue can be caused by version mismatch (current version 2)
If I edit pool_passwd file and set some invalid password I got a proper error
2021-10-26 13:59:03: pid 736: ERROR: md5 authentication failed
2021-10-26 13:59:03: pid 736: DETAIL: password does not match
So I guess that's not a problem with my postgres credentials.
Any ideas?
I am unable to start the Postgres server and whenever I use pg_ctl I am getting the following error - can some one help me to fix this. I changed the folder permissions using CHmod and tried running with Sudo -s also but still the problem exists.
one error I did was, I deleted the Postmaster.pid when the server was running- post this I am getting this issue when ever I try to start the server through pg_ctl and another error when I use the pgadmin.
Any suggestions here will be really helpful- thanks.
Using Macos Shell command :
'pg_ctl start -D /Library/PostgreSQL/12/data waiting for server to start....2020-05-05 11:40:04.838 IST [1216] FATAL: data directory "/Library/PostgreSQL/12/data" has wrong ownership 2020-05-05 11:40:04.838 IST [1216] HINT: The server must be started by the user that owns the data directory. stopped waiting pg_ctl: could not start server Examine the log output.'
Using pgadmin the error is as follows :
'could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5434? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5434?'
p.s. : I modified the hba.conf and also the postgres.conf files to allow connection from the local ip
Error received on 5May
waiting for server to start....2020-05-05 19:54:13.029 IST [7274] LOG: starting PostgreSQL 12.2 on x86_64-apple-darwin, compiled by Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn), 64-bit
2020-05-05 19:54:13.030 IST [7274] LOG: listening on IPv6 address "::", port 5433
2020-05-05 19:54:13.030 IST [7274] LOG: listening on IPv4 address "0.0.0.0", port 5433
2020-05-05 19:54:13.030 IST [7274] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"
2020-05-05 19:54:13.039 IST [7274] LOG: redirecting log output to logging collector process... 2020-05-05 19:54:13.039 IST [7274] HINT: Future log output will appear in directory "log" stopped waiting .. pg_ctl: could not start server
Examine the log output.
Log file details
2020-05-05 21:29:30.748 IST [8853] LOG: invalid authentication method "0.0.0.0/0"
2020-05-05 21:29:30.748 IST [8853] CONTEXT: line 80 of configuration file "/Library/PostgreSQL/12/data/pg_hba.conf"
2020-05-05 21:29:30.748 IST [8853] FATAL: could not load pg_hba.conf
2020-05-05 21:29:30.749 IST [8853] LOG: database system is shut down
Details of my pg_HBA conf
# "local" is for Unix domain socket connections only
local all all 0.0.0.0/0 md5
local all all md5
# IPv4 local connections:
host all all 127.0.0.1/32 md5
# IPv6 local connections:
host all all ::1/128 md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all md5
host replication all 127.0.0.1/32 md5
host replication all ::1/128 md5
host all all 0.0.0.0/0 md5
host all all ::/0 md5
latest log file
bash-3.2$ cat postgresql-2020-05-05_221328.log
2020-05-05 22:13:28.794 IST [9834] LOG: database system was interrupted; last known up at 2020-05-05 22:13:09 IST
2020-05-05 22:13:28.872 IST [9834] LOG: database system was not properly shut down; automatic recovery in progress
2020-05-05 22:13:28.874 IST [9834] LOG: redo starts at 0/17742C8
2020-05-05 22:13:28.874 IST [9834] LOG: invalid record length at 0/1774300: wanted 24, got 0
2020-05-05 22:13:28.874 IST [9834] LOG: redo done at 0/17742C8
2020-05-05 22:13:28.881 IST [9832] LOG: database system is ready to accept connections
......
also I found this error while staring the server and the PID is chaning everytime..
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2020-05-05 22:09:21.941 IST [9746] FATAL: lock file "postmaster.pid" already exists
2020-05-05 22:09:21.941 IST [9746] HINT: Is another postmaster (PID 9735) running in data directory "/Library/PostgreSQL/12/data"?
stopped waiting
pg_ctl: could not start server
Examine the log output.
bash-3.2$ kill -9 9735
bash-3.2$ pg_ctl start -D /Library/PostgreSQL/12/data
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2020-05-05 22:09:35.829 IST [9758] FATAL: lock file "postmaster.pid" already exists
2020-05-05 22:09:35.829 IST [9758] HINT: Is another postmaster (PID 9747) running in data directory "/Library/PostgreSQL/12/data"?
stopped waiting
pg_ctl: could not start server
Examine the log output.
502 9833 9832 0 10:13PM ?? 0:00.00 postgres: logger
502 9835 9832 0 10:13PM ?? 0:00.00 postgres: checkpointer
502 9836 9832 0 10:13PM ?? 0:00.04 postgres: background writer
502 9837 9832 0 10:13PM ?? 0:00.01 postgres: walwriter
502 9838 9832 0 10:13PM ?? 0:00.01 postgres: autovacuum launcher
502 9839 9832 0 10:13PM ?? 0:00.01 postgres: stats collector
502 9840 9832 0 10:13PM ?? 0:00.00 postgres: logical replication launcher
0 9641 9504 0 10:03PM ttys000 0:00.02 sudo -u postgres -s /bin/bash
502 9904 9642 0 10:37PM ttys000 0:00.00 grep postgres
The data directory should be owned by the postgres user and have user-only access (700 or u+rwx)
Does this match what you have set up?
Thom Brown
Disclosure: I am an EnterpriseDB employee.
Try running this code
pg_ctl -D /usr/local/var/postgres start
I am trying to deploy an automated high-available PostgreSQL cluster on kubernetes. In cases of master failover or temporary failures in master, standby loses streaming replication connection and when retrying, it takes a long time until it gets failed and retries.
I use PostgreSQL 10 and streaming replication (cluster-main-cluster-master-service is a service that always routes to master and all the replicas connect to this service for replication). I've tried setting configs like connect_timeout and keepalive in primary_conninfo of recovery.conf and wal_receiver_timeout in postgresql.conf of standby but I could not make any progress with them.
In the first place when master goes down, replication stops with the following error (state 1):
2019-10-06 14:14:54.042 +0330 [3039] LOG: replication terminated by primary server
2019-10-06 14:14:54.042 +0330 [3039] DETAIL: End of WAL reached on timeline 17 at 0/33000098.
2019-10-06 14:14:54.042 +0330 [3039] FATAL: could not send end-of-streaming message to primary: no COPY in progress
2019-10-06 14:14:55.534 +0330 [12] LOG: record with incorrect prev-link 0/2D000028 at 0/33000098
After investigating Postgres activities I found out that WalReceiver proccess stucks in LibPQWalReceiverConnect wait_event (state 2) but timeout is way longer than what I configured (although I set connect_timeout to 10 seconds, it takes about 2 minutes). Then, It fails with the following error (state 3):
2019-10-06 14:17:06.035 +0330 [3264] FATAL: could not connect to the primary server: could not connect to server: Connection timed out
Is the server running on host "cluster-main-cluster-master-service" (192.168.0.166) and accepting
TCP/IP connections on port 5432?
In the next try, It successfully connects the primary (state 4):
2019-10-06 14:17:07.892 +0330 [5786] LOG: started streaming WAL from primary at 0/33000000 on timeline 17
I also tried killing the process when stuck event occurs (state 2), and when I do, It starts the process again and connects and then streams normally (jumps to state 4).
After checking netstat, I also found that there is a connection with SYN_SENT state to the old master in the walreceiver process (in failover case).
connect_timeout governs how long PostgreSQL will wait for the replication connection to succeed, but that does not include establishing the TCP connection.
To reduce the time that the kernel waits for a successful answer to a TCP SYN request, reduce the number of retries. In /etc/sysctl.conf, set:
net.ipv4.tcp_syn_retries = 3
and run sysctl -p.
That should reduce the time significantly.
Reducing the value too much might make your system less stable.
I have been trying to implement this and I cannot figure out why this will not work. I have read many people downloading and running as is, but the pgpool never connects to the master or slave. I pulled the docker file from paunin's example in issue 57 and changed the image to the current postdock/postgres.
My docker compose is as follows and I am starting with the following command:
docker-compose -f .\basic.yml up -d
version: '2'
networks:
cluster:
driver: bridge
services:
pgmaster:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
NODE_ID: 1 # Integer number of node
NODE_NAME: node1 # Node name
CLUSTER_NODE_NETWORK_NAME: pgmaster
POSTGRES_PASSWORD: monkey_pass
POSTGRES_USER: monkey_user
POSTGRES_DB: monkey_db
CONFIGS: "listen_addresses:'*'"
ports:
- 5431:5432
networks:
cluster:
aliases:
- pgmaster
pgslave1:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 2
NODE_NAME: node2
CLUSTER_NODE_NETWORK_NAME: pgslave1
ports:
- 5441:5432
networks:
cluster:
aliases:
- pgslave1
pgpool:
image: postdock/pgpool
environment:
PCP_USER: pcp_user
PCP_PASSWORD: pcp_pass
WAIT_BACKEND_TIMEOUT: 60
CHECK_USER: monkey_user
CHECK_PASSWORD: monkey_pass
CHECK_PGCONNECT_TIMEOUT: 3
DB_USERS: monkey_user:monkey_pass
BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::"
CONFIGS: "num_init_children:250,max_pool:4"
ports:
- 5432:5432
- 9898:9898 # PCP
networks:
cluster:
aliases:
- pgpool
```
Both the master and the replication db seem to come up fine. I can see both in pgAdmin and I can create a table and see it appear in monkey_db. However, it is never moved over to the replica.
Here is the log for the master container:
PS C:\platform\docker\basic> docker logs basic_pgmaster_1
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
No pre-populated ssh keys!
cp: cannot stat '/home/postgres/.ssh/keys/*': No such file or directory
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> SETTING UP POLYMORPHIC VARIABLES (repmgr=3+postgres=9 | repmgr=4, postgres=10)...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
>>> Check all partner nodes for common upstream node...
>>>>>> Checking NODE=pgmaster...
psql: could not connect to server: Connection refused
Is the server running on host "pgmaster" (172.22.0.3) and accepting
TCP/IP connections on port 5432?
>>>>>> Skipping: failed to get master from the node!
>>>>>> Checking NODE=pgslave1...
psql: could not connect to server: Connection refused
Is the server running on host "pgslave1" (172.22.0.2) and accepting
TCP/IP connections on port 5432?
>>>>>> Skipping: failed to get master from the node!
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
>>> Sending in background postgres start...
>>> Waiting for local postgres server recovery if any in progress:LAUNCH_RECOVERY_CHECK_INTERVAL=30
>>> Recovery is in progress:
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
waiting for server to start....2018-09-20 06:03:29.170 UTC [85] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-09-20 06:03:29.197 UTC [86] LOG: database system was shut down at 2018-09-20 06:03:28 UTC
2018-09-20 06:03:29.202 UTC [85] LOG: database system is ready to accept connections
done
server started
CREATE DATABASE
CREATE ROLE
/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/entrypoint.sh
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Config file was replaced with standard one!
>>>>>> Adding config 'listen_addresses'=''*''
>>>>>> Adding config 'shared_preload_libraries'=''repmgr_funcs''
>>> Creating replication user 'replication_user'
CREATE ROLE
>>> Creating replication db 'replication_db'
waiting for server to shut down...2018-09-20 06:03:30.494 UTC [85] LOG: received fast shutdown request
.2018-09-20 06:03:30.514 UTC [85] LOG: aborting any active transactions
2018-09-20 06:03:30.517 UTC [85] LOG: worker process: logical replication launcher (PID 92) exited with exit code 1
2018-09-20 06:03:30.517 UTC [87] LOG: shutting down
2018-09-20 06:03:30.542 UTC [85] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2018-09-20 06:03:30.608 UTC [47] LOG: listening on IPv4 address "0.0.0.0", port 5432
2018-09-20 06:03:30.608 UTC [47] LOG: listening on IPv6 address "::", port 5432
2018-09-20 06:03:30.616 UTC [47] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-09-20 06:03:30.646 UTC [131] LOG: database system was shut down at 2018-09-20 06:03:30 UTC
2018-09-20 06:03:30.664 UTC [47] LOG: database system is ready to accept connections
>>>>>> RECOVERY_WAL_ID is empty!
>>> Not in recovery state (anymore)
>>> Waiting for local postgres server start...
>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90)
>>>>>> Schema replication_db.public exists on host pgmaster:5432!
>>> Registering node with role master
INFO: connecting to master database
INFO: master register: creating database objects inside the 'repmgr_pg_cluster' schema
INFO: retrieving node list for cluster 'pg_cluster'
[REPMGR EVENT] Node id: 1; Event type: master_register; Success [1|0]: 1; Time: 2018-09-20 06:03:56.560674+00; Details:
[REPMGR EVENT] will execute script '/usr/local/bin/cluster/repmgr/events/execs/master_register.sh' for the event
[REPMGR EVENT::master_register] Node id: 1; Event type: master_register; Success [1|0]: 1; Time: 2018-09-20 06:03:56.560674+00; Details:
[REPMGR EVENT::master_register] Locking master...
[REPMGR EVENT::master_register] Unlocking standby...
NOTICE: master node correctly registered for cluster 'pg_cluster' with id 1 (conninfo: user=replication_user password=replication_pass host=pgmaster dbname=replication_db port=5432 connect_timeout=2)
>>> Starting repmgr daemon...
[2018-09-20 06:03:56] [NOTICE] looking for configuration file in current directory
[2018-09-20 06:03:56] [NOTICE] looking for configuration file in /etc
[2018-09-20 06:03:56] [NOTICE] configuration file found at: /etc/repmgr.conf
[2018-09-20 06:03:56] [INFO] connecting to database 'user=replication_user password=replication_pass host=pgmaster dbname=replication_db port=5432 connect_timeout=2'
[2018-09-20 06:03:56] [INFO] connected to database, checking its state
[2018-09-20 06:03:56] [INFO] checking cluster configuration with schema 'repmgr_pg_cluster'
[2018-09-20 06:03:56] [INFO] checking node 1 in cluster 'pg_cluster'
[2018-09-20 06:03:56] [INFO] reloading configuration file
[2018-09-20 06:03:56] [INFO] configuration has not changed
[2018-09-20 06:03:56] [INFO] starting continuous master connection check
```
Here is the log for the slave. It appears that the primary db is cloned successfully:
> ```
>
> >>> Setting up STOP handlers...
> >>> STARTING SSH (if required)...
> No pre-populated ssh keys!
> cp: cannot stat '/home/postgres/.ssh/keys/*': No such file or directory
> >>> SSH is not enabled!
> >>> STARTING POSTGRES...
> >>> SETTING UP POLYMORPHIC VARIABLES (repmgr=3+postgres=9 | repmgr=4, postgres=10)...
> >>> TUNING UP POSTGRES...
> >>> Cleaning data folder which might have some garbage...
> >>> Check all partner nodes for common upstream node...
> >>>>>> Checking NODE=pgmaster...
> psql: could not connect to server: Connection refused
> Is the server running on host "pgmaster" (172.22.0.3) and accepting
> TCP/IP connections on port 5432?
> >>>>>> Skipping: failed to get master from the node!
> >>>>>> Checking NODE=pgslave1...
> psql: could not connect to server: Connection refused
> Is the server running on host "pgslave1" (172.22.0.2) and accepting
> TCP/IP connections on port 5432?
> >>>>>> Skipping: failed to get master from the node!
> >>> Auto-detected master name: ''
> >>> Setting up repmgr...
> >>> Setting up repmgr config file '/etc/repmgr.conf'...
> >>> Setting up upstream node...
> cat: /var/lib/postgresql/data/standby.lock: No such file or directory
> >>> Previously Locked standby upstream node LOCKED_STANDBY=''
> >>> Waiting for upstream postgres server...
> >>> Wait schema replication_db.repmgr_pg_cluster on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
> psql: could not connect to server: Connection refused
> Is the server running on host "pgmaster" (172.22.0.3) and accepting
> TCP/IP connections on port 5432?
> >>>>>> Host pgmaster:5432 is not accessible (will try 30 times more)
> >>>>>> Schema replication_db.repmgr_pg_cluster is still not accessible on host pgmaster:5432 (will try 29 times more)
> >>>>>> Schema replication_db.repmgr_pg_cluster is still not accessible on host pgmaster:5432 (will try 28 times more)
> >>>>>> Schema replication_db.repmgr_pg_cluster is still not accessible on host pgmaster:5432 (will try 27 times more)
> >>>>>> Schema replication_db.repmgr_pg_cluster exists on host pgmaster:5432!
> >>> REPLICATION_UPSTREAM_NODE_ID=1
> >>> Sending in background postgres start...
> >>> Waiting for upstream postgres server...
> >>> Wait schema replication_db.repmgr_pg_cluster on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
> >>>>>> Schema replication_db.repmgr_pg_cluster exists on host pgmaster:5432!
> >>> Starting standby node...
> >>> Instance hasn't been set up yet.
> >>> Clonning primary node...
> >>> Waiting for upstream postgres server...
> >>> Wait schema replication_db.repmgr_pg_cluster on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
> NOTICE: destination directory '/var/lib/postgresql/data' provided
> INFO: connecting to upstream node
> INFO: Successfully connected to upstream node. Current installation size is 37 MB
> INFO: checking and correcting permissions on existing directory /var/lib/postgresql/data ...
> >>>>>> Schema replication_db.repmgr_pg_cluster exists on host pgmaster:5432!
> >>> Waiting for cloning on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
> >>> Replicated: 4
> NOTICE: starting backup (using pg_basebackup)...
> INFO: executing: '/usr/lib/postgresql/10/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/data -h pgmaster -p 5432 -U replication_user -c fast -X stream -S repmgr_slot_2 '
> NOTICE: standby clone (using pg_basebackup) complete
> NOTICE: you can now start your PostgreSQL server
> HINT: for example : pg_ctl -D /var/lib/postgresql/data start
> HINT: After starting the server, you need to register this standby with "repmgr standby register"
> [REPMGR EVENT] Node id: 2; Event type: standby_clone; Success [1|0]: 1; Time: 2018-09-20 06:04:08.427899+00; Details: Cloned from host 'pgmaster', port 5432; backup method: pg_basebackup; --force: Y
> >>> Configuring /var/lib/postgresql/data/postgresql.conf
> >>>>>> Will add configs to the exists file
> >>>>>> Adding config 'shared_preload_libraries'=''repmgr_funcs''
> >>> Starting postgres...
> >>> Waiting for local postgres server recovery if any in progress:LAUNCH_RECOVERY_CHECK_INTERVAL=30
> >>> Recovery is in progress:
> 2018-09-20 06:04:08.517 UTC [163] LOG: listening on IPv4 address "0.0.0.0", port 5432
> 2018-09-20 06:04:08.517 UTC [163] LOG: listening on IPv6 address "::", port 5432
> 2018-09-20 06:04:08.521 UTC [163] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
> 2018-09-20 06:04:08.549 UTC [171] LOG: database system was interrupted; last known up at 2018-09-20 06:04:06 UTC
> 2018-09-20 06:04:09.894 UTC [171] LOG: entering standby mode
> 2018-09-20 06:04:09.903 UTC [171] LOG: redo starts at 0/2000028
> 2018-09-20 06:04:09.908 UTC [171] LOG: consistent recovery state reached at 0/20000F8
> 2018-09-20 06:04:09.908 UTC [163] LOG: database system is ready to accept read only connections
> 2018-09-20 06:04:09.916 UTC [175] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
> >>> Cloning is done
> >>>>>> WAL id: 000000010000000000000003
> >>>>>> WAL_RECEIVER_FLAG=1!
> >>> Not in recovery state (anymore)
> >>> Waiting for local postgres server start...
> >>> Wait schema replication_db.public on pgslave1:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90)
> >>>>>> Schema replication_db.public exists on host pgslave1:5432!
> >>> Unregister the node if it was done before
> DELETE 0
> >>> Registering node with role standby
> INFO: connecting to standby database
> INFO: connecting to master database
> INFO: retrieving node list for cluster 'pg_cluster'
> INFO: registering the standby
> [REPMGR EVENT] Node id: 2; Event type: standby_register; Success [1|0]: 1; Time: 2018-09-20 06:04:38.676889+00; Details:
> INFO: standby registration complete
> NOTICE: standby node correctly registered for cluster pg_cluster with id 2 (conninfo: user=replication_user password=replication_pass host=pgslave1 dbname=replication_db port=5432 connect_timeout=2)
> Locking standby (NEW_UPSTREAM_NODE_ID=1)...
> >>> Starting repmgr daemon...
> [2018-09-20 06:04:38] [NOTICE] looking for configuration file in current directory
> [2018-09-20 06:04:38] [NOTICE] looking for configuration file in /etc
> [2018-09-20 06:04:38] [NOTICE] configuration file found at: /etc/repmgr.conf
> [2018-09-20 06:04:38] [INFO] connecting to database 'user=replication_user password=replication_pass host=pgslave1 dbname=replication_db port=5432 connect_timeout=2'
> [2018-09-20 06:04:38] [INFO] connected to database, checking its state
> [2018-09-20 06:04:38] [INFO] connecting to master node of cluster 'pg_cluster'
> [2018-09-20 06:04:38] [INFO] retrieving node list for cluster 'pg_cluster'
> [2018-09-20 06:04:38] [INFO] checking role of cluster node '1'
> [2018-09-20 06:04:38] [INFO] checking cluster configuration with schema 'repmgr_pg_cluster'
> [2018-09-20 06:04:38] [INFO] checking node 2 in cluster 'pg_cluster'
> [2018-09-20 06:04:38] [INFO] reloading configuration file
> [2018-09-20 06:04:38] [INFO] configuration has not changed
> [2018-09-20 06:04:38] [INFO] starting continuous standby node monitoring
> ```
```
Here is the pgpool log:
> >>> STARTING SSH (if required)...
> cp: cannot stat '/home/postgres/.ssh/keys/*': No such file or directory
> No pre-populated ssh keys!
> >>> SSH is not enabled!
> >>> TURNING PGPOOL...
> >>> Opening access from all hosts by md5 in /usr/local/etc/pool_hba.conf
> >>> Adding user pcp_user for PCP
> >>> Creating a ~/.pcppass file for pcp_user
> >>> Adding users for md5 auth
> >>>>>> Adding user monkey_user
> >>> Adding check user 'monkey_user' for md5 auth
> >>> Adding user 'monkey_user' as check user
> >>> Adding user 'monkey_user' as health-check user
> >>> Adding backends
> >>>>>> Waiting for backend 0 to start pgpool (WAIT_BACKEND_TIMEOUT=60)
> 2018/09/20 06:03:26 Waiting for host: tcp://pgmaster:5432
> 2018/09/20 06:04:26 Timeout after 1m0s waiting on dependencies to become available: [tcp://pgmaster:5432]
> >>>>>> Will not add node 0 - it's unreachable!
> >>>>>> Waiting for backend 1 to start pgpool (WAIT_BACKEND_TIMEOUT=60)
> 2018/09/20 06:04:26 Waiting for host: tcp://pgslave1:5432
> 2018/09/20 06:05:26 Timeout after 1m0s waiting on dependencies to become available: [tcp://pgslave1:5432]
> >>>>>> Will not add node 1 - it's unreachable!
> >>> Checking if we have enough backends to start
> >>>>>> Will start pgpool REQUIRE_MIN_BACKENDS=0, BACKENDS_COUNT=0
> >>> Configuring /usr/local/etc/pgpool.conf
> >>>>>> Adding config 'num_init_children' with value '250'
> >>>>>> Adding config 'max_pool' with value '4'
> >>> STARTING PGPOOL...
> 2018-09-20 06:05:26: pid 62: LOG: Backend status file /var/log/postgresql/pgpool_status does not exist
> 2018-09-20 06:05:26: pid 62: LOG: Setting up socket for 0.0.0.0:5432
> 2018-09-20 06:05:26: pid 62: LOG: Setting up socket for :::5432
> 2018-09-20 06:05:26: pid 62: LOG: find_primary_node_repeatedly: waiting for finding a primary node
> 2018-09-20 06:05:26: pid 320: FATAL: pgpool is not accepting any new connections
> 2018-09-20 06:05:26: pid 320: DETAIL: all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:05:26: pid 320: HINT: repair the backend nodes and restart pgpool
> 2018-09-20 06:05:26: pid 62: LOG: child process with pid: 320 exits with status 256
> 2018-09-20 06:05:26: pid 62: LOG: fork a new child process with pid: 333
> 2018-09-20 06:06:26: pid 319: FATAL: pgpool is not accepting any new connections
> 2018-09-20 06:06:26: pid 319: DETAIL: all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:06:26: pid 319: HINT: repair the backend nodes and restart pgpool
> 2018-09-20 06:06:26: pid 62: LOG: child process with pid: 319 exits with status 256
> 2018-09-20 06:06:26: pid 62: LOG: fork a new child process with pid: 351
> 2018-09-20 06:07:26: pid 333: FATAL: pgpool is not accepting any new connections
> 2018-09-20 06:07:26: pid 333: DETAIL: all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:07:26: pid 333: HINT: repair the backend nodes and restart pgpool
> 2018-09-20 06:07:26: pid 62: LOG: child process with pid: 333 exits with status 256
> 2018-09-20 06:07:26: pid 62: LOG: fork a new child process with pid: 370
> 2018-09-20 06:08:26: pid 370: FATAL: pgpool is not accepting any new connections
> 2018-09-20 06:08:26: pid 370: DETAIL: all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:08:26: pid 370: HINT: repair the backend nodes and restart pgpool
> 2018-09-20 06:08:26: pid 62: LOG: child process with pid: 370 exits with status 256
> 2018-09-20 06:08:26: pid 62: LOG: fork a new child process with pid: 388
> 2018-09-20 06:09:27: pid 302: FATAL: pgpool is not accepting any new connections
> 2018-09-20 06:09:27: pid 302: DETAIL: all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:09:27: pid 302: HINT: repair the backend nodes and restart pgpool
> 2018-09-20 06:09:27: pid 62: LOG: child process with pid: 302 exits with status 256
> 2018-09-20 06:09:27: pid 62: LOG: fork a new child process with pid: 406
> 2018-09-20 06:10:27: pid 316: FATAL: pgpool is not accepting any new connections
> 2018-09-20 06:10:27: pid 316: DETAIL: all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:10:27: pid 316: HINT: repair the backend nodes and restart pgpool
> 2018-09-20 06:10:27: pid 62: LOG: child process with pid: 316 exits with status 256
> 2018-09-20 06:10:27: pid 62: LOG: fork a new child process with pid: 424
> 2018-09-20 06:11:27: pid 351: FATAL: pgpool is not accepting any new connections
> 2018-09-20 06:11:27: pid 351: DETAIL: all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:11:27: pid 351: HINT: repair the backend nodes and restart pgpool
> 2018-09-20 06:11:27: pid 62: LOG: child process with pid: 351 exits with status 256
> 2018-09-20 06:11:27: pid 62: LOG: fork a new child process with pid: 442
> ``` ```
I thought this was an issue with WAL shipping, but it appears to clone the db successfully and also registers based on the logs. This appears to be something with the PGPOOL and I don't see what I am missing.
Any help would be greatly appreciated.
Thanks.
From czarny94 on the github issues page:
Try to change "createdb" line of /src/pgsql/bin/postgres/primary/entrypoint.sh file. Diff from origin/master and mine after changes below:
diff --git a/src/pgsql/bin/postgres/primary/entrypoint.sh b/src/pgsql/bin/postgres/primary/entrypoint.sh
index b8451f5..030cbc7 100755
--- a/src/pgsql/bin/postgres/primary/entrypoint.sh
+++ b/src/pgsql/bin/postgres/primary/entrypoint.sh
## -3,11 +3,11 ## set -e
FORCE_RECONFIGURE=1 postgres_configure
...
echo ">>> Creating replication db '$REPLICATION_DB'"
-createdb $REPLICATION_DB -O $REPLICATION_USER
+createdb -U "${POSTGRES_USER}" "${REPLICATION_DB}" -O "${REPLICATION_USER}"
...
-echo "host replication $REPLICATION_USER 0.0.0.0/0 md5" >> $PGDATA/pg_hba.conf
+echo "host replication $REPLICATION_USER 0.0.0.0/0 trust" >> $PGDATA/pg_hba.conf