pgpool 4.1.0 healthcheck getsockopt() detected error "Connection refused" - postgresql

I am trying to setup a pgpool loadbalancer for a Postgresql streaming replication cluster.
I am using postgresql-12 and pgpool2-4.1.0 from the Postgresql repo https://apt.postgresql.org/pub/repos/apt/ on Debian 10.2 (latest stable).
I have setup Postgresql cluster with streaming replication using physical slots (not WAL shipping) and everything seems to be working properly. The secondaries connect replicate data without any issues.
Then I installed the pgpool2-4.1.0 on the same servers. I have made the proper modifications to pgpool.conf according to the pgpool wiki and I have enabled the watchdog process.
When I start pgpool, on all three nodes, I can see that watchdog is working properly, quorum exists and pgpool elects a master (pgpool node) which also enables the virtual IP from the configuration.
I can connect to the postgres backend via pgpool and issue read and write commands successfully.
The problem appears on the pgpool logs, from syslog, I get:
Jan 13 15:10:30 debian10 pgpool[9826]: 2020-01-13 15:10:30: pid 9870: LOG: failed to connect to PostgreSQL server on "pg1:5433", getsockopt() detected error "Connection refused"
Jan 13 15:10:30 debian10 pgpool[9826]: 2020-01-13 15:10:30: pid 9870: LOCATION: pool_connection_pool.c:680
When checking the PID mentioned above, I get the pgpool healthcheck process. I
pg1, pg2, pg3 are the database servers listening on all addresses on port 5433, pg1 is the primary.
pgpool listens on 5432.
The database user that is used for the healthcheck is "pgpool", I have verified that I can connect to the database using that user from all hosts on the particular subnet.
When I disable the healthcheck the issue goes away, but the defeats the purpose.
Any ideas?

Turns out it was name resolution in the /etc/hosts file and the postgresql.conf.
Specifically, the /etc/hosts was like this:
vagrant#pg1:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 pg1
....
10.10.20.11 pg1
....
And postgresql.conf like this:
....
listen_addresses = 'localhost,10.10.20.11' # what IP address(es) to listen on;
....
So when healthcheck tried to reach the local node on every machine, it would check via hostname (pg1, pg2, etc). With the hosts file above that leads to 127.0.1.1 that postgresql doesn't listen, so it would fail, hence the error, and then try with the 10.10.20.11 which would be successful. That also explains why there was no error from healthchecks of remote hosts.
I changed the hosts file to the following:
vagrant#pg1:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 pg1-local
....
10.10.20.11 pg1
....
And the logs are clear.
This is Debian specific, as Red Hat-based distros don't have a
127.0.1.1 hostname
record in their /etc/hosts

Related

postgres doesnt restart after making the necessary .conf file changes

apologies first. On a steep learning curve more like a wall and will be verbose and lacking in jargon but too old to change that now. I'm trying to access a Postgresql-13 (with postgis-3 extensions) database from a machine other than the one where it is hosted. Before doing anything at all other than install, create and consume a spatial data file, this is the partial screen dump from
sudo netstat -ltpn
Proto
Recv-Q
Send-Q
Local Address
Foreign Address
State
PID/Program name
tcp
0
0
127.0.0.1:5432
0.0.0.0:*
LISTEN
993/postgres
... and the database is accessible from pgadmin4 on the Ubuntu 20.04 machine which is hosting it. I can also connect to the database in QGIS on the same Ubuntu machine through 127.0.0.1 and port 5432. Nothing special about that other than knowing it works. What I want to do is connect to that database from any machine running QGIS (or another GIS platform which can consume postgis).
Port forwarding rules on the router are set for 80, 8080 and 5900 to point to 10.0.0.55 which is the IP for the Ubuntu 20.04 which is hosting. I have a dynamic DNS pointing the routers IP to give it the name http://blah.blah.net (not actually that, but close).
pgadmin4 is installed and configured to run in server and I can access the database through pgadmin4 from the host or any other computer using http://blah.blah.net/pgadmin4. Not sure I got this config quite right but it works. Geoserver is also running smoothly on Tomcat9 accessible from anywhere through http://blah.blah.net:8080/Geoserver/web/. And a fancy front end in the making on http://blah.blah.net.
Have then done the following mods to the two .conf files in etc/postgresql/13/main which are well noted and documented already ...
host all all 0.0.0.0/0 md5
in the pg_hba.conf
and also ...
listen_addresses='*' (and removed the leading #)
port = 5432 (was already)
in the postgresql.conf
Then restarted postgres with ...
sudo service postgresql restart
... and then port 5432 disappears completely and the database is not accessible from anywhere. Not even the host computer. Checked that postgresql was running with ...
sudo systemctl status postgresql
... which it appears to be. But am getting nothing from QGIS or pgadmin4. Not even from the host machine.
Got a lot of questions but the most obvious is am I missing something? Not sure if I need a rule at the router for port 5432 - did try that in a similar fashion to the rules above but it didn't change anything. Appreciate any help. Cheers ... R
The local address in the netstat output:
127.0.0.1:5432
indicates that PostgreSQL is only listening on the loopback interface. Since you changed listen_addresses in postgresql.conf, I conclude that you forgot to restart PostgreSQL.
To confirm, run this query:
SELECT setting, pending_restart
FROM pg_settings
WHERE name = 'listen_addresses';
It should either show the wrong value (if you didn't reload or changed the wrong file altogether) or show pending_restart as TRUE (if you reloaded, but didn't restart).

pgpool-II Session to Watchdog Delegate Ip Terminated When Either Primary or Standby Node Failed

I'm trying to setup postgres cluster of two nodes (primary and standby). In order to activate automatic failover, I'm using pgpool-II.
I followed the following article: https://www.pgpool.net/docs/41/en/html/example-cluster.html
and the only thing difference I did is installing postgresql version 12 instead of version 11.
Knowing that I'm trying it useing two centos7 images on VMware. I faced the following issues:
When I run systemctl status pgpool.service on both nodes, it returned success.
Also I can access postgresql using the watchdog delegate IP.
But what testing failover, everything goes wrong.
Scenario 1:
I accessed my database using watchdog delegate Ip.
I disconnect the standby server.
Result: My session to postgresql continued to work for less than a minute and then it failed. and I'm unable to connect again, until I reconnect the standby node, and restart the pgpool service again.
Scenario 2:
I accessed my database using watchdog delegate Ip.
I disconnect the primary server.
Result: My session stopped directly. and the standby server is not promoted to be master.
I noticed something (might be related to the above described problem): when I try to run the following command
psql 192.168.220.146 -p 9999 -U postgres -c "show pool_nodes"
it fails to work and returned the following:
psql: error: could not connect to server: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.9999"
However if I ran: psql 192.168.220.160 -p 5432 -U postgres
it works fine and I can access the postgres shell.
My pool_hba file:
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
host all pgpool 0.0.0.0/0 scram-sha-256
host all postgres 0.0.0.0/0 scram-sha-256
Any help would be appreciated.
I followed the following article: https://www.pgpool.net/docs/41/en/html/example-cluster.html and the only thing difference I did is installing postgresql version 11.
I not ping delegate_IP = '192.168.1.233'. May i help you?
Thanks you.
you are not providing -h argument to psql for specifying the IP address. So effectively psql is trying to connect to UNIX domain socket and considering the IP address in the command as the database name.
Try putting -h before the IP address
psql -h 192.168.220.146 -p 9999 -U postgres -c "show pool_nodes"

Postgres failing to start on Windows Server 2016 in Azure

I am having issues starting Postgres 9.6.6 on Windows Server 2016 on an Azure VM.
When I try to start Postgres it generates a log file with the following:
LOG: could not bind IPv6 socket: Permission denied
HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
LOG: could not bind IPv4 socket: Permission denied
HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
WARNING: could not create listen socket for "localhost"
FATAL: could not create any TCP/IP sockets
LOG: database system is shut down
Its a completely brand new VM, I've amended postgresql.conf and the listen_addresses to include 127.0.0.1 but I still get the same binding errors.
If I run nslookup on the VM I receive back the following;
Server: Unknown
Address: 148.43.119.15
*** UnKnown can't find localhost: Non-existent domain
So I think the fact that lookup is failing for localhost is possibly causing the problem. I've amended the hosts file on the VM to have:
127.0.0.1 localhost
:1 localhost
But the same errors are occurring, so I think its something that I've not setup in terms of networking but not sure where to look.
If my understanding is right, 148.43.119.15 is your VM's public IP. You could not listen postgresql on the IP, I get the same result if I set similar with you.
Please modify postgreqls.conf
listen_addresses = '*'
For Azure VM, if you want to access postgresql with Public IP you need open port
on Azure NSG and Windows Firewall.

Heroku Postgresql pg:psql No Route to host

For now I am just trying to make the heroku pg:psql command work but my final purpose is to copy a database that I have on my computer (localhost) to the heroku postgresql database with the pg:push command.
For now when I simply try to access the database that I created on heroku, the heroku pg:psql command returns:
psql: could not connect to server: No route to host
Is the server running on host "ec*-**-***-***-**.eu-west-1.compute.amazonaws.com" (**.***.***.**) and accepting
TCP/IP connections on port 5432?
postgresql.conf: (the lines are not commented)
listen_address ='*'
port = 5432
ssl = true
and host all all **.***.***.** trust in pg_hba.conf
I also tried to add rules to iptables in order to give access to the database from the host IP address provided by heroku.
I am on a Debian computer, how can I solve this?
psql: could not connect to server: No route to host
It means your PostgreSQL server is not starting up or is starting up on a different port.
Solutions you may try:
Check PostgreSQL service by command ps -ef | grep Postgres.
Check the port which PostgreSQL is listening to by command netstat -tupln | grep Postgres.
Make sure your server enables UDP port because PostgreSQL needs UDP port loopback for stats collector service.
Check the startup logs or database logs at pg_log about the problem.

Can't connect to Postgres server

I am attempting to run pg_basebackup in order to create a slave server, but I keep getting this error:
pg_basebackup: could not connect to server: could not connect to server: No route to host
Is the server running on host "192.168.1.164" and accepting
TCP/IP connections on port 5432?
On the 192.168.1.164 server, the postgresql.conf file has:
listen_addresses = '*' # originally 'localhost, 192.168.1.63'
port = 5432
and the pg_hba.conf file has:
host replication replicator 192.168.1.63/32 md5
where 192.168.1.63 is the slave server.
The link between the two machines is fine since I can SSH from either one to the other using those IPs. Also, the service postgresql is started on the master, and stoped on the slave. The master has a Postgres user replicator.
I am running both machines with PostgreSQL 9.4.4 and Fedora 22.
EDIT: from the master's psql, running SHOW config_file; and SHOW hba_file; matches with the files I've been editing, and of course, the server was restarted after the edits.
It turns out this was a firewall issue. The solution is this:
firewall-cmd --permanent --add-port=5432/tcp
firewall-cmd --add-port=5432/tcp
Note: I came from Ubuntu which doesn't have this port blocked, so I didn't realize it needed to be opened up.