Issue in postgresql HA mode switching of Master node - postgresql

I am new in postgresqlDB configuration. I am trying to configure postgresDB in HA mode with the help of pgpool and Elastic IP. Full setup is in AWS RHEL 8 servers.
pgpool version : 4.1.2
postgres version - 12
Below links I have followed during the configuration:
https://www.pgpool.net/docs/pgpool-II-4.1.2/en/html/example-cluster.html#EXAMPLE-CLUSTER-STRUCTURE
https://www.pgpool.net/docs/42/en/html/example-aws.html
https://www.enterprisedb.com/docs/pgpool/latest/03_configuring_connection_pooling/
Currently the postgres and pgpool services are up in all 3 component nodes. But if I am stopping master postgres service/server whole setup is going down and standby node is not taking the place of master. Please find the status of the pool nodes when master is down:
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+--------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | server1 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2022-10-12 12:10:13
1 | server2 | 5432 | up | 0.333333 | standby | 0 | true | 0 | | | 2022-10-13 09:16:07
2 | server3 | 5432 | up | 0.333333 | standby | 0 | false | 0 | | | 2022-10-13 09:16:07
Any help would be appreciated. Thanks in advance.

Related

PG_WAL is very big size

I have a Postgres cluster with 3 nodes: ETCD+Patroni+Postgres13.
Now there was a problem of constantly growing pg_wal folder. It now contains 5127 files. After searching the internet, I found an article advising you to pay attention to the following database parameters (their meaning at the time of the case is this):
archive_mode off;
wal_level replica;
max_wal_size 1G;
SELECT * FROM pg_replication_slots;
postgres=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]-------+------------
slot_name | db2
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 2247228
xmin |
catalog_xmin |
restart_lsn | 2D/D0ADC308
confirmed_flush_lsn |
wal_status | reserved
safe_wal_size |
-[ RECORD 2 ]-------+------------
slot_name | db1
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 2247227
xmin |
catalog_xmin |
restart_lsn | 2D/D0ADC308
confirmed_flush_lsn |
wal_status | reserved
safe_wal_size |
All other functionality of the Patroni cluster works (switchover, reinit, replication);
root#srvdb3:~# patronictl -c /etc/patroni/patroni.yml list
+ Cluster: mobile (7173650272103321745) --+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+------------+---------+---------+----+-----------+
| db1 | 10.01.1.01 | Replica | running | 17 | 0 |
| db2 | 10.01.1.02 | Replica | running | 17 | 0 |
| db3 | 10.01.1.03 | Leader | running | 17 | |
+--------+------------+---------+---------+----+-----------+
Patroni patroni-edit:
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
parameters:
checkpoint_timeout: 30
hot_standby: 'on'
max_connections: '1100'
max_replication_slots: 5
max_wal_senders: 5
shared_buffers: 2048MB
wal_keep_segments: 5120
wal_level: replica
use_pg_rewind: true
use_slots: true
retry_timeout: 10
ttl: 100
Help please, what could be the matter?
This is what I see in pg_stat_archiver:
postgres=# select * from pg_stat_archiver;
-[ RECORD 1 ]------+------------------------------
archived_count | 0
last_archived_wal |
last_archived_time |
failed_count | 0
last_failed_wal |
last_failed_time |
stats_reset | 2023-01-06 10:21:45.615312+00
If you have wal_keep_segments set to 5120, it is completely normal if you have 5127 WAL segments in pg_wal, because PostgreSQL will always retain at least 5120 old WAL segments. If that is too many for you, reduce the parameter. If you are using replication slots, the only disadvantage is that you might only be able to pg_rewind soon after a failover.

Unable to create new database

When I create a new mysql db, slashdb's test connection fails.
Here is how I log into mysql:
$ mysql -u 7stud -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 5.5.5-10.4.13-MariaDB Homebrew
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| chat |
| ectoing_repo |
| ejabberd |
| information_schema |
| mydb |
| mysql |
| performance_schema |
| test |
+--------------------+
8 rows in set (0.00 sec)
mysql> use mydb;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+----------------+
| Tables_in_mydb |
+----------------+
| cheetos |
| greetings |
| mody |
| people |
+----------------+
4 rows in set (0.00 sec)
mysql> select * from people;
+----+--------+------+
| id | name | info |
+----+--------+------+
| 1 | 7stud | abc |
| 2 | Beth | xxx |
| 3 | Diane | xyz |
| 4 | Kathy | xyz |
| 5 | Kathy | xyz |
| 6 | Dave | efg |
| 7 | Tom | zzz |
| 8 | David | abc |
| 9 | Eloise | abc |
| 10 | Jess | xyz |
| 11 | Jeffsy | 2.0 |
| 12 | XXX | xxx |
| 13 | XXX | xxx |
+----+--------+------+
13 rows in set (0.00 sec)
In the slashdb form for creating a new database, here is the info I entered:
Hostname: 127.0.0.1
Port: 80
Database Login: 7stud
Database Password: **
Database Name: mydb
Then I hit the "Test Connection" button, whereupon I get a spinning wheel, which disappears after a few minutes, but no "Connection Successful" message. What am I doing wrong?
Now, I'm using port 3306:
mysql> SHOW GLOBAL VARIABLES LIKE 'PORT';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| port | 3306 |
+---------------+-------+
1 row in set (0.00 sec)
but when slashdb tries to connect, I get the error:
Host localhost:3306 is not accessible
Your port is wrong in the database connection. You said that your MySQL is configured on port 3306, but you also posted your SlashDB config for database with port 80. Please change that to 3306.
Also, not sure how if you don't need to enable remote access to MySQL. Even if your SlashDB is running on the same machine as the MySQL database, it uses TCP/IP to connect.

PostgreSQL master server hangs on replication flow

First of all, I'm not a data engineer, so I'll try to do my best to give you all needed things to resolve my problem :/
Context:
I'm trying to create 2 PostgreSQL servers, 1 master and 1 slave.
psql (PostgreSQL) 10.9 (Ubuntu 10.9-0ubuntu0.18.04.1)
As far as I understand, it's not a good idea to do a synchronous replication when we only have 2 servers. But I have to understand what's going on here...
Problem:
Master server hangs when I try to execute a CREATE SCHEMA test;.
But, schema exists on Master, and exists on Slave too. The Master hangs because it waits for the slave commit status...
Configuration of Master:
/etc/postgresql/10/main/conf.d/master.conf
# Connection
listen_addresses = '127.0.0.1,slave-ip'
ssl = on
ssl_cert_file = '/etc/ssl/postgresql/certs/server.pem'
ssl_key_file = '/etc/ssl/postgresql/private/server.key'
ssl_ca_file = '/etc/ssl/postgresql/certs/server.pem'
password_encryption = scram-sha-256
# WAL
wal_level = replica
synchronous_commit = remote_apply #local works, remote_apply hangs
# Archive
archive_mode = on
archive_command = 'rsync -av %p postgres#lab-3:/var/lib/postgresql/wal_archive_lab_2/%f'
# Replication master
max_wal_senders = 2
wal_keep_segments = 100
synchronous_standby_names = 'ANY 1 ("lab-3")'
/etc/postgresql/10/main/pg_hba.conf
hostssl replication replicate slave-ip/32 scram-sha-256
Configuration of Slave:
/etc/postgresql/10/main/conf.d/standby.conf
# Connection
listen_addresses = '127.0.0.1,master-ip'
ssl = on
ssl_cert_file = '/etc/ssl/postgresql/certs/server.pem'
ssl_key_file = '/etc/ssl/postgresql/private/server.key'
ssl_ca_file = '/etc/ssl/postgresql/certs/server.pem'
password_encryption = scram-sha-256
# WAL
wal_level = replica
# Archive
archive_mode = on
archive_command = 'rsync -av %p postgres#lab-3:/var/lib/postgresql/wal_archive_lab_3/%f'
# Replication slave
max_wal_senders = 2
wal_keep_segments = 100
hot_standby = on
/var/lib/postgresql/10/main/recovery.conf
standby_mode = on
primary_conninfo = 'host=master-ip port=5432 user=replicate password=replicate_password sslmode=require application_name="lab-3"'
trigger_file = '/var/lib/postgresql/10/postgresql.trigger'
I got absolutely NOTHING in log files when it hangs, just the error when I Ctrl+C to abort on the master instance:
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.
Is there a way to check what append, and why it stays stuck like this ?
EDIT 1
The content of pg_stat_replication :
Before query
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state
-------+----------+-----------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------
54431 | 16384 | replicate | "lab-3" | slave-ip | | 47742 | 2019-08-06 07:56:48.105056+02 | | streaming | 0/110000D0 | 0/110000D0 | 0/110000D0 | 0/110000D0 | | | | 0 | async
(1 row)
While it hangs / after
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state
-------+----------+-----------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------------+-----------------+---------------+---------------+------------
54431 | 16384 | replicate | "lab-3" | slave-ip | | 47742 | 2019-08-06 07:56:48.105056+02 | | streaming | 0/11000C10 | 0/11000C10 | 0/11000C10 | 0/11000C10 | 00:00:00.000521 | 00:00:00.004421 | 00:00:00.0045 | 0 | async
(1 row)
Thanks !
As Laurenz Albe said, the problem was the quoting of the synchronous standby name.
Documentation explains that it should be quoted in the synchronous_standby_names configuration entry on master server if it contains dash, but it must not be quoted in the primary_conninfo value on the slave.

pgpool-II 3.7.5 not caching PG connections

Shouldn't pgpool cache PG backend processes? After disconnecting and reconnecting pool_backendpidchanges.
Relevant parameters:
num_init_children = 1
max_pool = 1
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0
connection_cache = on
Test:
postgres#node3:/etc/pgpool2$ psql -p 5433 -U postgres postgres
psql (9.6.10)
Type "help" for help.
postgres=# show pool_pools;
LOG: statement: show pool_pools;
pool_pid | start_time | pool_id | backend_id | database | username | create_time | majorversion | minorversion | pool_counter | pool_backendpid | pool_connected
----------+---------------------+---------+------------+----------+----------+---------------------+--------------+--------------+--------------+-----------------+----------------
3569 | 2018-09-13 20:18:22 | 0 | 0 | postgres | postgres | 2018-09-13 20:25:04 | 3 | 0 | 1 | 3631 | 1
(1 row)
postgres=# \q
postgres#node3:/etc/pgpool2$ psql -p 5433 -U postgres postgres
psql (9.6.10)
Type "help" for help.
postgres=# show pool_pools;
LOG: statement: show pool_pools;
pool_pid | start_time | pool_id | backend_id | database | username | create_time | majorversion | minorversion | pool_counter | pool_backendpid | pool_connected
----------+---------------------+---------+------------+----------+----------+---------------------+--------------+--------------+--------------+-----------------+----------------
3569 | 2018-09-13 20:18:22 | 0 | 0 | postgres | postgres | 2018-09-13 20:25:15 | 3 | 0 | 1 | 3640 | 1
(1 row)
Found out why:
connection_cache (boolean)
Caches connections to backends when set to on. Default is on. However,
connections to template0, template1, postgres and regression databases
are not cached even if connection_cache is on.
I was connecting to postgres database.

Unable to start HandlerSocket with mariadb

For some reason, I cannot get HandlerSocket to start listening when I start mariadb (version
10.0.14). I am using Cent OS 6.5.
my.cnf has the following settings:
handlersocket_port = 9998
handlersocket_port_wr = 9999
handlersocket_address = 127.0.0.1
Calling "SHOW GLOBAL VARIABLES LIKE 'handlersocket%'" from the mariaDb prompt shows:
+-------------------------------+-----------+
| Variable_name | Value |
+-------------------------------+-----------+
| handlersocket_accept_balance | 0 |
| handlersocket_address | 127.0.0.1 |
| handlersocket_backlog | 32768 |
| handlersocket_epoll | 1 |
| handlersocket_plain_secret | |
| handlersocket_plain_secret_wr | |
| handlersocket_port | 9998 |
| handlersocket_port_wr | 9999 |
| handlersocket_rcvbuf | 0 |
| handlersocket_readsize | 0 |
| handlersocket_sndbuf | 0 |
| handlersocket_threads | 16 |
| handlersocket_threads_wr | 1 |
| handlersocket_timeout | 300 |
| handlersocket_verbose | 10 |
| handlersocket_wrlock_timeout | 12 |
+-------------------------------+-----------+
I can start mariadb successfully, but when I check to see which ports are actively listening,
neither 9998 nor 9999 show up. I've checked the mysqld.log file, but no errors seem to be occurring.
Answering my own question here -
SELINUX needed to be set to permissive mode to get HandlerSocket started.