I am using postgres v14.4. I would use barman to make a backup of this instance. Log archives are not needed and hence archive_mode = off.
The problem is that barman refuses to make a backup with the archive mode disabled. Is there a way to do a consistent backup with barman without keeping the archives?
Below is the barman configuration used:
[xxxxxxx]
description = "Cluster x of type oltp"
active = true
archiver = true
streaming_archiver = false
; The mandatory connection info to PostgreSQL
conninfo = host=xxxxxxx port=xxxxx user=barman dbname=postgres
; Activate pg_basebackup
backup_method = postgres
reuse_backup = off
; The mandatory connection info to login with REPLICATION privileges
streaming_conninfo = host=xxxxxxxxx port=xxxxx user=streaming_barman
slot_name = barman_ge
; recover WAL files with barman-wal-restore command
recovery_options = get-wal
path_prefix=/usr/pgsql-14/bin
backup_directory=/barman/aa/bb
Related
So I am running a k8s cluster with 3 pod postgres cluster fronted by a 3 pod pgbouncer cluster. Connecting to that is a batch job with multiple parallel workers which stream data into the database via pgbouncer. If I run 10 of these batch job pods everything works smoothly. If I go up an order of magnitude to 100 job pods, a large portion of them fail to connect to the database with the error got error driver: bad connection. Multiple workers run on the same node (5 worker pods per node) So it's only ~26 pods in the k8s cluster.
What's maddening is I'm not seeing any postgres or pgbouncer error/warning logs in Kibana and their pods aren't failing. Also Prometheus logging shows it to be well under the max connections.
Below are the postgres and pgbouncer configs along with the connection code of the workers.
Relevant Connection Code From Worker:
err = backoff.Retry(func() error {
p.connection, err = gorm.Open(postgres.New(postgres.Config{
DSN: p.postgresUrl,
}), &gorm.Config{Logger: newLogger})
return err
}, backoff.NewExponentialBackOff())
if err != nil {
log.Panic(err)
}
Postgres Config:
postgresql:
parameters:
max_connections = 200
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 6990kB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 6
max_parallel_workers_per_gather = 3
max_parallel_workers = 6
max_parallel_maintenance_workers = 3
PgBouncer Config:
[databases]
* = host=loot port=5432 auth_user=***
[pgbouncer]
listen_port = 5432
listen_addr = *
auth_type = md5
auth_file = /pgconf/users.txt
auth_query = SELECT username, password from pgbouncer.get_auth($1)
pidfile = /tmp/pgbouncer.pid
logfile = /dev/stdout
admin_users = ***
stats_users = ***
default_pool_size = 20
max_client_conn = 600
max_db_connections = 190
min_pool_size = 0
pool_mode = session
reserve_pool_size = 0
reserve_pool_timeout = 5
query_timeout = 0
ignore_startup_parameters = extra_float_digits
Screenshot of Postgres DB Stats
Things I've tried:
Having the jobs connect directly to the cluster IP of the Pgbouncer service to rule out DNS.
Increasing PgBouncer connection pool
I'm not sure what the issue is here since I don't have any errors from the DB side to fix and a basic error message from the job side. Any help would be appreciated and I can add more context if a key piece is missing.
This ended up being an issue of postgres not actually using the configmap I had set. The map was for 200 connections but the actual DB was still at the default of 100.
Not much to learn here other than make sure to check that the configs you set actually propagate to the actual service.
I have a postgres active/standby cluster of 2 nodes, I have used repmgr for creating the cluster. The issue is, automatic failover is not happening when I stop the postgres services on master node. Contents of file repmgr.conf on master are as follows:
node_id=1
data_directory='/data/pgdatabase/masterdb/data'
node_name=node1
conninfo='host=IP-Of-Master user=repmgr dbname=repmgr'
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr/11/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr/11/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/var/log/repmgr/repmgr.log'
log_level=NOTICE
reconnect_attempts=4
reconnect_interval=5
repmgrd_service_start_command='sudo systemctl repmgr11 start'
repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
Contents of postgresql.conf are as follows:
listen_addresses = '*'
shared_preload_libraries = 'repmgr'
max_wal_senders = 15
max_replication_slots = 15
wal_level = 'replica'
hot_standby = on
archive_mode = on
archive_command = 'cp %p /var/lib/pgsql/11/archive/%f'
The contents are same on master and slave expect the name, which is node2 on slave.
Can anyone guide me what can be possible reason for automatic failover not happening
I have resolved it myself, I needed to create file /etc/default/repmgrd and add following lines into it
REPMGRD_ENABLED=yes
REPMGRD_CONF="/etc/repmgr.conf"
The second day I can not overcome the connection error through pgbouncer if I use auth_type = hba:
postgres=# create user monitoring with password 'monitoring';
postgres=# create database monitoring owner monitoring;
postgres=# \du+ monitoring
List of roles
Role name | Attributes | Member of | Description
------------+------------+-----------+-------------
monitoring | | {} |
postgres=# \l+ monitoring
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
------------+------------+----------+-------------+-------------+-------------------+---------+------------+-------------
monitoring | monitoring | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 7861 kB | pg_default |
/var/lib/pgsql/10/data/pg_hba.conf:
# TYPE DATABASE USER ADDRESS METHOD
host monitoring monitoring 0.0.0.0/0 trust
local monitoring monitoring trust
/etc/pgbouncer/pgbouncer.ini:
pidfile = /var/run/pgbouncer/pgbouncer.pid
reserve_pool_size = 5
reserve_pool_timeout = 2
listen_port = 6432
listen_addr = *
auth_type = hba
auth_hba_file = /etc/pgbouncer/hba_bouncer.conf
auth_file = /etc/pgbouncer/userlist.txt
logfile = /var/log/pgbouncer/pgbouncer.log
log_connections = 0
log_disconnections = 0
log_pooler_errors = 1
max_client_conn = 5000
server_idle_timeout = 30
pool_mode = transaction
server_reset_query =
admin_users = root
stats_users = root,monitoring
[databases]
* = client_encoding=UTF8 host=localhost port=5432 pool_size=1000
In pg_hba.conf of pgbouncer I also tried to specify specific addresses of interfaces of the server with mask /32, also mask /8, /16 (real mask of my network segment).
The result is only one: login rejected!
/etc/pgbouncer/hba_bouncer.conf:
host monitoring monitoring 0.0.0.0/0 trust
host monitoring monitoring 127.0.0.1/32 trust
/etc/pgbouncer/userlist.txt:
"monitoring" "monitoring"
Connection attempt:
# psql -U monitoring -p 5432 -h 127.0.0.1
psql (10.1)
Type "help" for help.
monitoring=>
# psql -U monitoring -p 6432 -h 127.0.0.1
psql: ERROR: login rejected
We have a use case similar to yours. We are running version 1.12.0 and we ran into the same issue where we also got the "ERROR: login rejected" message.
Turned out after investigation that the permissions on our pg_hba.conf for pg_bouncer was incorrect. Once we gave pgbouncer read permissions, it was working as expected. Unfortunately nothing in the higher logging that we turned on revealed this, and we happened to stumbled across this solution through testing on our own.
Ps. the password hash in the pgbouncer config we had left as "" as we're using trust on our connection. I don't think there is anything different in our config to what you posted otherwise.
I have two Postgres databases set up in a Primary/Secondary configuration. I tried to setup replication between them, but have hit a road block. Where am I going wrong?
I have checked various configuration files: recovery.conf, postgresql.conf, pg_hba.conf, and all seem to be set up correctly.
This is the error I have found in the pg_log folder:
cp: cannot stat ‘/var/lib/pgsql/walfiles/00000002000001CA0000003E’: No such file or directory
cp: cannot stat ‘/var/lib/pgsql/walfiles/00000003.history’: No such file or directory
2019-04-16 16:17:19 AEST FATAL: database system identifier differs between the primary and standby
2019-04-16 16:17:19 AEST DETAIL: The primary's identifier is 6647133350114885049, the standby's identifier is 6456613398298492847.
I am using PostgreSQL 9.2.23.
This is my recovery.conf:
standby_mode = 'on'
primary_conninfo = 'host=10.201.108.25 port=5432 user=repl-master password=111222333'
restore_command = 'cp -p /var/lib/pgsql/walfiles/%f %p'
trigger_file = '/var/lib/pgsql/i_am_master.pg.trigger'
recovery_target_timeline = 'latest'
archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/walfiles %r'
I'd expect replication from Primary to Secondary. So far, nothing.
Appreciate any input/ideas.
You didn't set up replication correctly. You cannot use pg_dump to create the replica, you have to use a physical backup technique like pg_basebackup.
See the documentation for details.
Do not use PostgreSQL 9.2, it is out of support.
I have the following problem: i am trying to set up a streaming replication scenario with load balancing. I read various tutorials but i cannot find the mistake. The replication does not work. I do not have a "wal sender/receiver process". The archiving works and everytime the master restarts, the archived wal files are copied to the slave. I even do not get any error. And in configuration file(s) everything looks like fine for me, e.g. master:
wal_level = hot_standby
wal_keep_segments = 32
max_wal_senders = 5
max_replication_slots = 5
wal_sender_timeout = 60s
What irritates me the most is that there is no "wal sender process" and there is no error thrown.
Thank you for any idea,
Sven
UPDATE 1: my recovery.conf:
standby_mode = 'on'
primary_conninfo = 'host=arcserver1 port=5432 user=postgres pass=postgres'
restore_command = 'pg_standby /db/pg_archived %f %p >> /var/log/standby.log'
primary_slot_name='standby1'
and my client postgresql.conf contains:
hot_standby = on
I found the solution:i replaced pg_standby with cp, because pg_standby seems to be only for warm standby, not hot standby.