I have 3 VM nodes running Master-Slave Postgres-11. They are being managed by Pacemaker.
Node Attributes:
* Node node04:
+ master-pgsqlins : 1000
+ pgsqlins-data-status : LATEST
+ pgsqlins-master-baseline : 000000C0D8000098
+ pgsqlins-status : PRI
* Node node05:
+ master-pgsqlins : -INFINITY
+ pgsqlins-data-status : STREAMING|ASYNC
+ pgsqlins-status : HS:async
* Node node06:
+ master-pgsqlins : 100
+ pgsqlins-data-status : STREAMING|SYNC
+ pgsqlins-status : HS:sync
Async node throws an error at times that the required WAL file is missing. It then stops the replication and starts it again.
On the master node, WAL archiving is enabled and they are synced to another folder named wal_archive. There is another process that keeps removing the files from that wal_archive folder. So I understand why the slave node would throw that error, but what I want to understand is that how is it able to start back again without that missing file?
The postgresql.conf
# Connection settings
# -------------------
listen_addresses = '*'
port = 5432
max_connections = 600
tcp_keepalives_idle = 0
tcp_keepalives_interval = 0
tcp_keepalives_count = 0
# Memory-related settings
# -----------------------
shared_buffers = 2GB # Physical memory 1/4
##DEBUG: mmap(1652555776) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
#huge_pages = try # on, off, or try
#temp_buffers = 16MB # depends on DB checklist
work_mem = 8MB # Need tuning
effective_cache_size = 4GB # Physical memory 1/2
maintenance_work_mem = 512MB
wal_buffers = 64MB
# WAL/Replication/HA settings
# --------------------
wal_level = logical
synchronous_commit = remote_write
archive_mode = on
archive_command = 'rsync -a %p /xxxxx/wal_archive/%f'
#archive_command = ':'
max_wal_senders=5
hot_standby = on
restart_after_crash = off
wal_sender_timeout = 60000
wal_receiver_status_interval = 2
max_standby_streaming_delay = -1
max_standby_archive_delay = -1
hot_standby_feedback = on
random_page_cost = 1.5
max_wal_size = 5GB
min_wal_size = 200MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 30min
# Logging settings
# ----------------
log_destination = 'csvlog,syslog'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql_%Y%m%d.log'
log_truncate_on_rotation = off
log_rotation_age = 1h
log_rotation_size = 0
log_timezone = 'Japan'
log_line_prefix = '%t [%p]: [%l-1] %h:%u#%d:[XXXPG]:CODE:%e '
log_statement = ddl
log_min_messages = info # DEBUG5
log_min_error_statement = info # DEBUG5
log_error_verbosity = default
log_checkpoints = on
log_lock_waits = on
log_temp_files = 0
log_connections = on
log_disconnections = on
log_duration = off
log_min_duration_statement = 1000
log_autovacuum_min_duration = 3000ms
track_functions = pl
track_activity_query_size = 8192
# Locale/display settings
# -----------------------
lc_messages = 'C'
lc_monetary = 'en_US.UTF-8' # ja_JP.eucJP
lc_numeric = 'en_US.UTF-8' # ja_JP.eucJP
lc_time = 'en_US.UTF-8' # ja_JP.eucJP
timezone = 'Asia/Tokyo'
bytea_output = 'escape'
# Auto vacuum settings
# -----------------------
autovacuum = on
autovacuum_max_workers = 3
autovacuum_vacuum_cost_limit = 200
#shared_preload_libraries = 'pg_stat_statements,auto_explain' <------------------check this
auto_explain.log_min_duration = 10000
auto_explain.log_analyze = on
include '/var/lib/pgsql/tmp/rep_mode.conf' # added by pgsql RA
On the async slave node, this is the recovery.conf
primary_conninfo = 'host=1xx.xx.xx.xx port=5432 user=replica application_name=node05 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = 'rsync -a /xxxxx/wal_archive/%f %p'
recovery_target_timeline = 'latest'
standby_mode = 'on'
The logs about the error from master
2021-07-05 23:35:02.321 JST,,,28926,,60e16b42.70fe,122,,2021-07-04 17:03:14 JST,,0,LOG,00000,"checkpoint complete: wrote 2897 buffers (1.1%); 0 WAL file(s) added, 0 removed
, 2 recycled; write=106.770 s, sync=0.050 s, total=106.827 s; sync files=251, longest=0.017 s, average=0.001 s; distance=20262 kB, estimate=46658 kB",,,,,,,,,""
2021-07-05 23:35:02.322 JST,,,28926,,60e16b42.70fe,123,,2021-07-04 17:03:14 JST,,0,LOG,00000,"checkpoint starting: immediate force wait",,,,,,,,,""
2021-07-05 23:35:02.347 JST,,,28926,,60e16b42.70fe,124,,2021-07-04 17:03:14 JST,,0,LOG,00000,"checkpoint complete: wrote 173 buffers (0.1%); 0 WAL file(s) added, 0 removed,
1 recycled; write=0.007 s, sync=0.012 s, total=0.026 s; sync files=43, longest=0.005 s, average=0.001 s; distance=14410 kB, estimate=43434 kB",,,,,,,,,""
2021-07-05 23:35:02.348 JST,"replica","",3451,"1xx.xx.xx.xxx:45120",60e16bfc.d7b,3,"streaming C1/97C3E000",2021-07-04 17:06:20 JST,116/0,0,ERROR,XX000,"requested WAL segment 00000001000000C100000097 has already been removed",,,,,,,,,"node05"
2021-07-05 23:35:02.361 JST,"replica","",3451,"1xx.xx.xx.xxx:45120",60e16bfc.d7b,4,"idle",2021-07-04 17:06:20 JST,,0,LOG,00000,"disconnection: session time: 30:28:41.550 user=replica database= host=172.17.48.141 port=45120",,,,,,,,,"node05"
2021-07-05 23:35:02.399 JST,,,24896,"1xx.xx.xx.xxx:49278",60e31896.6140,1,"",2021-07-05 23:35:02 JST,,0,LOG,00000,"connection received: host=1xx.xx.xx.xxx port=49278",,,,,,,,,""
2021-07-05 23:35:02.401 JST,"postgres","postgres",24851,"[local]",60e31896.6113,3,"idle",2021-07-05 23:35:02 JST,,0,LOG,00000,"disconnection: session time: 0:00:00.251 user=postgres database=postgres host=[local]",,,,,,,,,"postgres#node04"
2021-07-05 23:35:02.403 JST,"replica","",24896,"1xx.xx.xx.xxx:49278",60e31896.6140,2,"authentication",2021-07-05 23:35:02 JST,116/72,0,LOG,00000,"replication connection authorized: user=replica",,,,,,,,,""
The logs about the error from async slave node
2021-07-05 23:35:02.359 JST,,,2541,,60e16bfc.9ed,2,,2021-07-04 17:06:20 JST,,0,FATAL,XX000,"could not receive data from WAL stream: ERROR: requested WAL segment 00000001000000C100000097 has already been removed",,,,,,,,,""
2021-07-05 23:35:02.408 JST,,,4703,,60e31896.125f,1,,2021-07-05 23:35:02 JST,,0,LOG,00000,"started streaming WAL from primary at C1/98000000 on timeline 1",,,,,,,,,""
2021-07-05 23:35:03.318 JST,,,4835,"[local]",60e31897.12e3,1,"",2021-07-05 23:35:03 JST,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,""
Sync slave node doesn't throw this error, only async slave node, and that too recovers without any manual intervention. Is there a way to avoid this error other than by not removing the archived wal files from the wal_archive folder every 2 mins?
I have installed postgres in a containerized environment using docker-compose, for that I have used this docker image crunchydata/crunchy-postgres-gis:centos7-11.5-2.4.2, all was running right till I realized that PG_DIR/pg_wal is taking a lot of disk space, I don't want to use pg_archivecleanup every time nor in a cron job, but I want to configure postgres to do that automatically. please, what is the correct configuration for that?
This is my postgresql.conf file.
listen_addresses = '*' # what IP address(es) to listen on;
port = 5432 # (change requires restart)
unix_socket_directories = '/tmp' # comma-separated list of directories
unix_socket_permissions = 0777 # begin with 0 to use octal notation
temp_buffers = 8MB # min 800kB
max_connections = 400
shared_buffers = 1536MB
effective_cache_size = 4608MB
maintenance_work_mem = 384MB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 4MB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
unix_socket_directories = '/tmp' # comma-separated list of directories
unix_socket_permissions = 0777 # begin with 0 to use octal notation
shared_preload_libraries = 'pg_stat_statements.so' # (change requires restart)
#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
wal_level = hot_standby # minimal, archive, or hot_standby
max_wal_senders = 6 # max number of walsender processes
wal_keep_segments = 400 # in logfile segments, 16MB each; 0 disables
hot_standby = on # "on" allows queries during recovery
max_standby_archive_delay = 30s # max delay before canceling queries
max_standby_streaming_delay = 30s # max delay before canceling queries
wal_receiver_status_interval = 10s # send replies at least this often
archive_mode = on # enables archiving; off, on, or always
# (change requires restart)
archive_command = 'pgbackrest archive-push %p' # command to use to archive a logfile segment
# placeholders: %p = path of file to archive
# %f = file name only
# e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
archive_timeout = 60 # force a logfile segment switch after this
# number of seconds; 0 disables
#------------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#------------------------------------------------------------------------------
log_destination = 'stderr' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_directory = 'pg_log' # directory where log files are written,
log_filename = 'postgresql-%a.log' # log file name pattern,
log_truncate_on_rotation = on # If on, an existing log file with the
log_rotation_age = 1d # Automatic rotation of logfiles will
log_rotation_size = 0 # Automatic rotation of logfiles will
log_min_duration_statement = 0 # -1 is disabled, 0 logs all statements
log_checkpoints = on
log_connections = on
log_disconnections = on
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h'
log_lock_waits = on # log lock waits >= deadlock_timeout
log_timezone = 'US/Eastern'
log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and
datestyle = 'iso, mdy'
timezone = 'US/Eastern'
lc_messages = 'C' # locale for system error message
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
Thanks
You haven't shown us any evidence that pgbackrest has anything to do with this. If it is failing, you should see messages about that in the server's log file. If it is succeeding, then it should be taking up space in the archive, wherever that is, not in pg_wal.
But wal_keep_segments = 400 will lead to over 6.25GB of pg_wal being retained. I don't know if that constitutes "a lot" or not.
pg_archivecleanup isn't for cleaning up pg_wal, it is for cleaning up the archive.
We're running OrientDB 2.1.11 (Community Edition) along with JDK 1.8.0.74.
We're noticing memory consumption by 'orientdb' java slowly creeping up and in a few days, the database becomes un-responsive (we have to stop/start Orientdb in order to release memory).
We also noticed this kind of behavior in a few hours when we index the database.
The total size of the database is only 60 GB and not more than 200 million records!
As you can see below, it already consumes VIRT(11.44 GB) RES(8.62 GB).
We're running CentOS 7.1.x.
Even change heap from 512 to 256M and modified diskcache.bufferSize to 8GB
MAXHEAP=-Xmx256m
ORIENTDB MAXIMUM DISKCACHE IN MB, EXAMPLE, ENTER -Dstorage.diskCache.bufferSize=8192 FOR 8GB
MAXDISKCACHE="-Dstorage.diskCache.bufferSize=8192"
top output:
Tasks: 155 total, 1 running, 154 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.1 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16269052 total, 229492 free, 9510740 used, 6528820 buff/cache
KiB Swap: 8257532 total, 8155244 free, 102288 used. 6463744 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2367 nmx 20 0 11.774g 8.620g 14648 S 0.3 55.6 81:26.82 java
ps aux output:
nmx 2367 4.3 55.5 12345680 9038260 ? Sl May02 81:28 /bin/java
-server -Xmx256m -Djna.nosys=true -XX:+HeapDumpOnOutOfMemoryError
-Djava.awt.headless=true -Dfile.encoding=UTF8 -Drhino.opt.level=9
-Dprofiler.enabled=true -Dstorage.diskCache.bufferSize=8192
How do I control memory usage?
Is there a CB memory leak?
Could you set following setting for JVM -XX:+UseLargePages -XX:LargePageSizeInBytes=2m .
This should sovle your issue.
this page, solved my issue.
in nutshell:
add this configuration to your database:
static {
OGlobalConfiguration.NON_TX_RECORD_UPDATE_SYNCH.setValue(true); //Executes a synch against the file-system at every record operation. This slows down records updates but guarantee reliability on unreliable drives
OGlobalConfiguration.STORAGE_LOCK_TIMEOUT.setValue(300000);
OGlobalConfiguration.RID_BAG_EMBEDDED_TO_SBTREEBONSAI_THRESHOLD.setValue(-1);
OGlobalConfiguration.FILE_LOCK.setValue(false);
OGlobalConfiguration.SBTREEBONSAI_LINKBAG_CACHE_SIZE.setValue(5000);
OGlobalConfiguration.INDEX_IGNORE_NULL_VALUES_DEFAULT.setValue(true);
OGlobalConfiguration.MEMORY_CHUNK_SIZE.setValue(32);
long maxMemory = Runtime.getRuntime().maxMemory();
long maxMemoryGB = (maxMemory / 1024L / 1024L / 1024L);
maxMemoryGB = maxMemoryGB < 1 ? 1 : maxMemoryGB;
long cacheSizeMb = 2 * 65536 * maxMemoryGB / 1024;
long maxDirectMemoryMb = VM.maxDirectMemory() / 1024L / 1024L;
String cacheProp = System.getProperty("storage.diskCache.bufferSize");
if (cacheProp==null) {
long maxDirectMemoryOrientMb = maxDirectMemoryMb / 3L;
long cachSizeMb = cacheSizeMb > maxDirectMemoryOrientMb ? maxDirectMemoryOrientMb : cacheSizeMb;
cacheSizeMb = (long)Math.pow(2, Math.ceil( Math.log(cachSizeMb)/ Math.log((2))));
System.setProperty("storage.diskCache.bufferSize",Long.toString(cacheSizeMb));
// the command below generates a NullPointerException in Orient 2.2.15-snapshot
// OGlobalConfiguration.DISK_CACHE_SIZE.setValue(cacheSizeMb);
}
}