Postgres walwriter and background writer using memory - postgresql

I just did a large import of renderd tiles for an osm server. I want to start my next process (running the import of nominatim) but it takes a lot of memory. The problem I have is that walwriter, background writer, checkpointer are consuming 131GB of memory. I checked pg_top and the processes are sleeping. Is there any way to clear these processes safely or just force postgres to complete the walwriter and background writer?
I am using Postgres v12, and shared_buffers is set to 128GB.
HTOP:
pg_top:
last pid: 628600; load avg 0.08, 0.03, 0.04; up 1+00:31:38 02:22:22
5 5 sleeping
CPU states: 0.0% user, 0.0% nice, 0.0% system, 100% idle, 0.0% iowait
Memory: 487G used, 16G free, 546M buffers, 253G cached
DB activity: 0 tps, 0 rollbs/s, 0 buffer r/s, 100 hit%, 43 row r/s, 0 row w/s -
DB I/O: 0 reads/s, 0 KB/s, 0 writes/s, 0 KB/s
DB disk: 3088.7 GB total, 2538.8 GB free (17% used)
Swap: 45M used, 8147M free, 588K cached
627692 postgres 20 0 131G 4368K sleep 0:00 0.00% 0.00% postgres: 12/main: background writer
627691 postgres 20 0 131G 6056K sleep 0:00 0.00% 0.00% postgres: 12/main: checkpointer
627693 postgres 20 0 131G 4368K sleep 0:00 0.00% 0.00% postgres: 12/main: walwriter
628601 postgres 20 0 131G 11M sleep 0:00 0.00% 0.00% postgres: 12/main: postgres postgres [local] idle
627695 postgres 20 0 131G 6924K sleep 0:00 0.00% 0.00% postgres: 12/main: logical replication launcher
pg_wal directory:

Everything is just fine, and htop is lying to you.
Of course the background processes that access shared buffers will use that memory, and since it is shared memory, it is reported for each of these processes. In reality, it is allocated only once.
The shared memory allocated by PostgreSQL is slightly bigger than shared_buffers, so if that parameter is set to 128GB, you reported data make sense and are perfectly normal.
If you set max_wal_size = 32GB, it is normal to have a lot of WAL segments.

Related

checkpoint_completion_target being ignored

I'm testing checkpoint_completion_target in RDS PostgreSQL and see that checkpoint is taking total time of 28.5 seconds. However, I configured the
checkpoint_completion_target = 0.9
checkpoint_timeout = 300
According to this, should the checkpoint spread for 300*0.9 which is 270 seconds?
PostgreSQL version 11.10
Log:
2021-03-19 16:06:47 UTC::#:[25023]:LOG: checkpoint starting: time
2021-03-19 16:07:16 UTC::#:[25023]:LOG: checkpoint complete: wrote 283 buffers (0.2%); 0 WAL file(s) added, 0 removed, 1 recycled; write=28.500 s, sync=0.006 s, total=28.533 s; sync files=56, longest=0.006 s, average=0.000 s; distance=64990 kB, estimate=68721 kB
https://www.postgresql.org/docs/10/runtime-config-wal.html
https://www.postgresql.org/docs/11/wal-configuration.html
The checkpointer implements its throttling by napping in 0.1 second chunks. And there is no provision for taking more than one nap per buffer needing to be written. So if there is very little work to be done, it will finish early despite the setting of checkpoint_completion_target.

joblib Parallel running out of memory

I have something like this
outputs = Parallel(n_jobs=12, verbose=10)(delayed(_process_article)(article, config) for article in data)
Case 1: Run on ubuntu with 80 cores:
CPU(s): 80
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
There are a total of 90,000 tasks. At around 67k it fails and is terminated.
joblib.externals.loky.process_executor.BrokenProcessPool: A process in the executor was terminated abruptly, the pool is not usable anymore.
When I monitor the top at 67k I see a sharp fall in the memory
top - 11:40:25 up 2 days, 18:35, 4 users, load average: 7.09, 7.56, 7.13
Tasks: 32 total, 3 running, 29 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.6 us, 2.6 sy, 0.0 ni, 89.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 33554432 total, 40 free, 33520996 used, 33396 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 40 avail Mem
Case 2: Mac with 8 cores
hw.physicalcpu: 4
hw.logicalcpu: 8
But on the mac it is much much slower .. And surprisingly it does not get killed at 67k..
Additionally, I reduced the parallelism (in case 1) to 2,4 and it still fails :(
Why is this happening? Has anyone faced this issue before and has a fix?
Note: when I run for 50,000 tasks it runs well and does not give any problems.
Thank you!
Got a machine with an increased memory of 128GB and that solved the problem!

Minimum hardware requirements for JIRA Software, Confluence and MySQL?

My company is considering a self-hosted option for a combination of JIRA, Confluence and MySQL running behind an nginx proxy. We are a very small team of 5, and expect extremely mild usage for now. I hardly even expect any concurrent usage at this point.
I am a bit puzzled by the various guidelines posted by Atlassian:
https://confluence.atlassian.com/enterprise/jira-sizing-guide-461504623.html
https://confluence.atlassian.com/adminjiraserver075/jira-applications-installation-requirements-935390824.html
https://confluence.atlassian.com/doc/example-size-and-hardware-specifications-from-customer-survey-76840961.html
https://confluence.atlassian.com/doc/server-hardware-requirements-guide-30736403.html
It seems they don't want to bother providing actual minimum hardware requirements. For example, on the same page they could say "minimum heap size to allocate to Confluence is 1 GB and 1 GB for Synchrony (which is required for collaborative editing)" and also that " minimum hardware recommendation" is 6GB. The leap from 1 required plus 1 optional to 6 recommended minimum is bizarre, to say the least.
I think what I want to know is whether I will be able to fit this setup into a 2GB RAM machine or a 4GB RAM machine (both dual CPU).
OK, I have done a test with following configuration:
VM with 2 cores capped at ~2.2Ghz and 4GB RAM
Ubuntu 16.04 server
Docker and docker-compose
Containers:
nginx
jwilder/docker-gen
jrcs/letsencrypt-nginx-proxy-companion
cptactionhank/atlassian-jira-software
cptactionhank/atlassian-confluence
mysql
This 4GB RAM machine is barely capable of running this setup:
$ free -m
total used free shared buff/cache available
Mem: 3951 3553 107 0 291 157
Swap: 974 725 249
CPU usage was going up to 200% only during initialisation when JIRA and Confluence started with empty home dirs. The following top output is after:
creating a space and a page in Confluence
and a project with ~10 issues in JIRA
and linking JIRA and Confluence together
$ top -o %MEM | head -15
top - 16:14:33 up 6:12, 2 users, load average: 0.15, 0.04, 0.01
Tasks: 132 total, 1 running, 131 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.6 us, 0.5 sy, 0.0 ni, 95.8 id, 1.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 4046364 total, 128808 free, 3638444 used, 279112 buff/cache
KiB Swap: 998396 total, 252956 free, 745440 used. 161144 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6328 bin 20 0 3306232 1.468g 0 S 0.0 38.1 12:03.27 java
6418 bin 20 0 2860000 1.320g 0 S 0.0 34.2 10:56.24 java
7205 bin 20 0 2807088 476592 1724 S 0.0 11.8 1:58.37 java
5752 999 20 0 1815480 99804 4728 S 0.0 2.5 1:11.29 mysqld
1070 root 20 0 621908 28672 8904 S 0.0 0.7 0:30.74 dockerd
1179 root 20 0 623004 7536 2520 S 0.0 0.2 0:16.66 docker-containe
968 root 20 0 291352 6536 1912 S 0.0 0.2 0:00.77 snapd
8310 root 20 0 15388 5064 3056 S 0.0 0.1 0:21.39 docker-gen
Confluence also allocated ~500MB RAM to Synchrony:
$ ps aux --sort -rss | head -4
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bin 6328 3.3 38.3 3306232 1551120 ? Ssl 10:14 12:12 /usr/lib/jvm/java-1.8-openjdk/bin/java -Djava.util.logging.config.file=/opt/atlassian/confluence...
bin 6418 2.9 34.1 2860000 1382868 ? Ssl 10:14 10:57 /usr/lib/jvm/java-1.8-openjdk/bin/java -Djava.util.logging.config.file=/opt/atlassian/jira/...
bin 7205 0.5 11.7 2807088 476588 ? Sl 10:44 1:59 /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -classpath /opt/atlassian/confluence/temp/... synchrony.core sql
During JIRA and Confluence install stage, MySQL peaked at around 500MB RAM usage, and during normal operation it sits around 100MB.
In my attempts, a 2GB machine was only enough to run either JIRA or Confluence without MySQL.
Conclusion:
It looks like 4GB RAM Dual core machine is the absolute minimum required for JIRA+Confluence+MySQL. But keep in mind that such a machine is barely enough for a practically empty project.
I personally was not expecting these applications to be that RAM hungry being empty.

pg_top output analysis of puppetdb with postgres

I recently started using a tool called pg_top that shows statistics for Postgres, however since I am not verify versed with the internals of Postgres I needed a bit of clarification on the output.
last pid: 6152; load avg: 19.1, 18.6, 20.4; up 119+20:31:38 13:09:41
41 processes: 5 running, 36 sleeping
CPU states: 52.1% user, 0.0% nice, 0.8% system, 47.1% idle, 0.0% iowait
Memory: 47G used, 16G free, 2524M buffers, 20G cached
DB activity: 151 tps, 0 rollbs/s, 253403 buffer r/s, 86 hit%, 1550639 row r/s,
21 row w/s
DB I/O: 0 reads/s, 0 KB/s, 35 writes/s, 2538 KB/s
DB disk: 233.6 GB total, 195.1 GB free (16% used)
Swap:
My question is under the DB Activity, the 1.5 million row r/s, is that a lot? If so what can be done to improve it? I am running puppetdb 2.3.8, with 6.8 million resources, 2500 nodes, and Postgres 9.1. All of this runs on a single 24 core box with 64GB of memory.

Finding size of postgres shared memory segment on CentOS 6.3

I am trying to debug some shared memory issues with Postgres 9.3.1 and CentOS release 6.3 (Final). Using top, I can see that many of the postgres connections are using shared memory:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3534 postgres 20 0 2330m 1.4g 1.1g S 0.0 20.4 1:06.99 postgres: deploy mtalcott 10.222.154.172(53495) idle
9143 postgres 20 0 2221m 1.1g 983m S 0.0 16.9 0:14.75 postgres: deploy mtalcott 10.222.154.167(35811) idle
6026 postgres 20 0 2341m 1.1g 864m S 0.0 16.4 0:46.56 postgres: deploy mtalcott 10.222.154.167(37110) idle
18538 postgres 20 0 2327m 1.1g 865m S 0.0 16.1 2:06.59 postgres: deploy mtalcott 10.222.154.172(47796) idle
1575 postgres 20 0 2358m 1.1g 858m S 0.0 15.9 1:41.76 postgres: deploy mtalcott 10.222.154.172(52560) idle
...
There are about 29 total idle connections. However, sudo ipcs -m only shows:
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 163840 postgres 600 48 21
Surprisingly, it only shows it using 48 bytes. Why doesn't ipcs show a larger segment? Is there a different command I should be using?
I think it is because your postgre is of version 9.3, which uses POSIX type of shared memory. And ipcs -m shows sysV shared memory segments, which were used in Postgre of prior versions.