Finding size of postgres shared memory segment on CentOS 6.3 - postgresql

I am trying to debug some shared memory issues with Postgres 9.3.1 and CentOS release 6.3 (Final). Using top, I can see that many of the postgres connections are using shared memory:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3534 postgres 20 0 2330m 1.4g 1.1g S 0.0 20.4 1:06.99 postgres: deploy mtalcott 10.222.154.172(53495) idle
9143 postgres 20 0 2221m 1.1g 983m S 0.0 16.9 0:14.75 postgres: deploy mtalcott 10.222.154.167(35811) idle
6026 postgres 20 0 2341m 1.1g 864m S 0.0 16.4 0:46.56 postgres: deploy mtalcott 10.222.154.167(37110) idle
18538 postgres 20 0 2327m 1.1g 865m S 0.0 16.1 2:06.59 postgres: deploy mtalcott 10.222.154.172(47796) idle
1575 postgres 20 0 2358m 1.1g 858m S 0.0 15.9 1:41.76 postgres: deploy mtalcott 10.222.154.172(52560) idle
...
There are about 29 total idle connections. However, sudo ipcs -m only shows:
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 163840 postgres 600 48 21
Surprisingly, it only shows it using 48 bytes. Why doesn't ipcs show a larger segment? Is there a different command I should be using?

I think it is because your postgre is of version 9.3, which uses POSIX type of shared memory. And ipcs -m shows sysV shared memory segments, which were used in Postgre of prior versions.

Related

Postgres walwriter and background writer using memory

I just did a large import of renderd tiles for an osm server. I want to start my next process (running the import of nominatim) but it takes a lot of memory. The problem I have is that walwriter, background writer, checkpointer are consuming 131GB of memory. I checked pg_top and the processes are sleeping. Is there any way to clear these processes safely or just force postgres to complete the walwriter and background writer?
I am using Postgres v12, and shared_buffers is set to 128GB.
HTOP:
pg_top:
last pid: 628600; load avg 0.08, 0.03, 0.04; up 1+00:31:38 02:22:22
5 5 sleeping
CPU states: 0.0% user, 0.0% nice, 0.0% system, 100% idle, 0.0% iowait
Memory: 487G used, 16G free, 546M buffers, 253G cached
DB activity: 0 tps, 0 rollbs/s, 0 buffer r/s, 100 hit%, 43 row r/s, 0 row w/s -
DB I/O: 0 reads/s, 0 KB/s, 0 writes/s, 0 KB/s
DB disk: 3088.7 GB total, 2538.8 GB free (17% used)
Swap: 45M used, 8147M free, 588K cached
627692 postgres 20 0 131G 4368K sleep 0:00 0.00% 0.00% postgres: 12/main: background writer
627691 postgres 20 0 131G 6056K sleep 0:00 0.00% 0.00% postgres: 12/main: checkpointer
627693 postgres 20 0 131G 4368K sleep 0:00 0.00% 0.00% postgres: 12/main: walwriter
628601 postgres 20 0 131G 11M sleep 0:00 0.00% 0.00% postgres: 12/main: postgres postgres [local] idle
627695 postgres 20 0 131G 6924K sleep 0:00 0.00% 0.00% postgres: 12/main: logical replication launcher
pg_wal directory:
Everything is just fine, and htop is lying to you.
Of course the background processes that access shared buffers will use that memory, and since it is shared memory, it is reported for each of these processes. In reality, it is allocated only once.
The shared memory allocated by PostgreSQL is slightly bigger than shared_buffers, so if that parameter is set to 128GB, you reported data make sense and are perfectly normal.
If you set max_wal_size = 32GB, it is normal to have a lot of WAL segments.

How to stop postgres stats collector properly?

I tried to profile plpgsql functions.
And I use pg_stat_statements to do that.
I can do profile, but after that I can not restart postgresql.
Result of ps -aux is below
postgres 22129 0.0 0.1 1276896 37276 ? Ss 07:13 0:00 postgres: dbname: checkpointer process
postgres 22134 0.0 0.0 178008 4360 ? Ss 07:13 0:00 postgres: dbname: stats collector process
postgres 23030 18.5 0.9 1436048 316336 ? Ss 07:17 4:53 postgres: dbname: postgres sakura 127.0.0.1(56496) idle waiting for D1E/CEA4D9C8
I think stats process may be lock the server to restart.
I think it's locking to stop server, because after that I have to wait long time to only shutdown the PC. May be shutdown process is waiting until stats process stop.
Is there any way to stop stats process properly?
Or, the way to force top stats process?
I tried kill -9, but it does not help.

Minimum hardware requirements for JIRA Software, Confluence and MySQL?

My company is considering a self-hosted option for a combination of JIRA, Confluence and MySQL running behind an nginx proxy. We are a very small team of 5, and expect extremely mild usage for now. I hardly even expect any concurrent usage at this point.
I am a bit puzzled by the various guidelines posted by Atlassian:
https://confluence.atlassian.com/enterprise/jira-sizing-guide-461504623.html
https://confluence.atlassian.com/adminjiraserver075/jira-applications-installation-requirements-935390824.html
https://confluence.atlassian.com/doc/example-size-and-hardware-specifications-from-customer-survey-76840961.html
https://confluence.atlassian.com/doc/server-hardware-requirements-guide-30736403.html
It seems they don't want to bother providing actual minimum hardware requirements. For example, on the same page they could say "minimum heap size to allocate to Confluence is 1 GB and 1 GB for Synchrony (which is required for collaborative editing)" and also that " minimum hardware recommendation" is 6GB. The leap from 1 required plus 1 optional to 6 recommended minimum is bizarre, to say the least.
I think what I want to know is whether I will be able to fit this setup into a 2GB RAM machine or a 4GB RAM machine (both dual CPU).
OK, I have done a test with following configuration:
VM with 2 cores capped at ~2.2Ghz and 4GB RAM
Ubuntu 16.04 server
Docker and docker-compose
Containers:
nginx
jwilder/docker-gen
jrcs/letsencrypt-nginx-proxy-companion
cptactionhank/atlassian-jira-software
cptactionhank/atlassian-confluence
mysql
This 4GB RAM machine is barely capable of running this setup:
$ free -m
total used free shared buff/cache available
Mem: 3951 3553 107 0 291 157
Swap: 974 725 249
CPU usage was going up to 200% only during initialisation when JIRA and Confluence started with empty home dirs. The following top output is after:
creating a space and a page in Confluence
and a project with ~10 issues in JIRA
and linking JIRA and Confluence together
$ top -o %MEM | head -15
top - 16:14:33 up 6:12, 2 users, load average: 0.15, 0.04, 0.01
Tasks: 132 total, 1 running, 131 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.6 us, 0.5 sy, 0.0 ni, 95.8 id, 1.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 4046364 total, 128808 free, 3638444 used, 279112 buff/cache
KiB Swap: 998396 total, 252956 free, 745440 used. 161144 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6328 bin 20 0 3306232 1.468g 0 S 0.0 38.1 12:03.27 java
6418 bin 20 0 2860000 1.320g 0 S 0.0 34.2 10:56.24 java
7205 bin 20 0 2807088 476592 1724 S 0.0 11.8 1:58.37 java
5752 999 20 0 1815480 99804 4728 S 0.0 2.5 1:11.29 mysqld
1070 root 20 0 621908 28672 8904 S 0.0 0.7 0:30.74 dockerd
1179 root 20 0 623004 7536 2520 S 0.0 0.2 0:16.66 docker-containe
968 root 20 0 291352 6536 1912 S 0.0 0.2 0:00.77 snapd
8310 root 20 0 15388 5064 3056 S 0.0 0.1 0:21.39 docker-gen
Confluence also allocated ~500MB RAM to Synchrony:
$ ps aux --sort -rss | head -4
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bin 6328 3.3 38.3 3306232 1551120 ? Ssl 10:14 12:12 /usr/lib/jvm/java-1.8-openjdk/bin/java -Djava.util.logging.config.file=/opt/atlassian/confluence...
bin 6418 2.9 34.1 2860000 1382868 ? Ssl 10:14 10:57 /usr/lib/jvm/java-1.8-openjdk/bin/java -Djava.util.logging.config.file=/opt/atlassian/jira/...
bin 7205 0.5 11.7 2807088 476588 ? Sl 10:44 1:59 /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -classpath /opt/atlassian/confluence/temp/... synchrony.core sql
During JIRA and Confluence install stage, MySQL peaked at around 500MB RAM usage, and during normal operation it sits around 100MB.
In my attempts, a 2GB machine was only enough to run either JIRA or Confluence without MySQL.
Conclusion:
It looks like 4GB RAM Dual core machine is the absolute minimum required for JIRA+Confluence+MySQL. But keep in mind that such a machine is barely enough for a practically empty project.
I personally was not expecting these applications to be that RAM hungry being empty.

pg_top output analysis of puppetdb with postgres

I recently started using a tool called pg_top that shows statistics for Postgres, however since I am not verify versed with the internals of Postgres I needed a bit of clarification on the output.
last pid: 6152; load avg: 19.1, 18.6, 20.4; up 119+20:31:38 13:09:41
41 processes: 5 running, 36 sleeping
CPU states: 52.1% user, 0.0% nice, 0.8% system, 47.1% idle, 0.0% iowait
Memory: 47G used, 16G free, 2524M buffers, 20G cached
DB activity: 151 tps, 0 rollbs/s, 253403 buffer r/s, 86 hit%, 1550639 row r/s,
21 row w/s
DB I/O: 0 reads/s, 0 KB/s, 35 writes/s, 2538 KB/s
DB disk: 233.6 GB total, 195.1 GB free (16% used)
Swap:
My question is under the DB Activity, the 1.5 million row r/s, is that a lot? If so what can be done to improve it? I am running puppetdb 2.3.8, with 6.8 million resources, 2500 nodes, and Postgres 9.1. All of this runs on a single 24 core box with 64GB of memory.

Interpret differences in prstat vs. 'prstat -m' on Solaris

I've been using prstat and prstat -m a lot to investigate performance issues lately, and I think I've basically understood the differences of sampling vs. microstate accounting in Solaris 10. So I don't expect both to always show the exactly same number.
Today I came across an occasion where the 2 showed so vastly different outputs, that I have problems interpreting them and making sense of the output. The machine is a heavily loaded 8-CPU Solaris 10, with several large WebSphere processes and an Oracle database. The system practically came to a halt today for about 15 minutes (load averages of >700). I had difficulties to get any prstat information, but was able to get some outputs from "prtstat 1 1" and "prtstat -m 1 1", issued shortly one after another.
The top lines of the outputs:
prstat 1 1:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
8379 was 3208M 2773M cpu5 60 0 5:29:13 19% java/145
7123 was 3159M 2756M run 59 0 5:26:45 7.7% java/109
5855 app1 1132M 26M cpu2 60 0 0:01:01 7.7% java/18
16503 monitor 494M 286M run 59 19 1:01:08 7.1% java/106
7112 oracle 15G 15G run 59 0 0:00:10 4.5% oracle/1
7124 oracle 15G 15G cpu3 60 0 0:00:10 4.5% oracle/1
7087 app1 15G 15G run 58 0 0:00:09 4.0% oracle/1
7155 oracle 96M 6336K cpu1 60 0 0:00:07 3.6% oracle/1
...
Total: 495 processes, 4581 lwps, load averages: 74.79, 35.35, 23.8
prstat -m 1 1 (some seconds later)
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
7087 app1 0.1 56 0.0 0.2 0.4 0.0 13 30 96 2 33 0 oracle/1
7153 oracle 0.1 53 0.0 3.2 1.1 0.0 1.0 42 82 0 14 0 oracle/1
7124 oracle 0.1 47 0.0 0.2 0.2 0.0 0.0 52 77 2 16 0 oracle/1
7112 oracle 0.1 47 0.0 0.4 0.1 0.0 0.0 52 79 1 16 0 oracle/1
7259 oracle 0.1 45 9.4 0.0 0.3 0.0 0.1 45 71 2 32 0 oracle/1
7155 oracle 0.0 42 11 0.0 0.5 0.0 0.1 46 90 1 9 0 oracle/1
7261 oracle 0.0 37 9.5 0.0 0.3 0.0 0.0 53 61 1 17 0 oracle/1
7284 oracle 0.0 32 5.9 0.0 0.2 0.0 0.1 62 53 1 21 0 oracle/1
...
Total: 497 processes, 4576 lwps, load averages: 88.86, 39.93, 25.51
I have a very hard time interpreting the output. prstat seems to tell me that a fair amount of Java processing is going on, together with some Oracle stuff, just as I would expect in normal situation. prtstat -m shows a machine totally dominated by Oracle, consuming huge amounts of system time, and the overall CPU being heavily overloaded (large numbers in LAT).
I'm inclined to believe the output of prstat -m, because that matches much mor closely to what the system felt like during this time. Totally sluggish, almost no more user request processing going on from WebSphere, etc. But why would prstat show so heavily differing numbers?
Any explanation of this would be welcome!
CU, Joe
There's a known problem with prstat -m on Solaris in the way cpu usage figures are calculated - the value you see has been averaged over all threads (LWPs) in a process, and hence is far far too low for heavily multithreaded processes - such as your Java app servers, which can have hundreds of threads each (see your NLWP). Less than a dozen of them are probably CPU hogs hence the CPU usage by java looks "low". You'd need to call it with prstat -Lm to get the per-LWP (thread) breakdown to see that effect. Reference:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6780169
Without further performance monitoring data it's hard to give non-speculative explanations of what you've seen there. I assume lock contention within java. One particular workload that can cause this is heavily multithreaded memory mapped I/O, they'll all pile up on the process address space lock. But it could be a purely java userside lock of course. A plockstat on one of the java processes, and/or simple dtrace profiling would be helpful.