ubuntu postgresql 12 : Out of memory: Killed process (postgres) - postgresql

I am using unbuntu 20.04 and postgresql 12 .
My memory is 128 GB , SDD is 1TB. Cpu is i7 (16 core 20 threads)
I made simple c++ program which connect postgresql and generting tile map (just map image).
It's similar with osmtilemaker.
Once program starts, it takes serveral hours to servral months to finish job.
For first 4~5 hours, it runs well.
I monitored memory usage, and It doesnot occupy more than 10% of whole Memory at most.
Here is the screenshot of top command
Tasks: 395 total, 16 running, 379 sleeping, 0 stopped, 0 zombie
%Cpu(s): 70.3 us, 2.9 sy, 0.0 ni, 26.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 128565.5 total, 111124.7 free, 11335.7 used, 6105.1 buff/cache
MiB Swap: 2048.0 total, 2003.6 free, 44.4 used. 115108.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2217811 postgres 20 0 2369540 1.7g 678700 R 99.7 1.3 25:27.36 postgres
2217407 postgres 20 0 2393448 1.7g 678540 R 99.0 1.4 25:32.04 postgres
2217836 postgres 20 0 2352936 1.7g 679348 R 98.0 1.3 25:26.22 postgres
2217715 postgres 20 0 2368268 1.7g 680144 R 97.7 1.3 25:29.78 postgres
2217684 postgres 20 0 2384308 1.7g 679248 R 97.3 1.4 25:29.49 postgres
2217539 postgres 20 0 2386156 1.7g 680124 R 97.0 1.4 25:30.46 postgres
2216651 postgres 20 0 2429348 1.8g 678128 R 95.7 1.4 26:05.99 postgres
2217025 postgres 20 0 2396408 1.7g 679292 R 94.4 1.4 25:51.85 postgres
2238487 postgres 20 0 1294752 83724 54024 R 14.3 0.1 0:00.43 postgres
2238488 postgres 20 0 1294968 219304 189116 R 14.0 0.2 0:00.42 postgres
2238489 postgres 20 0 1294552 85624 56068 R 12.6 0.1 0:00.38 postgres
2062928 j 20 0 861492 536088 47396 S 6.6 0.4 19:18.64 mapTiler
2238490 postgres 20 0 1290132 73244 48084 R 6.3 0.1 0:00.19 postgres
2238491 postgres 20 0 1289876 73064 48160 R 6.3 0.1 0:00.19 postgres
928763 postgres 20 0 1181720 61368 59300 S 0.7 0.0 11:59.45 postgres
1306124 j 20 0 19668 2792 2000 S 0.3 0.0 0:06.84 screen
2238492 postgres 20 0 1273864 49192 40108 R 0.3 0.0 0:00.01 postgres
2238493 postgres 20 0 1273996 50172 40852 R 0.3 0.0 0:00.01 postgres
1 root 20 0 171468 9564 4864 S 0.0 0.0 0:09.40 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
I used 8 threads in program, so 8 process is using cpu a lot ,
but memory usage is always below 10%.
But, after 4~5 hours, oom-killer killed postgres processes and program stopped running.
Here is the result of dmesg
[62585.503398] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),
cpuset=/,mems_allowed=0,global_oom,
task_memcg=/system.slice/system-postgresql.slice/postgresql#12-main.service,
task=postgres,pid=463942, uid=129
[62585.503406] Out of memory: Killed process 463942 (postgres)
total-vm:19010060kB, anon-rss:17369476kB, file-rss:0kB,
shmem-rss:848380kB, UID:129 pgtables:36776kB oom_score_adj:0
It looks like out of memory error.
But, how that can happend when I have free memory more than 100GB?

Related

Europe import gets stuck on ways

A full import of https://download.geofabrik.de/europe-latest.osm.pbf gets stuck as soon as it gets to ways.
Processing: Node(2605985k 456.3k/s) Way(638k 0.18k/s) Relation(0 0.0/s)
I have tried it with both v1.20 of osm2pgsql that comes with Nominatim v3.5.1 and also osm2pgsql 1.3.0 which is the most up to date version.
I'm running the import on a 96core machine with 50GB of RAM and 750GB SSD. It has worked previously very quickly for other regions. My --osm2pgsql-cache is set to 25000 which is enough to hold the entire pbf file in memory.
My postgres.conf is:
# Defaults from postgres db init
max_connections = 100 # (change requires restart)
dynamic_shared_memory_type = posix # the default is the first option
min_wal_size = 80MB
log_timezone = 'Etc/UTC'
datestyle = 'iso, mdy'
timezone = 'Etc/UTC'
lc_messages = 'C.UTF-8' # locale for system error message
lc_monetary = 'C.UTF-8' # locale for monetary formatting
lc_numeric = 'C.UTF-8' # locale for number formatting
lc_time = 'C.UTF-8' # locale for time formatting
# Based on https://wiki.openstreetmap.org/wiki/PostgreSQL
shared_buffers = 4GB
maintenance_work_mem = 10GB
autovacuum_work_mem = 1GB
work_mem = 256MB
checkpoint_timeout = 10min
max_wal_size = 25GB
Output of top as well:
top - 03:21:56 up 7:36, 0 users, load average: 1.10, 1.09, 1.04
Tasks: 19 total, 1 running, 18 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.0 sy, 0.0 ni, 99.7 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 87092.1 total, 30475.4 free, 27253.4 used, 29363.3 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 54874.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
38 postgres 20 0 4404008 4.1g 4.1g S 1.7 4.8 1:30.25 postgres
169 postgres 20 0 4392784 4.1g 4.1g D 1.3 4.8 4:41.63 postgres
165 nominat+ 20 0 27.2g 24.7g 11836 S 1.0 29.1 31:39.34 osm2pgsql
39 postgres 20 0 4391716 4.0g 4.0g S 0.3 4.7 0:23.79 postgres
1 root 20 0 8700 3572 3276 S 0.0 0.0 0:00.01 init.sh
36 postgres 20 0 4391580 127672 125776 S 0.0 0.1 0:00.97 postgres
40 postgres 20 0 4391580 21996 20080 S 0.0 0.0 1:31.52 postgres
41 postgres 20 0 4392120 8544 6328 S 0.0 0.0 0:00.33 postgres
42 postgres 20 0 71768 4808 2912 S 0.0 0.0 0:13.07 postgres
43 postgres 20 0 4392004 6920 4828 S 0.0 0.0 0:00.00 postgres
62 nominat+ 20 0 78824 25384 19972 S 0.0 0.0 0:00.03 setup.php
67 postgres 20 0 4410852 36408 32900 S 0.0 0.0 0:00.01 postgres
164 nominat+ 20 0 2612 608 540 S 0.0 0.0 0:00.00 sh
168 postgres 20 0 4410044 4.1g 4.1g S 0.0 4.8 62:42.30 postgres
171 postgres 20 0 4427500 1.9g 1.9g S 0.0 2.3 4:04.50 postgres

Ceph OSDs are full, but I have not stored that much of data

I have a Ceph cluster running with 18 X 600GB OSDs. There are three pools (size:3, pg_num:64) with an image size of 200GB on each, and there are 6 servers connected to these images via iSCSI and storing about 20 VMs on them. Here is the output of "ceph df":
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
cephfs_data 1 0 B 0 0 B 0 0 B
cephfs_metadata 2 17 KiB 22 1.5 MiB 100.00 0 B
defaults.rgw.buckets.data 3 0 B 0 0 B 0 0 B
defaults.rgw.buckets.index 4 0 B 0 0 B 0 0 B
.rgw.root 5 2.0 KiB 5 960 KiB 100.00 0 B
default.rgw.control 6 0 B 8 0 B 0 0 B
default.rgw.meta 7 393 B 2 384 KiB 100.00 0 B
default.rgw.log 8 0 B 207 0 B 0 0 B
rbd 9 150 GiB 38.46k 450 GiB 100.00 0 B
rbd3 13 270 GiB 69.24k 811 GiB 100.00 0 B
rbd2 14 150 GiB 38.52k 451 GiB 100.00 0 B
Based on this, I expect about 1.7 TB RAW capacity usage, BUT it is currently about 9TBs!
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 9.8 TiB 870 GiB 9.0 TiB 9.0 TiB 91.35
TOTAL 9.8 TiB 870 GiB 9.0 TiB 9.0 TiB 91.35
And the cluster is down because there is very few capacity remained. I wonder what makes this and how can I get it fixed.
Your help is much appreciated
The problem was mounting the iSCSI target without discard option.
Since I am using RedHat Virtualization, I just modified all storage domains created on top of Ceph, and enabled "discard" on them1. Just after a few hours, about 1 TB of storage released. Now it is about 12 hours passed and 5 TB of storage is released.
Thanks

Postgres suddenly says that "Relation does not exists" (it exists for sure)

I have a weird problem with Postgres on production server (DigitalOcean).
Suddenly, my Django project started raising this error. I've not changed anything on server for 2 months so it is not caused by any code changes etc.
relation "mainapp_price" does not exist
LINE 1: ...rom_3", "mainapp_price"."stelinka_12kg_pack" FROM "mainapp_p...
I've checked /var/log/postgres/... which says something similar:
2019-04-27 13:40:26 UTC [13288-11] postgres#brennholzdb ERROR: relation "mainapp_availability" does not exist at character 179
2019-04-27 13:40:26 UTC [13288-12] postgres#brennholzdb STATEMENT: SELECT "mainapp_availability"."id", "mainapp_availability"."dry_wood", "mainapp_availability"."wet_wood", "mainapp_availability"."briquettes", "mainapp_availability"."area" FROM "mainapp_availability" WHERE "mainapp_availability"."area" = 'ar' LIMIT 21
I don't have a clue where is the problem. As I said everything is migrated and there are no new changes in the code.
Do you know what to do?
EDIT
postgres process eats almost 100% CPU..
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13277 postgres 20 0 385212 4916 2740 S 98.0 1.0 352:44.85 postgres
14096 django 20 0 40388 3524 2996 R 0.3 0.7 0:00.02 top
1 root 20 0 119992 5124 2996 S 0.0 1.0 0:03.54 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:02.45 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
6 root 20 0 0 0 0 S 0.0 0.0 0:01.25 kworker/u2:0
7 root 20 0 0 0 0 S 0.0 0.0 0:03.56 rcu_sched

Can't run sbt on ec2 due to lack of memory (not being able to allocate new memory)

For some reason, I can't run sbt at amazon ec2 free tier
ubuntu#ip-xx-xx-xx-xx:~$ sbt seed
Loading /usr/share/sbt/bin/sbt-launch-lib.bash
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c5550000, 715849728, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 715849728 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/ubuntu/hs_err_pid10392.log
Here is top:
ubuntu#ip-xx-xx-xx-xx:~$ top > 1.txt
ubuntu#ip-xx-xx-xx-xx:~$ cat 1.txt
top - 03:33:21 up 2 days, 2:41, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 66 total, 1 running, 65 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 603108 total, 438676 used, 164432 free, 34976 buffers
KiB Swap: 0 total, 0 used, 0 free, 337716 cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 26828 2172 924 S 0.0 0.4 0:01.05 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
6 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/u2:0
//....................
Note sbt seed does work at my local machine ("seed" is my custom task)
ubuntu#ip-xx-xx-xx-xx:~$ free
total used free shared buffers cached
Mem: 603108 438668 164440 0 35012 337720
-/+ buffers/cache: 65936 537172
Swap: 0 0 0
Micro instances come with 613 MB of physical memory total. You can enable swap if you need more, but keep in mind, that will be much slower.

prstat on solaris - can you make it flash when size exceeds a limit?

I have been told to make prstat flash the background from white to black a few times when any value in the size category passes a threshold. Is there a way to edit the command and put this in here or will this never happen?
I'm not trying to be mean, but somebody who asked for this is not being reasonable or does not understand. I would guess the "asker" has no clue about prstat. Look at these two examples:
example% prstat -u root -n 5 -P 1,2 1 1
PID USERNAME SWAP RSS STATE PRI NICE TIME CPU PROCESS/LWP
306 root 3024K 1448K sleep 58 0 0:00.00 0.3% sendmail/1
102 root 1600K 592K sleep 59 0 0:00.00 0.1% in.rdisc/1
250 root 1000K 552K sleep 58 0 0:00.00 0.0% utmpd/1
288 root 1720K 1032K sleep 58 0 0:00.00 0.0% sac/1
1 root 744K 168K sleep 58 0 0:00.00 0.0% init/1
TOTAL: 25, load averages: 0.05, 0.08, 0.12
example% prstat -S rss -n 5 -vc -u root,john
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWP
1 root 0.0 0.0 - - - - 100 - 0 0 0 0 init/1
102 root 0.0 0.0 - - - - 100 - 0 0 3 0 in.rdisc/1
250 root 0.0 0.0 - - - - 100 - 0 0 0 0 utmpd/1
1185 john 0.0 0.0 - - - - 100 - 0 0 0 0 csh/1
240 root 0.0 0.0 - - - - 100 - 0 0 0 0 powerd/4
TOTAL: 71, load averages: 0.02, 0.04, 0.08
So, what value do you look for? There are lots of things prstat displays, so you have to learn all of them then code for whatever each of the many possible outputs means.
To do this:
What you will have to do is to run prstat with arguments entered on the command line, in a child process, read and interpret everything it produces, then map it to output and flash the screen as appropriate. You can do this with coprocesses in ksh or zsh or by using fifos in bash. Consider running prtstat in -e mode regardless of the what the user enters so you have full screens to read and manipulate.
Flashing the screen can be done with escape sequences, like changing background color or whatever you want. Here is a starting point for Windows based terminals:
ANSI escape sequences
And for Vt100 (UNIX)
terminal escape codes