I am setting up a new install of postgres 10.4 and during load testing I keep receiving an out of memory error.
Configuration:
$ cat /etc/security/limits.conf
postgres hard memlock 508559360
postgres soft memlock 508559360
$ cat /etc/sysctl.conf
vm.nr_hugepages = 248320
vm.hugetlb_shm_group = 118
vm.overcommit_memory=2
vm.swappiness=1
vm.vfs_cache_pressure=50
kernel.sem = 250 32000 32 128
System Specs:
$ cat /proc/cpuinfo | grep "core id" | wc -l
32
Each CPU is an Intel(R) Xeon(R) CPU E5-2695 v4 # 2.10GHz. 16 physical cores, 32 logical as shown above.
$ free -m
free -m
total used free shared buff/cache available
Mem: 510969 498048 2434 3098 10485 8343
Swap: 1071 1071 0
Note: we have the memlock set at approximately 485 GB dedicated directly to postgres which shows in the high "used" column.
$ df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 250G 64K 250G 1% /dev/shm
$ cat /proc/meminfo
MemTotal: 523232496 kB
MemFree: 1216648 kB
MemAvailable: 8348808 kB
Buffers: 65220 kB
Cached: 10211780 kB
SwapCached: 7620 kB
Active: 5556092 kB
Inactive: 5432428 kB
Active(anon): 2208064 kB
Inactive(anon): 1593084 kB
Active(file): 3348028 kB
Inactive(file): 3839344 kB
Unevictable: 24128 kB
Mlocked: 24128 kB
SwapTotal: 1097724 kB
SwapFree: 0 kB
Dirty: 52 kB
Writeback: 0 kB
AnonPages: 728044 kB
Mapped: 62904 kB
Shmem: 3085592 kB
Slab: 1454276 kB
SReclaimable: 1345008 kB
SUnreclaim: 109268 kB
KernelStack: 11984 kB
PageTables: 22856 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 15770860 kB
Committed_AS: 7920136 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 71680 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 248320
HugePages_Free: 245883
HugePages_Rsvd: 1800
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 329600 kB
DirectMap2M: 534444032 kB
Postgres Info:
SELECT version();
PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit
max_connections=600
shared_buffers= '8192 MB'
work_mem = '20 MB'
maintenance_work_mem = 2GB
max_parallel_workers = 8
wal_buffers = 16MB
max_wal_size = 20GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
effective_cache_size = '364 GB'
default_statistics_target = 1000
log_timezone = 'US/Eastern'
track_activities = on
track_counts = on
track_io_timing = on
stats_temp_directory = 'pg_stat_tmp'
datestyle = 'iso, mdy'
timezone = 'US/Eastern'
default_text_search_config = 'pg_catalog.english'
transform_null_equals = on
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 16384
track_functions = all
track_io_timing = true
pg_stat_statements.track = all
session_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = '3s'
auto_explain.log_nested_statements='on'
auto_explain.log_analyze=true
Note: I've started w/ work memory set as high as 250 MB and have slowly brought it down to 20 MB and still receive the errors. I've also verified that connections don't go higher than 120 connections. We use PGBouncer in front of the instance in session mode.
The errors are too large for stack overflow but here they are linked:
https://codepad.co/snippet/derPU4E8
The standout errors are:
2018-06-18 15:02:22 EDT,28197,mydb,ERROR: could not resize shared memory segment "/PostgreSQL.1552129380" to 192088 bytes: No space left on device
I don't understand what device its talking about. I don't have any that appear to be even close to OOM let alone the tiny 192088 bytes its talking about.
2018-06-18 15:02:22 EDT,19708,mydb,ERROR: out of memory
2018-06-18 15:02:22 EDT,19708,mydb,DETAIL: Failed on request of size 7232.
2018-06-18 15:02:22 EDT,16688,,LOG: could not fork worker process: Cannot allocate memory
2018-06-18 15:02:22 EDT,4555,,ERROR: out of memory
2018-06-18 15:02:22 EDT,4555,,DETAIL: Failed on request of size 78336.
2018-06-18 15:02:22 EDT,4552,,LOG: could not open directory "/usr/lib/postgresql/10/share/timezone": Cannot allocate memory
2018-06-18 15:02:22 EDT,19935,mydb,ERROR: could not load library "/usr/lib/postgresql/10/lib/auto_explain.so": /usr/lib/postgresql/10/lib/auto_explain.so: failed to map segment from shared object
2018-06-18 15:02:22 EDT,28193,mydb,ERROR: out of memory
2018-06-18 15:02:22 EDT,28193,mydb,DETAIL: Failed on request of size 8192.
2018-06-18 15:02:22 EDT,26927,mydb,ERROR: could not resize shared memory segment "/PostgreSQL.1101931262" to 192088 bytes: No space left on device
Question: How do I debug this issue and more importantly how can I resolve it?
I followed Laurenz advice and no longer run OOM.
vm.overcommit_ratio = 100.
Additionally, I removed the memlock and huge pages definition in sysctl/limits.
Check to see if your queries are so complicated that they need to consume a large amount of temporary tablespace.
Related
I have a 5 note replicaset mongoDB - 1 primary, 3 secondaries and 1 arbiter.
I am using mong version 4.2.3
Sizes:
“dataSize” : 688.4161271536723,
“indexes” : 177,
“indexSize” : 108.41889953613281
My Primary is very slow - each command from the shell takes a long time to return.
Memory usage seems very high, and looks like mongodb is consuming more than 50% of the RAM:
# free -lh
total used free shared buff/cache available
Mem: 188Gi 187Gi 473Mi 56Mi 740Mi 868Mi
Low: 188Gi 188Gi 473Mi
High: 0B 0B 0B
Swap: 191Gi 117Gi 74Gi
------------------------------------------------------------------
Top Memory Consuming Process Using ps command
------------------------------------------------------------------
PID PPID %MEM %CPU CMD
311 49145 97.8 498 mongod --config /etc/mongod.conf
23818 23801 0.0 3.8 /bin/prometheus --config.file=/etc/prometheus/prometheus.yml
23162 23145 0.0 8.4 /usr/bin/cadvisor -logtostderr
25796 25793 0.0 0.4 postgres: checkpointer
23501 23484 0.0 1.0 /postgres_exporter
24490 24473 0.0 0.1 grafana-server --homepath=/usr/share/grafana --config=/etc/grafana/grafana.ini --packaging=docker cfg:default.log.mode=console
top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
311 systemd+ 20 0 313.9g 184.6g 2432 S 151.7 97.9 26229:09 mongod
23818 nobody 20 0 11.3g 150084 17988 S 20.7 0.1 8523:47 prometheus
23162 root 20 0 12.7g 93948 5964 S 65.5 0.0 18702:22 cadvisor
serverStatus memeory shows this:
octopusrs0:PRIMARY> db.serverStatus().mem
{
"bits" : 64,
"resident" : 189097,
"virtual" : 321404,
"supported" : true
}
octopusrs0:PRIMARY> db.serverStatus().tcmalloc.tcmalloc.formattedString
------------------------------------------------
MALLOC: 218206510816 (208097.9 MiB) Bytes in use by application
MALLOC: + 96926863360 (92436.7 MiB) Bytes in page heap freelist
MALLOC: + 3944588576 ( 3761.9 MiB) Bytes in central cache freelist
MALLOC: + 134144 ( 0.1 MiB) Bytes in transfer cache freelist
MALLOC: + 713330688 ( 680.3 MiB) Bytes in thread cache freelists
MALLOC: + 1200750592 ( 1145.1 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 320992178176 (306122.0 MiB) Actual memory used (physical + swap)
MALLOC: + 13979086848 (13331.5 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 334971265024 (319453.5 MiB) Virtual address space used
MALLOC:
MALLOC: 9420092 Spans in use
MALLOC: 234 Thread heaps in use
MALLOC: 4096 Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
How can I detrmine what is causing this high memory consumption, and what can I do to return to nornal memory consumption?
Thanks,
Tamar
We have an asp.net web service application that is having OutOfMemory issue with IIS w3wp.exe. We got mini memory dump of w3wp when OOM happened, and we see it has large mem_reserve but small managed heap size. Question is how can we find out what is using mem_reserve (virtual address space). I expect to see some mem_reserve but not 2.9G. Since mem_reserve doesn't add up to eeheap, would it be from unmanaged heap? then anyway to confirm that?
0:000> !address -summary
Mapping file section regions...
Mapping module regions...
Mapping PEB regions...
Mapping TEB and stack regions...
Mapping heap regions...
Mapping page heap regions...
Mapping other regions...
Mapping stack trace database regions...
Mapping activation context regions...
--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
<unknown> 1664 d4c1d000 ( 3.324 GB) 92.34% 83.11%
Free 348 19951000 ( 409.316 MB) 9.99%
Image 1184 ad58000 ( 173.344 MB) 4.70% 4.23%
Heap 75 5ab4000 ( 90.703 MB) 2.46% 2.21%
Stack 193 1200000 ( 18.000 MB) 0.49% 0.44%
TEB 64 40000 ( 256.000 kB) 0.01% 0.01%
Other 10 35000 ( 212.000 kB) 0.01% 0.01%
PEB 1 1000 ( 4.000 kB) 0.00% 0.00%
--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE 1323 d6f04000 ( 3.358 GB) 93.28% 83.96%
MEM_IMAGE 1828 dd75000 ( 221.457 MB) 6.01% 5.41%
MEM_MAPPED 40 1a26000 ( 26.148 MB) 0.71% 0.64%
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_RESERVE 746 b8b4e000 ( 2.886 GB) 80.16% 72.15%
MEM_COMMIT 2445 2db51000 ( 731.316 MB) 19.84% 17.85%
MEM_FREE 348 19951000 ( 409.316 MB) 9.99%
--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE 1024 1f311000 ( 499.066 MB) 13.54% 12.18%
PAGE_EXECUTE_READ 248 b2e6000 ( 178.898 MB) 4.85% 4.37%
PAGE_READONLY 613 1c1d000 ( 28.113 MB) 0.76% 0.69%
PAGE_WRITECOPY 219 103e000 ( 16.242 MB) 0.44% 0.40%
PAGE_EXECUTE_READWRITE 144 59c000 ( 5.609 MB) 0.15% 0.14%
PAGE_EXECUTE_WRITECOPY 69 21f000 ( 2.121 MB) 0.06% 0.05%
PAGE_READWRITE|PAGE_GUARD 128 144000 ( 1.266 MB) 0.03% 0.03%
--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
<unknown> c2010000 4032000 ( 64.195 MB)
Free 86010000 4000000 ( 64.000 MB)
Image 72f48000 f5d000 ( 15.363 MB)
Heap 64801000 fcf000 ( 15.809 MB)
Stack 1ed0000 7a000 ( 488.000 kB)
TEB ffe8a000 1000 ( 4.000 kB)
Other fffb0000 23000 ( 140.000 kB)
PEB fffde000 1000 ( 4.000 kB)
0:000> !ao
---------Heap 1 ---------
Managed OOM occured after GC #2924 (Requested to allocate 0 bytes)
Reason: Didn't have enough memory to allocate an LOH segment
Detail: LOH: Failed to reserve memory (117440512 bytes)
---------Heap 7 ---------
Managed OOM occured after GC #2308 (Requested to allocate 0 bytes)
Reason: Could not do a full GC
0:000> !vmstat
TYPE MINIMUM MAXIMUM AVERAGE BLK COUNT TOTAL
~~~~ ~~~~~~~ ~~~~~~~ ~~~~~~~ ~~~~~~~~~ ~~~~~
Free:
Small 4K 64K 34K 219 7,475K
Medium 72K 832K 273K 80 21,903K
Large 1,080K 65,536K 7,954K 49 389,759K
Summary 4K 65,536K 1,204K 348 419,139K
Reserve:
Small 4K 64K 14K 436 6,519K
Medium 88K 1,020K 303K 169 51,303K
Large 1,088K 65,528K 21,052K 141 2,968,407K
Summary 4K 65,528K 4,056K 746 3,026,231K
Commit:
Small 4K 64K 16K 2,019 33,386K
Medium 68K 1,024K 287K 275 79,043K
Large 1,028K 65,736K 7,315K 87 636,435K
Summary 4K 65,736K 314K 2,381 748,866K
Private:
Small 4K 64K 28K 851 23,911K
Medium 88K 1,024K 313K 222 69,687K
Large 1,028K 65,736K 18,429K 186 3,427,951K
Summary 4K 65,736K 2,797K 1,259 3,521,550K
Mapped:
Small 4K 64K 18K 23 431K
Medium 68K 1,004K 385K 11 4,239K
Large 1,520K 6,640K 3,684K 6 22,104K
Summary 4K 6,640K 669K 40 26,775K
Image:
Small 4K 64K 9K 1,581 15,562K
Medium 68K 1,000K 267K 211 56,419K
Large 1,028K 15,732K 4,299K 36 154,787K
Summary 4K 15,732K 124K 1,828 226,771K
0:000> !eeheap -gc
Number of GC Heaps: 8
------------------------------
Heap 0 (01cd1720)
generation 0 starts at 0x44470de4
generation 1 starts at 0x43f51000
generation 2 starts at 0x02161000
ephemeral segment allocation context: none
segment begin allocated size
02160000 02161000 02dee4a8 0xc8d4a8(13161640)
43f50000 43f51000 444b9364 0x568364(5669732)
Large object heap starts at 0x12161000
segment begin allocated size
12160000 12161000 121bd590 0x5c590(378256)
c2010000 c2011000 c6041020 0x4030020(67305504)
Heap Size: Size: 0x5281dbc (86515132) bytes.
------------------------------
Heap 1 (01cd65a8)
generation 0 starts at 0x8c4b38e0
generation 1 starts at 0x8c011000
generation 2 starts at 0x04161000
ephemeral segment allocation context: none
segment begin allocated size
04160000 04161000 054114f0 0x12b04f0(19596528)
25c40000 25c41000 25fc7328 0x386328(3695400)
8c010000 8c011000 8c4c4778 0x4b3778(4929400)
Large object heap starts at 0x13161000
segment begin allocated size
13160000 13161000 13161010 0x10(16)
Heap Size: Size: 0x1ae9fa0 (28221344) bytes.
------------------------------
Heap 2 (01cdb5c0)
generation 0 starts at 0x4a71a420
generation 1 starts at 0x49f51000
generation 2 starts at 0x06161000
ephemeral segment allocation context: none
segment begin allocated size
06160000 06161000 06e89b18 0xd28b18(13798168)
27c40000 27c41000 27f8f6dc 0x34e6dc(3466972)
9a010000 9a011000 9a5900b8 0x57f0b8(5763256)
49f50000 49f51000 4a72cc2c 0x7dbc2c(8240172)
Large object heap starts at 0x14161000
segment begin allocated size
14160000 14161000 14161010 0x10(16)
Heap Size: Size: 0x1dd1ee8 (31268584) bytes.
------------------------------
Heap 3 (01ce05d8)
generation 0 starts at 0x34fbe540
generation 1 starts at 0x34d11000
generation 2 starts at 0x08161000
ephemeral segment allocation context: none
segment begin allocated size
08160000 08161000 08c26248 0xac5248(11293256)
2bc40000 2bc41000 2bfc9294 0x388294(3703444)
34d10000 34d11000 34fcdbb0 0x2bcbb0(2870192)
Large object heap starts at 0x15161000
segment begin allocated size
15160000 15161000 151c3b10 0x62b10(404240)
Heap Size: Size: 0x116cb9c (18271132) bytes.
------------------------------
Heap 4 (01ce55f0)
generation 0 starts at 0x934a7cec
generation 1 starts at 0x93011000
generation 2 starts at 0x0a161000
ephemeral segment allocation context: none
segment begin allocated size
0a160000 0a161000 0b4bf898 0x135e898(20310168)
45f50000 45f51000 45f70df4 0x1fdf4(130548)
93010000 93011000 934b83ec 0x4a73ec(4879340)
Large object heap starts at 0x16161000
segment begin allocated size
16160000 16161000 161ab050 0x4a050(303184)
Heap Size: Size: 0x186fac8 (25623240) bytes.
------------------------------
Heap 5 (01cec608)
generation 0 starts at 0x31f1d4d0
generation 1 starts at 0x31c41000
generation 2 starts at 0x0c161000
ephemeral segment allocation context: none
segment begin allocated size
0c160000 0c161000 0d2120b0 0x10b10b0(17502384)
7aea0000 7aea1000 7af5c074 0xbb074(766068)
31c40000 31c41000 31f2caac 0x2ebaac(3062444)
Large object heap starts at 0x17161000
segment begin allocated size
17160000 17161000 17179ad0 0x18ad0(101072)
Heap Size: Size: 0x14706a0 (21431968) bytes.
------------------------------
Heap 6 (01cf5a20)
generation 0 starts at 0x5fbb5598
generation 1 starts at 0x5f821000
generation 2 starts at 0x0e161000
ephemeral segment allocation context: none
segment begin allocated size
0e160000 0e161000 0eb22284 0x9c1284(10228356)
5f820000 5f821000 5fbc9838 0x3a8838(3835960)
Large object heap starts at 0x18161000
segment begin allocated size
18160000 18161000 18161010 0x10(16)
Heap Size: Size: 0xd69acc (14064332) bytes.
------------------------------
Heap 7 (01cfaa38)
generation 0 starts at 0xad530e00
generation 1 starts at 0xad011000
generation 2 starts at 0x10161000
ephemeral segment allocation context: none
segment begin allocated size
10160000 10161000 109bd218 0x85c218(8765976)
a4010000 a4011000 a42f418c 0x2e318c(3027340)
41f50000 41f51000 42a99090 0xb48090(11829392)
ad010000 ad011000 ad5399a0 0x5289a0(5409184)
Large object heap starts at 0x19161000
segment begin allocated size
19160000 19161000 1980ddb0 0x6acdb0(6999472)
Heap Size: Size: 0x225cb84 (36031364) bytes.
------------------------------
GC Heap Size: Size: 0xf950f98 (261427096) bytes.
0:000> !heap -s
SEGMENT HEAP ERROR: failed to initialize the extention
LFH Key : 0x3a251113
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
Virtual block: 1a830000 - 1a830000 (size 00000000)
Virtual block: 1a9d0000 - 1a9d0000 (size 00000000)
007b0000 00000002 16384 15520 16384 198 418 5 2 3 LFH
00280000 00001002 3136 1608 3136 7 23 3 0 0 LFH
00a40000 00001002 1088 828 1088 79 25 2 0 0 LFH
00c30000 00001002 1088 308 1088 13 5 2 0 0 LFH
00100000 00041002 256 4 256 2 1 1 0 0
01560000 00001002 256 24 256 3 4 1 0 0
01510000 00001002 64 4 64 2 1 1 0 0
01670000 00001002 64 4 64 2 1 1 0 0
00290000 00011002 256 4 256 1 2 1 0 0
01870000 00001002 256 8 256 3 3 1 0 0
004e0000 00001002 256 4 256 1 2 1 0 0
019e0000 00001002 256 4 256 2 1 1 0 0
01af0000 00001002 256 4 256 2 1 1 0 0
02080000 00001002 1280 396 1280 1 34 2 0 0 LFH
01fc0000 00041002 256 4 256 2 1 1 0 0
1b700000 00041002 256 256 256 6 4 1 0 0 LFH
21840000 00001002 64192 1880 64192 1639 13 12 0 0 LFH
External fragmentation 87 % (13 free blocks)
-----------------------------------------------------------------------------
If we recycle w3wp, then all memory will go back to system. So there seems no memory leak. I understand that mem_reserve is mapped to virtual address, and are allocated space but not committed. Question is: is there a way to figure out what is using up all these virtual address.
I have memory dump, but I don't have access the the server to get performance counter or something.
Thanks folks!!
I have my Windows 10 bluescreen several times and I have memory dump and running !vm produces output below showing 0 available PTEs. How do I find out postmortem who leaked those or monitor on live system which process/driver responsible for leak?
0: kd> !vm
Page File: \??\C:\pagefile.sys
Current: 9961472 Kb Free Space: 9961464 Kb
Minimum: 9961472 Kb Maximum: 62351824 Kb
Page File: \??\C:\swapfile.sys
Current: 16384 Kb Free Space: 16376 Kb
Minimum: 16384 Kb Maximum: 49881460 Kb
No Name for Paging File
Current: 129378408 Kb Free Space: 129362880 Kb
Minimum: 129378408 Kb Maximum: 129378408 Kb
Physical Memory: 16756646 ( 67026584 Kb)
Available Pages: 11876350 ( 47505400 Kb)
ResAvail Pages: 15917045 ( 63668180 Kb)
Locked IO Pages: 0 ( 0 Kb)
Free System PTEs: 0 ( 0 Kb)
********** Running out of system PTEs **************
Modified Pages: 249289 ( 997156 Kb)
Modified PF Pages: 249261 ( 997044 Kb)
Modified No Write Pages: 25 ( 100 Kb)
NonPagedPool Usage: 3042 ( 12168 Kb)
NonPagedPoolNx Usage: 103383 ( 413532 Kb)
NonPagedPool Max: 4294967296 (17179869184 Kb)
PagedPool 0: 144048 ( 576192 Kb)
PagedPool 1: 31595 ( 126380 Kb)
PagedPool 2: 31923 ( 127692 Kb)
PagedPool 3: 31631 ( 126524 Kb)
PagedPool 4: 31714 ( 126856 Kb)
PagedPool Usage: 270911 ( 1083644 Kb)
PagedPool Maximum: 4294967296 (17179869184 Kb)
Processor Commit: 1348 ( 5392 Kb)
Session Commit: 17782 ( 71128 Kb)
Shared Commit: 658461 ( 2633844 Kb)
Special Pool: 0 ( 0 Kb)
Kernel Stacks: 26919 ( 107676 Kb)
Pages For MDLs: 395401 ( 1581604 Kb)
Pages For AWE: 0 ( 0 Kb)
NonPagedPool Commit: 97838 ( 391352 Kb)
PagedPool Commit: 270911 ( 1083644 Kb)
Driver Commit: 19721 ( 78884 Kb)
Boot Commit: 2732 ( 10928 Kb)
PFN Array Commit: 196913 ( 787652 Kb)
System PageTables: 3267 ( 13068 Kb)
ProcessLockedFilePages: 306 ( 1224 Kb)
Pagefile Hash Pages: 0 ( 0 Kb)
Sum System Commit: 1691599 ( 6766396 Kb)
Total Private: 4330147 ( 17320588 Kb)
Misc/Transient Commit: 9281 ( 37124 Kb)
Committed pages: 6031027 ( 24124108 Kb)
Commit limit: 19247014 ( 76988056 Kb)
Pid ImageName Commit SharedCommit Debt
598c vmmem 3677256 Kb 0 Kb 0 Kb
4d4 RemoteDesktopManager.exe 1367276 Kb 473008 Kb 0 Kb
6300 vmmem 1050684 Kb 0 Kb 0 Kb
207c vmmem 1050684 Kb 0 Kb 0 Kb
1d18 vmmem 1050684 Kb 0 Kb 0 Kb
276c powershell.exe 713052 Kb 4660 Kb 0 Kb
483c chrome.exe 635112 Kb 130192 Kb 0 Kb
3ad4 chrome.exe 525988 Kb 20672 Kb 0 Kb
From Russinovich, Mark; Solomon, David; Ionescu, Alex. Windows Internals, Part 2 (6th Edition) (Developer Reference)
you can enable system PTE tracking by creating a new DWORD value in
the HKLM\ SYSTEM\ CurrentControlSet\ Control\ Session Manager\ Memory
Management key called TrackPtes and setting its value to 1. You can
then use !sysptes 4 to show a list of allocators
So you could try to
set the registry key and reboot
use livekdon the running system and execute !sysptes 4 to get al list of allocators
or execute !sysptes 4 on a dump you collect
I am trying to get a feel for what I should expect in terms of performance from cloud storage.
I just ran the gsutil perfdiag from a compute engine instance in the same location (US) and the same project as my cloud storage bucket.
For nearline storage, I get a 25 Mibit/s read and 353 Mibit/s write, is that low / high / average, why such discrepancy between read and write ?
==============================================================================
DIAGNOSTIC RESULTS
==============================================================================
------------------------------------------------------------------------------
Latency
------------------------------------------------------------------------------
Operation Size Trials Mean (ms) Std Dev (ms) Median (ms) 90th % (ms)
========= ========= ====== ========= ============ =========== ===========
Delete 0 B 5 112.0 52.9 78.2 173.6
Delete 1 KiB 5 94.1 17.5 90.8 115.0
Delete 100 KiB 5 80.4 2.5 79.9 83.4
Delete 1 MiB 5 86.7 3.7 88.2 90.4
Download 0 B 5 58.1 3.8 57.8 62.2
Download 1 KiB 5 2892.4 1071.5 2589.1 4111.9
Download 100 KiB 5 1955.0 711.3 1764.9 2814.3
Download 1 MiB 5 2679.4 976.2 2216.2 3869.9
Metadata 0 B 5 69.1 57.0 42.8 129.3
Metadata 1 KiB 5 37.4 1.5 37.1 39.0
Metadata 100 KiB 5 64.2 47.7 40.9 113.0
Metadata 1 MiB 5 45.7 9.1 49.4 55.1
Upload 0 B 5 138.3 21.0 122.5 164.8
Upload 1 KiB 5 170.6 61.5 139.4 242.0
Upload 100 KiB 5 387.2 294.5 245.8 706.1
Upload 1 MiB 5 257.4 51.3 228.4 319.7
------------------------------------------------------------------------------
Write Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Write throughput: 353.13 Mibit/s.
------------------------------------------------------------------------------
Read Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Read throughput: 25.16 Mibit/s.
------------------------------------------------------------------------------
System Information
------------------------------------------------------------------------------
IP Address:
##.###.###.##
Temporary Directory:
/tmp
Bucket URI:
gs://pl_twitter/
gsutil Version:
4.12
boto Version:
2.30.0
Measurement time:
2015-05-11 07:03:26 PM
Google Server:
Google Server IP Addresses:
##.###.###.###
Google Server Hostnames:
Google DNS thinks your IP is:
CPU Count:
4
CPU Load Average:
[0.16, 0.05, 0.06]
Total Memory:
14.38 GiB
Free Memory:
11.34 GiB
TCP segments sent during test:
5592296
TCP segments received during test:
2417850
TCP segments retransmit during test:
3794
Disk Counter Deltas:
disk reads writes rbytes wbytes rtime wtime
sda1 31 5775 126976 1091674112 856 1603544
TCP /proc values:
wmem_default = 212992
wmem_max = 212992
rmem_default = 212992
tcp_timestamps = 1
tcp_window_scaling = 1
tcp_sack = 1
rmem_max = 212992
Boto HTTPS Enabled:
True
Requests routed through proxy:
False
Latency of the DNS lookup for Google Storage server (ms):
2.5
Latencies connecting to Google Storage server IPs (ms):
##.###.###.### = 1.1
------------------------------------------------------------------------------
In-Process HTTP Statistics
------------------------------------------------------------------------------
Total HTTP requests made: 94
HTTP 5xx errors: 0
HTTP connections broken: 0
Availability: 100%
For standard storage I get:
==============================================================================
DIAGNOSTIC RESULTS
==============================================================================
------------------------------------------------------------------------------
Latency
------------------------------------------------------------------------------
Operation Size Trials Mean (ms) Std Dev (ms) Median (ms) 90th % (ms)
========= ========= ====== ========= ============ =========== ===========
Delete 0 B 5 121.9 34.8 105.1 158.9
Delete 1 KiB 5 159.3 58.2 126.0 232.3
Delete 100 KiB 5 106.8 17.0 103.3 125.7
Delete 1 MiB 5 167.0 77.3 145.1 251.0
Download 0 B 5 87.2 10.3 81.1 100.0
Download 1 KiB 5 95.5 18.0 92.4 115.6
Download 100 KiB 5 156.7 20.5 155.8 179.6
Download 1 MiB 5 219.6 11.7 213.4 232.6
Metadata 0 B 5 59.7 4.5 57.8 64.4
Metadata 1 KiB 5 61.0 21.8 49.6 85.4
Metadata 100 KiB 5 55.3 10.4 50.7 67.7
Metadata 1 MiB 5 75.6 27.8 67.4 109.0
Upload 0 B 5 162.7 37.0 139.0 207.7
Upload 1 KiB 5 165.2 23.6 152.3 194.1
Upload 100 KiB 5 392.1 235.0 268.7 643.0
Upload 1 MiB 5 387.0 79.5 340.9 486.1
------------------------------------------------------------------------------
Write Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Write throughput: 515.63 Mibit/s.
------------------------------------------------------------------------------
Read Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Read throughput: 123.14 Mibit/s.
------------------------------------------------------------------------------
System Information
------------------------------------------------------------------------------
IP Address:
10.240.133.190
Temporary Directory:
/tmp
Bucket URI:
gs://test_throughput_standard/
gsutil Version:
4.12
boto Version:
2.30.0
Measurement time:
2015-05-21 11:08:50 AM
Google Server:
Google Server IP Addresses:
##.###.##.###
Google Server Hostnames:
Google DNS thinks your IP is:
CPU Count:
8
CPU Load Average:
[0.28, 0.18, 0.08]
Total Memory:
Upload 1 MiB 5 387.0 79.5 340.9 486.1
49.91 GiB
Free Memory:
47.9 GiB
TCP segments sent during test:
5165461
TCP segments received during test:
1881727
TCP segments retransmit during test:
3423
Disk Counter Deltas:
disk reads writes rbytes wbytes rtime wtime
dm-0 0 0 0 0 0 0
loop0 0 0 0 0 0 0
loop1 0 0 0 0 0 0
sda1 0 4229 0 1080618496 0 1605286
TCP /proc values:
wmem_default = 212992
wmem_max = 212992
rmem_default = 212992
tcp_timestamps = 1
tcp_window_scaling = 1
tcp_sack = 1
rmem_max = 212992
Boto HTTPS Enabled:
True
Requests routed through proxy:
False
Latency of the DNS lookup for Google Storage server (ms):
1.2
Latencies connecting to Google Storage server IPs (ms):
##.###.##.### = 1.3
------------------------------------------------------------------------------
In-Process HTTP Statistics
------------------------------------------------------------------------------
Total HTTP requests made: 94
HTTP 5xx errors: 0
HTTP connections broken: 0
Availability: 100%
==============================================================================
DIAGNOSTIC RESULTS
==============================================================================
------------------------------------------------------------------------------
Latency
------------------------------------------------------------------------------
Operation Size Trials Mean (ms) Std Dev (ms) Median (ms) 90th % (ms)
========= ========= ====== ========= ============ =========== ===========
Delete 0 B 5 145.1 59.4 117.8 215.2
Delete 1 KiB 5 178.0 51.4 190.6 224.3
Delete 100 KiB 5 98.3 5.0 96.6 104.3
Delete 1 MiB 5 117.7 19.2 112.0 140.2
Download 0 B 5 109.4 38.9 91.9 156.5
Download 1 KiB 5 149.5 41.0 141.9 192.5
Download 100 KiB 5 106.9 20.3 108.6 127.8
Download 1 MiB 5 121.1 16.0 112.2 140.9
Metadata 0 B 5 70.0 10.8 76.8 79.9
Metadata 1 KiB 5 113.8 36.6 124.0 148.7
Metadata 100 KiB 5 63.1 20.2 55.7 86.5
Metadata 1 MiB 5 59.2 4.9 61.3 62.9
Upload 0 B 5 127.5 22.6 117.4 153.6
Upload 1 KiB 5 215.2 54.8 221.4 270.4
Upload 100 KiB 5 229.8 79.2 171.6 329.8
Upload 1 MiB 5 489.8 412.3 295.3 915.4
------------------------------------------------------------------------------
Write Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Write throughput: 503 Mibit/s.
------------------------------------------------------------------------------
Read Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Read throughput: 1.05 Gibit/s.
------------------------------------------------------------------------------
System Information
------------------------------------------------------------------------------
IP Address:
################
Temporary Directory:
/tmp
Bucket URI:
gs://test_throughput_standard/
gsutil Version:
4.12
boto Version:
2.30.0
Measurement time:
2015-05-21 06:20:49 PM
Google Server:
Google Server IP Addresses:
#############
Google Server Hostnames:
Google DNS thinks your IP is:
CPU Count:
8
CPU Load Average:
[0.08, 0.03, 0.05]
Total Memory:
49.91 GiB
Free Memory:
47.95 GiB
TCP segments sent during test:
4958020
TCP segments received during test:
2326124
TCP segments retransmit during test:
2163
Disk Counter Deltas:
disk reads writes rbytes wbytes rtime wtime
dm-0 0 0 0 0 0 0
loop0 0 0 0 0 0 0
loop1 0 0 0 0 0 0
sda1 0 4202 0 1080475136 0 1610000
TCP /proc values:
wmem_default = 212992
wmem_max = 212992
rmem_default = 212992
tcp_timestamps = 1
tcp_window_scaling = 1
tcp_sack = 1
rmem_max = 212992
Boto HTTPS Enabled:
True
Requests routed through proxy:
False
Latency of the DNS lookup for Google Storage server (ms):
1.6
Latencies connecting to Google Storage server IPs (ms):
############ = 1.3
2nd Run:
==============================================================================
DIAGNOSTIC RESULTS
==============================================================================
------------------------------------------------------------------------------
Latency
------------------------------------------------------------------------------
Operation Size Trials Mean (ms) Std Dev (ms) Median (ms) 90th % (ms)
========= ========= ====== ========= ============ =========== ===========
Delete 0 B 5 91.5 14.0 85.1 106.0
Delete 1 KiB 5 125.4 76.2 91.7 203.3
Delete 100 KiB 5 104.4 15.9 99.0 123.2
Delete 1 MiB 5 128.2 36.0 116.4 170.7
Download 0 B 5 60.2 8.3 63.0 68.7
Download 1 KiB 5 62.6 11.3 61.6 74.8
Download 100 KiB 5 103.2 21.3 110.7 123.8
Download 1 MiB 5 137.1 18.5 130.3 159.8
Metadata 0 B 5 73.4 35.9 62.3 114.2
Metadata 1 KiB 5 55.9 18.1 55.3 75.6
Metadata 100 KiB 5 45.7 11.0 42.5 59.1
Metadata 1 MiB 5 49.9 7.9 49.2 58.8
Upload 0 B 5 128.2 24.6 115.5 158.8
Upload 1 KiB 5 153.5 44.1 132.4 206.4
Upload 100 KiB 5 176.8 26.8 165.1 209.7
Upload 1 MiB 5 277.9 80.2 214.7 378.5
------------------------------------------------------------------------------
Write Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Write throughput: 463.76 Mibit/s.
------------------------------------------------------------------------------
Read Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Read throughput: 184.96 Mibit/s.
------------------------------------------------------------------------------
System Information
------------------------------------------------------------------------------
IP Address:
#################
Temporary Directory:
/tmp
Bucket URI:
gs://test_throughput_standard/
gsutil Version:
4.12
boto Version:
2.30.0
Measurement time:
2015-05-21 06:24:31 PM
Google Server:
Google Server IP Addresses:
####################
Google Server Hostnames:
Google DNS thinks your IP is:
CPU Count:
8
CPU Load Average:
[0.19, 0.17, 0.11]
Total Memory:
49.91 GiB
Free Memory:
47.9 GiB
TCP segments sent during test:
5180256
TCP segments received during test:
2034323
TCP segments retransmit during test:
2883
Disk Counter Deltas:
disk reads writes rbytes wbytes rtime wtime
dm-0 0 0 0 0 0 0
loop0 0 0 0 0 0 0
loop1 0 0 0 0 0 0
sda1 0 4209 0 1080480768 0 1604066
TCP /proc values:
wmem_default = 212992
wmem_max = 212992
rmem_default = 212992
tcp_timestamps = 1
tcp_window_scaling = 1
tcp_sack = 1
rmem_max = 212992
Boto HTTPS Enabled:
True
Requests routed through proxy:
False
Latency of the DNS lookup for Google Storage server (ms):
3.5
Latencies connecting to Google Storage server IPs (ms):
################ = 1.1
------------------------------------------------------------------------------
In-Process HTTP Statistics
------------------------------------------------------------------------------
Total HTTP requests made: 94
HTTP 5xx errors: 0
HTTP connections broken: 0
Availability: 100%
3rd run
==============================================================================
DIAGNOSTIC RESULTS
==============================================================================
------------------------------------------------------------------------------
Latency
------------------------------------------------------------------------------
Operation Size Trials Mean (ms) Std Dev (ms) Median (ms) 90th % (ms)
========= ========= ====== ========= ============ =========== ===========
Delete 0 B 5 157.0 78.3 101.5 254.9
Delete 1 KiB 5 153.5 49.1 178.3 202.5
Delete 100 KiB 5 152.9 47.5 168.0 202.6
Delete 1 MiB 5 110.6 20.4 105.7 134.5
Download 0 B 5 104.4 50.5 66.8 167.6
Download 1 KiB 5 68.1 11.1 68.7 79.2
Download 100 KiB 5 85.5 5.8 86.0 90.8
Download 1 MiB 5 126.6 40.1 100.5 175.0
Metadata 0 B 5 67.9 16.2 61.0 86.6
Metadata 1 KiB 5 49.3 8.6 44.9 59.5
Metadata 100 KiB 5 66.6 35.4 44.2 107.8
Metadata 1 MiB 5 53.9 13.2 52.1 69.4
Upload 0 B 5 136.7 37.1 114.4 183.5
Upload 1 KiB 5 145.5 58.3 116.8 208.2
Upload 100 KiB 5 227.3 37.6 233.3 259.3
Upload 1 MiB 5 274.8 45.2 261.8 328.5
------------------------------------------------------------------------------
Write Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Write throughput: 407.03 Mibit/s.
------------------------------------------------------------------------------
Read Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Read throughput: 629.07 Mibit/s.
------------------------------------------------------------------------------
System Information
------------------------------------------------------------------------------
IP Address:
###############
Temporary Directory:
/tmp
Bucket URI:
gs://test_throughput_standard/
gsutil Version:
4.12
boto Version:
2.30.0
Measurement time:
2015-05-21 06:32:48 PM
Google Server:
Google Server IP Addresses:
################
Google Server Hostnames:
Google DNS thinks your IP is:
CPU Count:
8
CPU Load Average:
[0.11, 0.13, 0.13]
Total Memory:
49.91 GiB
Free Memory:
47.94 GiB
TCP segments sent during test:
5603925
TCP segments received during test:
2438425
TCP segments retransmit during test:
4586
Disk Counter Deltas:
disk reads writes rbytes wbytes rtime wtime
dm-0 0 0 0 0 0 0
loop0 0 0 0 0 0 0
loop1 0 0 0 0 0 0
sda1 0 4185 0 1080353792 0 1603851
TCP /proc values:
wmem_default = 212992
wmem_max = 212992
rmem_default = 212992
tcp_timestamps = 1
tcp_window_scaling = 1
tcp_sack = 1
rmem_max = 212992
Boto HTTPS Enabled:
True
Requests routed through proxy:
False
Latency of the DNS lookup for Google Storage server (ms):
2.2
Latencies connecting to Google Storage server IPs (ms):
############## = 1.6
All things being equal, write performance is generally higher for modern storage systems because of presence of a caching layer between the application disks, that said, what you are seeing is within the expected range for "nearline" storage.
I have observed far superior throughput results when using "standard" storage buckets. Though latency did not improve much. Consider using a "Standard" bucket if your application requires high throughput. If your application is sensitive to latency, then using local storage as a cache (or scratch space) may be the only option.
Here is a snippet from one my experiments on "Standard" buckets:
------------------------------------------------------------------------------
Latency
------------------------------------------------------------------------------
Operation Size Trials Mean (ms) Std Dev (ms) Median (ms) 90th % (ms)
========= ========= ====== ========= ============ =========== ===========
Delete 0 B 10 91.5 12.4 89.0 98.5
Delete 1 KiB 10 96.4 9.1 95.6 105.6
Delete 100 KiB 10 92.9 22.8 85.3 102.4
Delete 1 MiB 10 86.4 9.1 84.1 93.2
Download 0 B 10 54.2 5.1 55.4 58.8
Download 1 KiB 10 83.3 18.7 78.4 94.9
Download 100 KiB 10 75.2 14.5 68.6 92.6
Download 1 MiB 10 95.0 19.7 86.3 126.7
Metadata 0 B 10 33.5 7.9 31.1 44.8
Metadata 1 KiB 10 36.3 7.2 35.8 46.8
Metadata 100 KiB 10 37.7 9.2 36.6 44.1
Metadata 1 MiB 10 116.1 231.3 36.6 136.1
Upload 0 B 10 151.4 67.5 122.9 195.9
Upload 1 KiB 10 134.2 22.4 127.9 149.3
Upload 100 KiB 10 168.8 20.5 168.6 188.6
Upload 1 MiB 10 213.3 37.6 200.2 262.5
------------------------------------------------------------------------------
Write Throughput
------------------------------------------------------------------------------
Copied 5 1 GiB file(s) for a total transfer size of 10 GiB.
Write throughput: 3.46 Gibit/s.
Parallelism strategy: both
------------------------------------------------------------------------------
Write Throughput With File I/O
------------------------------------------------------------------------------
Copied 5 1 GiB file(s) for a total transfer size of 10 GiB.
Write throughput: 3.9 Gibit/s.
Parallelism strategy: both
------------------------------------------------------------------------------
Read Throughput
------------------------------------------------------------------------------
Copied 5 1 GiB file(s) for a total transfer size of 10 GiB.
Read throughput: 7.04 Gibit/s.
Parallelism strategy: both
------------------------------------------------------------------------------
Read Throughput With File I/O
------------------------------------------------------------------------------
Copied 5 1 GiB file(s) for a total transfer size of 10 GiB.
Read throughput: 1.64 Gibit/s.
Parallelism strategy: both
Hope that is helpful.
I can't understand, where my ceph raw space is gone.
cluster 90dc9682-8f2c-4c8e-a589-13898965b974
health HEALTH_WARN 72 pgs backfill; 26 pgs backfill_toofull; 51 pgs backfilling; 141 pgs stuck unclean; 5 requests are blocked > 32 sec; recovery 450170/8427917 objects degraded (5.341%); 5 near full osd(s)
monmap e17: 3 mons at {enc18=192.168.100.40:6789/0,enc24=192.168.100.43:6789/0,enc26=192.168.100.44:6789/0}, election epoch 734, quorum 0,1,2 enc18,enc24,enc26
osdmap e3326: 14 osds: 14 up, 14 in
pgmap v5461448: 1152 pgs, 3 pools, 15252 GB data, 3831 kobjects
31109 GB used, 7974 GB / 39084 GB avail
450170/8427917 objects degraded (5.341%)
18 active+remapped+backfill_toofull
1011 active+clean
64 active+remapped+wait_backfill
8 active+remapped+wait_backfill+backfill_toofull
51 active+remapped+backfilling
recovery io 58806 kB/s, 14 objects/s
OSD tree (each host has 2 OSD):
# id weight type name up/down reweight
-1 36.45 root default
-2 5.44 host enc26
0 2.72 osd.0 up 1
1 2.72 osd.1 up 0.8227
-3 3.71 host enc24
2 0.99 osd.2 up 1
3 2.72 osd.3 up 1
-4 5.46 host enc22
4 2.73 osd.4 up 0.8
5 2.73 osd.5 up 1
-5 5.46 host enc18
6 2.73 osd.6 up 1
7 2.73 osd.7 up 1
-6 5.46 host enc20
9 2.73 osd.9 up 0.8
8 2.73 osd.8 up 1
-7 0 host enc28
-8 5.46 host archives
12 2.73 osd.12 up 1
13 2.73 osd.13 up 1
-9 5.46 host enc27
10 2.73 osd.10 up 1
11 2.73 osd.11 up 1
Real usage:
/dev/rbd0 14T 7.9T 5.5T 59% /mnt/ceph
Pool size:
osd pool default size = 2
Pools:
ceph osd lspools
0 data,1 metadata,2 rbd,
rados df
pool name category KB objects clones degraded unfound rd rd KB wr wr KB
data - 0 0 0 0 0 0 0 0 0
metadata - 0 0 0 0 0 0 0 0 0
rbd - 15993591918 3923880 0 444545 0 82936 1373339 2711424 849398218
total used 32631712348 3923880
total avail 8351008324
total space 40982720672
Raw usage is 4x real usage. As I understand, it must be 2x ?
Yes, it must be 2x. I don't really shure, that the real raw usage is 7.9T. Why do you check this value on mapped disk?
This are my pools:
pool name KB objects clones degraded unfound rd rd KB wr wr KB
admin-pack 7689982 1955 0 0 0 693841 3231750 40068930 353462603
public-cloud 105432663 26561 0 0 0 13001298 638035025 222540884 3740413431
rbdkvm_sata 32624026697 7968550 31783 0 0 4950258575 232374308589 12772302818 278106113879
total used 98289353680 7997066
total avail 34474223648
total space 132763577328
You can see, that the total amount of used space is 3 times more than the used space in the pool rbdkvm_sata (+-).
ceph -s shows the same result too:
pgmap v11303091: 5376 pgs, 3 pools, 31220 GB data, 7809 kobjects
93736 GB used, 32876 GB / 123 TB avail
I don't think you have just one rbd image. The result of "ceph osd lspools" indicated that you had 3 pools and one of pools had name "metadata".(Maybe you were using cephfs). /dev/rbd0 was appeared because you mapped the image but you could have other images also. To list the images you can use "rbd list -p ". You can see the image info with "rbd info -p "