I have for years used the !heap –p –a for various tasks.
Now I’m starting to debug on Win8 using the WinDbg 6.2.9200 found in the latest Win8 sdk.
Here I have found that the !heap –p –a does not always work, and that the output from
!address “advertise” usage of !heap –x (see below) .
After reading the !heap -? , I can’t understand the difference!
Anyone who knows the difference?
Which command do you use to see the details of a heap block ?
0:008> !address 335168f8
<cut cut>
Usage: Heap
Base Address: 32b43000
End Address: 33540000
Region Size: 009fd000
State: 00001000 MEM_COMMIT
Protect: 00000004 PAGE_READWRITE
Type: 00020000 MEM_PRIVATE
Allocation Base: 32570000
Allocation Protect: 00000004 PAGE_READWRITE
More info: heap owning the address: !heap 0xa80000
More info: heap segment
More info: heap entry containing the address: !heap -x 0x335168f8
0:008> !heap -x 0x335168f8
Entry User Heap Segment Size PrevSize Unused Flags
-----------------------------------------------------------------------------
335168f0 335168f8 00a80000 32570000 30 30 1c busy extra fill
0:008> !heap -p -a 0x335168f8
0:008> .echo "nothing !!"
nothing !!
Windbg uses a different mechanism for looking up the heap information depending on which flag you use.
The -p flag tells it that you have enabled Page Heap via gflags.exe or similar. When Page Heap is enabled, Windows keeps a separate set of structures (_DPH_HEAP_ROOT and co) for tracking allocations. If PageHeap is not on, there won't be any such structures, so you will get no output. I also expect that -p -a will just search backward from the address to try to find the _DPH_HEAP_BLOCK which describes the allocation.
The -x flag tells Windbg to walk the the _HEAP/_HEAP_ENTRY structures which Windows uses for keeping track of allocations. This set of structures describe all active allocations which have gone through the standard allocators (e.g., malloc, new, LocalAlloc,HeapAlloc`, etc).
There are a few great papers on the internals of Windows' heap allocators. I really like the paper Chris Valasek (#nudehaberdasher) did a few years ago on the Low Fragmentation Heap which was implemented in Windows 7 (and the principles still apply in Win8).
Related
In processing a stream of logs (via UDP) in a raku (v2022.07) app, I'm
hitting what appears to be a memory leak using IO::Socket::Async.
I pulled the code out into a simpler program which I've included below
(~ identical to code at https://docs.raku.org/type/IO::Socket::Async):
#!/usr/bin/env raku
#
my $socket = IO::Socket::Async.bind-udp('localhost', 24225);
react {
whenever $socket.Supply -> $v {
print $v if $v.chars > 0;
};
};
It leaks substantial ram - I let it run about 12 hours and
when I checked -- still running (on a 1T ram machine) -- with
ps auwwx [pid]
it showed 314974456 and 20739784 for VSZ and RSS (so, roughly 300G v size and 20G resident).
[btw, the UDP traffic is fairly light - average of 350 (~100 byte) packets/sec (spikes to ~1000/sec)]
So .. I rewrote above in perl5 (after similar leaky results w/
a couple of raku variants) which stabilizes quickly at about 8M resident - that's fine/stable/etc. -
but I'd prefer this process to feed a raku channel (without separate perl process/file
tailing, etc.).
My environment: FreeBSD 13.1-RELEASE-p2 GENERIC amd64 and raku:
v2022.07 built on MoarVM 2022.07 (installed with rakubrew).
I'm guessing this is unique to raku on freebsd but not sure.
I did attempt to upgrade (rakubrew) to v2022.12 to see if problem resolved there -
but in rebuilding modules (zef), too many failed (some issue with
Digest/Digest::HMAC) - so I had to revert to 2022.07.
I'll sure be grateful for any suggestions for addressing the leak or alternative
methods to address reading from a UDP port.
Not exactly a solution to your problem, but you can monitor memory usage from within your Raku code using built-in feature:
use Telemetry;
say T{"max-rss"};
Also remember that Supply by default decodes unicode chars. If your protocol is binary you may add :bin to Socket params to avoid treating binary data as text.
I used the top command to check on the Linux server, and found that the Memory of the deployed Vertx program has been increasing. I printed the Memory usage with Native Memory Tracking, and found that the Internal Memory has been increasing.I didn't manually allocate out of heap memory in the code.
Native Memory Tracking:
Total: reserved=7595MB, committed=6379MB
Java Heap (reserved=4096MB, committed=4096MB)
(mmap: reserved=4096MB, committed=4096MB)
Class (reserved=1101MB, committed=86MB)
(classes #12776)
(malloc=7MB #18858)
(mmap: reserved=1094MB, committed=79MB)
Thread (reserved=122MB, committed=122MB)
(thread #122)
(stack: reserved=121MB, committed=121MB)
Code (reserved=253MB, committed=52MB)
(malloc=9MB #12566)
(mmap: reserved=244MB, committed=43MB)
GC (reserved=155MB, committed=155MB)
(malloc=6MB #302)
(mmap: reserved=150MB, committed=150MB)
Internal (reserved=1847MB, committed=1847MB)
(malloc=1847MB #35973)
Symbol (reserved=17MB, committed=17MB)
(malloc=14MB #137575)
(arena=3MB #1)
Native Memory Tracking (reserved=4MB, committed=4MB)
(tracking overhead=3MB)
vertx version:3.9.8
Cluster:Hazelcast
startup script:su web -s /bin/bash -c "/usr/bin/nohup /usr/bin/java -XX:NativeMemoryTracking=detail -javaagent:../target/showbiz-server-game-1.0-SNAPSHOT-fat.jar -javaagent:../../quasar-core-0.8.0.jar=b -Dvertx.hazelcast.config=/data/appdata/webdata/Project/config/cluster.xml -jar -Xms4G -Xmx4G -XX:-OmitStackTraceInFastThrow ../target/server-1.0-SNAPSHOT-fat.jar start-Dvertx-id=server -conf application-conf.json -Dlog4j.configurationFile=log4j2_logstash.xml -cluster >nohup.out 2>&1 &"
If your producers are much faster than your consumers, and back pressure isn't handled properly, it's possible to have memory that keeps increasing.
Also, this could vary depending on how the code is written.
This similar reported issue could be of help and consider exploring writeStream too.
I would like to get timeseries
t0, misses
...
tN, misses
where tN is a timestamp (second-resolution) and misses is a number of times the kernel made disk-IO for my PID to load missing page of the mmap()-ed memory region when process did access to that memory. Ok, maybe connection between disk-IO and memory-access is harder to track, lets assume my program can not do any disk-io with another (than assessing missing mmapped memory) reason. I THINK, I need to track something called node-load-misses in perf world.
Any ideas how eBPF can be used to collect such data? What probes should I use?
Tried to use perf record for similar purpose: I dislike how much data perf records. As I recall the try was like (also I dont remember how I parsed that output.data file):
perf record -p $PID -a -F 10 -e node-loads -e node-load-misses -o output.data
I thought eBPF could give some facility to implement such thing in less overhead way.
Loading of mmaped pages which are not present in memory is not hardware event like perf's cache-misses or node-loads or node-load-misses. When your program assess not present memory address, GPFault/pagefault exception is generated by hardware and it is handled in software by Linux kernel codes. For first access to anonymous memory physical page will be allocated and mapped for this virtual address; for access of mmaped file disk I/O will be initiated. There are two kinds of page faults in linux: minor and major, and disk I/O is major page fault.
You should try to use trace-cmd or ftrace or perf trace. Support of fault tracing was planned for perf tool in 2012, and patches were proposed in https://lwn.net/Articles/602658/
There is a tracepoint for page faults from userspace code, and this command prints some events with memory address of page fault:
echo 2^123456%2 | perf trace -e 'exceptions:page_fault_user' bc
With recent perf tool (https://mirrors.edge.kernel.org/pub/linux/kernel/tools/perf/) there is perf trace record which can record both mmap syscalls and page_fault_user into perf.data and perf script will print all events and they can be counted by some awk or python script.
Some useful links on perf and tracing: http://www.brendangregg.com/perf.html http://www.brendangregg.com/ebpf.html https://github.com/iovisor/bpftrace/blob/master/INSTALL.md
And some bcc tools may be used to trace disk I/O, like https://github.com/iovisor/bcc/blob/master/examples/tracing/disksnoop.py or https://github.com/brendangregg/perf-tools/blob/master/examples/iosnoop_example.txt
And for simple time-series stat you can use perf stat -I 1000 command with correct software events
perf stat -e cpu-clock,page-faults,minor-faults,major-faults -I 1000 ./program
...
# time counts unit events
1.000112251 413.59 msec cpu-clock # 0.414 CPUs utilized
1.000112251 5,361 page-faults # 0.013 M/sec
1.000112251 5,301 minor-faults # 0.013 M/sec
1.000112251 60 major-faults # 0.145 K/sec
2.000490561 16.32 msec cpu-clock # 0.016 CPUs utilized
2.000490561 1 page-faults # 0.005 K/sec
2.000490561 1 minor-faults # 0.005 K/sec
2.000490561 0 major-faults # 0.000 K/sec
So I know that a memory address (eg: 12208e6c) is within a specific heap.Using windbg, is there a way to determine what the starting address for this heap is and which function was responsible for allocating it?
!address <address> gives you information about the heap an address is contained in:
0:005> !address 03051234
Usage: Heap
Base Address: 03050000
End Address: 0307c000
Region Size: 0002c000
State: 00001000 MEM_COMMIT
Protect: 00000004 PAGE_READWRITE
Type: 00020000 MEM_PRIVATE
Allocation Base: 03050000
Allocation Protect: 00000004 PAGE_READWRITE
More info: heap owning the address: !heap 0x3050000
More info: heap segment
More info: heap entry containing the address: !heap -x 0x3051234
The "Base Address" is what you called the "starting address".
To find out who allocated that heap, you have to enable a feature called "Create user mode stack trace database" and set a buffer size in GFlags.
After doing so, you can find out the allocation call stack like this:
0:005> !gflag
Current NtGlobalFlag contents: 0x00001000
ust - Create user mode stack trace database
0:005> !heap -p -a 00591234
address 00591234 found in
_HEAP # 590000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
00590f28 0103 0000 [00] 00590f40 00800 - (busy)
msvcrt!_iob
7782e159 ntdll!RtlAllocateHeap+0x00000274
7629ade8 msvcrt!_calloc_impl+0x00000136
7629ae43 msvcrt!_calloc_crt+0x00000016
762a1e48 msvcrt!__initstdio+0x0000000d
762a1fc8 msvcrt!_cinit+0x0000001e
762a1a94 msvcrt!_core_crt_dll_init+0x000001b2
7629a48c msvcrt!_CRTDLL_INIT+0x0000001b
777e92e0 ntdll!__RtlUserThreadStart+0x00000021
777f061b ntdll!RtlpAllocateHeap+0x0000083a
777f6d84 ntdll!LdrpInitializeProcess+0x0000137e
777f583e ntdll!RtlSetEnvironmentVariable+0x00000020
777e9809 ntdll!LdrpUpdateLoadCount2+0x00000047
I am currently load testing my clustered application which is running in JBoss 5.1, with JDK 1.6.0_45 and I am experiencing intermittent JVM crashes. From the error report (further details from report below) it appears that the eden space is full (100%) at the time of the crash, so I suspect that this is the most likely candidate.
I have therefore been running JVisualVM to look for memory leaks, specifically monitoring my own classes. I can see these classes growing in memory, but then they are periodically cleaned up by the garbage collector.
Even if there was a memory leak, I would have expected to see OutOfMemory errors before a complete JVM crash anyway. Can anyone help to point me in the right direction with where the problem may lie? Any guidance would be very much appreciated.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000000006dba43f7, pid=3980, tid=2556
#
# JRE version: 6.0_45-b06
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.45-b01 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# V [jvm.dll+0x2c43f7]
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
#
snip
Heap
PSYoungGen total 670272K, used 662831K [0x00000007d5560000, 0x0000000800000000, 0x0000000800000000)
eden space 641728K, 100% used [0x00000007d5560000,0x00000007fc810000,0x00000007fc810000)
from space 28544K, 73% used [0x00000007fc810000,0x00000007fdcabf68,0x00000007fe3f0000)
to space 28352K, 12% used [0x00000007fe450000,0x00000007fe7d0e60,0x0000000800000000)
PSOldGen total 1398144K, used 1096904K [0x0000000780000000, 0x00000007d5560000, 0x00000007d5560000)
object space 1398144K, 78% used [0x0000000780000000,0x00000007c2f32250,0x00000007d5560000)
PSPermGen total 422848K, used 378606K [0x0000000760000000, 0x0000000779cf0000, 0x0000000780000000)
object space 422848K, 89% used [0x0000000760000000,0x00000007771bb800,0x0000000779cf0000)
snip
VM Arguments:
jvm_args: -Dprogram.name=run.bat -XX:MaxPermSize=512m -Xms2G -Xmx2G -Dhttp.proxyHost=testproxy -Dhttp.proxyPort=8010 -Dhttps.proxyHost=testproxy -Dhttps.proxyPort=8010 -Djavax.net.ssl.trustStore=cacerts -Djavax.net.ssl.trustStorePassword=changeit -Djavax.net.ssl.keyStore=testkeystore.jks -Djavax.net.ssl.keyStorePassword=testkeystore -Djboss.messaging.ServerPeerID=2 -Dhttp.nonProxyHosts=*.mydomain.com -Dsun.rmi.dgc.client.gcInterval=900000 -Dsun.rmi.dgc.server.gcInterval=900000 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Djava.library.path=C:\jboss-5.1.0.GA\bin\native;C:\Program Files (x86)\Windows Resource Kits\Tools\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\jboss-5.1.0.GA\bin -Djava.endorsed.dirs=C:\jboss-5.1.0.GA\lib\endorsed
java_command: org.jboss.Main -c hops-cnf -b 0.0.0.0
Launcher Type: SUN_STANDARD
This is most likely a problem with the JVM version and the eden space. Your best bet is probably to reduce the GC threading? Try with
-XX:LargePageSizeInBytes=5m -XX:ParallelGCThreads=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC