How to count cache-misses in mmap-ed memory (using eBPF)? - mmap

I would like to get timeseries
t0, misses
...
tN, misses
where tN is a timestamp (second-resolution) and misses is a number of times the kernel made disk-IO for my PID to load missing page of the mmap()-ed memory region when process did access to that memory. Ok, maybe connection between disk-IO and memory-access is harder to track, lets assume my program can not do any disk-io with another (than assessing missing mmapped memory) reason. I THINK, I need to track something called node-load-misses in perf world.
Any ideas how eBPF can be used to collect such data? What probes should I use?
Tried to use perf record for similar purpose: I dislike how much data perf records. As I recall the try was like (also I dont remember how I parsed that output.data file):
perf record -p $PID -a -F 10 -e node-loads -e node-load-misses -o output.data
I thought eBPF could give some facility to implement such thing in less overhead way.

Loading of mmaped pages which are not present in memory is not hardware event like perf's cache-misses or node-loads or node-load-misses. When your program assess not present memory address, GPFault/pagefault exception is generated by hardware and it is handled in software by Linux kernel codes. For first access to anonymous memory physical page will be allocated and mapped for this virtual address; for access of mmaped file disk I/O will be initiated. There are two kinds of page faults in linux: minor and major, and disk I/O is major page fault.
You should try to use trace-cmd or ftrace or perf trace. Support of fault tracing was planned for perf tool in 2012, and patches were proposed in https://lwn.net/Articles/602658/
There is a tracepoint for page faults from userspace code, and this command prints some events with memory address of page fault:
echo 2^123456%2 | perf trace -e 'exceptions:page_fault_user' bc
With recent perf tool (https://mirrors.edge.kernel.org/pub/linux/kernel/tools/perf/) there is perf trace record which can record both mmap syscalls and page_fault_user into perf.data and perf script will print all events and they can be counted by some awk or python script.
Some useful links on perf and tracing: http://www.brendangregg.com/perf.html http://www.brendangregg.com/ebpf.html https://github.com/iovisor/bpftrace/blob/master/INSTALL.md
And some bcc tools may be used to trace disk I/O, like https://github.com/iovisor/bcc/blob/master/examples/tracing/disksnoop.py or https://github.com/brendangregg/perf-tools/blob/master/examples/iosnoop_example.txt
And for simple time-series stat you can use perf stat -I 1000 command with correct software events
perf stat -e cpu-clock,page-faults,minor-faults,major-faults -I 1000 ./program
...
# time counts unit events
1.000112251 413.59 msec cpu-clock # 0.414 CPUs utilized
1.000112251 5,361 page-faults # 0.013 M/sec
1.000112251 5,301 minor-faults # 0.013 M/sec
1.000112251 60 major-faults # 0.145 K/sec
2.000490561 16.32 msec cpu-clock # 0.016 CPUs utilized
2.000490561 1 page-faults # 0.005 K/sec
2.000490561 1 minor-faults # 0.005 K/sec
2.000490561 0 major-faults # 0.000 K/sec

Related

Memory leak (?) using IO::Socket::Async (on FreeBSD 13.1)

In processing a stream of logs (via UDP) in a raku (v2022.07) app, I'm
hitting what appears to be a memory leak using IO::Socket::Async.
I pulled the code out into a simpler program which I've included below
(~ identical to code at https://docs.raku.org/type/IO::Socket::Async):
#!/usr/bin/env raku
#
my $socket = IO::Socket::Async.bind-udp('localhost', 24225);
react {
whenever $socket.Supply -> $v {
print $v if $v.chars > 0;
};
};
It leaks substantial ram - I let it run about 12 hours and
when I checked -- still running (on a 1T ram machine) -- with
ps auwwx [pid]
it showed 314974456 and 20739784 for VSZ and RSS (so, roughly 300G v size and 20G resident).
[btw, the UDP traffic is fairly light - average of 350 (~100 byte) packets/sec (spikes to ~1000/sec)]
So .. I rewrote above in perl5 (after similar leaky results w/
a couple of raku variants) which stabilizes quickly at about 8M resident - that's fine/stable/etc. -
but I'd prefer this process to feed a raku channel (without separate perl process/file
tailing, etc.).
My environment: FreeBSD 13.1-RELEASE-p2 GENERIC amd64 and raku:
v2022.07 built on MoarVM 2022.07 (installed with rakubrew).
I'm guessing this is unique to raku on freebsd but not sure.
I did attempt to upgrade (rakubrew) to v2022.12 to see if problem resolved there -
but in rebuilding modules (zef), too many failed (some issue with
Digest/Digest::HMAC) - so I had to revert to 2022.07.
I'll sure be grateful for any suggestions for addressing the leak or alternative
methods to address reading from a UDP port.
Not exactly a solution to your problem, but you can monitor memory usage from within your Raku code using built-in feature:
use Telemetry;
say T{"max-rss"};
Also remember that Supply by default decodes unicode chars. If your protocol is binary you may add :bin to Socket params to avoid treating binary data as text.

How can I solve the error "Can't switch processors on a single processor kernel triage dump" in WinDbg?

I have a mini dump generated with the default parameters described at Collecting User-Mode Dumps.
The dump was generated when the system was hanging through right CTRL+SCROLL LOCK+SCROLL LOCK as set in the following register keys:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kbdhid\Parameters]
"CrashOnCtrlScroll"=dword:00000001
So the call stack that WinDbg shows me after the command 0: kd> !analyze -v is the one of the thread that was executing from kbdhid device driver.
When I tried to switch to a different processor I get the error:
0: kd> ~1
Can't switch processors on a single processor kernel triage dump
How can I solve this error?
What is a "single processor kernel triage dump"? If I search with Google I will get 3 or 4 results... no more, maybe someone from Microsoft could be of great help here :-).
Is there some particular value of CustomDumpFlags that I have to set? See MINIDUMP_TYPE enumeration.
I know that my system is multiprocessor and WinDbg confirms it:
0: kd> ~8
8 is not a valid processor number
0: kd> ~7
Can't switch processors on a single processor kernel triage dump
A Single Processor Kernel Dump or a kernel triage dump is a feature
where you can collect the kernel mode stack trace of an user mode process
on a machine that was not booted with /DEBUG on iirc available from vista+
you can also collect this dump using kdbgctrl
D:\>tasklist | grep -i edge
xxxxxxxxxxxxxxxxxxxxxx
MicrosoftEdgeCP.exe 12588 Console 5 41,892 K
MicrosoftEdgeCP.exe 9152 Console 5 1,49,064 K
xxxxxxxxxxxxxx
D:\>kdbgctrl -td 9152 edgy.dmp
Dump created in edgy.dmp, 1048564 bytes
D:\>file edgy.dmp
edgy.dmp: MS Windows 64bit crash dump, 1018708 pages
run !process -1 1f command to get the stack of all the threads for the current process
only one process kernel memory will be available in this dump
!process 0 0 wont work
it is not full kernel memory dump and may not be having information about any other processor stack aswell
run !cpuid only the info about 0 processor will be present in this dump
0: kd> !cpuid
CP F/M/S Manufacturer MHz
0 6,142,9 GenuineIntel 2304
Unable to get information for processor 1
Unable to get information for processor 2
Unable to get information for processor 3
0: kd>
or irql
0: kd> !irql 0
Debugger saved IRQL for processor 0x0 -- 0 (LOW_LEVEL)
0: kd> !irql 1
Cannot get PRCB address from processor 0x1
0: kd> !irql 2
Cannot get PRCB address from processor 0x2
0: kd> !irql 3
Cannot get PRCB address from processor 0x3
0: kd>

Kitura slow or low request per second?

I've download Kitura 0.20 and created a new project for a benchmark on a swift build -c release
import Kitura
let router = Router()
router.get("/") {
request, response, next in
response.send("Hello, World!")
next()
}
Kitura.addHTTPServer(onPort: 8090, with: router)
Kitura.run()
and the score appear to be low compare to Zewo and Vapor which could hit 400k+ request/s?
MacBook-Pro:hello2 yanli$ wrk -t1 -c100 -d30 --latency http://localhost:8090
Running 30s test # http://localhost:8090
1 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 415.36us 137.54us 3.09ms 91.11%
Req/Sec 5.80k 2.47k 7.19k 85.71%
Latency Distribution
50% 391.00us
75% 443.00us
90% 513.00us
99% 0.93ms
16229 requests in 30.01s, 1.67MB read
Socket errors: connect 0, read 342, write 55, timeout 0
Requests/sec: 540.84
Transfer/sec: 57.04KB
I suspect you are running out of ephemeral ports. Your issue is probably the same as this one: 'ab' program freezes after lots of requests, why?
Kitura currently does not support HTTP keepalive, and so every request requires a new connection. One symptom of this is that regardless of how many seconds you attempt to drive load, you'll see a similar number of completed requests (16229 in your example).
On OS X, there are 16,384 ephemeral ports available by default, and these will be rapidly exhausted unless you tune the network settings.
[1] http://danielmendel.github.io/blog/2013/04/07/benchmarkers-beware-the-ephemeral-port-limit/
[2] https://rolande.wordpress.com/2010/12/30/performance-tuning-the-network-stack-on-mac-osx-10-6/
My approach has been to reduce the Maximum Segment Lifetime tunable (which defaults to 15000, or 15 seconds) and increase the range of available ports temporarily while benchmarking, for example:
sudo sysctl -w net.inet.tcp.msl=1000
sudo sysctl -w net.inet.ip.portrange.first=32768
<run benchmark>
sudo sysctl -w net.inet.tcp.msl=15000
sudo sysctl -w net.inet.ip.portrange.first=49152

mongodb higher faults on Windows than on Linux

I am executing below C# code -
for (; ; )
{
Console.WriteLine("Doc# {0}", ctr++);
BsonDocument log = new BsonDocument();
log["type"] = "auth";
BsonDateTime time = new BsonDateTime(DateTime.Now);
log["when"] = time;
log["user"] = "staticString";
BsonBoolean bol = BsonBoolean.False;
log["res"] = bol;
coll.Insert(log);
}
When I run it on a MongoDB instance (version 2.0.2) running on virtual 64 bit Linux machine with just 512 MB ram, I get about 5k inserts with 1-2 faults as reported by mongostat after few mins.
When same code is run against a MongoDB instance (version 2.0.2) running on a physical Windows machine with 8 GB of ram, I get 2.5k inserts with about 80 faults as reported by mongostat after few mins.
Why more faults are occurring on Windows? I can see following message in logs-
[DataFileSync] FlushViewOfFile failed 33 file
Journaling is disable on both instances
Also, is 5k insert on a virtual machine with 1-2 faults a good enough speed? or should I be expecting better inserts?
Looks like this is a known issue - https://jira.mongodb.org/browse/SERVER-1163
page fault counter on Windows is in fact the total page faults which include both hard and soft page fault.
Process : Page Faults/sec. This is an indication of the number of page faults that
occurred due to requests from this particular process. Excessive page faults from a
particular process are an indication usually of bad coding practices. Either the
functions and DLLs are not organized correctly, or the data set that the application
is using is being called in a less than efficient manner.

Hadoop streaming job failure: Task process exit with nonzero status of 137

I've been banging my head on this one for a few days, and hope that somebody out there will have some insight.
I've written a streaming map reduce job in perl that is prone to having one or two reduce tasks take an extremely long time to execute. This is due to a natural asymmetry in the data: some of the reduce keys have over a million rows, where most have only a few dozen.
I've had problems with long tasks before, and I've been incrementing counters throughout to ensure that map reduce doesn't time them out. But now they are failing with an error message I hadn't seen before:
java.io.IOException: Task process exit with nonzero status of 137.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
This is not the standard timeout error message, but the error code 137 = 128+9 suggests that my reducer script received a kill -9 from Hadoop. The tasktracker log contains the following:
2011-09-05 19:18:31,269 WARN org.mortbay.log: Committed before 410 getMapOutput(attempt_201109051336_0003_m_000029_1,7) failed :
org.mortbay.jetty.EofException
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787)
at org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:548)
at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)
at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946)
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:646)
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2940)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72)
at sun.nio.ch.IOUtil.write(IOUtil.java:43)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169)
at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221)
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721)
... 24 more
2011-09-05 19:18:31,289 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.92.8.202:50060, dest: 10.92.8.201:46436, bytes: 7340032, op: MAPRED_SHUFFLE, cliID: attempt_201109051336_0003_m_000029_1
2011-09-05 19:18:31,292 ERROR org.mortbay.log: /mapOutput
java.lang.IllegalStateException: Committed
at org.mortbay.jetty.Response.resetBuffer(Response.java:994)
at org.mortbay.jetty.Response.sendError(Response.java:240)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2963)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
I've been looking around the forums, and it sounds like the Warnings are commonly found in jobs that run without error, and can usually be ignored. The error makes it look like the reducer lost contact with the map output, but I can't figure out why. Does anyone have any ideas?
Here's a potentially relevant fact: These long tasks were making my job take days where it should take minutes. Since I can live without the output from one or two keys, I tried to implement a ten minute timeout in my reducer as follows:
eval {
local $SIG{ALRM} = sub {
print STDERR "Processing of new merchant names in $prev_merchant_zip timed out...\n";
print STDERR "reporter:counter:tags,failed_zips,1\n";
die "timeout";
};
alarm 600;
#Code that could take a long time to execute
alarm 0;
};
This timeout code works like a charm when I test the script locally, but the strange mapreduce errors started after I introduced it. However, the failures seem to occur well after the first timeout, so I'm not sure if it's related.
Thanks in advance for any help!
Two possibilities come to mind:
RAM usage, if a tasks uses too much RAM the OS can kill it (after horrible swapping, etc).
Are you using any non-reentrant libraries? Maybe the timer is being triggered at an inopportune point in a library call.
Exit code 137 is a typical sign of the infamous OOM killer. You can easily check it using dmesg command for messages like this:
[2094250.428153] CPU: 23 PID: 28108 Comm: node Tainted: G C O 3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u2
[2094250.428155] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
[2094250.428156] ffff880773439400 ffffffff8150dacf ffff881328ea32f0 ffffffff8150b6e7
[2094250.428159] ffff881328ea3808 0000000100000000 ffff88202cb30080 ffff881328ea32f0
[2094250.428162] ffff88107fdf2f00 ffff88202cb30080 ffff88202cb30080 ffff881328ea32f0
[2094250.428164] Call Trace:
[2094250.428174] [<ffffffff8150dacf>] ? dump_stack+0x41/0x51
[2094250.428177] [<ffffffff8150b6e7>] ? dump_header+0x76/0x1e8
[2094250.428183] [<ffffffff8114044d>] ? find_lock_task_mm+0x3d/0x90
[2094250.428186] [<ffffffff8114088d>] ? oom_kill_process+0x21d/0x370
[2094250.428188] [<ffffffff8114044d>] ? find_lock_task_mm+0x3d/0x90
[2094250.428193] [<ffffffff811a053a>] ? mem_cgroup_oom_synchronize+0x52a/0x590
[2094250.428195] [<ffffffff8119fac0>] ? mem_cgroup_try_charge_mm+0xa0/0xa0
[2094250.428199] [<ffffffff81141040>] ? pagefault_out_of_memory+0x10/0x80
[2094250.428203] [<ffffffff81057505>] ? __do_page_fault+0x3c5/0x4f0
[2094250.428208] [<ffffffff8109d017>] ? put_prev_entity+0x57/0x350
[2094250.428211] [<ffffffff8109be86>] ? set_next_entity+0x56/0x70
[2094250.428214] [<ffffffff810a2c61>] ? pick_next_task_fair+0x6e1/0x820
[2094250.428219] [<ffffffff810115dc>] ? __switch_to+0x15c/0x570
[2094250.428222] [<ffffffff81515ce8>] ? page_fault+0x28/0x30
You can easily if OOM is enabled:
$ cat /proc/sys/vm/overcommit_memory
0
Basically OOM killer tries to kill process that eats largest part of memory. And you probably don't want to disable it:
The OOM killer can be completely disabled with the following command.
This is not recommended for production environments, because if an
out-of-memory condition does present itself, there could be unexpected
behavior depending on the available system resources and
configuration. This unexpected behavior could be anything from a
kernel panic to a hang depending on the resources available to the
kernel at the time of the OOM condition.
sysctl vm.overcommit_memory=2
echo "vm.overcommit_memory=2" >> /etc/sysctl.conf
Same situation can happen if you use e.g. cgroups for limiting memory. When process exceeds given limit it gets killed without warning.
I got this error. Kill a day and found it was an infinite loop somewhere in the code.