Error with !runaway command - windbg

I am looking a dump file collected from production environment for high cpu usage. I ran !threadpool and !runaway command as follows
0:000> !ThreadPool
CPU utilization: 100%
Worker Thread: Total: 6 Running: 2 Idle: 4 MaxLimit: 32767 MinLimit: 4
Work Request in Queue: 0
--------------------------------------
Number of Timers: 8
--------------------------------------
Completion Port Thread:Total: 8 Free: 3 MaxFree: 8 CurrentLimit: 8 MaxLimit: 1000 MinLimit: 4
0:000> !runaway
ERROR: !runaway: extension exception 0x80004002.
"Unable to get thread times - dumps may not have time information"
I want to know what threads are consuming most cpu time but I cannot run !runaway command. Are there any other commands in sos, sosex or any other extension that could be helpful in this case?

You need a tool that adds the necessary information to the dump.
In WinDbg, the .dump command has the /mt MiniOption, which
Adds additional thread information to the minidump. This includes thread times, which can be displayed by using the !runaway extension or the .ttime (Display Thread Times) command when debugging the minidump.
(Emphasis: links in WinDbg)
The t option is included in the a option as well, so .dump /ma is fine as well.
To find out whether or not your dump has that information, use the undocumented command .dumpdebug like this:
.shell -ci ".dumpdebug" findstr "MiniDump"
If there is a line
1000 MiniDumpWithThreadInfo
the information is contained and you have a different issue. If it's not there, the time info is not available.
Most other tools I know do not provide such detailed settings, so it's more or less luck, whether this info is included or not.

Related

Gem5 in full system running spec2006 runs out of memory

What I am trying to do is run a spec2006 benchmark (namely the 410.bwaves one) in full system mode.
I have made a .rcS script to pass to the fs.py script and the command I type to start the simulation is as follows:
build/X86/gem5.opt configs/example/fs.py --script="../run_bwaves.rcS" --disk-image=ubuntu-14.04.img --kernel=x86_64-vmlinux-2.6.22.9
The result after some time is:
Free swap: 0kB
131072 pages of RAM
3650 reserved pages
18 pages shared
0 pages swap cached
Out of memory: kill process 807 (bwaves) score 13154 or a child
Killed process 807 (bwaves)
/tmp/script: line 39: 807 Killed ./bwaves
Full linux output here
Gem5 output here
I am guessing it has something to do with this line: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
but I am not sure.
I have tried adding a --mem-size=... flag but it brakes the simulation with a Memory size not divisible by page size error.
If anyone could help me I would be glad.
Edit: As suggested by comment, I used a large enough --mem-size flag divisible by the page size. The error now has turned into
bwaves[807]: segfault at 00007ffee647624c rip 0000000000410eb5 rsp 00007fff664761c0 error 4
/tmp/script: line 39: 807 Segmentation fault ./bwaves

Getting CPU cycles from user mode dump

Process Explorer has columns for CPU time (down to milliseconds) and CPU Cycles. For WinDbg I am aware of the !runaway command, also !runaway 7 for more details, but it shows CPU time only.
Are the CPU cycles also available somehow in a user mode crash dump?
What I have tried:
I looked at dt nt!_KTHREAD and I see it has a CycleTime property
ntdll!_KTHREAD
+0x000 Header : _DISPATCHER_HEADER
+0x018 CycleTime : Uint8B
I tried to query that property in a !for_each_thread, but WinDbg responds that it's available in kernel mode only.
Why do I want those CPU cycles?
I am working on a training for JetBrains dotTrace. It has an option to count CPU cycles and I'd like to explain where this cycles come from. Above kernel structure and Process Explorer is probably enough, but it would be awesome to see it live or post mortem in a user mode dump. I explain a lot of basics with WinDbg.
Following the implementation of GetProcessTimes() in ReactOS, you can see that the information is copied from the process' KPROCESS. So, indeed, it's only physically present in a dump that includes kernel memory.
C:\tw>ls -l
total 0
C:\tw>cdb -c ".dump /ma .\tw.dmp;q" calc.exe | grep writ
Dump successfully written
C:\tw>cdb -c "lm;!peb;.dump /ma .\tw1.dmp;q" calc.exe | grep writ
Dump successfully written
C:\tw>cdb -c ".ttime;q" -z tw.dmp | grep -B 3 quit
Created: Wed Apr 5 20:03:55.919 2017 ()
Kernel: 0 days 0:00:00.046
User: 0 days 0:00:00.000
quit:
C:\tw>cdb -c ".ttime;q" -z tw1.dmp | grep -B 3 quit
Created: Wed Apr 5 20:04:28.682 2017 ()
Kernel: 0 days 0:00:00.031
User: 0 days 0:00:00.000
quit:
C:\tw>

Windbg - !clrstack

I'm attempting to debug a manual dump file of a 64bit w3wp process with 64bit Windbg (Version 6.10). The dump was taken with taskmgr. I can't get anything from the !clrstack command. Here is what I'm getting:
!loadby sos clr
!runaway
User Mode Time
Thread Time
17:cf4 0 days 5:37:42.455
~17s
ntdll!ZwDelayExecution+0xa:
00000000`776208fa c3 ret
!clrstack
GetFrameContext failed: 1
What is GetFrameContext failed: 1?
Use !dumpstack command instead of !clrstack. It usually works.
Try getting the "native" call stack by doing "k" and see what that gets you. Sometimes, the stack isn't quite right and the !ClrStack extension is pretty sensitive.

Process information in dump

I learnt that .tlist command in windbg dumps all the processes running in the system at the time of creating crash dump.
I would like to see the Memory Information of each process. So that it will help me to see if the system is over loaded by any specific process.
!process 0 1 will list all the processes and show memory related info for each. I issued this command using livekd and got all the processes. And here's my chrome process (which I picked out from the output):
PROCESS fffffa8007cb4200
SessionId: 1 Cid: 1158 Peb: 7efdf000 ParentCid: 0ff8
DirBase: 1b7962000 ObjectTable: fffff8a00addb010 HandleCount: 135.
Image: chrome.exe
VadRoot fffffa80090a6f80 Vads 169 Clone 0 Private 4037. Modified 3702. Locked 0.
DeviceMap 0000000000000000
Token fffff8a0091f9120
ElapsedTime 00:05:49.161
UserTime 00:00:00.000
KernelTime 00:00:00.000
QuotaPoolUsage[PagedPool] 0
QuotaPoolUsage[NonPagedPool] 0
Working Set Sizes (now,min,max) (8020, 50, 345) (32080KB, 200KB, 1380KB)
PeakWorkingSetSize 10137
VirtualSize 144 Mb
PeakVirtualSize 151 Mb
PageFaultCount 66631
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 5784
Job fffffa8009822e30
Note memory related properties such as "Working Set Sizes", "Virtual Size", etc.
ps. Works with livekd and with system memory dumps (which I believe is what livekd does).
Marc
This information is not contained in process dump. .tlist queries your current system, not the state when the dump was taken. If you can take a system dump, than you can check out processes and their memory usage, as Marc Sherman already answered.

Solaris CPU run queue

Is there a command which can tell me whats in the Solaris run queue?
I can get a count using vmstat, but I need to know what processes/threads are in there.
The run-queue is always changing, so it's almost impossible to get the set of processes in the current run-queue.
That said, you can get an approximation by looking at the STAT (state) field of the process list from ps. When running the command below:
$ ps aux
...the if the STAT field begins with R, then the process is marked RUNNABLE by the kernel, which on most operating systems means that it is in the run-queue. Here's what a runnable process looks like on my machine:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 78179 0.0 0.0 599828 480 s003 R+ 7:51AM 0:00.00 ps aux
On solaris, you can also use the prstat command and look at the STATE column. The value run indicates that the process is on the run-queue. (Also note that the value cpuN indicates that the process is currently running on processor N.
For example:
$ prstat -s cpu -n 5
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
13974 kincaid 888K 432K run 40 0 36:14.51 67% cpuhog/1
27354 kincaid 2216K 1928K run 31 0 314:48.51 27% server/5
14690 root 136M 46M sleep 59 0 0:00.59 2.3% Xsun/1
14797 kincaid 9192K 7496K sleep 59 0 0:00.10 0.9% dtwm/8
14851 kincaid 24M 14M sleep 48 0 0:00.03 0.3% netscape/1
Total: 97 processes, 190 lwps, load averages: 2.18, 2.15, 2.11
I was about to correct 0xfe answer when I saw you already did it. The run queue is containing theads not processes so the -L option is mandatory with the prstat command if you want to have the number of "state run" lines more or less matching the run queue. Beware that sampling artifacts will probably prevent to get accurate matches.
In any case, if you want to precisely know what processes/threads are sitting in the run queue you'd rather go the dtrace way assuming you are running Solaris 10 or newer.
The whoqueue.d script which might already been in /usr/demo/dtrace directory on your machine will be a good start:
# dtrace -s /usr/demo/dtrace/whoqueue.d
Run queue of length 1:
24349/1 (dtrace)
Run queue of length 3:
0/0 (sched)
0/0 (sched)
0/0 (sched)
Run queue of length 4:
22468/30 (java)
22468/17 (java)
22468/23 (java)
22468/10 (java)
Have a look at this page for details.