I'm trying to determine the cause of a high cpu condition in windbg, and encounter lines like:
196fe804 69088346 00000000 00000000 00000000 System_ni+0x1bbf4b
When running '~* kb'
I've set it up so that it loads the symbols from MS' server, but for certain dll's it doesnt seem to work. Specifically:
System.ni
mscorlib.ni
System.Web.ni
System.Web.Mvc.ni
System.Web.Extensions.ni
(I'm not sure where the "ni" comes from)
Anyone with some good ideas?
These are NGen'ed images.
In your Windbg type this: .loadby sos clr, followed by !ClrStack. That will show function names out of managed metadata.
Related
I have a huge application (made in PowerBuilder) that crashes every once in a while so it is hard to reproduce this error. We have it set up so that when a crash like this occurs, we recieve a .dmp file.
I used WinDbg to analyze my .dmp file with the command !analyze -v. From this I can deduct that the error that occured was an Access Violation C0000005. Based on the [0] and [1] parameters, it attempted to dereference a null pointer.
WinDbg also showed me STACK_TEXT consisting of around 30 lines, but I am not sure how to read it. From what I have seen I need to use some sort of symbols.
First line of my STACK_TEXT is this:
00000000`00efca10 00000000`75d7fa46 : 00000000`10df1ae0 00000000`0dd62828 00000000`04970000 00000000`10e00388 : pbvm!ob_get_runtime_class+0xad
From this, my goal is to analyze this file to figure out where exactly in the program this error happened or which function it was in. Is this something I will be able to find after further analyzing the stack trace?
How can I pinpoint where in the program a crash happened using .dmp and WinDbg so I can fix my code?
If you analyze a crash dump with !analyze -v, the lines after STACK TEXT is the stack trace. The output is equivalent to kb, given you set the correct thread and context.
The output of kb is
Child EBP
Return address
First 4 values on the stack
Symbol
The backticks ` tell you that you are running in 64 bit and they split the 64 bit values in the middle.
On 32 bit, the first 4 parameters on the stack were often equivalent to the first 4 parameters to the function, depending on the calling convention.
On 64 bit, the stack is not so relevant any more, because with the 64 bit calling convention, parameters are passed via registers. Therefore you can probably ignore those values.
The interesting part is the symbol like pbvm!ob_get_runtime_class+0xad.
In front of ! is the module name, typically a DLL or EXE name. Look for something that you built. After the ! and before the + is a method name. After the + is the offset in bytes from the beginning of the function.
As long as you don't have functions with thousands of lines of code, that number should be small, like < 0x200. If the number is larger than that, it typically means that you don't have correct symbols. In that case, the method name is no longer reliable, since it's probably just the last known (the last exported) method name and a faaaar way from there, so don't trust it.
In case of pbvm!ob_get_runtime_class+0xad, pbvm is the DLL name, ob_get_runtime_class is the method name and +0xad is the offset within the method where the instruction pointer is.
To me (not knowing anything about PowerBuilder) PBVM sounds like the PowerBuilder DLL implementation for Virtual Memory. So that's not your code, it's the code compiled by Sybase. You'd need to look further down the call stack to find the culprit code in your DLL.
After reading Wikipedia, it seems that PowerBuilder does not necessarily compile to native code, but to intermediate P-Code instead. In this case you're probably out of luck, since your code is never really on the call stack and you need a special debugger or a WinDbg extension (which might not exist, like for Java). Run it with the -pbdebug command line switch or compile it to native code and let it crash again.
With bcc tools' profile, I am getting mostly "[unknown]" in the profile output on my C program. This is, of course, expected because the program symbol are not loaded. However, I am not sure how to properly load the symbols so that "profile" program can pick it up. I have built my program with debug enabled "-g", but how do I load the debug symbols to "profile"?
Please see the DEBUGGING section in bcc profile's manpage:
See "[unknown]" frames with bogus addresses? This can happen for different
reasons. Your best approach is to get Linux perf to work first, and then to
try this tool. Eg, "perf record -F 49 -a -g -- sleep 1; perf script", and
to check for unknown frames there.
The most common reason for "[unknown]" frames is that the target software has
not been compiled
with frame pointers, and so we can't use that simple method for walking the
stack. The fix in that case is to use software that does have frame pointers,
eg, gcc -fno-omit-frame-pointer, or Java's -XX:+PreserveFramePointer.
Another reason for "[unknown]" frames is JIT compilers, which don't use a
traditional symbol table. The fix in that case is to populate a
/tmp/perf-PID.map file with the symbols, which this tool should read. How you
do this depends on the runtime (Java, Node.js).
If you seem to have unrelated samples in the output, check for other
sampling or tracing tools that may be running. The current version of this
tool can include their events if profiling happened concurrently. Those
samples may be filtered in a future version.
In your case, since it's a C program, I would recommend compiling with -fno-omit-frame-pointer.
For investigating managed heap corruption I would like to use ba (break on access) breakpoints. Can I use them in managed code? If yes, how can I set them programmatically?
UPDATE: It would also be okay so set them in WinDbg (-> set ba for every object of type XY)
Breakpoints set by 'ba' command are called processor or hardware breakpoints.
First the good news
It is easy to set hardware breakpoint. You will need to set one of the processor's debug registers (DR0, DR1, DR2 or DR3) with the address of the data and set debug control register DR7 with fields to set size of memory and type of access. The instruction (in x64 assembler) looks like:
MOV rax, DR0
Obviously you will have to somehow execute this assembler instruction from your language of choice, or use interop to C++ and inline assembly, but this is easier than for example setting software breakpoint.
Now the bad news
First of all, on SMP machines you will have to do this for all processors that can touch your code. This is probably solvable if you configure processor affinity for you process, or do debugging on single-proc machine. Second, there are only 4 debug processors on Intel architecture. If you try setting processor breakpoints with WinDbg, after 4th it will complain Too many data breakpoints for thread N after you hit g.
I assume the whole purpose you are asking about automation is because there are too many objects to set breakpoints by hand. Since you are limited to 4 ba breakpoints anyways, there is not much point in automating this.
Is there a way from WinDbg, without using the DbgEng API, to display the symbol server paths (i.e. PdbSig70 and PdbAge) for all loaded modules?
I know that
lml
does this for the modules whose symbols have loaded. I would like to know these paths for the symbols that did not load so as to diagnose the problem. Anyone know if this is possible without having to utilize the DbgEng API?
edited:
I also realize that you can use
!sym noisy
to get error messages about symbols loading. While this does have helpful output it is interleaved with other output that I want and is not simple and clear like 'lml'
!sym noisy and !sym quiet can turn on additional output for symbol loading, i.e.:
!sym noisy
.reload <dll>
X <some symbol in that DLL to cause a load>
!sym quiet
When the debugger attempts to load the PDB you will see every path that it tries to load and if PDB's weren't found or were rejected.
To my knowledge there's no ready solution in windbg.
Your options would be to either write a nifty script or an extension dependent on where you're the fittest.
It is pretty doable within windbg as a script. The information you're after is described in the PE debug directory.
Here's a link to the c++ sample code that goes into detail on extracting useful information (like the name of the symbol file in your case). Adapting it to windbg script should be no sweat.
Here's another useful pointer with tons of information on automating windbg. In particular, it talks about ways of passing arguments to windbg scripts (which is useful in your case as well, to have a common debug info extraction code which you can invoke from within the loaded modules iteration loop).
You can use the command
lme
to show the modules that did not have any symbols loaded.
http://ntcoder.com/bab/tag/lme/
I want to work with PE files in Perl and didn't find a module, so I think I will write my own (already did that in delphi once).
I only got one problem, when mapping the executable to a buffer, how can i search for octals like 0x00004550 (IMAGE_NT_SIGNATURE), convert them back to writeable strings etc?
There is a Perl module to manipulate portable executables: Win32::Exe.
I don't have a clue on your exact question, but if you still want to write your own library, Win32::Exe might be a good reference.
For converting that value to a bytestring representation, use pack. The constant you are dealing is a little-endian 32 bit value, so 'V' in the template.
$ perl -e 'print pack q[V], 0x00004550' | hd
00000000 50 45 00 00 |PE..|
00000004
See perldoc -f pack for details.
You probably won't need to search for strings like "PE\0\0", just use them to verify whether the file you are reading actually is a PE file. The 'PE' section usually comes right after the DOS ('MZ') section which has its own length field.
(I agree that Win32::Exe may be worth a look, depending on what you want to do.)