How to load symbol files to BCC profiler - bpf

With bcc tools' profile, I am getting mostly "[unknown]" in the profile output on my C program. This is, of course, expected because the program symbol are not loaded. However, I am not sure how to properly load the symbols so that "profile" program can pick it up. I have built my program with debug enabled "-g", but how do I load the debug symbols to "profile"?

Please see the DEBUGGING section in bcc profile's manpage:
See "[unknown]" frames with bogus addresses? This can happen for different
reasons. Your best approach is to get Linux perf to work first, and then to
try this tool. Eg, "perf record -F 49 -a -g -- sleep 1; perf script", and
to check for unknown frames there.
The most common reason for "[unknown]" frames is that the target software has
not been compiled
with frame pointers, and so we can't use that simple method for walking the
stack. The fix in that case is to use software that does have frame pointers,
eg, gcc -fno-omit-frame-pointer, or Java's -XX:+PreserveFramePointer.
Another reason for "[unknown]" frames is JIT compilers, which don't use a
traditional symbol table. The fix in that case is to populate a
/tmp/perf-PID.map file with the symbols, which this tool should read. How you
do this depends on the runtime (Java, Node.js).
If you seem to have unrelated samples in the output, check for other
sampling or tracing tools that may be running. The current version of this
tool can include their events if profiling happened concurrently. Those
samples may be filtered in a future version.
In your case, since it's a C program, I would recommend compiling with -fno-omit-frame-pointer.

Related

Why can Eclipse not "see" the structure of a class?

I am debugging my program with apache mina 2.0.2. The specific library is irrelevant.
The problem is that Eclipse can see internal structure of some classes and cannot see one of the others. No apparent differences visible for me: both classes have both code and source.
You can see that Eclipse draws arrow near AbstractPollingConnector class and does not so near AbstractPollingProcessor.
Of course Eclipse can't set line breakpoints inside "bad" classes.
What is the reason of it and what to do?
I guess, it depends whether the referenced classes were compiled with the debugger options on or off. According an older Javac manual, the -g switch seems related:
-g
Generate all debugging information, including local variables. By default, only line number and >source file information is generated.
-g:none
Do not generate any debugging information.
-g:{keyword list}
Generate only some kinds of debugging information, specified by a comma separated list of >keywords. Valid keywords are:
source
Source file debugging information
lines
Line number debugging information
vars
Local variable debugging information

"All programs are interpreted". How?

A computer scientist will correctly explain that all programs are
interpreted and that the only question is at what level. --perlfaq
How are all programs interpreted?
A Perl program is a text file read by the perl program which causes the perl program to follow a sequence of actions.
A Java program is a text file which has been converted into a series of byte codes which are then interpreted by the java program to follow a sequence of actions.
A C program is a text file which is converted via the C compiler into an assembly program which is converted into machine code by the assembler. The machine code is loaded into memory which causes the CPU to follow a sequence of actions.
The CPU is a jumble of transistors, resistors, and other electrical bits which is laid out by hardware engineers so that when electrical impulses are applied, it will follow a sequence of actions as governed by the laws of physics.
Physicists are currently working out what makes those rules and how they are interpreted.
Essentially, every computer program is interpreted by something else which converts it into something else which eventually gets translated into how the electrons in your local neighborhood fly around.
EDIT/ADDED: I know the above is a bit tongue-in-cheek, so let me add a slightly less goofy addition:
Interpreted languages are where you can go from a text file to something running on your computer in one simple step.
Compiled languages are where you have to take an extra step in the middle to convert the language text into machine- or byte-code.
The latter can easily be easily be converted into the former by a simple transformation:
Make a program called interpreted-c, which can take one or more C files and can run a program which doesn't take any arguments:
#!/bin/sh
MYEXEC=/tmp/myexec.$$
gcc -o $MYEXEC ${1+"$#"} && $MYEXEC
rm -f $MYEXEC
Now which definition does your C program fall into? Compare & contrast:
$ perl foo.pl
$ interpreted-c foo.c
Machine code is interpreted by the processor at runtime, given that the same machine code supplied to a processor of a certain arch (x86, PowerPC etc), should theoretically work the same regardless of the specific model's 'internal wiring'.
EDIT:
I forgot to mention that an arch may add new instructions for things like accessing new registers, in which case code written to use it won't work on older processors in the range. Much like when you try to use an old version of a library and then try to use capabilities only found in newer libraries.
Example: many Linux distros are released as 686 only, despite the fact it's in the 'x86 family'. This is due to the use of new instructions.
My first thought was too look inside the CPU — see below — but that's not right. The answer is much much simpler than that.
A high-level description of a CPU is:
1. execute the current op
2. grab the next op
3. goto 1
Compare it to Perl's interpreter:
while ((PL_op = op = op->op_ppaddr(aTHX))) {
}
(Yeah, that's the whole thing.)
There can be no doubt that the CPU is an interpreter.
It just goes to show how useless it is to classify something is interpreted or not.
Original answer:
Even at the CPU level, programs get rewritten into simpler instructions to allow the CPU to execute more them more quickly. This is done by changing the order in which they are executed and executing them in parallel. For example, Intel's Hyperthreading.
Even deeper, each instruction is considered a program of its own, one that routes electronic signals. See microcode.
The Levels of interpretions are really easy to explain:
2: Runtimelanguage (CLR, Java Runtime...) & Scriptlanguage (Python, Ruby...)
1: Assemblies
0: Binary Code
Edit: I changed the level of Scriptinglanguages to the same level of Runtimelanguages. Thank's for the hint. :-)
I can write a Game Boy interpreter that works similarly to how the Java Virtual Machine works, treating the z80 machine instructions as byte code. Assuming the original was written in C1, does that mean C suddenly became an interpreted language just because I used it like one?
From another angle, gcc can compile C into machine code for a number of different processors. There's no reason the target machine has to be the same as the machine you're compiling on. In fact, this is a common way to compile C code for AVRs and other microcontrollers.
As a matter of abstraction, the compiler's job is to translate flat text into a structure, then translate that structure into something that can be executed somewhere. Whatever is doing the execution may have its own levels of breaking out the structure before really executing it.
A lot of power becomes available once you start thinking along these lines.
A good book on this is Structure and Interpretation of Computer Programs. Even if you only get through the first chapter (or half of the first chapter), I think you'll learn a lot.
1 I think most Game Boy stuff was hand coded ASM, but the principle remains.

Is there some kind of tool to look at the encoding of Intel x86 instructions?

Forgive me if this might be a dumb question but, I'm in an assembly class that was mostly taught using an emulated CPU that was supposed to teach the concepts of assembly code. We haven't even written an Intel program, so I'm trying to adjust. In our emulated CPU, we were able to generate a symbol table file that gave the bytes equivalent for instructions:
http://imgur.com/tw5S8.png
Would I be able to do such a thing with Intel x86 instructions?
Try IDA. It has an option to show binary values of opcodes.
EDIT: Well.. it's a disassembler. Try opening a binary file, and set the number of opcode bytes to show (in Options/General/) to something that is not zero.
If you are looking for an IDE that shows you in real time the opcodes for the instruction you've used, then I don't think you'll find one, because of lack of "market". Can you explain why you need it? Do you want to know just their length, or want to learn them? There is simple pattern for lengths, so by dissasembling many binaries you'll catch it. If it's the opcodes you want.. well, there are lots of them, almost no rules, and practically no use to do it.
I see.. then you have to generate the list file . Your assembler should have an option for that. (for NASM it's -l listfile). Just put any instruction(s) in your .asm file, and generate listing for it. It should contain the binary encoding for each instruction.
First, get Intel Instruction Set Refference, or, better, this link: http://siyobik.info/index.php?module=x86 . There you'll find that most opcodes have several encodings. In your particular case, the bit 1 of the opcode specifies direction, and since both operands are registers, you can toggle the direction and swap the register codes, and the result will be the same. Usually you have this freedom on most register to register arithmetic operations. To check this, try decompiling with IDA this source file:
db 02h, E0h
db 00h, C4h
There is a demo program shipped with fasm.dll which has an editor and hex-viewer:

Need help debugging a minidump with WinDbg

I've read a lot of similar questions, but I can't seem to find an answer to exactly what my problem is.
I've got a set of minidumps from a 32-bit application that was running on 64-bit Windows 2008. The 32-bit Visual Studio on my 32-Bit Vista Business wouldn't touch them at all, so I've been trying to open them in WinDbg.
I don't have the EXACT corresponding .pdb files (we only started saving them AFTER this particular release), but I have .pdbs built by the same machine with the same code. I also have access to the exact executable that created the minidumps.
I found a nifty little application called ChkMatch that can make .pdbs match an executable... the only difference (according to ChkMatch) was age, so I matched my newer .pdbs to the original executable.
However, when I load it in WinDbg, it still says that it is a "mismatched pdb" then, since I had set .symopts+0x40 it tries to load them anyway. I then get the warning:
*** WARNING: Unable to verify checksum for myexe.exe
I ran !lmi myexe and saw that, indeed, the checksum of the executable was in fact zero. From poking around a bit, I've found that the executable should have been built with the /release flag to have a checksum. That's all well and good, but I can't exactly go back in time and rebuild (if I did though, I'd definitely save the original .pdbs :-P ).
Is there anything I can do here? Seems a little ridiculous I can't make things match here at least enough to get a call stack.
you don't need the checksum to get a call stack - this warning can be safely ignored.
to get the stack you need to issue the stack command (any variant of k).
if the minidumps are any good (i.e. describe an actual fault), you should first try the auto analysis !analyze -v which will get you started.
come back when you have exhausted your expertise :o)
If you're working with minidumps then you have to set your image path (Ctrl+I) to point to a location with the images in the dump. The trouble with minidumps is that they don't contain any code or data from the executables on the target, so you have to supply them yourself.
-scott

WinDbg, display Symbol Server paths of loaded modules (even if the symbols did not load)?

Is there a way from WinDbg, without using the DbgEng API, to display the symbol server paths (i.e. PdbSig70 and PdbAge) for all loaded modules?
I know that
lml
does this for the modules whose symbols have loaded. I would like to know these paths for the symbols that did not load so as to diagnose the problem. Anyone know if this is possible without having to utilize the DbgEng API?
edited:
I also realize that you can use
!sym noisy
to get error messages about symbols loading. While this does have helpful output it is interleaved with other output that I want and is not simple and clear like 'lml'
!sym noisy and !sym quiet can turn on additional output for symbol loading, i.e.:
!sym noisy
.reload <dll>
X <some symbol in that DLL to cause a load>
!sym quiet
When the debugger attempts to load the PDB you will see every path that it tries to load and if PDB's weren't found or were rejected.
To my knowledge there's no ready solution in windbg.
Your options would be to either write a nifty script or an extension dependent on where you're the fittest.
It is pretty doable within windbg as a script. The information you're after is described in the PE debug directory.
Here's a link to the c++ sample code that goes into detail on extracting useful information (like the name of the symbol file in your case). Adapting it to windbg script should be no sweat.
Here's another useful pointer with tons of information on automating windbg. In particular, it talks about ways of passing arguments to windbg scripts (which is useful in your case as well, to have a common debug info extraction code which you can invoke from within the loaded modules iteration loop).
You can use the command
lme
to show the modules that did not have any symbols loaded.
http://ntcoder.com/bab/tag/lme/