How to interprete double entries in Windbg "x /2" result? - windbg

I'm debugging a dumpfile (memory dump, not a crashdump), which seems to contain two times the amount of expected objects. While investigating the corresponding symbols, I've noticed the following:
0:000> x /2 <product_name>!<company>::<main_product>::<chapter>::<subchapter>::<Current_Object>*
012511cc <product_name>!<company>::<main_product>::<chapter>::<subchapter>::<Current_ObjectID>::`vftable'
012511b0 <product_name>!<company>::<main_product>::<chapter>::<subchapter>::<Current_ObjectID>::`vftable'
01251194 <product_name>!<company>::<main_product>::<chapter>::<subchapter>::<Current_Object>::`vftable'
0125115c <product_name>!<company>::<main_product>::<chapter>::<subchapter>::<Current_Object>::`vftable'
For your information, the entries Current_Object and Current_ObjectID are present in the code, no problem there.
What I don't understand, is that there seem to be two entries for every symbol, and their memory addresses are very close to each other.
Does anybody know how I can interprete this?

it can be due to veriety of reasons Optimizations and redundant code elimination being one at the linking time (pdb is normally made when you compile) see this link by raymond chen for an overview
quoting relevent paragraph from the link
And when you step into the call to p->GetValue() you find yourself in Class1::GetQ.
What happened?
What happened is that the Microsoft linker combined functions that are identical
at the code generation level.
?GetQ#Class1##QAEPAHXZ PROC NEAR ; Class1::GetQ, COMDAT
00000 8b 41 04 mov eax, DWORD PTR [ecx+4]
00003 c3 ret 0
?GetQ#Class1##QAEPAHXZ ENDP ; Class1::GetQ
?GetValue#Class2##UAEHXZ PROC NEAR ; Class2::GetValue, COMDAT
00000 8b 41 04 mov eax, DWORD PTR [ecx+4]
00003 c3 ret 0
?GetValue#Class2##UAEHXZ ENDP ; Class2::GetValue
Observe that at the object code level, the two functions are identical.
(Note that whether two functions are identical at the object code level is
highly dependent on which version of what compiler you're using, and with
which optimization flags. Identical code generation for different functions
occurs with very high frequency when you use templates.) Therefore, the
linker says, "Well, what's the point of having two identical functions? I'll
just keep one copy and use it to stand for both Class1::GetQ and
Class2::GetValue."

Related

Extending .text section of executable fails

Warning - it is rather long question.
Background
I'm working on my small project for instrumenting pe files. At the moment my main focus is to extend .text section of an executable. Not add a new segment, not modify the entry point, but really extend existing .text section.
My approach is very naive. I fully rely on https://learn.microsoft.com/en-us/windows/win32/debug/pe-format and so I'm happy to hear suggestions on how I can improve. Moreover maybe you will spot something I've missed. For now I'm:
fixing all raw/virtual addresses of sections that are after .text
section making sure they are aligned correctly,
fixing some
fields in headers such as size of data, size of code, checksum,
iterating over every data dictionary and fixing all fields which
have RVA.
When comparing files in pe-bear everything seems to be correctly updated.
Current state
I am able to extend small project written by myself which use TLS, exceptions, imports, delay imports, sefe seh and resources. But I fail to extend real, big applications avaible on internet.
Notepad++
One of the applications which I'm unable to extend is Notepad. By this I mean that modified Notepad++ binary crashes. I've traced down a place which seems to be the reason.
push ebp
mov ebp, esp
mov ebx, dword ptr ss:[ebp+8] ; <-- 1) this argument in orginal is equal to 2. In permutated executable it is equal to some huge number
xor ecx, ecx
push edi
xor eax, eax
lea edi, dword ptr ds:[ebx*4 + 0x641fdc] ; <-- and so this causes ACCESS VOLATATION
Some screenshots:
x64dbg - orginal Notepad++ before 1). Base: 0x41000. The ebx will become: 0x2.
x64dbg - extended Notepad++ before 1). Base: 0x42000. The ebx will become: 0x00177FF4
0x146dde rva address is a relocation. Here is a screenshot of this relocation in orginal Notepad++
And in the modified one. Note that only the relocation value has changed from: 0x611fdc to 0x621fdc (which is expected behaviour):
And finaly screenshot of sections:
orginal:
modified:
Link to orginal Notepad++ binary: https://notepad-plus-plus.org/downloads/v7.8.6/
Link to modified Notepad++ binary: https://www.dropbox.com/s/hokh4ulmtgn7om1/notepad%2B%2B.permutated.exe?dl=0

Immediate Addressing mode difference?

Recently when I was studying the concept of addressing modes, the first type being immediate addressing mode, consider the example ADD #NUM1,R0 (instruction execution from left to right)
Here, is the address of NUM1 stored in Register R1?
What about when we do ADD #4,R0 to make it point to the next data, when we use #4, I understood that it adds 4 to contents of Register R0. Is there a difference when we use #NUM1 and #4. Please explain!
Is there a difference when we use #NUM1 and #4
In the final machine code in the executable that a CPU will actually run, no, there isn't.
If you have an assembler that directly creates an executable (no separate linking step), then the assembler will know at assemble time the numeric address of NUM1, and simply expand it as an immediate, producing exactly the same machine code as if you'd written add #0x700, R0. (Assuming the NUM1 label ends up at address 0x700 for this example.)
e.g. if the machine encoding for add #imm, R0 is 00 00 imm16, then you'll get 00 00 07 00 (assuming a bit-endian immediate).
Here, is the address of NUM1 stored in Register R1?
No, it's added to R0. If R0 previously contained 4, then R0 will now hold the address NUM1+4.
R1 isn't affected.
Often you have an assembler and a separate linker (e.g. as foo.s -o foo.o to assemble and then link with ld -o foo foo.o).
The numeric address isn't available at assemble time, only at link time. An object file format holds metadata for the symbol relocations, which let the linker fill in the absolute numeric addresses once it decides where the code will be loaded.
The resulting machine code will still be the same.

Why is RAX not used to pass a parameter in System V AMD64 ABI?

I don't understand what the benefit of not passing a parameter in RAX,
Since the return value is in RAX it is going to be clobbered by the callee anyway.
Can someone explain?
x86-64 System V does use AL for variadic functions: the caller passes the number of FP args in XMM registers.
(This is only an optimization to allow the callee to not dump all the vector regs into an array; the number in AL is allowed to be higher than the number of FP args. In practice, gcc's code-gen for variadic functions just checks if it's non-zero and dumps either none or all 8 of xmm0..7. I think the ABI guarantees that it's safe to always pass al=8 even if there aren't actually any FP args, and that you can't pass pass FP args on the stack instead by setting al=0)
But why not use r9b for that, and use RAX for the 6th arg? Or RAX for some earlier arg?
Because RAX has so many implicit uses in x86, and experiments when designing the calling convention (http://web.archive.org/web/20140414124645/http://www.x86-64.org/pipermail/discuss/2000-November/001257.html) found that using RAX tended to require extra instructions in the caller or callee. e.g. because RAX was often needed as part of computing other args in the caller, or was needed while doing something with one of the other args before the code gets around to using the arg that was passed in RAX.
RAX is used for rep stos (which gcc used to use more aggressively to inline memset), and it's used for div and widening (one-operand) mul/imul, which gcc uses for division by a compile-time constant. (Why does GCC use multiplication by a strange number in implementing integer division?).
Most of the other RAX special uses are just shorter encodings of things you can also do with other registers, like cdqe vs. movsxd rax, eax (or between any other registers). Or add eax,imm32 (no ModRM) vs. add r/m32, imm32 (or most other ALU instructions). See one of my answers on
Tips for golfing in x86/x64 machine code. Original 8086 lacked many of the longer non-AX alternatives, but between 8086 and 386, stuff like imul r32,r32 and movsx/movzx were added. Other RAX-only instructions aren't worth using when optimizing for speed (like xlatb, lodsd), or are obsolete by P6 / AMD64 extensions (lahf as part of FP compares obsoleted by fucomi and using SSE/SSE2 ucomisd for FP math), or are specialized instructions like cmpxchg or cpuid that are too rare to have an impact on calling convention design. Compilers didn't use the BCD instructions like aaa anyway, and AMD64 removed them.
The designers of the x86-64 System V calling convention (primarily Jan Hubička for the integer arg-passing register design) generally aimed to avoid registers with many / common implicit uses. rdx comes before rcx in the arg-passing order, because cl is needed for variable shift counts (without BMI2). These are maybe more common than mul and div, because 2-operand imul reg,reg allows normal non-widening multiplies without clobbering RDX:RAX.
The choice of rdi and rsi as the first 2 args was apparently motivated by inlining memset or memcpy as rep movs (which gcc did back in 2000, even though it wasn't actually a good choice in many of the cases where gcc did that). Even though rep-string instructions use RCX as the counter, they still found it on average saved instructions to pass the 3rd arg in RDX instead of RCX, so the calling convention doesn't quite work out for memcpy to be rep stosb/ret.
Jan Hubička evaluated multiple variations on arg-passing registers by compiling SpecInt with a then-current version of x86-64 gcc. See my answer on Why does Windows64 use a different calling convention from all other OSes on x86-64? for some more details and links.
One of the arg-register orders he evaluated was RAX, RDX, RCX, RBX, RSI, RDI, but he found that less good than other options. (See the mailing list message linked above).
It's fairly common for RISC calling conventions to pass the first arg in the first return-value register. ARM does this (r0), and I think so does PowerPC. Others (like MIPS) don't. But all of those architectures have no implicit uses of most integer registers, often just a link register and maybe the stack pointer.
x86-64 SysV and Windows do this for FP args: xmm0 for passing and returning.

Trouble Figuring out loading to register with offset from different register

I am creating an 8-bit CPU. I have basic instructions like mov, ld, st, add, sub, mult, jmp. I am trying to put my instructions together. First I move the base address of a value into register 1 (R1). I then want to load register 2 (R2) with the value. So my instructions look:
1 mov R1, 0xFFFF
2 ld R2, [R1+0]
My opcode definitions are:
ld: 0001
mov: 1111
Register codes are:
R1: 0001
R2: 0010
So my instructions in binary look like:
1 mov R1, 0xFFFF = 1111 0001 0xFFFF
2 ld R2, [R1+0] = 0001 00010
But on my second direction for load, how can I ensure the value stored at the memory location I moved to R1 is going to be used. This is my first time doing anything with computer architecture, so I am a little lost.
how can I ensure the value stored at the memory location I moved to R1 is going to be used.
By building your hardware to correctly handle the read-after-write hazard (https://en.wikipedia.org/wiki/Hazard_(computer_architecture)#Data_hazards).
Either
make it a simple non-pipelined CPU where one instruction writes back to registers before the next instruction reads any registers
detect the dependency and stall the pipeline
bypass forwarding. (https://en.wikipedia.org/wiki/Hazard_(computer_architecture)#Eliminating_hazards)

Is it possible to replace every instance of a particular function with a dummy in a compiled binary?

Is it possible to alter the way that an existing x86-64 binary references and/or calls one particular function. Specifically, is it possible to alter the binary such nothing happens (similar to a nop) at the times when that function would normally have executed?
I realize that there are powerful speciality tools out there (ie decompilers/disassemblers) for just this sort of task, but what I'm really wondering is if the executable formats are human-readable "enough" to be able to do this sort of thing (on small programs, at least) with just vim and a hex editor.
Are certain executable file formats (eg mach-o, elf, whatever the heck windows uses, etc.) more readable than others? Are they all just completely incomprehensible gibberish? Any expert views and/or good jumping off points/references would be greatly appreciated.
Disclaimer
Someone came by and quickly downvoted the initial version of this question, so I want to make this perfectly clear: I am not interested in disabling any serial or security checks or anything of the sort. Originally I had wanted a program to stop making a really irritating noise, but now I'm just curious about how compilers and executables work.
I'm in this for the educational value, and I think that other people on SE will be interested in the answer. However, I appreciate that others might not be as comfortable with this topic. If you have a concern about something I've said, please leave a comment and I promise I'll change my post.
This is trivial to do when the function in question is in the binary itself and uses standard calling conventions. Example:
void make_noise() { printf("Quack!\n"); }
int fn1() { puts("fn1"); make_noise(); return 1; }
int fn2() { puts("fn2"); make_noise(); return 2; }
int main() { puts("main"); return fn1() + fn2() - 3; }
gcc -w t.c -o a.out && ./a.out
This outputs (expected):
main
fn1
Quack!
fn2
Quack!
Now let's get rid of the noise:
gdb -q --write ./a.out
(gdb) disas/r make_noise
Dump of assembler code for function make_noise:
0x000000000040052d <+0>: 55 push %rbp
0x000000000040052e <+1>: 48 89 e5 mov %rsp,%rbp
0x0000000000400531 <+4>: bf 34 06 40 00 mov $0x400634,%edi
0x0000000000400536 <+9>: e8 d5 fe ff ff callq 0x400410 <puts#plt>
0x000000000040053b <+14>: 5d pop %rbp
0x000000000040053c <+15>: c3 retq
End of assembler dump.
This tells us a few things:
The function that we want to get rid of starts at address 0x40052d
The op-code of retq instruction is 0xC3.
Let's patch retq as the first instruction of make_noise, and see what happens:
(gdb) set *(char*)0x40052d = 0xc3
(gdb) disas make_noise
Dump of assembler code for function make_noise:
0x000000000040052d <+0>: retq
0x000000000040052e <+1>: mov %rsp,%rbp
0x0000000000400531 <+4>: mov $0x400634,%edi
0x0000000000400536 <+9>: callq 0x400410 <puts#plt>
0x000000000040053b <+14>: pop %rbp
0x000000000040053c <+15>: retq
End of assembler dump.
It worked!
(gdb) q
Segmentation fault (core dumped) ## This is a long-standing GDB bug
And now let's run patched binary:
$ ./a.out
main
fn1
fn2
Voila! No noise.
If the function is in a different binary, LD_PRELOAD techniques mentioned by Florian Weimer is usually easier than binary patching.
ELF dynamic linking implementations often support LD_PRELOAD and LD_AUDIT modules, which can both intercept calls into another shared object. LD_AUDIT offers more control, and exists on GNU/Linux (but the Solaris documentation is the canonical reference).
For calls within the same shared object, this may not be possible if the target function is not exported (or the call is executed via a hidden alias; glibc does this a lot). If you have debugging information, you can use systemtap to intercept the call. If the function is inlined, intercepting the call might not be possible even with systemtap because there is no exact place in the instruction stream where the call takes place.