Extending .text section of executable fails - code-injection

Warning - it is rather long question.
Background
I'm working on my small project for instrumenting pe files. At the moment my main focus is to extend .text section of an executable. Not add a new segment, not modify the entry point, but really extend existing .text section.
My approach is very naive. I fully rely on https://learn.microsoft.com/en-us/windows/win32/debug/pe-format and so I'm happy to hear suggestions on how I can improve. Moreover maybe you will spot something I've missed. For now I'm:
fixing all raw/virtual addresses of sections that are after .text
section making sure they are aligned correctly,
fixing some
fields in headers such as size of data, size of code, checksum,
iterating over every data dictionary and fixing all fields which
have RVA.
When comparing files in pe-bear everything seems to be correctly updated.
Current state
I am able to extend small project written by myself which use TLS, exceptions, imports, delay imports, sefe seh and resources. But I fail to extend real, big applications avaible on internet.
Notepad++
One of the applications which I'm unable to extend is Notepad. By this I mean that modified Notepad++ binary crashes. I've traced down a place which seems to be the reason.
push ebp
mov ebp, esp
mov ebx, dword ptr ss:[ebp+8] ; <-- 1) this argument in orginal is equal to 2. In permutated executable it is equal to some huge number
xor ecx, ecx
push edi
xor eax, eax
lea edi, dword ptr ds:[ebx*4 + 0x641fdc] ; <-- and so this causes ACCESS VOLATATION
Some screenshots:
x64dbg - orginal Notepad++ before 1). Base: 0x41000. The ebx will become: 0x2.
x64dbg - extended Notepad++ before 1). Base: 0x42000. The ebx will become: 0x00177FF4
0x146dde rva address is a relocation. Here is a screenshot of this relocation in orginal Notepad++
And in the modified one. Note that only the relocation value has changed from: 0x611fdc to 0x621fdc (which is expected behaviour):
And finaly screenshot of sections:
orginal:
modified:
Link to orginal Notepad++ binary: https://notepad-plus-plus.org/downloads/v7.8.6/
Link to modified Notepad++ binary: https://www.dropbox.com/s/hokh4ulmtgn7om1/notepad%2B%2B.permutated.exe?dl=0

Related

Why is RAX not used to pass a parameter in System V AMD64 ABI?

I don't understand what the benefit of not passing a parameter in RAX,
Since the return value is in RAX it is going to be clobbered by the callee anyway.
Can someone explain?
x86-64 System V does use AL for variadic functions: the caller passes the number of FP args in XMM registers.
(This is only an optimization to allow the callee to not dump all the vector regs into an array; the number in AL is allowed to be higher than the number of FP args. In practice, gcc's code-gen for variadic functions just checks if it's non-zero and dumps either none or all 8 of xmm0..7. I think the ABI guarantees that it's safe to always pass al=8 even if there aren't actually any FP args, and that you can't pass pass FP args on the stack instead by setting al=0)
But why not use r9b for that, and use RAX for the 6th arg? Or RAX for some earlier arg?
Because RAX has so many implicit uses in x86, and experiments when designing the calling convention (http://web.archive.org/web/20140414124645/http://www.x86-64.org/pipermail/discuss/2000-November/001257.html) found that using RAX tended to require extra instructions in the caller or callee. e.g. because RAX was often needed as part of computing other args in the caller, or was needed while doing something with one of the other args before the code gets around to using the arg that was passed in RAX.
RAX is used for rep stos (which gcc used to use more aggressively to inline memset), and it's used for div and widening (one-operand) mul/imul, which gcc uses for division by a compile-time constant. (Why does GCC use multiplication by a strange number in implementing integer division?).
Most of the other RAX special uses are just shorter encodings of things you can also do with other registers, like cdqe vs. movsxd rax, eax (or between any other registers). Or add eax,imm32 (no ModRM) vs. add r/m32, imm32 (or most other ALU instructions). See one of my answers on
Tips for golfing in x86/x64 machine code. Original 8086 lacked many of the longer non-AX alternatives, but between 8086 and 386, stuff like imul r32,r32 and movsx/movzx were added. Other RAX-only instructions aren't worth using when optimizing for speed (like xlatb, lodsd), or are obsolete by P6 / AMD64 extensions (lahf as part of FP compares obsoleted by fucomi and using SSE/SSE2 ucomisd for FP math), or are specialized instructions like cmpxchg or cpuid that are too rare to have an impact on calling convention design. Compilers didn't use the BCD instructions like aaa anyway, and AMD64 removed them.
The designers of the x86-64 System V calling convention (primarily Jan Hubička for the integer arg-passing register design) generally aimed to avoid registers with many / common implicit uses. rdx comes before rcx in the arg-passing order, because cl is needed for variable shift counts (without BMI2). These are maybe more common than mul and div, because 2-operand imul reg,reg allows normal non-widening multiplies without clobbering RDX:RAX.
The choice of rdi and rsi as the first 2 args was apparently motivated by inlining memset or memcpy as rep movs (which gcc did back in 2000, even though it wasn't actually a good choice in many of the cases where gcc did that). Even though rep-string instructions use RCX as the counter, they still found it on average saved instructions to pass the 3rd arg in RDX instead of RCX, so the calling convention doesn't quite work out for memcpy to be rep stosb/ret.
Jan Hubička evaluated multiple variations on arg-passing registers by compiling SpecInt with a then-current version of x86-64 gcc. See my answer on Why does Windows64 use a different calling convention from all other OSes on x86-64? for some more details and links.
One of the arg-register orders he evaluated was RAX, RDX, RCX, RBX, RSI, RDI, but he found that less good than other options. (See the mailing list message linked above).
It's fairly common for RISC calling conventions to pass the first arg in the first return-value register. ARM does this (r0), and I think so does PowerPC. Others (like MIPS) don't. But all of those architectures have no implicit uses of most integer registers, often just a link register and maybe the stack pointer.
x86-64 SysV and Windows do this for FP args: xmm0 for passing and returning.

How to interprete double entries in Windbg "x /2" result?

I'm debugging a dumpfile (memory dump, not a crashdump), which seems to contain two times the amount of expected objects. While investigating the corresponding symbols, I've noticed the following:
0:000> x /2 <product_name>!<company>::<main_product>::<chapter>::<subchapter>::<Current_Object>*
012511cc <product_name>!<company>::<main_product>::<chapter>::<subchapter>::<Current_ObjectID>::`vftable'
012511b0 <product_name>!<company>::<main_product>::<chapter>::<subchapter>::<Current_ObjectID>::`vftable'
01251194 <product_name>!<company>::<main_product>::<chapter>::<subchapter>::<Current_Object>::`vftable'
0125115c <product_name>!<company>::<main_product>::<chapter>::<subchapter>::<Current_Object>::`vftable'
For your information, the entries Current_Object and Current_ObjectID are present in the code, no problem there.
What I don't understand, is that there seem to be two entries for every symbol, and their memory addresses are very close to each other.
Does anybody know how I can interprete this?
it can be due to veriety of reasons Optimizations and redundant code elimination being one at the linking time (pdb is normally made when you compile) see this link by raymond chen for an overview
quoting relevent paragraph from the link
And when you step into the call to p->GetValue() you find yourself in Class1::GetQ.
What happened?
What happened is that the Microsoft linker combined functions that are identical
at the code generation level.
?GetQ#Class1##QAEPAHXZ PROC NEAR ; Class1::GetQ, COMDAT
00000 8b 41 04 mov eax, DWORD PTR [ecx+4]
00003 c3 ret 0
?GetQ#Class1##QAEPAHXZ ENDP ; Class1::GetQ
?GetValue#Class2##UAEHXZ PROC NEAR ; Class2::GetValue, COMDAT
00000 8b 41 04 mov eax, DWORD PTR [ecx+4]
00003 c3 ret 0
?GetValue#Class2##UAEHXZ ENDP ; Class2::GetValue
Observe that at the object code level, the two functions are identical.
(Note that whether two functions are identical at the object code level is
highly dependent on which version of what compiler you're using, and with
which optimization flags. Identical code generation for different functions
occurs with very high frequency when you use templates.) Therefore, the
linker says, "Well, what's the point of having two identical functions? I'll
just keep one copy and use it to stand for both Class1::GetQ and
Class2::GetValue."

What makes read() a syscall?

The following link says that read is a syscall:
What is the difference between read() and fread()?
Now, I am trying to understand what makes read a system call.
For example:
I use Nuttx OS and registered a device structure flash_dev (path '/dev/flash0') with open, close and ioctl methods. This is added as a inode in pesudo file system with semaphore support for mutual exclusion.
Now, from application I open ('/dev/flash0') and do read & ioctls.
Now, which part in the above process makes read a syscall?
The read() function is a thin wrapper around whatever instructions are necessary to call into the system, IOW, to make a system call. When you call read() (and fread() call it as well), the relevant kernel/driver code gets invoked and does whatever is necessary to read from a file.
A system call is a call whose functionality lives almost entirely in the kernel rather than in user space. Traditionally, open(), read(), write(), etc, are in the kernel whereas fread(), fwrite(), etc, have code that runs in user space that calls into the kernel as needed.
For example, in Linux when you call read() the standard library your application linked against might do the following:
mov eax, 3 ;3 -> read
mov ebx, 2 ;file id
mov ecx, buffer
mov edx, 5 ;5 bytes
int 80h
That's it - it simply takes the parameters you passed in and invokes the kernel via the int 80h (interrupt) instruction. As an application programmer, it's not usually important whether the call runs in user space, in the kernel, or both. It can be important for debugging or performance reasons, but for simple applications it really doesn't matter much.

Calling function at memory address x86_x64

I'm writing some shellcode, and I would like to know how to properly call a function from assembly that resides in libc.Note that the address below is the address of the system function. I'm certain I've got this address correct, because if I overwrite the return address with this address, system is called successfully, but it seems to segfault within the assembly (which is contained in a buffer). If anything is unclear, let me know. I'm trying to get a libc function call working on a executable stack, andI'm pulling my hair out here. Note that the code reaches the buffer okay, and starts going through the appropriate nop sled, but segfaults on the call instruction (code below).
mov 0xf7ff7fa5b640, %rax
mov (hex representation of /bin/sh), %rdi
call *%rax

Is it possible to overwrite %eax using buffer overflow?

I know that a program stack looks somewhat like this (from high to low):
EIP | EBP | local variables
But where could I find %eax, and the other general registers? Is it possible to overwrite them using a buffer overflow?
Update: In the end, I did not even have to overwrite %eax, because it turned out that the program pointed %eax to the user input at some point.
A register, by definition, is not in RAM. Registers are in the CPU and do not have addresses, so you cannot overwrite them with a buffer overflow. However, there are very few registers, so the compiler really uses them as a kind of cache for the most used stack elements. This means that while you cannot overflow into registers stricto sensu, values overwritten in RAM will sooner or later be loaded into registers.
(In Sparc CPU, the registers-as-stack-cache policy is even hardwired.)
In your schema, EIP and EBP are not on the stack; the corresponding slots on the stack are areas from which these two registers will be reloaded (upon function exit). EAX, on the other hand, is a general purpose register that the code will use here and there, without a strict convention.
EAX will probably never appear on the stack. For most x86 compilers EAX is the 32-bit return-value register, and is never saved on the stack nor restored from the stack (RAX in 64-bit systems).
This is not to say that a buffer overflow cannot be used to alter the contents of EAX by putting executable code on the stack; if code execution on the stack has not been disabled by the OS, this can be done, or you can force a bogus return address on the stack which transfers control to a code sequence that loads a value into EAX, but these are quite difficult to pull off. Similarly, if the value returned by a function is known to be stored in a local variable, a stack smash that altered that variable would change the value copied to EAX, but optimizing compilers can change stack layout on a whim, so an exploit that works on one version may fail completely on a new release or hotfix.
See Thomas Pornin (+1) and flouder's (+1) answers, they are good. I want to add a supplementary answer that may have been alluded to but not specifically stated, and that is register spilling.
Though the "where" of the original question (at least as worded) appears to be based on the false premise that %eax is on the stack, and being a register, it isn't part of the stack on x86 (though you can emulate any hardware register set on a stack, and some architectures actually do this, but that isn't relevant), incidentally, registers are spilled/filled from the stack frequently. So it is possible to smash the value of a register with a stack overflow if the register has been spilled to the stack. This would require you to know the spilling mechanism of the particular compiler, and for that function call, you would need to know that %eax had been spilled, where it had been spilled to, and stomp that stack location, and when it is next filled from its memory copy, it gets the new value. As unlikely as this seems, these attacks are usually inspired by reading source code, and knowing something about the compiler in question, so it isn't really that far fetched.
See this for more reading on register spilling
gcc argument register spilling on x86-64
https://software.intel.com/en-us/articles/dont-spill-that-register-ensuring-optimal-performance-from-intrinsics