I'm writing some shellcode, and I would like to know how to properly call a function from assembly that resides in libc.Note that the address below is the address of the system function. I'm certain I've got this address correct, because if I overwrite the return address with this address, system is called successfully, but it seems to segfault within the assembly (which is contained in a buffer). If anything is unclear, let me know. I'm trying to get a libc function call working on a executable stack, andI'm pulling my hair out here. Note that the code reaches the buffer okay, and starts going through the appropriate nop sled, but segfaults on the call instruction (code below).
mov 0xf7ff7fa5b640, %rax
mov (hex representation of /bin/sh), %rdi
call *%rax
Related
Warning - it is rather long question.
Background
I'm working on my small project for instrumenting pe files. At the moment my main focus is to extend .text section of an executable. Not add a new segment, not modify the entry point, but really extend existing .text section.
My approach is very naive. I fully rely on https://learn.microsoft.com/en-us/windows/win32/debug/pe-format and so I'm happy to hear suggestions on how I can improve. Moreover maybe you will spot something I've missed. For now I'm:
fixing all raw/virtual addresses of sections that are after .text
section making sure they are aligned correctly,
fixing some
fields in headers such as size of data, size of code, checksum,
iterating over every data dictionary and fixing all fields which
have RVA.
When comparing files in pe-bear everything seems to be correctly updated.
Current state
I am able to extend small project written by myself which use TLS, exceptions, imports, delay imports, sefe seh and resources. But I fail to extend real, big applications avaible on internet.
Notepad++
One of the applications which I'm unable to extend is Notepad. By this I mean that modified Notepad++ binary crashes. I've traced down a place which seems to be the reason.
push ebp
mov ebp, esp
mov ebx, dword ptr ss:[ebp+8] ; <-- 1) this argument in orginal is equal to 2. In permutated executable it is equal to some huge number
xor ecx, ecx
push edi
xor eax, eax
lea edi, dword ptr ds:[ebx*4 + 0x641fdc] ; <-- and so this causes ACCESS VOLATATION
Some screenshots:
x64dbg - orginal Notepad++ before 1). Base: 0x41000. The ebx will become: 0x2.
x64dbg - extended Notepad++ before 1). Base: 0x42000. The ebx will become: 0x00177FF4
0x146dde rva address is a relocation. Here is a screenshot of this relocation in orginal Notepad++
And in the modified one. Note that only the relocation value has changed from: 0x611fdc to 0x621fdc (which is expected behaviour):
And finaly screenshot of sections:
orginal:
modified:
Link to orginal Notepad++ binary: https://notepad-plus-plus.org/downloads/v7.8.6/
Link to modified Notepad++ binary: https://www.dropbox.com/s/hokh4ulmtgn7om1/notepad%2B%2B.permutated.exe?dl=0
Is it possible to alter the way that an existing x86-64 binary references and/or calls one particular function. Specifically, is it possible to alter the binary such nothing happens (similar to a nop) at the times when that function would normally have executed?
I realize that there are powerful speciality tools out there (ie decompilers/disassemblers) for just this sort of task, but what I'm really wondering is if the executable formats are human-readable "enough" to be able to do this sort of thing (on small programs, at least) with just vim and a hex editor.
Are certain executable file formats (eg mach-o, elf, whatever the heck windows uses, etc.) more readable than others? Are they all just completely incomprehensible gibberish? Any expert views and/or good jumping off points/references would be greatly appreciated.
Disclaimer
Someone came by and quickly downvoted the initial version of this question, so I want to make this perfectly clear: I am not interested in disabling any serial or security checks or anything of the sort. Originally I had wanted a program to stop making a really irritating noise, but now I'm just curious about how compilers and executables work.
I'm in this for the educational value, and I think that other people on SE will be interested in the answer. However, I appreciate that others might not be as comfortable with this topic. If you have a concern about something I've said, please leave a comment and I promise I'll change my post.
This is trivial to do when the function in question is in the binary itself and uses standard calling conventions. Example:
void make_noise() { printf("Quack!\n"); }
int fn1() { puts("fn1"); make_noise(); return 1; }
int fn2() { puts("fn2"); make_noise(); return 2; }
int main() { puts("main"); return fn1() + fn2() - 3; }
gcc -w t.c -o a.out && ./a.out
This outputs (expected):
main
fn1
Quack!
fn2
Quack!
Now let's get rid of the noise:
gdb -q --write ./a.out
(gdb) disas/r make_noise
Dump of assembler code for function make_noise:
0x000000000040052d <+0>: 55 push %rbp
0x000000000040052e <+1>: 48 89 e5 mov %rsp,%rbp
0x0000000000400531 <+4>: bf 34 06 40 00 mov $0x400634,%edi
0x0000000000400536 <+9>: e8 d5 fe ff ff callq 0x400410 <puts#plt>
0x000000000040053b <+14>: 5d pop %rbp
0x000000000040053c <+15>: c3 retq
End of assembler dump.
This tells us a few things:
The function that we want to get rid of starts at address 0x40052d
The op-code of retq instruction is 0xC3.
Let's patch retq as the first instruction of make_noise, and see what happens:
(gdb) set *(char*)0x40052d = 0xc3
(gdb) disas make_noise
Dump of assembler code for function make_noise:
0x000000000040052d <+0>: retq
0x000000000040052e <+1>: mov %rsp,%rbp
0x0000000000400531 <+4>: mov $0x400634,%edi
0x0000000000400536 <+9>: callq 0x400410 <puts#plt>
0x000000000040053b <+14>: pop %rbp
0x000000000040053c <+15>: retq
End of assembler dump.
It worked!
(gdb) q
Segmentation fault (core dumped) ## This is a long-standing GDB bug
And now let's run patched binary:
$ ./a.out
main
fn1
fn2
Voila! No noise.
If the function is in a different binary, LD_PRELOAD techniques mentioned by Florian Weimer is usually easier than binary patching.
ELF dynamic linking implementations often support LD_PRELOAD and LD_AUDIT modules, which can both intercept calls into another shared object. LD_AUDIT offers more control, and exists on GNU/Linux (but the Solaris documentation is the canonical reference).
For calls within the same shared object, this may not be possible if the target function is not exported (or the call is executed via a hidden alias; glibc does this a lot). If you have debugging information, you can use systemtap to intercept the call. If the function is inlined, intercepting the call might not be possible even with systemtap because there is no exact place in the instruction stream where the call takes place.
In particular the print command typically (80-90% failure rate) does not work
I've verified already:
https://developer.apple.com/library/content/qa/qa1947/_index.html
Example 1
(lldb) p prevMsg
error: Couldn't materialize: couldn't get the value of runOnce: extracting data from value failed error: errored out in DoExecute, couldn't PrepareToExecuteJITExpression
Example 2 A more typical example that puts you in a stone age of computing:
(lldb) p activeNetworkRequests
error: Execution was interrupted, reason: EXC_BAD_ACCESS (code=1, address=0x1700530). The process has been returned to the state before expression evaluation.
This seems to have gotten progressively worse since Xcode 7.
Printing variables scoped from the enclosing function of a closure are particularly hopeless.
The code base is not small, about 15K lines. It would not be practical to isolate and reproduce all the code here.
Surely others are experiencing this?
UPDATE: I'm told about the merits of expression --unwind-on-error=0 -- variable-in-question, presumably for example2
UPDATE 2:
Code:
Util.log("Returning \(key) from file cache", [.Caches])
Output:
08:03:11.201 v2.0.64d other TwoStageCache.swift objectForKey(_:completion:)[95]: Returning https://example.server.com/Storage/Retrieve?FileName=accounts/person#domain.com/resource/47a58660-26d1-11e7-8e7f-c9f4cd679b03.html from file cache
(So the value of key is fine)
(lldb) fr var key
(URL) key = unable to read data
(lldb) print key
error: Execution was interrupted, reason: EXC_BAD_ACCESS (code=1, address=0x1d787583).
The process has been returned to the state before expression evaluation.
If we look at the crash:
(lldb) expression --unwind-on-error=0 -- key
libobjc.A.dylib`objc_retain:
0x22562b0 <+0>: pushl %ebp
0x22562b1 <+1>: movl %esp, %ebp
0x22562b3 <+3>: subl $0x8, %esp
0x22562b6 <+6>: calll 0x22562bb ; <+11>
0x22562bb <+11>: popl %ecx
0x22562bc <+12>: movl 0x8(%ebp), %eax
0x22562bf <+15>: testl %eax, %eax
0x22562c1 <+17>: je 0x22562e1 ; <+49>
0x22562c3 <+19>: movl (%eax), %edx
-> 0x22562c5 <+21>: testb $0x2, 0x10(%edx)
From:
1 $__lldb_expr(UnsafeMutablePointer<Any>) -> ()
2 Beta Viewer`#objc AppDelegate.init() -> AppDelegate:
3 sharedEnchantment`partial apply for TwoStageCache.(objectForKey(URL, completion : (imgData : Data?, err : BBError?) -> ()) -> ()).(closure #1)
4 sharedEnchantment`thunk:
Sorry in advance for the essay, but hopefully the info will be worth the read...
lldb has two ways of looking at variables(*): print and frame variable.
print isn't really meant primarily for printing variables - that's just a side effect of what it really does. print is an alias for expression which gives you a little more sense of what it is: a full expression evaluator which runs the expression you pass at the point where you are stopped in your code.
It builds a context that emulates the code at the current pc (including the Class/Protocol context) and then takes the snippet you pass it, compiles it in that context, JIT's the result, inserts the JIT'ed code into the process you are debugging and runs it. That's quite powerful - you can change values, call functions in your program, introduce new functions, new types, etc. But there is also a lot of machinery just to get it going, and with swift some of this machinery is tricky to get right.
frame variable can only print locals and arguments in the current frame (with the -g flag it can also print globals & statics). It can't call functions, or any of the other fancy things print can do. It does understand a limited subset of the variable access syntax, so:
(lldb) frame variable foo.bar.baz
will work. But under the covers, all it needs to do is read the debug information to find the variable, its type, and where it is in memory, and then it can extract the value from that information. So it is faster and more robust for what it does do - which is a large percentage of the work people generally ask print to do.
Note, you can get "object printing" for variables you access with frame variable by using the -O flag, and it supports the same formatting options for results as print. For context, the Xcode "Locals" view is roughly equivalent to calling frame variable.
I tend to use frame variable for simple locals printing, but even if you like to use one command for all your needs - which will be print - it's good to know that there's a fallback if print is failing for some reason.
Back to your examples...
Example 1: one of the things print does in Swift is introduce all the visible local variables into the context of the expression, so they are available to your code. The error in Example 1 is because one of the local variables couldn't be realized - maybe it was a only specified by a protocol conformance and we couldn't figure out what it really was - so we failed building the context, which means the parse or JIT steps failed. The print code does a pre-scan for this sort of failure and omits failing locals, but you've found a case this scan misses.
frame variable would have probably also failed to print runOnce but since it doesn't depend on the current context, the inability to do that wouldn't have affected your ability to print other variables.
If you can reproduce this issue, even if you can't make the project available to us we can often figure out what's going on from lldb's debugging log. So drive the debug session to the point where the print is going to fail, and do:
(lldb) log enable -f /tmp/lldb-log.txt lldb expr types
then run the failing expression. Then grab that log, and file a bug as described here:
https://swift.org/contributing/#reporting-bugs
Example 2: Was activeNetworkRequests a property? Those require us to call the "get" method to access them, and I have seen a few cases where lldb doesn't emit the code to call the property getters correctly. The log above will show us the code that was emitted, and we might be able to tell from there what went wrong. Of course, if you can make a test case you can send with the bug that is always best, but that's often not possible...
(*)For gdb users this is pretty close to the info locals vrs. print...
The following link says that read is a syscall:
What is the difference between read() and fread()?
Now, I am trying to understand what makes read a system call.
For example:
I use Nuttx OS and registered a device structure flash_dev (path '/dev/flash0') with open, close and ioctl methods. This is added as a inode in pesudo file system with semaphore support for mutual exclusion.
Now, from application I open ('/dev/flash0') and do read & ioctls.
Now, which part in the above process makes read a syscall?
The read() function is a thin wrapper around whatever instructions are necessary to call into the system, IOW, to make a system call. When you call read() (and fread() call it as well), the relevant kernel/driver code gets invoked and does whatever is necessary to read from a file.
A system call is a call whose functionality lives almost entirely in the kernel rather than in user space. Traditionally, open(), read(), write(), etc, are in the kernel whereas fread(), fwrite(), etc, have code that runs in user space that calls into the kernel as needed.
For example, in Linux when you call read() the standard library your application linked against might do the following:
mov eax, 3 ;3 -> read
mov ebx, 2 ;file id
mov ecx, buffer
mov edx, 5 ;5 bytes
int 80h
That's it - it simply takes the parameters you passed in and invokes the kernel via the int 80h (interrupt) instruction. As an application programmer, it's not usually important whether the call runs in user space, in the kernel, or both. It can be important for debugging or performance reasons, but for simple applications it really doesn't matter much.
I know that a program stack looks somewhat like this (from high to low):
EIP | EBP | local variables
But where could I find %eax, and the other general registers? Is it possible to overwrite them using a buffer overflow?
Update: In the end, I did not even have to overwrite %eax, because it turned out that the program pointed %eax to the user input at some point.
A register, by definition, is not in RAM. Registers are in the CPU and do not have addresses, so you cannot overwrite them with a buffer overflow. However, there are very few registers, so the compiler really uses them as a kind of cache for the most used stack elements. This means that while you cannot overflow into registers stricto sensu, values overwritten in RAM will sooner or later be loaded into registers.
(In Sparc CPU, the registers-as-stack-cache policy is even hardwired.)
In your schema, EIP and EBP are not on the stack; the corresponding slots on the stack are areas from which these two registers will be reloaded (upon function exit). EAX, on the other hand, is a general purpose register that the code will use here and there, without a strict convention.
EAX will probably never appear on the stack. For most x86 compilers EAX is the 32-bit return-value register, and is never saved on the stack nor restored from the stack (RAX in 64-bit systems).
This is not to say that a buffer overflow cannot be used to alter the contents of EAX by putting executable code on the stack; if code execution on the stack has not been disabled by the OS, this can be done, or you can force a bogus return address on the stack which transfers control to a code sequence that loads a value into EAX, but these are quite difficult to pull off. Similarly, if the value returned by a function is known to be stored in a local variable, a stack smash that altered that variable would change the value copied to EAX, but optimizing compilers can change stack layout on a whim, so an exploit that works on one version may fail completely on a new release or hotfix.
See Thomas Pornin (+1) and flouder's (+1) answers, they are good. I want to add a supplementary answer that may have been alluded to but not specifically stated, and that is register spilling.
Though the "where" of the original question (at least as worded) appears to be based on the false premise that %eax is on the stack, and being a register, it isn't part of the stack on x86 (though you can emulate any hardware register set on a stack, and some architectures actually do this, but that isn't relevant), incidentally, registers are spilled/filled from the stack frequently. So it is possible to smash the value of a register with a stack overflow if the register has been spilled to the stack. This would require you to know the spilling mechanism of the particular compiler, and for that function call, you would need to know that %eax had been spilled, where it had been spilled to, and stomp that stack location, and when it is next filled from its memory copy, it gets the new value. As unlikely as this seems, these attacks are usually inspired by reading source code, and knowing something about the compiler in question, so it isn't really that far fetched.
See this for more reading on register spilling
gcc argument register spilling on x86-64
https://software.intel.com/en-us/articles/dont-spill-that-register-ensuring-optimal-performance-from-intrinsics