What makes read() a syscall? - system-calls

The following link says that read is a syscall:
What is the difference between read() and fread()?
Now, I am trying to understand what makes read a system call.
For example:
I use Nuttx OS and registered a device structure flash_dev (path '/dev/flash0') with open, close and ioctl methods. This is added as a inode in pesudo file system with semaphore support for mutual exclusion.
Now, from application I open ('/dev/flash0') and do read & ioctls.
Now, which part in the above process makes read a syscall?

The read() function is a thin wrapper around whatever instructions are necessary to call into the system, IOW, to make a system call. When you call read() (and fread() call it as well), the relevant kernel/driver code gets invoked and does whatever is necessary to read from a file.

A system call is a call whose functionality lives almost entirely in the kernel rather than in user space. Traditionally, open(), read(), write(), etc, are in the kernel whereas fread(), fwrite(), etc, have code that runs in user space that calls into the kernel as needed.
For example, in Linux when you call read() the standard library your application linked against might do the following:
mov eax, 3 ;3 -> read
mov ebx, 2 ;file id
mov ecx, buffer
mov edx, 5 ;5 bytes
int 80h
That's it - it simply takes the parameters you passed in and invokes the kernel via the int 80h (interrupt) instruction. As an application programmer, it's not usually important whether the call runs in user space, in the kernel, or both. It can be important for debugging or performance reasons, but for simple applications it really doesn't matter much.

Related

Compiling an OS and defining the system calls

I'm trying to better understand operating systems, not the theory behind them but how real people write real OS code.
I know most OS's are written in C. I know the source code for these OS's include calls to functions like malloc, calloc, etc, to allocate memory for a process, etc.
Under normal conditions, i.e, when compiling code destined to run on an OS, I know that the C compiler will use the underlying OS's system calls to execute these functions. But when compiling the source code for these OS's, how does the compiler know what to do. The system calls don't exist cause they're defined by the OS. Does the compiler just call some assembly routine, which will eventually become a system call?
It is complex because you need to understand several things about OS development to get the big picture. Overall, the OS isn't like a process which executes in-order like average user mode code. When the computer boots, the OS will execute in-order to set up its environment. After that, the OS is basically system calls and interrupts.
Each CPU works differently but most CPUs will have an interrupt table mechanism and a syscall mechanism. The interrupt table specifies where to jump for a certain interrupt number and the syscall mechanism is probably one register containing the address of the entry point for a syscall. It works like this on x86-64 (most desktop/laptop computers). x86-64 has the IDT (Interrupt Descriptor Table) and the syscall register is IA32_LSTAR.
The OS isn't written in C because you can call malloc() or anything. The OS is written in C because you can make C static and freestanding (all code needed is in the executable and it doesn't rely on any external software to the executable). Actually, when writing an OS, you cannot call malloc(). You need to avoid any standard library function implementations and use static and freestanding code (the base of C like structs, pointers, arithmetic, variables, etc). C is also used because you can modify arbitrary memory locations with pointers. For example,
unsigned int* ptr = (unsigned int*)0x1234;
*ptr = 0x87654321;
makes sure that address 0x1234 contains 0x87654321. You can also use binary operators (and, or, xor, shift, etc) to modify memory at the bit level.
To answer your question, if you want to define the system calls in an OS that you write yourself, you simply consider your syscalls to be that way. When you write your syscall handler, you consider that someone using your OS knows that a certain syscall number is requesting a certain operation so you oblige by doing that. For example, Linux uses SysV as a convention for syscall numbers and registers used to pass arguments to them (including the syscall number). On x86-64 Linux, from user mode, you put your syscall number in RAX and use the instruction syscall. The processor then looks in IA32_LSTAR for the address of the syscall handler and jumps to it. The processor (the core) is now in kernel mode (in the kernel). The kernel now looks at RAX for the syscall number and answers the request by the doing the associated operation (after several checkups and a bunch of other things).

Possible to see tracing when using cat or vi opening a text file

Is it possible to trace through what is being read through a text file using eBPF? There are ways to see the amount of memory being used and count reads and writes but I would like to even output the user data using bpf_trace_print if possible.
I think this would require tracing open() (or openat()) system call and correlate it (fd in particular) with traced read calls.
/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/format defines what syscall arguments can be accessed. What may interest you is char *buf buffer pointer, where read() places bytes it has read.
However, it is possible that the trace call occurs before any bytes have been read (need to check the kernel source). So, may be more reliable way is to use raw tracepoint (BPF_PROG_TYPE_RAW_TRACEPOINT) hooked at read() return.

Calling function at memory address x86_x64

I'm writing some shellcode, and I would like to know how to properly call a function from assembly that resides in libc.Note that the address below is the address of the system function. I'm certain I've got this address correct, because if I overwrite the return address with this address, system is called successfully, but it seems to segfault within the assembly (which is contained in a buffer). If anything is unclear, let me know. I'm trying to get a libc function call working on a executable stack, andI'm pulling my hair out here. Note that the code reaches the buffer okay, and starts going through the appropriate nop sled, but segfaults on the call instruction (code below).
mov 0xf7ff7fa5b640, %rax
mov (hex representation of /bin/sh), %rdi
call *%rax

Is it possible to overwrite %eax using buffer overflow?

I know that a program stack looks somewhat like this (from high to low):
EIP | EBP | local variables
But where could I find %eax, and the other general registers? Is it possible to overwrite them using a buffer overflow?
Update: In the end, I did not even have to overwrite %eax, because it turned out that the program pointed %eax to the user input at some point.
A register, by definition, is not in RAM. Registers are in the CPU and do not have addresses, so you cannot overwrite them with a buffer overflow. However, there are very few registers, so the compiler really uses them as a kind of cache for the most used stack elements. This means that while you cannot overflow into registers stricto sensu, values overwritten in RAM will sooner or later be loaded into registers.
(In Sparc CPU, the registers-as-stack-cache policy is even hardwired.)
In your schema, EIP and EBP are not on the stack; the corresponding slots on the stack are areas from which these two registers will be reloaded (upon function exit). EAX, on the other hand, is a general purpose register that the code will use here and there, without a strict convention.
EAX will probably never appear on the stack. For most x86 compilers EAX is the 32-bit return-value register, and is never saved on the stack nor restored from the stack (RAX in 64-bit systems).
This is not to say that a buffer overflow cannot be used to alter the contents of EAX by putting executable code on the stack; if code execution on the stack has not been disabled by the OS, this can be done, or you can force a bogus return address on the stack which transfers control to a code sequence that loads a value into EAX, but these are quite difficult to pull off. Similarly, if the value returned by a function is known to be stored in a local variable, a stack smash that altered that variable would change the value copied to EAX, but optimizing compilers can change stack layout on a whim, so an exploit that works on one version may fail completely on a new release or hotfix.
See Thomas Pornin (+1) and flouder's (+1) answers, they are good. I want to add a supplementary answer that may have been alluded to but not specifically stated, and that is register spilling.
Though the "where" of the original question (at least as worded) appears to be based on the false premise that %eax is on the stack, and being a register, it isn't part of the stack on x86 (though you can emulate any hardware register set on a stack, and some architectures actually do this, but that isn't relevant), incidentally, registers are spilled/filled from the stack frequently. So it is possible to smash the value of a register with a stack overflow if the register has been spilled to the stack. This would require you to know the spilling mechanism of the particular compiler, and for that function call, you would need to know that %eax had been spilled, where it had been spilled to, and stomp that stack location, and when it is next filled from its memory copy, it gets the new value. As unlikely as this seems, these attacks are usually inspired by reading source code, and knowing something about the compiler in question, so it isn't really that far fetched.
See this for more reading on register spilling
gcc argument register spilling on x86-64
https://software.intel.com/en-us/articles/dont-spill-that-register-ensuring-optimal-performance-from-intrinsics

entering ring 0 from user mode

Most modern operating systems run in the protected mode. Now is it possible for the user programs to enter the "ring 0" by directly setting the corresponding bits in some control registers. Or does it have to go through some syscall.
I believe to access the hardware we need to go through the operating system. But if we know the address of the hardware device can we just write some assembly language code with reference to the location of the device and access it. What happens when we give the address of some hardware device in the assembly language code.
Thanks.
To enter Ring 0, you must perform a system call, and by its nature, the system controls where you go, because for the call you simply give an index to the CPU, and the CPU looks inside a table to know what to call. You can't really get around the security aspect (obviously) to do something else, but maybe this link will help.
You can ask the operating system to map the memory of the hardware device into the memory space of your program. Once that's done, you can just read and write that memory from ring 3. Whether that's possible to do, or how to do that, depends on the operating system or the device.
; set PE bit
mov cr0, eax
or eax, 1
mov eax, cr0
; far jump (cs = selector of code segment)
jmp cs:#pm
#pm:
; Now we are in PM
Taken from Wikipedia.
Basic idea is to set (to 1) 0th bit in cr0 control register.
But if you are already in protected mode (i.e. you are in windows/linux), security restricts you to do it (you are in ring 3 - lowest trust).
So be the first one to get into protected mode.