Why are there ioctl calls in socket.c? - sockets

Trying to understand why there are ioctl calls in socket.c ? I can see a modified kernel that I am using, it has some ioctl calls which load in the required modules when the calls are made.
I was wondering why these calls ended up in socket.c ? Isn't socket kind of not-a-device and ioctls are primarily used for device.
Talking about 2.6.32.0 heavily modified kernel here.

ioctl suffers from its historic name. While originally developed to perform i/o controls on devices, it has a generic enough construct that it may be used for arbitrary service requests to the kernel in context of a file descriptor. A file descriptor is an opaque value (just an int) provided by the kernel that can be associated with anything.
Now if you treat a file descriptor and think of things as files, which most *nix constructs do, open/read/write/close isn't enough. What if you want to label a file (rename)? what if you want to wait for a file to become available (ioctl)? what if you want to terminate everything if a file closes (termios)? all the "meta" operations that don't make sense in the core read/write context are lumped under ioctls; fctls; etc. unless they are so frequently used that they deserve their own system call (e.g. flock(2) functionality in BSD4.2)

Related

How do I add a missing peripheral register to a STM32 MCU model in Renode?

I am trying out this MCU / SoC emulator, Renode.
I loaded their existing model template under platforms/cpus/stm32l072.repl, which just includes the repl file for stm32l071 and adds one little thing.
When I then load & run a program binary built with STM32CubeIDE and ST's LL library, and the code hits the initial function of SystemClock_Config(), where the Flash:ACR register is being probed in a loop, to observe an expected change in value, it gets stuck there, as the Renode Monitor window is outputting:
[WARNING] sysbus: Read from an unimplemented register Flash:ACR (0x40022000), returning a value from SVD: 0x0
This seems to be expected, not all existing templates model nearly everything out of the box. I also found that the stm32L071 model is missing some of the USARTs and NVIC channels. I saw how, probably, the latter might be added, but there seems to be not a single among the default models defining that Flash:ACR register that I could use as example.
How would one add such a missing register for this particular MCU model?
Note1: For this test, I'm using a STM32 firmware binary which works as intended on actual hardware, e.g. a devboard for this MCU.
Note2:
The stated advantage of Renode over QEMU, which does apparently not emulate peripherals, is also allowing to stick together a more complex system, out of mocked external e.g. I2C and other devices (apparently C# modules, not yet looked into it).
They say "use the same binary as on the real system".
Which is my reason for trying this out - sounds like a lot of potential for implementing systems where the hardware is not yet fully available, and also automatted testing.
So the obvious thing, commenting out a lot of parts in init code, to only test some hardware-independent code while sidestepping such issues, would defeat the purpose here.
If you want to just provide the ACR register for the flash to pass your init, use a tag.
You can either provide it via REPL (recommended, like here https://github.com/renode/renode/blob/master/platforms/cpus/stm32l071.repl#L175) or via RESC.
Assuming that your software would like to read value 0xDEADBEEF. In the repl you'd use:
sysbus:
init:
Tag <0x40022000, 0x40022003> "ACR" 0xDEADBEEF
In the resc or in the Monitor it would be just:
sysbus Tag <0x40022000, 0x40022003> "ACR" 0xDEADBEEF
If you want more complex logic, you can use a Python peripheral, as described in the docs (https://renode.readthedocs.io/en/latest/basic/using-python.html#python-peripherals-in-a-platform-description):
flash: Python.PythonPeripheral # sysbus 0x40022000
size: 0x1000
initable: false
filename: "script_with_complex_python_logic.py"
```
If you really need advanced implementation, then you need to create a complete C# model.
As you correctly mentioned, we do not want you to modify your binary. But we're ok with mocking some parts we're not interested in for a particular use case if the software passes with these mocks.
Disclaimer: I'm one of the Renode developers.

Understanding higher level call to systemcalls

I am going through the book by Galvin on OS . There is a section at the end of chapter 2 where the author writes about "adding a system call " to the kernel.
He describes how using asmlinkage we can create a file containing a function and make it qualify as a system call . But in the next part about how to call the system call he writes the following :
" Unfortunately, these are low-level operations that cannot be performed using C language statements and instead require assembly instructions. Fortunately, Linux provides macros for instantiating wrapper functions that contain the appropriate assembly instructions. For instance, the following C program uses the _syscallO() macro to invoke the newly defined system call:
Basically , I want to understand how syscall() function generally works . Now , what I understand by Macros is a system for text substitution .
(Please correct me If I am wrong)
How does a macro call an assembly language instruction ?
Is it so that syscallO() when compiled is translated into the address(op code) of the instruction to execute a trap ?(But this somehow doesn't fit with concept or definition of macros that I have )
What exactly are the wrapper functions that are contained inside and are they also written in assembly language ?
Suppose , I want to create a function of my own which performs the system call then what are the things that I need to do . Do , I need to compile it to generate the machine code for performing Trap instructions ?
Man, you have to pay $156 dollars to by the thing, then you actually have to read it. You could probably get an VMS Internals and Data Structures book for under $30.
That said, let me try to translate that gibberish into English.
System calls do not use the same kind of linkage (i.e. method of passing parameters and calling functions) that other functions use.
Rather than executing a call instruction of some kind, to execute a system service, you trigger an exception (which in Intel is bizarrely called an interrupt).
The CPU expects the operating system to create a DISPATCH TABLE and store its location and size in a special hardware register(s). The dispatch table is an array of pointers to handlers for exceptions and interrupts.
Exceptions and interrupts have numbers so, when exception or interrupt number #1 occurs, the CPU invokes the 2d exception handler (not #0, but #1) in the dispatch table in kernel mode.
What exactly are the wrapper functions that are contained inside and are they also written in assembly language ?
The operating system devotes usually one (but sometimes more) exceptions to system services. You need to do some thing like this in assembly language to invoke a system service:
INT $80 ; Explicitly trigger exception 80h
Because you have to execute a specific instruction, this has to be one in assembly language. Maybe your C compiler can do assembly language in line to call system service like that. But even if it could, it would be a royal PITA to have to do it each time you wanted to call a system service.
Plus I have not filled in all the details here (only the actual call to the system service). Normally, when you call functions in C (or whatever), the arguments are pushed on the program stack. Because the stack usually changes when you enter kernel mode, arguments to system calls need to be stored in registers.
PLUS you need to identify what system service you want to execute. Usually, system services have numbers. The number of the system service is loaded into the first register (e.g., R0 or AX).
The full process when you need to invoke a system service is:
Save the registers you are going to overwrite on the stack.
Load the arguments you want to pass to the system service into hardware registers.
Load the number of the system service into the lowest register.
Trigger the exception to enter kernel mode.
Unload the arguments returned by the system service from registers
Possibly do some error checking
Restore the registers you saved before.
Instead of doing this each time you call a system service, operating systems provide wrapper functions for high level languages to use. You call the wrapper as you would normally call a function. The wrapper (in assembly language) does the steps above for you.
Because these wrappers are pretty much the same (usually the only difference is the result of different numbers of arguments), wrappers can be created using macros. Some assemblers have powerful macro facilities that allow a single macro to define all wrappers, even with different numbers of arguments.
Linux provides multiple _syscall C macros that create wrappers. There is one for each number of arguments. Note that these macros are just for operating system developers. Once the wrapper is there, everyone can use it.
How does a macro call an assembly language instruction ?
These _syscall macros have to generate in line assembly code.
Finally, note that these wrappers do not define the actual system service. That has to be set up in the dispatch table and the system service exception handler.

What exactly happens when an OS goes into kernel mode?

I find that neither my textbooks or my googling skills give me a proper answer to this question. I know it depends on the operating system, but on a general note: what happens and why?
My textbook says that a system call causes the OS to go into kernel mode, given that it's not already there. This is needed because the kernel mode is what has control over I/O-devices and other things outside of a specific process' adress space. But if I understand it correctly, a switch to kernel mode does not necessarily mean a process context switch (where you save the current state of the process elsewhere than the CPU so that some other process can run).
Why is this? I was kinda thinking that some "admin"-process was switched in and took care of the system call from the process and sent the result to the process' address space, but I guess I'm wrong. I can't seem to grasp what ACTUALLY is happening in a switch to and from kernel mode and how this affects a process' ability to operate on I/O-devices.
Thanks alot :)
EDIT: bonus question: does a library call necessarily end up in a system call? If no, do you have any examples of library calls that do not end up in system calls? If yes, why do we have library calls?
Historically system calls have been issued with interrupts. Linux used the 0x80 vector and Windows used the 0x2F vector to access system calls and stored the function's index in the eax register. More recently, we started using the SYSENTER and SYSEXIT instructions. User applications run in Ring3 or userspace/usermode. The CPU is very tricky here and switching from kernel mode to user mode requires special care. It actually involves fooling the CPU to think it was from usermode when issuing a special instruction called iret. The only way to get back from usermode to kernelmode is via an interrupt or the already mentioned SYSENTER/EXIT instruction pairs. They both use a special structure called the TaskStateSegment or TSS for short. These allows to the CPU to find where the kernel's stack is, so yes, it essentially requires a task switch.
But what really happens?
When you issue an system call, the CPU looks for the TSS, gets its esp0 value, which is the kernel's stack pointer and places it into esp. The CPU then looks up the interrupt vector's index in another special structure the InterruptDescriptorTable or IDT for short, and finds an address. This address is where the function that handles the system call is. The CPU pushes the flags register, the code segment, the user's stack and the instruction pointer for the next instruction that is after the int instruction. After the systemcall has been serviced, the kernel issues an iret. Then the CPU returns back to usermode and your application continues as normal.
Do all library calls end in system calls?
Well most of them do, but there are some which don't. For example take a look at memcpy and the rest.

Kernel Code vs User Code

Here's a passage from the book
When executing kernel code, the system is in kernel-space execut-
ing in kernel mode.When running a regular process, the system is in user-space executing
in user mode.
Now what really is a kernel code and user code. Can someone explain with example?
Say i have an application that does printf("HelloWorld") now , while executing this application, will it be a user code, or kernel code.
I guess that at some point of time, user-code will switch into the kernel mode and kernel code will take over, but I guess that's not always the case since I came across this
For example, the open() library function does little except call the open() system call.
Still other C library functions, such as strcpy(), should (one hopes) make no direct use
of the kernel at all.
If it does not make use of the kernel, then how does it make everything work?
Can someone please explain the whole thing in a lucid way.
There isn't much difference between kernel and user code as such, code is code. It's just that the code that executes in kernel mode (kernel code) can (and does) contain instructions only executable in kernel mode. In user mode such instructions can't be executed (not allowed there for reliability and security reasons), they typically cause exceptions and lead to process termination as a result of that.
I/O, especially with external devices other than the RAM, is usually performed by the OS somehow and system calls are the entry points to get to the code that does the I/O. So, open() and printf() use system calls to exercise that code in the I/O device drivers somewhere in the kernel. The whole point of a general-purpose OS is to hide from you, the user or the programmer, the differences in the hardware, so you don't need to know or think about accessing this kind of network card or that kind of display or disk.
Memory accesses, OTOH, most of the time can just happen without the OS' intervention. And strcpy() works as is: read a byte of memory, write a byte of memory, oh, was it a zero byte, btw? repeat if it wasn't, stop if it was.
I said "most of the time" because there's often page translation and virtual memory involved and memory accesses may result in switched into the kernel, so the kernel can load something from the disk into the memory and let the accessing instruction that's caused the switch continue.

System call or function call - performance-wise

In Linux, when you can choose between a system call or a function call to do a task, which option is the better one due to a better performance?
We should note that in most of the cases we do not directly use system call. We use the interface provided by glibc.
http://www.kernel.org/doc/man-pages/online/pages/man2/syscalls.2.html
http://www.gnu.org/software/libc/manual/html_node/System-Calls.html
Now in cases like File Mangement/IPC/ process management etc which are the core resource management activities of the Operating System the only option is system call and not library functions.
In these cases, typically we use Library function which works as a wrapper over a system call. That is say for reading a file, we have many library functions like
fgetc/fgets/fscanf/fread - all should invoke read system call.
So shall we use read system call? or the other library functions?
This should depend on the particular application.If we are using read, then we again need to change the code to run this, on some other operating system where read is not available.
We are losing some flexibilty. It may be useful when we are sure of the platform and we can do some optimisations by using read only or may be the application must use only file descriptors and not file pointer etc.
Now in cases where we need to consider only say user level operations and invoke
no service from operating system , like say copying a string.(strcpy).
In this case definitely we shall not use any system call unnecessarily, if at
all something is there, since it should be an extra overhead due to operating
system intervention, which is not needed in this case.
So I feel choosing between a system call and a library function only occurs for cases where we have a library function built on top of a system call.
(like adding to examples above we can have say malloc which calls system call brk).
Here the choice will depend on the particular type of software, the platform on which it should run, the precise non functional requirements like speed (Though you cannot say with certainty that your code will run faster if you are using brk instead of malloc), portability etc.