Simple System Call Implementation example?

Simple System Call Implementation example? - operating-system

Interestingly, I couldn't find any simple example on web. Can you share a simple example please? I'm trying to understand following by analyzing an example.
⦁ Typically,
⦁ a number associated with each system call
⦁ Number used as an index to a table: System Call table
⦁ Table keeps addresses of system calls (routines)
⦁ System call runs and returns
⦁ Caller does not know system call implementation
⦁ Just knows interface

This depends on which architecture you want to add a system call for, or if you want to add the system call for all architectures. I will explain one way to add a system call for ARM.
Pick a name for your syscall. For example, mysyscall.
Choose a syscall number. In arch/arm/include/asm/unistd.h, take note of how each syscall has a specific number (__NR__SYSCALL_BASE+<number>) assigned to it. Choose an unused number for your syscall. Let us choose syscall number 223. Then add:
#define __NR_mysyscall (__NR_SYSCALL_BASE+223
where the index 223 would be in that header file. This assigns the number 223 to your syscall on ARM architectures.
Modify architecture-specific syscall table. In linux/arch/arm/kernel/calls.S, change the line that corresponds to syscall 223 to:
CALL(sys_mysyscall)
Add your function prototype. Suppose you wanted to add a non-architecture-specific syscall. Edit the file: include/linux/syscalls.h and add your syscall's prototype:
asmlinkage long sys_mysyscall(struct dummy_struct *buf);
If you wanted to add it specifically for ARM, then do the following except in this file: arch/arm/kernel/sys_arm.c.
Implement your syscall somewhere. Create a file whereever you please. For example, in the kernel/ directory. You need to at least have:
#include <linux/syscalls.h>
...
SYSCALL_DEFINE1(mysyscall, struct dummy_struct __user *, buf)
{
/* Implement your syscall */
}
Note the macro, SYSCALL_DEFINE1. The number at the end should correspond to how many input parameters your syscall has. In this case, our system call only has 1 parameter, so you use SYSCALL_DEFINE1. If it had two parameters, you would use SYSCALL_DEFINE2, etc.
Don't forget to add the object (.o) file to the Makefile in the directory where you put it.
Compile your new kernel and test. You haven't modified your C libraries, so you cannot invoke your syscall with mysyscall(). You need to use the syscall() function which takes a system call number as its first argument:
struct dummy_struct *buf = calloc(1, sizeof(buf));
int res = syscall(223, buf);
Do note that this was for ARM. The process will be very similar for other architectures.
Edit: Don't forget to add your syscall file to the Makefile in kernel/.

Related

Is it possible to tail call eBPF codes that use different modes?

Is it possible to tail call eBPF codes that use different modes?
For example, if I coded a code that printk("hello world") using kprobe,
would I be able to tail call a XDP code afterwards or vice versa?
I programmed something on eBPF that uses a socket buffer and seems like when I try to tail call another code that uses kprobe, it doesn't load the program.
I wanted to tail call a code that uses XDP_PASS after using a BPF.SOCKET_FILTER mode but seems like tail call isn't working.
I've been trying to figure this out but I can't find any documentations regarding tail calling codes that use different modes :P
Thanks in advance!

No, it is not.
Have a look at kernel commit 04fd61ab36ec, which introduced tail calls: the comment in the first piece of code (in internal kernel header bpf.h), defining the struct bpf_array, sets a owner_prog_type member, and explains the following in a comment:
/* 'ownership' of prog_array is claimed by the first program that
* is going to use this map or by the first program which FD is stored
* in the map to make sure that all callers and callees have the same
* prog_type and JITed flag
*/
So once the program type associated with a BPF program array, used for tail calls, has been defined, it is not possible to use it with other program types. Which makes sense, since different program types work with different context (packet data VS traced function context VS ...), can use different helpers, have return functions with different meanings, necessitate different checks from the verifier, ... So it's hard to see how jumping from one type to another would work. How could you start with processing a network packet, and all of a sudden jump to a piece of code that is supposed to trace some internals of the kernel? :)
Note that it is also impossible to mix JIT-ed and non-JIT-ed programs, as indicated by the owner_jited of the struct.

Understanding higher level call to systemcalls

I am going through the book by Galvin on OS . There is a section at the end of chapter 2 where the author writes about "adding a system call " to the kernel.
He describes how using asmlinkage we can create a file containing a function and make it qualify as a system call . But in the next part about how to call the system call he writes the following :
" Unfortunately, these are low-level operations that cannot be performed using C language statements and instead require assembly instructions. Fortunately, Linux provides macros for instantiating wrapper functions that contain the appropriate assembly instructions. For instance, the following C program uses the _syscallO() macro to invoke the newly defined system call:
Basically , I want to understand how syscall() function generally works . Now , what I understand by Macros is a system for text substitution .
(Please correct me If I am wrong)
How does a macro call an assembly language instruction ?
Is it so that syscallO() when compiled is translated into the address(op code) of the instruction to execute a trap ?(But this somehow doesn't fit with concept or definition of macros that I have )
What exactly are the wrapper functions that are contained inside and are they also written in assembly language ?
Suppose , I want to create a function of my own which performs the system call then what are the things that I need to do . Do , I need to compile it to generate the machine code for performing Trap instructions ?

Man, you have to pay $156 dollars to by the thing, then you actually have to read it. You could probably get an VMS Internals and Data Structures book for under $30.
That said, let me try to translate that gibberish into English.
System calls do not use the same kind of linkage (i.e. method of passing parameters and calling functions) that other functions use.
Rather than executing a call instruction of some kind, to execute a system service, you trigger an exception (which in Intel is bizarrely called an interrupt).
The CPU expects the operating system to create a DISPATCH TABLE and store its location and size in a special hardware register(s). The dispatch table is an array of pointers to handlers for exceptions and interrupts.
Exceptions and interrupts have numbers so, when exception or interrupt number #1 occurs, the CPU invokes the 2d exception handler (not #0, but #1) in the dispatch table in kernel mode.
What exactly are the wrapper functions that are contained inside and are they also written in assembly language ?
The operating system devotes usually one (but sometimes more) exceptions to system services. You need to do some thing like this in assembly language to invoke a system service:
INT $80 ; Explicitly trigger exception 80h
Because you have to execute a specific instruction, this has to be one in assembly language. Maybe your C compiler can do assembly language in line to call system service like that. But even if it could, it would be a royal PITA to have to do it each time you wanted to call a system service.
Plus I have not filled in all the details here (only the actual call to the system service). Normally, when you call functions in C (or whatever), the arguments are pushed on the program stack. Because the stack usually changes when you enter kernel mode, arguments to system calls need to be stored in registers.
PLUS you need to identify what system service you want to execute. Usually, system services have numbers. The number of the system service is loaded into the first register (e.g., R0 or AX).
The full process when you need to invoke a system service is:
Save the registers you are going to overwrite on the stack.
Load the arguments you want to pass to the system service into hardware registers.
Load the number of the system service into the lowest register.
Trigger the exception to enter kernel mode.
Unload the arguments returned by the system service from registers
Possibly do some error checking
Restore the registers you saved before.
Instead of doing this each time you call a system service, operating systems provide wrapper functions for high level languages to use. You call the wrapper as you would normally call a function. The wrapper (in assembly language) does the steps above for you.
Because these wrappers are pretty much the same (usually the only difference is the result of different numbers of arguments), wrappers can be created using macros. Some assemblers have powerful macro facilities that allow a single macro to define all wrappers, even with different numbers of arguments.
Linux provides multiple _syscall C macros that create wrappers. There is one for each number of arguments. Note that these macros are just for operating system developers. Once the wrapper is there, everyone can use it.
How does a macro call an assembly language instruction ?
These _syscall macros have to generate in line assembly code.
Finally, note that these wrappers do not define the actual system service. That has to be set up in the dispatch table and the system service exception handler.

Specman e vr_ad: How to use read_reg_field?

in UVM e Reference document is written:
You can call read_reg_field or write_reg_field for registers whose fields
are defined as single_field_access (see “vr_ad_port_unit Syntax and Examples”).
...
For example:
write_reg_fields tx_mode_reg {.resv = 4; .dest = 2};
But there is no example for using read_reg_field...
Could you please explain how should it be used?
(I've tried the next code, but it gives compilation error:
some_var = read_reg_field my_reg_file.my_reg {.my_reg_field} )
Thank you for your help.

As far as I know there is no read_reg_fieds macro. If you want to do a read to a register and then save the value of a certain field, do this:
read_reg my_reg;
value = my_reg.my_reg_field;
Normally, when you read register, you read them completely. Reading only individual fields makes sense if your bus protocol allows narrow transfers (i.e. your data width is 32 bits, but you can do 16 bit transfers on it). I haven't seen such a thing implemented in vr_ad (could be there and I just don't know of it), but UVM RAL (the SystemVerilog register package) supports it.
Long story short, if you just care about getting your data from your DUT, using read_reg is enough.

When the Design Under Test is implemented in verilog or vhdl - you can read the register as a whole, you cannot "read just some of its fields".
A register is at a specific address, reading this register -> read from this address.
The quote of the spec about fields access is when the DUT is a SystemC model.
Connecting to SC models is done using ports. If the model defines a port for each field - you can read a field.

how sys_open works?

I have write a simple char device driver (mydev) with "open" file operation in it.
In user space application I open this driver node. using open("/dev/mydev", O_RDONLY);
The open() system call internally calls the sys_open().
I just want to know the follow of how sys_open() function call my driver's open file operation. How VFS handles this, which function it internally calls.

I found the answer in Understanding Linux Kernel book, at section 12.5.1
Steps are,
Invokes getname( ) to read the file pathname from the process address space.
Invokes get_unused_fd( ) to find an empty slot in current->files->fd. The
corresponding index (the new file descriptor) is stored in the fd local variable.
Invokes the filp_open( ) function, passing as parameters the pathname, the access
mode flags, and the permission bit mask. This function, in turn, executes the following
steps:
a. Invokes get_empty_filp( ) to get a new file object.
b. Sets the f_flags and f_mode fields of the file object according to the values of
the flags and modes parameters.
c. Invokes open_namei( ), which executes the following operations:
i. Invokes lookup_dentry( ) to interpret the file pathname and gets the
dentry object associated with the requested file.
ii. Performs a series of checks to verify whether the process is permitted
to open the file as specified by the values of the flags parameter. If so,
returns the address of the dentry object; otherwise, returns an error code.
d. If the access is for writing, checks the value of the i_writecount field of the
inode object. A negative value means that the file has been memory-mapped,
specifying that write accesses must be denied (see the section Section 15.2 in
Chapter 15). In this case, returns an error code. Any other value specifies the
number of processes that are actually writing into the file. In the latter case,
increments the counter.
e. Initializes the fields of the file object; in particular, sets the f_op field to the contents of the i_op->default_file_ops field of the inode object. This sets
up all the right functions for future file operations.
f. If the open method of the (default) file operations is defined, invokes it.
g. Clears the O_CREAT, O_EXCL, O_NOCTTY, and O_TRUNC flags in f_flags.
h. Returns the address of the file object.
Sets current->files->fd[fd] to the address of the file object.
Returns fd .

How to use the select() function in socket programming?

The prototype is:
int select (int nfds,
fd_set *read-fds,
fd_set *write-fds,
fd_set *except-fds,
struct timeval *timeout);
I've been struggling to understand this function for quite some time. My question is, if it checks all the file descriptors from 0 to nfds-1, and will modify the read-fds, write-fds and except-fds when return, why do I need to use FD_SET to add file descriptors to the set at the begining, it will check all the file descriptors anyway, or not?

It won't check from 0 to nfds-1. The first argument just provides an upper bound on how large, numerically, the file descriptors used are. This is because the set itself might be represented as a bitvector, without a way to know how many bits are actually used. Specifying this as a separate argument helps select() avoid checking file descriptors that are not in use.
Also, a descriptor that is not in e.g. the read set when you call select() is not being checked at all, so it cannot appear in the set when the call returns, either.

I once had the same doubt as yours. You can look at following question and answers:
Query on Select System Call

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse