How to generate delay using eBPF kernel program - ebpf

I'm trying to generate delay in acknowledgement using eBPF kernel program for egress packet.
I'm running the python+c program using bcc.
i tried mdelay/msleep/udelay etc functions with delay.h library in c, it gives me LLVM error that "Program using external function which can't be resolved".
Then I tried implementing sleep functionality :
Taking 3 variables:
tprev (which get the current time at starting of prog)
tnow(this gets current time as the loop starts and gets updates in each iteration with current time)
timer: this is the duration for which we want program to be delayed.
the while loop is: while((tnow - tprev) ≤ timer)
but ebpf prog treat it as n infinite loop and gives error that infinite loop detected.
Whereas it's not an infinite loop.
Is there a way to introduce delay in Ack or delay in general in eBPF program and How?

The short answer is no(not in eBPF). At least not as of kernel 5.18. This is because eBPF programs, particularly those running in the network stack are often called from code that should never sleep.
What perhaps is most useful in your case is that TC(Traffic Control) programs can ask TC to delay a packet for you. The actual delay happens outside of the eBPF program, in the TC subsystem. You can request TC to send the packet at a given time by setting __sk_buff->tstamp. Note: this only works for Egress(outgoing) traffic not Ingress(incomming) traffic. This behavior can also be triggered via TC configuration without using eBPF.
tried mdelay/msleep/udelay etc functions with delay.h library in c, it gives me LLVM error that "Program using external function which can't be resolved".
Yes, you can't use the stdlib in eBPF programs since they use kernel facilities which are unavailable in eBPF.
Side notes:
We do have "sleepable" programs, but only syscall, LSM and tracing programs can be sleepable. "sleepable" doesn't mean we can call some sort of sleep helper with a duration, it means we can call helper functions which in turn may sleep(for example bpf_copy_from_user_stack). So you don't have control over how long the program will sleep.
Another time related feature is BPF timers, which actually allow you to set a timer and execute a callback after a given time. The limitation here is that you can't pass any arguments to this callback, and it is called without a context. So after setting the timer, the original program will continue and return as usual.

Related

how does the operating system treat few interrupts and keep processes going?

I'm learning computer organization and structure (I'm using Linux OS with x86-64 architecture). we've studied that when an interrupt occurs in user mode, the OS is notified and it switches between the user stack and the kernel stack by loading the kernels rsp from the TSS, afterwards it saves the necessary registers (such as rip) and in case of software interrupt it also saves the error-code. in the end, just before jumping to the adequate handler routine it zeroes the TF and in case of hardware interrupt it zeroes the IF also. I wanted to ask about few things:
the error code is save in the rip, so why loading both?
if I consider a case where few interrupts happen together which causes the IF and TF to turn on, if I zero the TF and IF, but I treat only one interrupt at a time, aren't I leave all the other interrupts untreated? in general, how does the OS treat few interrupts that occur at the same time when using the method of IDT with specific vector for each interrupt?
does this happen because each program has it's own virtual memory and thus the interruption handling processes of all the programs are unrelated? where can i read more about it?
how does an operating system keep other necessary progresses running while handling the interrupt?
thank you very much for your time and attention!
the error code is save in the rip, so why loading both?
You're misunderstanding some things about the error code. Specifically:
it's not generated by software interrupts (e.g. instructions like int 0x80)
it is generated by some exceptions (page fault, general protection fault, double fault, etc).
the error code (if used) is not saved in the RIP, it's pushed on the stack so that the exception handler can use it to get more information about the cause of the exception
2a. if I consider a case where few interrupts happen together which causes the IF and TF to turn on, if I zero the TF and IF, but I treat only one interrupt at a time, aren't I leave all the other interrupts untreated?
When the IF flag is clear, mask-able IRQs (which doesn't include other types of interrupts - software interrupts, exceptions) are postponed (not disabled) until the IF flag is set again. They're "temporarily untreated" until they're treated later.
The TF flag only matters for debugging (e.g. single-step debugging, where you want the CPU to generate a trap after every instruction executed). It's only cleared in case the process (in user-space) was being debugged, so that you don't accidentally continue debugging the kernel itself; but most processes aren't being debugged like this so most of the time the TF flag is already clear (and clearing it when it's already clear doesn't really do anything).
2b. in general, how does the OS treat few interrupts that occur at the same time when using the method of IDT with specific vector for each interrupt? does this happen because each program has it's own virtual memory and thus the interruption handling processes of all the programs are unrelated? where can i read more about it?
There's complex rules that determine when an interrupt can interrupt (including when it can interrupt another interrupt). These rules mostly only apply to IRQs (not software interrupts that the kernel won't ever use itself, and not exceptions which are taken as soon as they occur). Understanding the rules means understanding the IF flag and the interrupt controller (e.g. how interrupt vectors and the "task priority register" in the local APIC influence the "processor priority register" in the local APIC, which determines which groups of IRQs will be postponed when the IF flag is set). Information about this can be obtained from Intel's manuals, but how Linux uses it can only be obtained from Linux source code and/or Linux specific documentation.
On top of that there's "whatever mechanisms and practices the OS felt like adding on top" (e.g. deferred procedure calls, tasklets, softIRQs, additional stack management) that add more complications (which can also only be obtained from Linux source code and/or Linux specific documentation).
Note: I'm not a Linux kernel developer so can't/won't provide links to places to look for Linux specific documentation.
how does an operating system keep other necessary progresses running while handling the interrupt?
A single CPU can't run 2 different pieces of code (e.g. an interrupt handler and user-space code) at the same time. Instead it runs them one at a time (e.g. runs user-space code, then switches to an IRQ handler for very short amount of time, then returns to the user-space code). Because the IRQ handler only runs for a very short amount of time it creates the illusion that everything is happening at the same time (even though it's not).
Of course when you have multiple CPUs, different CPUs can/do run different pieces of code at the same time.

How can I invoke UART_Receive_IT() automatically when I receive a data?

I am new to STM32 and freertos. I need to write a program to send and receive data from a module via UART port. I have to send(Transmit) a data to that module(for eg. M66). Then I would return to do some other tasks. once the M66 send a response to that, my seial-port-receive-function(HAL_UART_Receive_IT) has to be invoked and receive that response. How can I achieve this?
The way HAL_UART_Receive_IT works is that you configure it to receive specified amount of data into given buffer. You give it your buffer to which it'll read received data and number of bytes you want to receive. It then starts receiving data. Once exactly this amount of data is received, a callback function HAL_UART_RxCpltCallback gets called (from IRQ) where you can do whatever you want with this data, e.g. add it to some kind of queue for later processing in the task context.
If I was to express my experiences related to working with HAL's UART module is that it's not the greatest one for generic use where you don't know the amount of data you expect to receive in advance. In the case of M66 modem you mention, this will happen all the time.
To solve this you have two choices:
Simply don't use HAL functions at all in case of UART, other than the initialization functions. Implement your own UART interrupt handler (most of the code can be copied from handler in HAL) where upon receiving data you place received bytes in a receive byte queue handled in your RTOS task. In this task you implement protocol parsing. This is the approach I use personally.
If you really want to use HAL but also work with a module that sends varying amount of data, call HAL_UART_Receive_IT and specify that you want to receive 1 byte each time. This will work, but will be (potentially much) slower than the first approach. Assuming you'll later want to implement some tcp/ip communication (you mentioned M66 GPRS module) you probably don't want to do it this way.
You should try the following way.
Enable UARTX Rx interrupt in NVIC.
Set Interrupt priority.
Unmask Interrupt request in EXTI.
Then use USARTX Interrupt Handler Function Define in you Vector.
Whenever the data is received from USARTX this function get automatically called and you can copy data from USARTX Receive Data Register.
I would rather suggest another approach. You probably want to archive higher speeds (lets say 921600 bods) and the interrupt way is fat to slow for it.
You need to implement the DMA transmition with the data end detection features. Run your USART in the DMA mode in the circular mode. You will have two events to serve. The first one is the DMA end of thransmition interrupt (then you copy the data from the current tail pointer to the end of the buffer to avoid data override) and USART IDLE interrupt - this will detect the end of the receive.

How the callback functions work in stm32 Hal Library?

As we all know,the Hal Lib provides some callback function to manage hardware interrupt.But i don't know how them work?
Te fact is that I am using HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart) this function so as to receive other devices' data and check those data.So I use the usart interrupt to receive them.
But I don't know when the callback function will be executed,is it depends on the receive buffer's length or the data's buffer?
I guess the hardware interrupt will be triggered while a character has been received,but the callback function will be executed after the receive buffer is full.
PS:I am using the stm32-nucleo-f410 development board to communicate with an AT commend device,and I am a novice about it.
(So sorry for my poor English!)
Thanks a lot.
The callback you are referring to is called when the amount of data specified in the receive functions (the third argument to HAL_UART_Receive_IT). You are correct that the UART interrupt service routine (ISR) is called every time a character is received, but when using the HAL that happens internally to the library and doesn't need to be managed by you. Every time the ISR is called, the received character is moved into the array you provide via the second argument of HAL_UART_Receive_IT, and when the number of characters specified by the call is reached, the callback will be called in that ISR (so make sure not to do anything that will take too much time to complete - ISRs should be short, and the ISRs in the HAL library are already pretty lengthy to handle every possible use case).
Further, if you find that the callback is not being triggered even if you are sending enough data to the peripheral, make sure the interrupt is actually enabled - the HAL_UART_Receive_IT function doesn't actually enable the interrupt, that has to be done during initialization of the peripheral.

Context switch by random system call

I know that an interrupt causes the OS to change a CPU from its current task and to run a kernel routine. I this case, the system has to save the current context of the process running on the CPU.
However, I would like to know whether or not a context switch occurs when any random process makes a system call.
I would like to know whether or not a context switch occurs when any random process makes a system call.
Not precisely. Recall that a process can only make a system call if it's currently running -- there's no need to make a context switch to a process that's already running.
If a process makes a blocking system call (e.g, sleep()), there will be a context switch to the next runnable process, since the current process is now sleeping. But that's another matter.
There are generally 2 ways to cause a content switch. (1) a timer interrupt invokes the scheduler that forcibly makes a context switch or (2) the process yields. Most operating systems have a number of system services that will cause the process to yield the CPU.
well I got your point. so, first I clear a very basic idea about system call.
when a process/program makes a syscall and interrupt the kernel to invoke syscall handler. TSS loads up Kernel stack and jump to syscall function table.
See It's actually same as running a different part of that program itself, the only major change is Kernel play a role here and that piece of code will be executed in ring 0.
now your question "what will happen if a context switch happen when a random process is making a syscall?"
well, nothing will happen. Things will work in same way as they were working earlier. Just instead of having normal address in TSS you will have address pointing to Kernel stack and SysCall function table address in that random process's TSS.

how does the processor know an instruction is making a system call

system call -- It is an instruction that generates an interrupt that causes OS to gain
control of processor.
so if a running process issue a system call (e.g. create/terminate/read/write etc), a interrupt is generated which cause the KERNEL TO TAKE CONTROL of the processor which then executes the required interrupt handler routine. correct?
then can anyone tell me how the processor known that this instruction is supposed to block the process, go to privileged mode, and bring kernel code.
I mean as a programmer i would just type stream1=system.io.readfile(ABC) or something, which translates to open and read file ABC.
Now what is monitoring the execution of this process, is there a magical power in the cpu to detect this?
As from what i have read a PROCESSOR can only execute only process at a time, so WHERE IS THE MONITOR PROGRAM RUNNING?
How can the KERNEL monitor if a system call is made or not when IT IS NOT IN RUNNING STATE!!
or does the computer have a SYSTEM CALL INSTRUCTION TABLE which it compares with before executing any instruction?
please help
thanku
The kernel doesn't monitor the process to detect a system call. Instead, the process generates an interrupt which transfers control to the kernel, because that's what software-generated interrupts do according to the instruction set reference manual.
For example, on Unix the process stuffs the syscall number in eax and runs an an int 0x80 instruction, which generates interrupt 0x80. The CPU reacts to this by looking in the Interrupt Descriptor Table to find the kernel's handler for that interrupt. This handler is the entry point for system calls.
So, to call _exit(0) (the raw system call, not the glibc exit() function which flushes buffers) in 32-bit x86 Linux:
movl $1, %eax # The system-call number. __NR_exit is 1 for 32-bit
xor %ebx,%ebx # put the arg (exit status) in ebx
int $0x80
Let's analyse each questions you have posed.
Yes, your understanding is correct.
See, if any process/thread wants to get inside kernel there are only two mechanisms, one is by executing TRAP machine instruction and other is through interrupts. Usually interrupts are generated by the hardware, so any other process/threads wants to get into kernel it goes through TRAP. So as usual when TRAP is executed by the process it issues interrupt (mostly software interrupt) to your kernel. Along with trap you will also mentions the system call number, this acts as input to your interrupt handler inside kernel. Based on system call number your kernel finds the system call function inside system call table and it starts to execute that function. Kernel will set the mode bit inside cs register as soon as it starts to handle interrupts to intimate the processor as current instruction is a privileged instruction. By this your processor will comes to know whether the current instruction is privileged or not. Once your system call function finished it's execution your kernel will execute IRET instruction. Which will clear mode bit inside CS register to inform whatever instruction from now inwards are from user mode.
There is no magical power inside processor, switching between user and kernel context makes us to think that processor is a magical thing. It is just a piece of hardware which has the capability to execute tons of instructions at a very high rate.
4..5..6. Answers for all these questions are answered in above cases.
I hope I've answered your questions up to some extent.
The interrupt controller signals the CPU that an interrupt has occurred, passes the interrupt number (since interrupts are assigned priorities to handle simultaneous interrupts) thus the interrupt number to determine wich handler to start. The CPu jumps to the interrupt handler and when the interrupt is done, the program state reloaded and resumes.
[Reference: Silberchatz, Operating System Concepts 8th Edition]
What you're looking for is mode bit. Basically there is a register called cs register. Normally its value is set to 3 (user mode). For privileged instructions, kernel sets its value to 0. Looking at this value, processor knows which kind of instruction it is. If you're interested digging more please refer this excellent article.
Other Ref.
Where is mode bit
Modern hardware supports multiple user sessions. If your hw supports multi user mode, i provides a mechanism called interrupt. An interrupt basically stops the execution of the current code to execute other code (e.g kernel code).
Which code is executed is decided by parameters, that get passed to the interrupt, by the code that issues the interrupt. The hw will increase the run level, load the kernel code into the memory and forces the cpu to execute this code. When the kernel code returns, it again directly informs the hw and the run level gets decreased.
The HW will then restore the cpu state before the interrupt and set the cpu the the next line in the code that started the interrupt. Done.
Since the code is actively calling the hw, which again actively calls the kernel, no monitoring needs to be done by the kernel itself.
Side note:
Try to keep your question short. Make clear what you want. The first answer was correct for the question you posted, you just didnt phrase it well. Make clear that you are new to the topic and need a detailed explanation of basic concepts instead of explaining what you understood so far and don't use caps lock.
Please accept the answer cnicutar provided. thank you.