Is it possible to tail call eBPF codes that use different modes? - ebpf

Is it possible to tail call eBPF codes that use different modes?
For example, if I coded a code that printk("hello world") using kprobe,
would I be able to tail call a XDP code afterwards or vice versa?
I programmed something on eBPF that uses a socket buffer and seems like when I try to tail call another code that uses kprobe, it doesn't load the program.
I wanted to tail call a code that uses XDP_PASS after using a BPF.SOCKET_FILTER mode but seems like tail call isn't working.
I've been trying to figure this out but I can't find any documentations regarding tail calling codes that use different modes :P
Thanks in advance!

No, it is not.
Have a look at kernel commit 04fd61ab36ec, which introduced tail calls: the comment in the first piece of code (in internal kernel header bpf.h), defining the struct bpf_array, sets a owner_prog_type member, and explains the following in a comment:
/* 'ownership' of prog_array is claimed by the first program that
* is going to use this map or by the first program which FD is stored
* in the map to make sure that all callers and callees have the same
* prog_type and JITed flag
*/
So once the program type associated with a BPF program array, used for tail calls, has been defined, it is not possible to use it with other program types. Which makes sense, since different program types work with different context (packet data VS traced function context VS ...), can use different helpers, have return functions with different meanings, necessitate different checks from the verifier, ... So it's hard to see how jumping from one type to another would work. How could you start with processing a network packet, and all of a sudden jump to a piece of code that is supposed to trace some internals of the kernel? :)
Note that it is also impossible to mix JIT-ed and non-JIT-ed programs, as indicated by the owner_jited of the struct.

Related

Is it a pure function?

I have following function:
def timestamp(key: String)
: String
= Monoid.combine(key, Instant.now().getEpochSecond.toString)
and wanted to know, if it is pure or not? A pure function for me is, given the same input returns always the same output. But the function above, given always the same string will returns another string with another time, that it is in my opinion not pure.
No, it's not pure by any definition I know of. A good discussion of pure functions is here: https://alvinalexander.com/scala/fp-book/definition-of-pure-function. In Alvin's definition of purity he says:
A pure function has no “back doors,” which means:
...
It cannot depend on any external I/O. It can’t rely on input from files, databases, web services, UIs, etc; it can’t produce output, such as writing to a file, database, or web service, writing to a screen, etc.
Reading the time of the current system uses I/O so it is not pure.
You are right, it is not a pure function as it returns different result for the same arguments. Mathematically speaking it is not a function at all.
Definition of Pure function from Wikipedia
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change while program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices (usually—see below).
Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices (usually—see below).

What is the architecture behind Scratch programming blocks?

I need to build a mini version of the programming blocks that are used in Scratch or later in snap! or openblocks.
The code in all of them is big and hard to follow, especially in Scratch which is written in some kind of subset of SmallTalk, which I don't know.
Where can I find the algorithm they all use to parse the blocks and transform it into a set of instructions that work on something, such as animations or games as in Scratch?
I am really interested in the algorithmic or architecture behind the concept of programming blocks.
This is going to be just a really general explanation, and it's up to you to work out specifics.
Defining a block
There is a Block class that all blocks inherit from. They get initialized with their label (name), shape, and a reference to the method. When they are run/called, the associated method is passed the current context (sprite) and the arguments.
Exact implementations differ among versions. For example, In Scratch 1.x, methods took arguments corresponding to the block's arguments, and the context (this or self) is the sprite. In 2.0, they are passed a single argument containing all of the block's arguments and context. Snap! seems to follow the 1.x method.
Stack (command) blocks do not return anything; reporter blocks do.
Interpreting
The interpreter works somewhat like this. Each block contains a reference to the next one, and any subroutines (reporter blocks in arguments; command blocks in a C-slot).
First, all arguments are resolved. Reporters are called, and their return value stored. This is done recursively for lots of Reporter blocks inside each other.
Then, the command itself is executed. Ideally this is a simple command (e.g. move). The method is called, the Stage is updated.
Continue with the next block.
C blocks
C blocks have a slightly different procedure. These are the if <> style, and the repeat <> ones. In addition to their ordinary arguments, they reference their "miniscript" subroutine.
For a simple if/else C block, just execute the subroutine normally if applicable.
When dealing with loops though, you have to make sure to thread properly, and wait for other scripts.
Events
Keypress/click events can be dealt with easily enough. Just execute them on keypress/click.
Something like broadcasts can be done by executing the hat when the broadcast stack is run.
Other events you'll have to work out on your own.
Wait blocks
This, along with threading, is the most confusing part of the interpretation to me. Basically, you need to figure out when to continue with the script. Perhaps set a timer to execute after the time, but you still need to thread properly.
I hope this helps!

VHDL Bus Functional Modelling - Can't put groups of procedures into a package to clean up the code

I want to organize a working bus functional model and push commonly used procedures (which look like CPU subroutines) out into a package and get them out of the main cpu model, but I'm stuck.
The procedures don't have access to the hardware bits when they're pushed out in a package.
In Verilog, I would put commonly used procedures out into an include file and link them into the CPU model as required for a given test suite.
More details:
I have a working bus functional model of a CPU, for simulation test benching.
At the "user interface" level I have a process called "main" running inside the CPU model which calls my predefined "instruction set" like this:
cpu_read(address, read_result);
cpu_write(address, write_data);
etc.
I bundle groups of those calls up into higher level procedures like
configure_communication_bus;
clear_all_packet_counters;
etc.
At the next layer these generic functions call a more hardware specific version which knows the interface timing for the design,
and those procedures then use an input record and output record to connect to the hardware module ports and waggle the cpu bus signals as required.
cpu_read calls hardware_cpu_read(cpu_input_record, cpu_output_record, address);
Something like this:
procedure cpu_read (address : in std_logic_vector(15 downto 0);
read_result : out std_logic_vector(31 downto 0));
begin
hardware_cpu_read(cpu_input_record, cpu_output_record, address, read_result);
end procedure;
The cpu_input_record and cpu_output_record are declared as signals of type nnn_record in the cpu model vhdl file.
So this is all working, but every single one of these procedures is all stored in the cpu VHDL module file, and all in the procedure declaration section so that they are all in the same scope.
If I share the model with team members they will need to add their own testing subroutines, and those also are all in the same location in the file, as well, their simulation test code has to go into the "main" process along with mine.
I'd rather link in various tests from outside the model, and only keep model specific procedures in the model file..
Ironically I can push the lowest level hardware procedure out to a package, and call those procedures from within the "main" process, but the higher level processes can't be put out into that package or any other packages because they don't have access to the cpu_read_record and cpu_write_record.
I feel like there must be a simple way to clean up this code and make it modular, and I'm just missing something obvious.
I don't really think making a command interpreter and loading my test code into a behavioral ROM is the right way to go by the way. Nor is fighting with the simulator interface to connect up a C program, but I may break down and try this..
Quick sketch of an answer (to the question I think you are asking! :-) though I may be off-beam...
To move the BFM subprograms into a reusable package, they need to be independent of the execution scope - that usually means a long parameter list for each of them. So using them in a testbench quickly gets tedious compared with the parameterless (or parameter-lite) versions you have now..
The usual workaround is to implement the BFM in a package, with long parameter lists.
Then write parameter-lite local equivalents (wrappers) in the execution scope, which simply call the package versions supplying all the parameters explicitly.
This is just boilerplate - not pretty but it does allow you to move the BFM into a package. These wrappers can be local to the testbench, to a process within it, or even to a subprogram within that process.
(The parameter types can be records for tidiness : these are probably declared in a third package, shared between BFM. TB, and synthesisable device under test...)
Thanks to overloading, there is no ambiguity between the local and BFM package versions, so the actual testbench remains as simple as possible.
Example wrapper function :
function cpu_read(address : unsigned) return slv_32 is
begin
return BFM_pack.cpu_read (
address => address,
rd_data_bus => tb_rd_data_bus,
wait => tb_wait_signal,
oe => tb_mem_oe,
-- ditto for all the signals constants variables it needs from the tb_ scope
);
end cpu_read;
Currently your test procedures require two extra signals on them, cpu_input_record and cpu_output_record. This is not so bad. It is not uncommon to just have these on all procedures that interact with the cpu and be done with it. So use hardware_cpu_read and not cpu_read. Add cpu_input_record, cpu_output_record to your configure_communication_bus and clear_all_packet_counters procedures and be done. Perhaps choose shorter names.
I do a similar approach, except I use only one record with resolved elements. To make this work, you need to initialize the record so that all elements are non-driving (ie: 'Z' for std_logic). To make this more flexible, I have created resolution functions for integer, time, and real. However, this only saves you one signal. Not a real huge win. Perhaps half way to where you think you want to be. But it is more work than what you are doing.
For VHDL-201X, we are working on syntax to allow parameters/ports automatically map to a identically named signal. This will get you to where you want to be with any of the approaches (yours, mine, or Brian's without the extra wrapper subprogram). It is posted here: http://www.eda.org/twiki/bin/view.cgi/P1076/ImplicitConnections. Given this, I would add the two records to your procedures and call it good enough for now.
Once you get by this problem, you seem to also be asking is how do I write separate tests using the same testbench. For this I use multiple architectures - I like to think of these as a Factory Class for concurrent code. To make this feasible, I separate the stimulus generation code from the rest of the testbench (typically: netlist connections and clock). My presentation, "VHDL Testbench Techniques that Leapfrog SystemVerilog", has an overview of this architecture along with a number of other goodies. It is available at: http://www.synthworks.com/papers/index.htm
You're definitely on the right track, in fact I have a variant like this (what you describe).
The catch is, now I build up a whole subroutine using the "parameter light" procedures, and those are what I want to put in a package to share and reuse. The problem is that any procedure pushed out to a package can't call to the parameter light procedures in the main vhdl file..
So what happens is we have one main vhdl file with all the common CPU hardware setup routines, and every designer's test code all in the same vhdl file..
Long story short, putting our test subroutines into separate files is really what I was hoping for..

Does MATLAB perform tail call optimization?

I've recently learned Haskell, and am trying to carry the pure functional style over to my other code when possible. An important aspect of this is treating all variables as immutable, i.e. constants. In order to do so, many computations that would be implemented using loops in an imperative style have to be performed using recursion, which typically incurs a memory penalty due to the allocation a new stack frame for each function call. In the special case of a tail call (where the return value of a called function is immediately returned to the callee's caller), however, this penalty can be bypassed by a process called tail call optimization (in one method, this can be done by essentially replacing a call with a jmp after setting up the stack properly). Does MATLAB perform TCO by default, or is there a way to tell it to?
If I define a simple tail-recursive function:
function tailtest(n)
if n==0; feature memstats; return; end
tailtest(n-1);
end
and call it so that it will recurse quite deeply:
set(0,'RecursionLimit',10000);
tailtest(1000);
then it doesn't look as if stack frames are eating a lot of memory. However, if I make it recurse much deeper:
set(0,'RecursionLimit',10000);
tailtest(5000);
then (on my machine, today) MATLAB simply crashes: the process unceremoniously dies.
I don't think this is consistent with MATLAB doing any TCO; the case where a function tail-calls itself, only in one place, with no local variables other than a single argument, is just about as simple as anyone could hope for.
So: No, it appears that MATLAB does not do TCO at all, at least by default. I haven't (so far) looked for options that might enable it. I'd be surprised if there were any.
In cases where we don't blow out the stack, how much does recursion cost? See my comment to Bill Cheatham's answer: it looks like the time overhead is nontrivial but not insane.
... Except that Bill Cheatham deleted his answer after I left that comment. OK. So, I took a simple iterative implementation of the Fibonacci function and a simple tail-recursive one, doing essentially the same computation in both, and timed them both on fib(60). The recursive implementation took about 2.5 times longer to run than the iterative one. Of course the relative overhead will be smaller for functions that do more work than one addition and one subtraction per iteration.
(I also agree with delnan's sentiment: highly-recursive code of the sort that feels natural in Haskell is typically likely to be unidiomatic in MATLAB.)
There is a simple way to check this. Create this function tail_recursion_check:
function r = tail_recursion_check(n)
if n > 1
r = tail_recursion_check(n - 1);
else
error('error');
end
end
and run tail_recursion_check(10), for example. You are going to see a very long stack trace with 10 items that says error at line 3. If there were tail call optimization, you would only see one.

Why inet_ntoa is designed to be a non-reentrant function?

Glancing at the source code of GNU C Library,I found the inet_ntoa is implementated with
static __thread char buffer[18]
My question is, since there is a need to use reeentrant inet_ntoa,why do not the author of GNU C Library use malloc to implementate it?
thanks.
The reason it's not using the heap is to conform with standards (POSIX) and other systems. The interface is just not such that you are supposed to free the buffer returned. It assumes static storage..
But by declaring it as thread local (with __thread), two threads do not conflict with each other if they happen to both be calling the function. This is glibc's workaround for the brokenness of the interface.
It's true that this is not re-entrant or consistent with the spirit of that term. If you have a recursive function that calls it, you cannot rely on the buffer being the same between calls. But it can be used by multiple threads, which often is good enough.
EDIT: By the way, I just remembered, there is a newer version of this function that uses a caller-provided buffer. See inet_ntop().