__init and __exit macros usage for built-in and loadable modules - init

I was reading about linux kernel development and I just read some text that I don't understand. Here is the paragraph, which talks about the __init and __exit macros for modules:
This demonstrates a feature of kernel 2.2 and later. Notice the change
in the definitions of the init and cleanup functions. The __init macro
causes the init function to be discarded and its memory freed once the
init function finishes for built−in drivers, but not loadable modules.
If you think about when the init function is invoked, this makes
perfect sense.
There is also an __initdata
which works similarly to __init but for init variables rather than
functions.
The __exit macro causes the omission of the function when
the module is built into the kernel, and like
__exit, has no effect for loadable modules. Again, if you consider when the cleanup function runs
I get the point; the macro __init causes the init function to be discarded and its memory freed once the init function finishes for built-in drivers. But why? not for loadable modules? I couldn't make sense of it.
I know it's a silly thing, but I thought about it for some time and couldn't comprehend it fully. Why for built-in driver but not for loadable modules? Variables, addresses etc assigned in __init would be required for the both, right?

You're right; even in a module there could be functions that you really don't need after initialization, and so they could in principle be removed from memory. The reason __init has no effect for modules is more about how easy it would be to implement.
This answer to a question about the nature of __init sheds some light on the subject. Essentially, the kernel build system looks for all of the functions flagged with __init, across all of the pieces of the kernel, and arranges them so that they will all be in the same block of memory.
Then, when the kernel boots, it can free that one block of memory all at once.
This pre-sorting idea doesn't work so well with modules. The init code has to be loaded when the module is loaded, so it can't share space with other init code. Instead, the kernel would have to pick a few hundred bytes out of each module and free them individually.
However, hardware page sizes are typically 4KB, so it's hard to free up memory in chunks of less than that. So trying to free the __init functions in each individual module is probably more trouble than it's worth.

Related

wrap function without dlsym

How to write a shared library that:
wraps a system function (say malloc),
internally uses the real version of wrapped functions (e.g., malloc defined in libc), AND
can be linked from client code without giving --wrap=malloc every time it is used?
I learned from several posts that I can wrap system functions with --wrap option of ld; something like this:
void * __wrap_malloc(size_t sz) {
return __real_malloc(sz);
}
and get a shared library with:
gcc -O0 -g -Wl,--wrap=malloc -shared -fPIC m.c -o libwrapmalloc.so
But when a client code links this library, it needs to pass --wrap=malloc every time. I want to hide this from the client code, as the library I am working on actually wraps tons of system functions.
An approach I was using was to define malloc and find the real malloc in libc using dlopen and dlsym. This was nearly what I needed, but just as someone posted before Function interposition in Linux without dlsym, dlsym and dlopen internally call mem-alloc functions (calloc, as I witnessed it) so we cannot easily override calloc/malloc functions with this approach.
I recently learned --wrap and thought it was neat, but I just do not want to ask clients to give tons of --wrap=xxxx arguments every time they get executables...
I want to have a situation in which malloc in the client code calls malloc defined in my shared library whereas malloc in my shared library calls malloc in libc.
If this is impossible, I would like to reduce the burden of the clients to give lots of --wrap=... arguments correctly.

From where is the code for dealing with critical section originated?

While learning the subject of operating systems, Critical Section is a topic which I've come across. To solve this problem, certain methods are provided like semaphores, certain software solutions, etc...etc..etc. But I've a question that from where is the code for implementing these solutions originated? As programmers never are found writing such codes for their program. Suppose I write a simple program executing printf in 'C', I never write any code for critical section problem. And the code is converted into low level instructions and is executed by OS, which behaves as our obedient servant. So, where does code dealing with critical section originate and fit in? Let resources like frame buffer be the critical section.
The OS kernel supplies such inter-thread comms synchronization mechanisms, mutex, semaphore, event, critical section, conditional variables etc. It has to because the kernel needs to block threads that cannot proceed. Many languages provide convenient wrappers around such calls.
Your app accesses them, directly or indirectly, via system calls, ie intrrupts that enter kernel state and ask for such services.
In some cases, a short-term user-space spinlock may get plastered on top, but such code should defer to a system call if the spinner is not quickly satisfied.
In the case of C printf, the relevant library, (stdio usually), will make the calls to lock/unlock the I/O stream, (assuming you have linked in a multithreaded version of the library).

VHDL Bus Functional Modelling - Can't put groups of procedures into a package to clean up the code

I want to organize a working bus functional model and push commonly used procedures (which look like CPU subroutines) out into a package and get them out of the main cpu model, but I'm stuck.
The procedures don't have access to the hardware bits when they're pushed out in a package.
In Verilog, I would put commonly used procedures out into an include file and link them into the CPU model as required for a given test suite.
More details:
I have a working bus functional model of a CPU, for simulation test benching.
At the "user interface" level I have a process called "main" running inside the CPU model which calls my predefined "instruction set" like this:
cpu_read(address, read_result);
cpu_write(address, write_data);
etc.
I bundle groups of those calls up into higher level procedures like
configure_communication_bus;
clear_all_packet_counters;
etc.
At the next layer these generic functions call a more hardware specific version which knows the interface timing for the design,
and those procedures then use an input record and output record to connect to the hardware module ports and waggle the cpu bus signals as required.
cpu_read calls hardware_cpu_read(cpu_input_record, cpu_output_record, address);
Something like this:
procedure cpu_read (address : in std_logic_vector(15 downto 0);
read_result : out std_logic_vector(31 downto 0));
begin
hardware_cpu_read(cpu_input_record, cpu_output_record, address, read_result);
end procedure;
The cpu_input_record and cpu_output_record are declared as signals of type nnn_record in the cpu model vhdl file.
So this is all working, but every single one of these procedures is all stored in the cpu VHDL module file, and all in the procedure declaration section so that they are all in the same scope.
If I share the model with team members they will need to add their own testing subroutines, and those also are all in the same location in the file, as well, their simulation test code has to go into the "main" process along with mine.
I'd rather link in various tests from outside the model, and only keep model specific procedures in the model file..
Ironically I can push the lowest level hardware procedure out to a package, and call those procedures from within the "main" process, but the higher level processes can't be put out into that package or any other packages because they don't have access to the cpu_read_record and cpu_write_record.
I feel like there must be a simple way to clean up this code and make it modular, and I'm just missing something obvious.
I don't really think making a command interpreter and loading my test code into a behavioral ROM is the right way to go by the way. Nor is fighting with the simulator interface to connect up a C program, but I may break down and try this..
Quick sketch of an answer (to the question I think you are asking! :-) though I may be off-beam...
To move the BFM subprograms into a reusable package, they need to be independent of the execution scope - that usually means a long parameter list for each of them. So using them in a testbench quickly gets tedious compared with the parameterless (or parameter-lite) versions you have now..
The usual workaround is to implement the BFM in a package, with long parameter lists.
Then write parameter-lite local equivalents (wrappers) in the execution scope, which simply call the package versions supplying all the parameters explicitly.
This is just boilerplate - not pretty but it does allow you to move the BFM into a package. These wrappers can be local to the testbench, to a process within it, or even to a subprogram within that process.
(The parameter types can be records for tidiness : these are probably declared in a third package, shared between BFM. TB, and synthesisable device under test...)
Thanks to overloading, there is no ambiguity between the local and BFM package versions, so the actual testbench remains as simple as possible.
Example wrapper function :
function cpu_read(address : unsigned) return slv_32 is
begin
return BFM_pack.cpu_read (
address => address,
rd_data_bus => tb_rd_data_bus,
wait => tb_wait_signal,
oe => tb_mem_oe,
-- ditto for all the signals constants variables it needs from the tb_ scope
);
end cpu_read;
Currently your test procedures require two extra signals on them, cpu_input_record and cpu_output_record. This is not so bad. It is not uncommon to just have these on all procedures that interact with the cpu and be done with it. So use hardware_cpu_read and not cpu_read. Add cpu_input_record, cpu_output_record to your configure_communication_bus and clear_all_packet_counters procedures and be done. Perhaps choose shorter names.
I do a similar approach, except I use only one record with resolved elements. To make this work, you need to initialize the record so that all elements are non-driving (ie: 'Z' for std_logic). To make this more flexible, I have created resolution functions for integer, time, and real. However, this only saves you one signal. Not a real huge win. Perhaps half way to where you think you want to be. But it is more work than what you are doing.
For VHDL-201X, we are working on syntax to allow parameters/ports automatically map to a identically named signal. This will get you to where you want to be with any of the approaches (yours, mine, or Brian's without the extra wrapper subprogram). It is posted here: http://www.eda.org/twiki/bin/view.cgi/P1076/ImplicitConnections. Given this, I would add the two records to your procedures and call it good enough for now.
Once you get by this problem, you seem to also be asking is how do I write separate tests using the same testbench. For this I use multiple architectures - I like to think of these as a Factory Class for concurrent code. To make this feasible, I separate the stimulus generation code from the rest of the testbench (typically: netlist connections and clock). My presentation, "VHDL Testbench Techniques that Leapfrog SystemVerilog", has an overview of this architecture along with a number of other goodies. It is available at: http://www.synthworks.com/papers/index.htm
You're definitely on the right track, in fact I have a variant like this (what you describe).
The catch is, now I build up a whole subroutine using the "parameter light" procedures, and those are what I want to put in a package to share and reuse. The problem is that any procedure pushed out to a package can't call to the parameter light procedures in the main vhdl file..
So what happens is we have one main vhdl file with all the common CPU hardware setup routines, and every designer's test code all in the same vhdl file..
Long story short, putting our test subroutines into separate files is really what I was hoping for..

Kernel Code vs User Code

Here's a passage from the book
When executing kernel code, the system is in kernel-space execut-
ing in kernel mode.When running a regular process, the system is in user-space executing
in user mode.
Now what really is a kernel code and user code. Can someone explain with example?
Say i have an application that does printf("HelloWorld") now , while executing this application, will it be a user code, or kernel code.
I guess that at some point of time, user-code will switch into the kernel mode and kernel code will take over, but I guess that's not always the case since I came across this
For example, the open() library function does little except call the open() system call.
Still other C library functions, such as strcpy(), should (one hopes) make no direct use
of the kernel at all.
If it does not make use of the kernel, then how does it make everything work?
Can someone please explain the whole thing in a lucid way.
There isn't much difference between kernel and user code as such, code is code. It's just that the code that executes in kernel mode (kernel code) can (and does) contain instructions only executable in kernel mode. In user mode such instructions can't be executed (not allowed there for reliability and security reasons), they typically cause exceptions and lead to process termination as a result of that.
I/O, especially with external devices other than the RAM, is usually performed by the OS somehow and system calls are the entry points to get to the code that does the I/O. So, open() and printf() use system calls to exercise that code in the I/O device drivers somewhere in the kernel. The whole point of a general-purpose OS is to hide from you, the user or the programmer, the differences in the hardware, so you don't need to know or think about accessing this kind of network card or that kind of display or disk.
Memory accesses, OTOH, most of the time can just happen without the OS' intervention. And strcpy() works as is: read a byte of memory, write a byte of memory, oh, was it a zero byte, btw? repeat if it wasn't, stop if it was.
I said "most of the time" because there's often page translation and virtual memory involved and memory accesses may result in switched into the kernel, so the kernel can load something from the disk into the memory and let the accessing instruction that's caused the switch continue.

Why inet_ntoa is designed to be a non-reentrant function?

Glancing at the source code of GNU C Library,I found the inet_ntoa is implementated with
static __thread char buffer[18]
My question is, since there is a need to use reeentrant inet_ntoa,why do not the author of GNU C Library use malloc to implementate it?
thanks.
The reason it's not using the heap is to conform with standards (POSIX) and other systems. The interface is just not such that you are supposed to free the buffer returned. It assumes static storage..
But by declaring it as thread local (with __thread), two threads do not conflict with each other if they happen to both be calling the function. This is glibc's workaround for the brokenness of the interface.
It's true that this is not re-entrant or consistent with the spirit of that term. If you have a recursive function that calls it, you cannot rely on the buffer being the same between calls. But it can be used by multiple threads, which often is good enough.
EDIT: By the way, I just remembered, there is a newer version of this function that uses a caller-provided buffer. See inet_ntop().