What are relocatable programs and what makes a program relocatable? From the OS memory management context, why programs (processes) need to be relocatable?
There is position independent code and position dependent code. Position independent code does not depend upon where it is located in memory. Position independent code is generally desirable. There are a lot of techniques processor/compiler-assembler/linker/loader combinations use to generate position independent code.
If you do something like:
extern int b ;
int a = &b;
the code is inherently not position independent because the assignment depends upon where b has been loaded (however, this situation occurs so commonly, that linkers and loaders have ways of dealing with this).
If a program or shared library only contains position independent code it can be loaded anywhere in memory and is relocatable.
Let's suppose you have a program P that is linked to shared libraries L1 and L2. If L1 and L2 require using the same location in memory, then they cannot be loaded together and P cannot run.
Most programs can be relocated. If the program contains relative addresses of its data then it can be placed anywhere in memory. It it contains absolute addresses then the loader will adjust these addresses in the code when loading the program into memory.
http://linker.iecc.com/
Related
When we pin a MAP, we can able to share information from userspace to ebpf but it is system wide.
But if i want to share different value to different instance from tc ingress/egress (array map with size of 1)
Is there any way to pass argument ?
Map (unpinned unique per instance) - update from userspace
Any other way to communicate from userspace to kernel (while attaching or after)
Really appreciate your help.
Pinning a map doesn't make it system wide. Every map is always accessible system wide, pinning just adds a reference to the file system to make it easier to find and to make sure a map isn't removed even when not in use by any program.
Is there any way to pass argument ?
No, once a program is loaded, only the kernel can pass arguments(contexts) to a program, userspace can only use maps to communicate with eBPF programs.
Map (unpinned unique per instance) - update from userspace
Any userspace program with the right permissions can update any map as long as you can obtain a file descriptor to the map. Map FDs can be obtained:
By creating a new map
By opening a map pin
By using its unique ID (directly or by looping over all loaded maps)
By IPC from another process that already has the FD
Any other way to communicate from userspace to kernel (while attaching or after)
Maps are it. You can rewrite the program before loading it into the kernel by setting constants at specific locations, but not at attach time. One way which might be interesting is that on newer kernels, global data is supported. Which allow you to change the value of variables defined in the global scope from userspace. In this case the global data is packed in a array map with a single key/value.
I know how to link modules but could someone explain the flow of calling the modules to be used when I want it to be used.
Like have a state machine and depending on the state I can call a module to activate, or like if I need to repeat a process how to go back to a module earlier in a state machine.
again I get the instantiating part like this
wire clk;
wire sig;
wire out;
A a(clk, sig, topout);
B b(clk, sig);
endmodule
but can someone explain how to call modules and how the control flow works in general for them?
(I am new to HDLs so I appreciate any help)
Verilog is a language specifically developed to simulate behavior of hardware. Hardware is a set of transistors and other elements which always statically presented and function in parallel. Functioning of such elements could be enabled or disabled, but the hardware is still present.
Verilog is similar to the hardware in the sense that all its elements are always present, intended for parallel functioning.
The basic functional elements of Verilog are gates, primitives and procedural blocks (i.e., always blocks). Those blocks are connected by wires.
All those elements are then grouped in modules. Modules are used to create logical partitioning of the hardware mode. They cannot be 'called'. They can be instantiated in a hierarchical manner to describe a logical structure of the model. They cannot be instantiated conditionally since they represent pieces of hardware. Different module instances are connected via wires to express hierarchical connectivity between lower-level elements.
There is one exception however, the contents of an always block is pure software. It describes an algorithmic behavior of it and as such, software flow constructs are allowed inside always block (specific considerations must be used to make it synthesizable).
As it comes to simulation, Verilog implements an event-driven simulation mode which is intended to mimic parallel execution of hardware. In other words, a Verilog low level primitive (gate or always block) is executed only if at least one of its inputs changes.
The only flow control which is usually used in such models is a sequence of input events and clocks. The latter are used to synchronize results of multiple parallel operations and to organize pipes or other sequential functions.
As I mentioned before, hardware elements can be enabled/disabled by different methods, so the only further control you can use by implementing such methods in your hardware description. For example, all hardware inside a particular module can be turned off by disabling clock signal which the module uses. There could be specific enable/disable signals on wires or in registers, and so on.
Now to your question: your code defines hierarchical instantiation of a couple of modules.
module top(out);
output wire out;
wire clk;
wire sig;
A a(clk, sig, out);
B b(clk, sig);
endmodule
Module 'top' (missing in your example) contains instances of two other modules, A and B. A and B are module definitions. They are instantiated as corresponding instances 'a' and 'b'. The instances are connected by signals 'clk', which is probably a clock signal, some signal 'sig' which is probably an output of one of the modules and input in another. 'out' is output of module top, which is probably connected to another module or an element in a higher level of hierarchy, not shown here.
The flow control in some sense is defined by the input/output relation between modules 'A' and 'B'. For example:
module A(input clk, input sig, output out);
assign out = sig;
...
endmodule
module B(input clk, output sig);
always#(posedge clk) sig <= some-new-value;
...
endmodule
However, in general it is defined by the input/output relation of the internal elements inside module (always blocks in the above example). input/output at the module port level is mostly used for semantic checking.
In the event-driven simulation it does not matter hardware of which module is executed first. However as soon as the value of the 'sig' changes in always#(posedge clk) of module 'B', simulation will cause hardware in module 'A' (the assign statement to be evaluated (or re-evaluated). This is the only way you can express a sequence in the flow at this level. Same as in hardware.
If you are like me you are looking at Verilog with the background of a software programmer. Confident in the idea that a program executes linearly. You think of ordered execution. Line 1 before line 2...
Verilog at its heart wants to execute all the lines simultaneously. All the time.
This is a very parallel way to program and until you get it, you will struggle to think the right way about it. It is not how normal software works. (I recall it took me weeks to get my head around it.)
You can prefix blocks of simultaneous execution with conditions, which are saying execute the lines in this block when the condition is true. All the time the condition is true. One class of such conditions is the rising edge of a clock: always #(posedge clk). Using this leads to a block of code that execute once every time the clk ticks (up).
Modules are not like subroutines. They are more like C macros - they effectively inline blocks of code where you place them. These blocks of code execute all the time any conditions that apply to them are true. Typically you conditionalize the internals of a module on the state of the module arguments (or internal register state). It is the connectivity of the modules through the shared arguments that ensures the logic of a system works together.
Compile time. If you know at compile time where the process will reside
in memory, then absolute code can be generated. For example, if you know
that a user process will reside starting at location R, then the generated
compiler code will start at that location and extend up from there. If, at
some later time, the starting location changes, then it will be necessary
to recompile this code. The MS-DOS .COM-format programs are bound at
compile time.
What can be the reason of the starting location to change? Can it be
because of context switching/swapping ?
Does absolute code means binary code?
Load time. If it is not known at compile time where the process will reside
in memory, then the compiler must generate relocatable code. In this case,
final binding is delayed until load time. If the starting address changes, we
need only reload the user code to incorporate this changed value.
How is relocatable code different from absolute code? Does it contain info about base,limit and relocation register?
How is reloading more efficient then recompiling as they mentioned only reload means no recompiling only reload?
Execution time. If the process can be moved during its execution from
one memory segment to another, then binding must be delayed until run
time. .
Why it may be needed to move a process during it's execution?
The compile-time and load-time address-binding methods generate
identical logical and physical addresses. However, the execution-time address-binding scheme results in differing logical and physical addresses.
How compile and load-time methods generate identical logical and physical addresses?
To begin with, I would find a better source for your information. What you have is very poor.
What can be the reason of the starting location to change? Can it be because of context switching/swapping ?
You change the code or need the code to be loaded at a different location in memory.
Does absolute code means binary code?
No. They are independent concepts.
How is relocatable code different from absolute code? Does it contain info about base,limit and relocation register?
Relocatable code uses relative addresses, generally relative to the program counter.
(Base limit and relocation registers would be a system specific ocncept).
How is reloading more efficient then recompiling as they mentioned only reload means no recompiling only reload?
Let's say two different programs use the same dynamic library. They made need to have loaded at different locations in memory. It's not an efficiency issue.
Why it may be needed to move a process during it's execution?
This is what was done in ye olde days before virtual memory. To my knowledge no one does this any more.
How compile and load-time methods generate identical logical and physical addresses?
I don't know what the &^54 they are talking about. That statement makes no sense.
Dynamic libraries (.dll .so) are relocatable, because they might appear at different adresses in different applications, but in order to save memory, the operating system only has one copy in physical memory (virtual memory is great), and each application has read only access.
Same happens for applications that are relocatable. For security, it is also wize that the addresses are random - some remote attacks are slighty harder
Basically, when my system is running, I would like the user to ftp some new code to the SD card, and dynamically load the new function and create a task to run in the system. This is normal for Linux. For example, I can compile a SO, and dynamically load into the memory.
How to do it in uC/OS II or III?
This is not a service uC/OS-II or uC/OS-III can provide by itself.
You would need a program loader that is able to read an ELF file, copy its relevant sections (ex .text, .rodata, etc.) in memory according to load addresses specified and allocate memory for uninitialized memory sections. You would then be able to create a new uC/OS task and pass it the function pointer that corresponds to the ELF entry point.
Most embedded systems don't have a Memory Management Unit (MMU) and thus you would need to pay special care to the linking process to make sure any of those sections don't overlap with any code that would already running on your target. Depending on your toolchain, that would most likely involve carefully crafting your linker script.
Another option that would avoid the problem of potential overlapping of the memory space would be to use a toolchain that can produce position independent code and load the ELF in the heap of your main application or in any other allocated and available memory space that is allocated by your main application.
Yet another option would be to produce relocatable code and use or build a program linker that is able to process relocations at runtime, when you want to load the uploaded code.
I want to organize a working bus functional model and push commonly used procedures (which look like CPU subroutines) out into a package and get them out of the main cpu model, but I'm stuck.
The procedures don't have access to the hardware bits when they're pushed out in a package.
In Verilog, I would put commonly used procedures out into an include file and link them into the CPU model as required for a given test suite.
More details:
I have a working bus functional model of a CPU, for simulation test benching.
At the "user interface" level I have a process called "main" running inside the CPU model which calls my predefined "instruction set" like this:
cpu_read(address, read_result);
cpu_write(address, write_data);
etc.
I bundle groups of those calls up into higher level procedures like
configure_communication_bus;
clear_all_packet_counters;
etc.
At the next layer these generic functions call a more hardware specific version which knows the interface timing for the design,
and those procedures then use an input record and output record to connect to the hardware module ports and waggle the cpu bus signals as required.
cpu_read calls hardware_cpu_read(cpu_input_record, cpu_output_record, address);
Something like this:
procedure cpu_read (address : in std_logic_vector(15 downto 0);
read_result : out std_logic_vector(31 downto 0));
begin
hardware_cpu_read(cpu_input_record, cpu_output_record, address, read_result);
end procedure;
The cpu_input_record and cpu_output_record are declared as signals of type nnn_record in the cpu model vhdl file.
So this is all working, but every single one of these procedures is all stored in the cpu VHDL module file, and all in the procedure declaration section so that they are all in the same scope.
If I share the model with team members they will need to add their own testing subroutines, and those also are all in the same location in the file, as well, their simulation test code has to go into the "main" process along with mine.
I'd rather link in various tests from outside the model, and only keep model specific procedures in the model file..
Ironically I can push the lowest level hardware procedure out to a package, and call those procedures from within the "main" process, but the higher level processes can't be put out into that package or any other packages because they don't have access to the cpu_read_record and cpu_write_record.
I feel like there must be a simple way to clean up this code and make it modular, and I'm just missing something obvious.
I don't really think making a command interpreter and loading my test code into a behavioral ROM is the right way to go by the way. Nor is fighting with the simulator interface to connect up a C program, but I may break down and try this..
Quick sketch of an answer (to the question I think you are asking! :-) though I may be off-beam...
To move the BFM subprograms into a reusable package, they need to be independent of the execution scope - that usually means a long parameter list for each of them. So using them in a testbench quickly gets tedious compared with the parameterless (or parameter-lite) versions you have now..
The usual workaround is to implement the BFM in a package, with long parameter lists.
Then write parameter-lite local equivalents (wrappers) in the execution scope, which simply call the package versions supplying all the parameters explicitly.
This is just boilerplate - not pretty but it does allow you to move the BFM into a package. These wrappers can be local to the testbench, to a process within it, or even to a subprogram within that process.
(The parameter types can be records for tidiness : these are probably declared in a third package, shared between BFM. TB, and synthesisable device under test...)
Thanks to overloading, there is no ambiguity between the local and BFM package versions, so the actual testbench remains as simple as possible.
Example wrapper function :
function cpu_read(address : unsigned) return slv_32 is
begin
return BFM_pack.cpu_read (
address => address,
rd_data_bus => tb_rd_data_bus,
wait => tb_wait_signal,
oe => tb_mem_oe,
-- ditto for all the signals constants variables it needs from the tb_ scope
);
end cpu_read;
Currently your test procedures require two extra signals on them, cpu_input_record and cpu_output_record. This is not so bad. It is not uncommon to just have these on all procedures that interact with the cpu and be done with it. So use hardware_cpu_read and not cpu_read. Add cpu_input_record, cpu_output_record to your configure_communication_bus and clear_all_packet_counters procedures and be done. Perhaps choose shorter names.
I do a similar approach, except I use only one record with resolved elements. To make this work, you need to initialize the record so that all elements are non-driving (ie: 'Z' for std_logic). To make this more flexible, I have created resolution functions for integer, time, and real. However, this only saves you one signal. Not a real huge win. Perhaps half way to where you think you want to be. But it is more work than what you are doing.
For VHDL-201X, we are working on syntax to allow parameters/ports automatically map to a identically named signal. This will get you to where you want to be with any of the approaches (yours, mine, or Brian's without the extra wrapper subprogram). It is posted here: http://www.eda.org/twiki/bin/view.cgi/P1076/ImplicitConnections. Given this, I would add the two records to your procedures and call it good enough for now.
Once you get by this problem, you seem to also be asking is how do I write separate tests using the same testbench. For this I use multiple architectures - I like to think of these as a Factory Class for concurrent code. To make this feasible, I separate the stimulus generation code from the rest of the testbench (typically: netlist connections and clock). My presentation, "VHDL Testbench Techniques that Leapfrog SystemVerilog", has an overview of this architecture along with a number of other goodies. It is available at: http://www.synthworks.com/papers/index.htm
You're definitely on the right track, in fact I have a variant like this (what you describe).
The catch is, now I build up a whole subroutine using the "parameter light" procedures, and those are what I want to put in a package to share and reuse. The problem is that any procedure pushed out to a package can't call to the parameter light procedures in the main vhdl file..
So what happens is we have one main vhdl file with all the common CPU hardware setup routines, and every designer's test code all in the same vhdl file..
Long story short, putting our test subroutines into separate files is really what I was hoping for..