what is the meaning of Instruction dispatch - cpu-architecture

Can any-one tell me what is the exact meaning of instruction dispatch and how Register based architecture avoid instruction dispatch ?

Instruction dispatch involves fetching/reading an instruction from memory, and jumping to the corresponding segment of code that implements the instruction.
In a stack based architecture an addition would look like:
I1: LOAD C
I2: LOAD B
I3: ADD
I4: STORE A
You fetch the values from the stack and push the result back on it (hence the name stack based architecture).
In a register based architecture:
I1 "ADD, a, b, c"
a,b,c being registers.
A register based architecture does not completely avoid fetching instructions but it reduces the number of them.

Related

Can someone explain the control flow of modules in System Verilog

I know how to link modules but could someone explain the flow of calling the modules to be used when I want it to be used.
Like have a state machine and depending on the state I can call a module to activate, or like if I need to repeat a process how to go back to a module earlier in a state machine.
again I get the instantiating part like this
wire clk;
wire sig;
wire out;
A a(clk, sig, topout);
B b(clk, sig);
endmodule
but can someone explain how to call modules and how the control flow works in general for them?
(I am new to HDLs so I appreciate any help)
Verilog is a language specifically developed to simulate behavior of hardware. Hardware is a set of transistors and other elements which always statically presented and function in parallel. Functioning of such elements could be enabled or disabled, but the hardware is still present.
Verilog is similar to the hardware in the sense that all its elements are always present, intended for parallel functioning.
The basic functional elements of Verilog are gates, primitives and procedural blocks (i.e., always blocks). Those blocks are connected by wires.
All those elements are then grouped in modules. Modules are used to create logical partitioning of the hardware mode. They cannot be 'called'. They can be instantiated in a hierarchical manner to describe a logical structure of the model. They cannot be instantiated conditionally since they represent pieces of hardware. Different module instances are connected via wires to express hierarchical connectivity between lower-level elements.
There is one exception however, the contents of an always block is pure software. It describes an algorithmic behavior of it and as such, software flow constructs are allowed inside always block (specific considerations must be used to make it synthesizable).
As it comes to simulation, Verilog implements an event-driven simulation mode which is intended to mimic parallel execution of hardware. In other words, a Verilog low level primitive (gate or always block) is executed only if at least one of its inputs changes.
The only flow control which is usually used in such models is a sequence of input events and clocks. The latter are used to synchronize results of multiple parallel operations and to organize pipes or other sequential functions.
As I mentioned before, hardware elements can be enabled/disabled by different methods, so the only further control you can use by implementing such methods in your hardware description. For example, all hardware inside a particular module can be turned off by disabling clock signal which the module uses. There could be specific enable/disable signals on wires or in registers, and so on.
Now to your question: your code defines hierarchical instantiation of a couple of modules.
module top(out);
output wire out;
wire clk;
wire sig;
A a(clk, sig, out);
B b(clk, sig);
endmodule
Module 'top' (missing in your example) contains instances of two other modules, A and B. A and B are module definitions. They are instantiated as corresponding instances 'a' and 'b'. The instances are connected by signals 'clk', which is probably a clock signal, some signal 'sig' which is probably an output of one of the modules and input in another. 'out' is output of module top, which is probably connected to another module or an element in a higher level of hierarchy, not shown here.
The flow control in some sense is defined by the input/output relation between modules 'A' and 'B'. For example:
module A(input clk, input sig, output out);
assign out = sig;
...
endmodule
module B(input clk, output sig);
always#(posedge clk) sig <= some-new-value;
...
endmodule
However, in general it is defined by the input/output relation of the internal elements inside module (always blocks in the above example). input/output at the module port level is mostly used for semantic checking.
In the event-driven simulation it does not matter hardware of which module is executed first. However as soon as the value of the 'sig' changes in always#(posedge clk) of module 'B', simulation will cause hardware in module 'A' (the assign statement to be evaluated (or re-evaluated). This is the only way you can express a sequence in the flow at this level. Same as in hardware.
If you are like me you are looking at Verilog with the background of a software programmer. Confident in the idea that a program executes linearly. You think of ordered execution. Line 1 before line 2...
Verilog at its heart wants to execute all the lines simultaneously. All the time.
This is a very parallel way to program and until you get it, you will struggle to think the right way about it. It is not how normal software works. (I recall it took me weeks to get my head around it.)
You can prefix blocks of simultaneous execution with conditions, which are saying execute the lines in this block when the condition is true. All the time the condition is true. One class of such conditions is the rising edge of a clock: always #(posedge clk). Using this leads to a block of code that execute once every time the clk ticks (up).
Modules are not like subroutines. They are more like C macros - they effectively inline blocks of code where you place them. These blocks of code execute all the time any conditions that apply to them are true. Typically you conditionalize the internals of a module on the state of the module arguments (or internal register state). It is the connectivity of the modules through the shared arguments that ensures the logic of a system works together.

What is the difference between Program Status Word (PSW) and Program Counter (PC)?

In an Operating Systems course, the instructor introduced PSW and PC when he talked about Interrupt Handling.
His explanation was
PC holds the address of the next instruction to be fetched
PSW contains execution status information
But later I searched online and found that PSW = PC + status register. This makes me quite confused.
On the one hand, I am not sure what "execution status information" refers to. On the other hand, if PSW has the functions of a PC, why do we still need it?
Appreciate any explanation.
This isn't really standardized terminology. Most architectures have some register that plays the role of a status word, containing bits to indicate things like whether an add instruction caused a carry. But different architectures give it different names, and what exactly is included can vary widely. I'm not aware of any architecture that includes the program counter as part of their status word, but if they want to do that, well, who's going to stop them?
This is the kind of thing where you just have to look at the definition given by whatever book or article you are reading (or infer it from context), and realize that a different author may use the word differently.
In general, interrupts are hardware level subroutine calls. They do the same thing as a subroutine call (change the algorithm that the processor is executing) however they do it without warning the "executing code" that they are now operating.
In order to not damage the "executing code" all information that it was using must be stored. This includes the Program Counter (usually saved to the stack by the interrupt hardware in the same way that a subroutine call does) and all of the registers that the interrupt function will alter- these must be saved by pushing them onto the stack. The registers etc must be restored before the return from interrupt (RETI) instruction - the PC is restored by the RETI itself.
The PSW (often called the flag register) is a very important register and must generally be saved first. It contains bits like Zero (the last calculation resulted in a zero result) Carry (the last calculation resulted in a carry ie the result number is bigger than the register can hold) and several other flags. I suggest that you read the data sheet of an 8 bit microcontroller for an idea of what these flags might be. suffice it to say that these flags are needed in order to perform conditional jumps. And whilst they will often be ignored you can't take that chance.
You are probably correct in Your instructor using the term PSW to mean all all of the registers.
The subject of interrupts contains concepts that are common to subroutine calls in general (e.g. don't leave data that you don't want overwritten in a register before entering a subroutine). And later on in operating systems, the concept of context switches that occur during multi-tasking.
Peter

What Problems Might Occur if I Overlap Multi-Register Modbus Data Items?

It is common to use 2 registers to read/write a floating point value in Modbus.
My question is what problems or compatibility issues arise if I specify my device register map with overlapping data as follows..
40001 (float a), 40002 (float b), 40003 (float c), 40004 (float d), and so on.
Float (a) can be read at 40001 with FC03, number of registers is 2.
Float (b) can be read at 40002 with FC03, number of registers is 2.
Float (a) and (b) can be read at 40001 with FC03, number of registers is 4.
This will render your device to become not a modbus-compatible device, but just a modbus-like device.
The drawback is that there are plenty of modbus clients, mostly SCADA systems, which would simply stop working with such register map. So if you don't care about 3rd party clients, you can do it, but what is the purpose?
UPD
Also you get undefined behavior on reading registers which belong to different values at the same time. What is the expected output of reading single word at 4002? LSB of a or MSB of b?
How do I read 2 consequent numbers (a and b)??
Modbus is already only modbus-like when it comes to multi-register
values
Wrong, it is still modbus, but whenever you ready multi-register values or implement time stamps, you explicitly define them in documentation and you rules should not violate general modbus rules like mention above. There is nothing wrong in telling you are using 2 registers with MSB BOM.
So the answer is it could work for some specific cases but is generally not usable at all.

AMD64 ABI Function Calling Sequence for arguments of type __m256

I've been reading through this section for a while, but I can't seem to figure it out. I'm on AMD64 ABI Draft 0.99.6, page 18, section 3.32 Parameter Passing and theres the following text:
Arguments of type __m256 are split into four eightbyte chunks. The least significant one belongs to class SSE and all the others to class SSEUP.
I'm confused because it sounds like I use three SSEUP registers and only one SSE, but that seems wasteful of the other two SSE registers associated with the SSEUP. Am I misreading it? I probably won't even use this datatype, but I've been confused on this text for quite a while. Can someone give an example of how this would work? I'm probably missing something obvious.
Page 18 just contains a list of definitions necessary for a later discussion of the algorithm used to pass the parameters of a function.
Particularly, the SSE class is always passed in a new vector register, the first available of %xmm0-%xmm7.
Note that these names refer to the lower 128-bit parts of the registers but its better to think of them in terms of variable size vector registers %v0-%v7.
The SSEUP class is passed in the next available 64-bit (eight-byte) of the last vector register used.
__m256 is then passed, in processors that support AVX, using a single %ymm register: the lower 64 bits get the SSE class - and hence a new %v0 register - while the other three 64 bits chunks get SSEUP thereby reusing the %v0 register.
Here's the relevant quote from the document:
If the class is SSE, the next available vector register is used, the registers
are taken in the order from %xmm0 to %xmm7.
If the class is SSEUP, the eightbyte is passed in the next available eightbyte
chunk of the last used vector register.
The SSEUP class was introduced earlier in the ABI and it is still present today.
You can quickly consult the Version 0.9 to see the differences: the type _m256 and _m512 were not present for example.
For compiler that doesn't support the new ABI with the _m256 type or for compilers that do support it but target processors with no AVX support, that type is usually an aggregate of two _m128 and thus by the rules described later (particularly the post-merge rules) it is passed in memory:
If the size of an object is larger than two eightbytes, or in C++, is a nonPOD structure or union type, or contains unaligned fields, it has class
MEMORY.
For compilers using the old ABI
If the size of the aggregate exceeds two eightbytes and the first eightbyte
isn’t SSE or any other eightbyte isn’t SSEUP, the whole argument
is passed in memory.
For compilers using the new ABI
The standard is admittedly confusing mostly due to the need to address backward compatibility, the SSE and SSEUP classifications are handy classifications in an architecture where the vector registers keep widening and broad range of different sizes are already present out there.

What is a relocatable program?

What are relocatable programs and what makes a program relocatable? From the OS memory management context, why programs (processes) need to be relocatable?
There is position independent code and position dependent code. Position independent code does not depend upon where it is located in memory. Position independent code is generally desirable. There are a lot of techniques processor/compiler-assembler/linker/loader combinations use to generate position independent code.
If you do something like:
extern int b ;
int a = &b;
the code is inherently not position independent because the assignment depends upon where b has been loaded (however, this situation occurs so commonly, that linkers and loaders have ways of dealing with this).
If a program or shared library only contains position independent code it can be loaded anywhere in memory and is relocatable.
Let's suppose you have a program P that is linked to shared libraries L1 and L2. If L1 and L2 require using the same location in memory, then they cannot be loaded together and P cannot run.
Most programs can be relocated. If the program contains relative addresses of its data then it can be placed anywhere in memory. It it contains absolute addresses then the loader will adjust these addresses in the code when loading the program into memory.
http://linker.iecc.com/