Can 'mov' instructions, which do not require any offset/displacement added to it, be executed without any assistance of ALU? - mov

I've recently started exploring the field of computer architecture. While studying the instruction set architecture, I came across 'mov' instruction which copies data from one location to another. I understand that some type of mov' instructions are conditional while some need to have offset or displacement added to it to find a particular address, and hence they need ALU assistance. For e.g. Base-plus-index, Register relative, Base relative-plus-index, Scaled index etc.
I was wondering, if it is possible to bypass ALU for those mov' instructions (for e.g. register to register data transfer) who do not require any ALU assistance.

Yes. Obviously, an instruction that doesn't require any arithmetic to be performed doesn't require the assistance of the ALU.
Obviously, though, it still requires the "intervention of microprocessor"; the registers, program counter, instruction fetch/decode/execute pipeline are all part of the CPU.

Related

How blockchain consensus verify the transaction?

So many blockchain documentation tell that the blockchain uses the consensus to verify transactions. But what I understand the consensus only generate the hash to create the new block. I don't know why the documentation says that the consensus verifies the transaction while the transaction has been created (made) before the consensus run for creating the new hash for new block. The consensus flow does not care about the input (group of transactions), it doesn't know the transactions group are valid or not. Why the blockchain documentation say that?
Before getting into details, I'd like to set some context.
First, what is blockchain - blockchain is a set of blocks where every next block depends on a pervious one. And we can extend the chain as long as every new block is correct from system's point of view.
An example of a block chain (starting from a special, genesis, block): A->B->C->D
Every block, other than A, depends on previous one - usually this is based on hashes and some other rules; but a blockchain creator may pick any set of rules.
Second, we need to agree on what is a consensus. In a blockchain world, the consensus is a process to agree on the chain of block. It could be several chains, but the system will get to a consensus on which one is the right one.
Here is why this is important: let's say we have the chain from above with last block being D. In a distributed environment it is totally possible that more than one VALID block will be generated at the same time; and since the system is distributed, several new chains may emerge:
A->B->C->D->E1
A->B->C->D->E2
The consensus in blockchains allows the network as a whole to agree which of this chains is a valid one. Two important properties of a consensus:
it is eventual, it will take a while before every node gets to the same page; which is the same as: if you will ask at the same time different nodes about which chain is the correct one - you may get different answers; but eventually, the network will agree on the same one
there are many different ways how one can pick the right chain, e.g. in bitcoin, the system eventually agrees on a longest valid chain.
Now we can address the original question: "blockchain uses the consensus to verify transactions". Blockchain (distributed one) uses consensus to agree on a chain and it uses a set of rules to validate that every block is correct. Since every node runs a software with the same set of rules, nodes will make same decision on every block correctness. But in distributed blockchain case, more than one chain may emerge...as described above.
To verify transactions blockchain is using set of rules applied locally to either accept or reject new blocks.
The documentation you mentioned does not sound right to me. Here is an example of a doc: https://www.investopedia.com/terms/c/consensus-mechanism-cryptocurrency.asp - I really like their first take away point: "A consensus mechanism refers to any number of methodologies used to achieve agreement, trust, and security across a decentralized computer network." - and, as I mentioned about - it is always worth asking - what is consensus about - in distributed blockchains, the consensus is about the chain itself.

Query on various addressing modes?

Can an array be implemented using only indirect addressing mode? I think we can only access the first element but what about the other elements? For that, I think, we'll have to use immediate addressing mode.
An add instruction can generate an address in a register.
A CPU with only [register] addressing modes would work, but need more instructions than one with an immediate displacement as part of load/store instructions.
Instruction set design isn't about what's necessary for computation to be possible, but rather about how to make it efficient.
related:
Why does the lw instruction's second argument take in both an offset and regSource?
What is the minimum instruction set required for any Assembly language to be considered useful? (note the difference between useful and Turing-complete.)

Bus Functional Models (System Verilog)

I'm looking to design a bus functional model for SPI and UART bus. First, I want to understand if my perception of a bus functional model is correct.
It is supposed to simulate bus transactions, without worrying about the underlying implementation specifics.
For instance,
If I were to build a BFM for an SPI bus, the design should be able to simulate transactions on the SPI acting as a master based on some protocol for example reading instructions from a file and then showing them on a simulator accordingly,
For example a generic data transfer instruction, like send_write(0x0c, 0x0f), that sends data byte 0c to the slave address 0f, should drive the Chip Select line low and send the data bits accordingly on the clock edge based on the SPI mode, is my understanding of BFM correct in this case?
Now what I don't understand this is, how is it helpful?
Where in between the DUT and the TestBench does a BFM sit, and how does it help a system designer.
I also want to know if there are any reference BFMs that have been built and tested that are available to study,
I'd appreciate if someone could help me with an example, preferably in System Verilog.
So had to research a lot on this so I thought I would answer, but here's an idea of what it is,
Think of a Bus Functional Model(BFM) that simulates transactions of a bus, like READ and WRITE, reducing the overhead of a testbench of taking care of the timing analysis for the same. There are a lot more interpretations of a BFM, BFMs typically reduce the job of a testbench by making it more data focused.
Okay that was a high-level answer, let's dig a little deeper,
Think of BFM as a block that sits within the testbench block as a whole, when the test bench needs to perform a task, for instance, wants to write at a particular address, it asks the BFM to write at that address, the BFM, which is a black box to the test-bench does the transaction whilst taking care of the timing. It can be driven by a file that could be loaded by the test-bench or it could be a bunch of tasks that the test-bench uses to do the transactions.
The Design Under Test's(DUTs) response to the BFM transacts is of interest to the tester for the design. One may argue that the BFM may change based on DUT, which is what distinguishes a better BFM per say.
If the BFM could have a configuration vector that could be loaded to initialize and behave according to the DUT specifications, then it becomes portable to helping test other designs.
Further the BFM, may be defined as abstract virtual functions(in SV), that could have further concrete implementations based on the DUT.
virtual class apb_bfm;
pure virtual task writeData(int unsigned addr, int unsigned data);
pure virtual task readData (int unsigned addr, output int unsigned data);
pure virtual task initializeSignals();
endclass
The above BFM abstraction is for the APB Master, that does the tasks mentioned, the low level details of each of these must be encapsulated by interfaces and clocking blocks in order to have sanity of the clocks as well as abstract interface types. I have referenced a book in the section, which describes beautifully how to architect test benches and design Transaction Level Models(TLMs). Reading this will give you a good understanding of how to design one.
Also this paper on abstract BFMs gives a good idea on how BFMs should be modeled for a good design. The APB example is derived off that.
The following image on how a BFM could be placed in the test framework was what I could gather from Bergeron's book.
Hopefully that gives a basic understanding of what BFMs are. Of course writing one and getting it to work is difficult but this basic knowledge would allow you to have a high level picture of it.
Book Reference:
Bergeron, J. (n.d.). Writing TestBenches in System Verilog. Springer.
A BFM is a VIP with dual roles. It can act as a Driver or a Monitor/Receiver. In a Driver role, it packs the transactions and makes it drive on a signal level using the interface handle, else the DUT is unable to accept the transactions (it only has signal interface). As a receiver, it unpacks the signal bits coming through the interface handle, to transactions and sends it to Scoreboard/Checker. It can also act as a Protocol Checker in some cases.
You can get a good example of BFM usage here
http://www.testbench.in/SL_00_INDEX.html

How do operating systems allow userspace programs to interact with kernelspace programs?

This isn't quite a question about a specific OS, but let's take Windows as an example. A userspace program uses the Windows API to communicate with kernelspace. However, I don't understand how that's possible. The API, according to MS websites, lives in userspace. In order to access kernelspace it has to be in kernelspace, if I understand it correctly. So what is the mechanism by which the windows API gets extra privileges to speak to kernelspace? In which space does that mechanism operate? Is this sort of thing universal to all modern PC OS's?
As you're already aware there a bunch of facilities exposed to userspace programs by the Windows kernel. (If you're curious there's a list of system calls). These system calls are all identified by a unique number, which isn't part of the publicly documented interface given by Microsoft. Instead when you call a publicly exposed function from your program there's a DLL installed when you install (or update) Windows that has an entry point which is just a normal, unprivileged user mode function call. This DLL knows the mappings between public interfaces and the available system calls in the currently running kernel. These mappings are not always 1:1 which allows for tweaks and enhancements without breaking existing code using stable interfaces.
When some userland code calls one of these functions its role is to prepare arguments for the system call and then initiate the jump into kernel mode. How exactly that jump occurs is specific to the architecture that Windows is currently running on. In fact it varies not just between x86 and Arm but between AMD and Intel x86 systems even. I'll talk just about the modern Intel x86 32-bit case (using the SYSENTER instruction) here for simplicity. On x86 most of the other variations are relatively minor, for instance int 2Eh was used prior to SYSENTER support.
Early in boot up the operating system does a bunch of work to prepare for enabling a userland and system calls from it. Understanding this is critical to understanding how system calls really work.
First let's rewind a little and consider what exactly we mean by userland and kernelmode. On x86 when we talk about privileged vs un-privileged code we talk about "rings". There are actually 4 (ignoring hypervisors) but for various reasons nobody really used anything but ring0 (kernel) and ring3 (userland). When we run code on x86 the address that's being executed (EIP) and data that's being read/written come from segments.
Segments are mostly just a historical accident left over from the days before virtual addressing on x86 was a thing. They are however important for us here because there are special registers that define which segments are currently being used when we execute instructions or otherwise reference memory. Segments on x86 are all defined in a big table, called the Global Descriptor Table or GDT. (There's also a local descriptor table, LDT, but that's not going to further the current discussion here). The important point for our discussion here is that the (arcane) layout of the table entries include 2 bits, called DPL which define the privilege level of the currently active segment. You'll notice that 2 bits is exactly enough to define 4 levels of privilege.
So in short when we talk about "executing in kernel mode" we really just mean that our active code segment (CS) and data segment selectors point to entries in the GDT which have DPL set to 0. Likewise for userland we have a CS and data segment selectors pointing to GDT entries with DPL set to 3 and no access to kernel addresses. (There are other selectors too, but to keep it simple we'll just consider "code" and "data" for now).
Back to early on during kernel boot up: during start up the kernel creates the GDT entries we need. (These have to be laid out in a specific order for SYSENTER to work, but that's mostly just an implementation detail). There are also some "machine specific registers" that control how our processor behaves. These can only be set by privileged code. Three of them that are important here are:
IA32_SYSENTER_ESP
IA32_SYSENTER_EIP
IA32_SYSENTER_CS
Recall that we've got some code runnig in userland (ring3) that wants to transition to ring0. Let's assume that it has saved any registers that it needs to per the calling convention and put arguments into the right registers that the call expects. We then hit the SYSENTER instruction. (Actually it uses KiFastSystemCall I think). The SYSENTER instruction is special. It modifies the current code and data segment selectors based on the value that the kernel setup in the machine specific register IA32_SYSENTER_CS. (The stack/data segument values are computed as an offset of IA32_SYSENTER_CS). Subsequently the stack pointer itself (ESP) is set to the kernel stack that was setup for handling system calls earlier on and saved into the MSR IA32_SYSENTER_ESP and likewise for EIP the instruction pointer from IA32_SYSENTER_EIP.
Since the CS selector now points to a GDT entry with DPL set to 0 and EIP points to kernel mode code on a kernel stack we're running in the kernel at this point.
From here onwards the kernel mode code can read and write memory from both kernel and userspace (with some appropriate caution) to undertake the actual work needed to perform the system call. The arguments to the system call can be read from registers etc. according to the calling convention, but any arguments that are actually pointers back to userland or handles to kernel objects can be accessed to read larger blocks of data too.
When the system call is over the process is basically reversed and we end up back in userland with DPL 3 for the selectors.
Its the CPU that is acts as intermediate for transfer of information between user memory space(accessible in user mode) to protected memory space(accessible in kernel mode), via CPU registers.
Here's an Example:
Suppose a user writes a program in higher level language. Now when execution of the program happens, CPU generates the virtual addresses.
Now before any read/write operation occurs, the virtual address, is converted to physical address. Because the translation mechanism(memory management unit), is only accessible in kernel mode, cause its stored in protected memory, the translation occurs in kernel mode and the physical address is finally saved into some register of the CPU, and only then a read/write operation occurs.

How to update regmodel with writes going from RTL blocks

I understand that regmodel values are updated as soon as transaction initiates from test environment on any of the connected interfaces.
However consider a scenario:
RTL registers being updated from ROM on boot-up (different value than default)
Processor in RTL writing to register as compared to test environment.
In these 2 cases regmodel doesn't get updated/mirrored with correct RTL value. I would like to know what is correct procedure to get regmodel updated, if there is none at the moment what other approach can be taken to keep these 2 in sync?
For the first case you have to pre-load your memory model with your ROM contents at the start of the simulation. There isn't any infrastructure to do this in uvm_memory (unfortunately), so you'll have to implement it yourself.
For the second case, you have to use a more grey-box approach to verification. You have to monitor the bus accesses that the processor does to the peripherals and update the register values based on those transactions. This should be OK to do from a maintainability point of view, because the architecture of your SoC should be pretty stable in that sense (you've already decided to use a processor so that will always be there, but you might not know which peripherals will make it to the end).