Load immediate direct - cpu-architecture

Load immediate direct - cpu-architecture

Can teach how to do this question?
explain how to use load immediate, direct, indirect?
Thank you.

Check Wikipedia's entry on addressing modes.
Basically, a load immediate (or load literal) will write the number that is contained inside the instruction into the accumulator. E.g. load immediate 5 will place a 5 in the accumulator. accumulator:=5
A load direct will read memory at the address that is contained inside the instruction and place the result in the accumulator. E.g. load direct 5 will read the memory at address 5 and write the result to the accumulator. If contents of memory at address x are designated as memory(address:x) then accumulator:=memory(address:5)
A load indirect will read memory twice. It reads memory at the address that is contained inside the instruction, and then it reads memory another time at the address that was indicated by the first memory location and writes the result in the accumulator. E.g. load indirect 5 will read the memory at the address 5. Let's say that memory location 5 contains a 10. Then, in the second step, the processor will read memory location 10 and place the result in the accumulator. accumulator:=memory(address:memory(address:5))

Related

In MESI cache coherence protocol, when exactly does the state of a cache line change if the data needs to be fetched from memory?

In MESI protocol when a CPU:
Performs a read operation
Finds out the cache line is in Invalid state
There is no other non-invalid copies in other caches
It will need to fetch the data from the memory. This will take a certain number of cycles to do this. So does the state of the cache line change from (I) to (E) instantly or only after data is fetched from memory?

I think a cache would normally wait for the data to arrive; when it's not there yet you can't actually get a hit in cache for other requests to the same line, only to other lines that actually are present (hit under miss). Therefore the state for that line is still Invalid; the data for that tag isn't valid, so you can't set it to a valid state yet.
You'd want another miss to same line (miss under miss) to notice there was already an outstanding request for that line and attach itself to that line-request buffer. (e.g. Intel x86 LFB = line fill buffer). Since finding Invalid triggers looking at fill buffers but Exclusive doesn't, you want Invalid based on this reasoning as well.
e.g. the Skylake perf-counter event mem_load_retired.fb_hit counts, from perf list output:
[Retired load instructions which data sources were load missed L1 but
hit FB due to preceding miss to the same cache line with data not
ready.
Supports address when precise (Precise event)]
In a cache in a very old / simple or toy CPU with no memory-level parallelism (whole pipeline or just memory access totally stalls execution until the data arrives), the distinction is meaningless; nothing else happens to cache while the requested data is in-flight.
In such a CPU it's just an implementation detail. (Except it should still process MESI requests from other cores while a load is in flight so again tags need to reflect the correct state, otherwise it's extra stuff to check when deciding how to reply.)

After data is fetched from memory.
In practice, MESI (or any other protocol) has many transition states in addition to the main states of M/E/S/I. In your example, the coherence protocol would transition to a "Wait for Data Fill" state and will transition to E only after data is fetched and valid bit is set.
Reference: Cache coherence protocols in gem5/ruby-- http://learning.gem5.org/book/part3/MSI/cache-transitions.html (search for "was invalid, going to shared") may be useful.

How do MemReq and MemResp exactly work in RoccIO - RISCV

I'm trying to figure out how can I read from and write to memory in RISCV when I'm using RoCCIO. But I couldn't clearly get what is happening. Especially how can I address the memory or how should I work with memory tag.
Are there any resources that I can find how I can transfer data between Rocket core and my Accelerator?
In the uncore/src/main/scala/consts.scala path they have mentioned different type of memory cmd. But what else?
For example I want to pass starting address of an array and number of elements that I plan to fetch into the accelerator and then start fetching them. What signalling should I use?
Thanks

Within the RoCC interface, the mem field is a connection to the L1 cache. The dmem field is a connection to the L2 cache. Which one you want to use depends upon the memory bandwidth requirements of your accelerator.
Rocket and the RoCC accelerator can either share data through the caches (remember to use a fence instruction on the Rocket core so the memory ordering is correct) or you can directly give data to Rocket through the resp field in the RoCCIO.
The L1 cache's IO can be found in Rocket's (https://github.com/ucb-bar/rocket/blob/master/src/main/scala/nbdcache.scala) whereas the L2 IO can be found in the uncore's (https://github.com/ucb-bar/uncore/blob/master/src/main/scala/tilelink.scala).
Although I don't know which memory tag you are referring to, typically the tag is passed through the memory system and returned to you with the response untouched (if you have multiple requests inflight, this returning tag helps you identify which is which).
I suspect if you want to fetch an array of data, you will need a state machine to request each individual address in your accelerator. Unless you go through the L2 cache interface, in which case I believe it comes in cache-line sizes.

Where does the OS get the needed disk address when page fault happens from?

When a page table entry(PTE) is not marked as valid, it means the data needed is not in memory, but on the disk. So now page fault happens and the OS is responsible to load this page of data from the disk to memory.
My question is, how does the OS know the exact disk address?

You are asking in a system dependent manner. A PTE not marked as valid may mean the address does not exist at all in the process address. A system may have another bit to indicate that the address is valid but logical to physical mapping does not exist.
The operating system needs to maintain a table of where it put the data.
The data can exist in a number of places.
1. It might be uninitialized data that has no mapping anywhere. Respond to the page fault by clearing a physical page and mapping it to the process address space.
It might be in the page file.
Some systems have a separate swap file.
It might be in the executable or shared library file.

The answer given in 2014 is correct. All the processor knows is that the page is missing - or sometimes that it had incorrect permission (e.g., write to a read-only page). At that point the processor generates a "page fault" exception which the kernel gets and now has to handle.
In some cases, this page fault will need to be passed all the way to the application, in Linux as a SIGSEGV ("segmentation violation") signal, e.g., when the user uses a null pointer. But, as you said, more usually, the kernel should and can handle the page fault. The kernel keeps, in its own tables (not inside the page table which is a structure with a specific format dictated by the processor) information about what each virtual-memory page is supposed contain. The following are some of the things the kernel may realize about the faulting page by consulting its own tables. This is not an exhaustive list.
This might be a page mmap()ed from disk. This case includes an application's explicit use of mmap(), but also happens when you run an executable, or use shared libraries - those are also mapped from disk - so the page fault can also happen when the processor executes instructions, not just when reading and writing. The kernel keeps a list of these mappings, so when it gets the page fault it can figure out where on disk it needs to read to get the missing page. So it reads from disk, and when getting the data it puts it in a new page in memory, and sets the page table entry (PTE) to point to this new page with the data, and resumes the application thread - where the faulting instruction is retried and now succeeds.
This may have been a page swapped out to disk. Again, the kernel keeps a table of which pages were swapped out, and where in the swap partition (or swap file, or whatever) this page now lives.
This might have been a write attempt to a "copy on write" page. The kernel needs to make a copy of the source page, and change the address in the page table to point to the new copy, and then allow the write. For example when you allocate a large area of memory, it can point to an existing "zero-filled" page, and only ever allocated when you first write to pages. Another example after fork() the new process's pages are all copy-on-write pages pointing to the original process's pages, and will only be actually copied when first written (by either process).
However, since you are looking for credible sources, maybe you want to read an explanation how the Linux kernel, specifically, does this, for example in:
https://vistech.net/~champ/online-docs/books/linuxkernel2/060.htm.

It is the same as virtual memory addressing.
The addresses that appear in programs are the virtual addresses or program addresses. For every memory access, either to fetch an instruction or data, the CPU must translate the virtual address to a real physical address. A virtual memory address can be considered to be composed of two parts: a page number and an offset into the page. The page number determines which page contains the information and the offset specifies which byte within the page. The size of the offset field is the log base 2 of the size of a page.
If the virtual address is valid, the system checks to see if a page frame is free. If no frames are free, the page replacement algorithm is run to remove a page.

What exactly happens inside the OS to cause the segmentation fault

I have read a lot about the virtual address and paging.
Let me first tell you people what i understood. When a process wants to execute something it tries to load the data from the hard disk to memory. To do this it uses a virtual address. So our MMU validates the virtual address looks into the TLB to find the corresponding physical page, if it doesn't find there it looks into Inverted Page Table and at the end it looks into Page table if it doesn't find an entry over there it generates a page fault and all the swapping of page is done and all the tables will be updated.
And as I read all the processes have different page tables and same virtual address. so if I try access an array element a[1000] which was defined as int a[100] I am sure that am gonna get a segmentation fault cause that instruction might be trying to access a memory that doesn't belong to it. but how OS comes to know that a[1000] doesn't belong to the running process by just using the concept of virtual address and physical pages. Am I missing something here or my entire understanding is wrong?
I know we can say a memory access is illegal if a process is trying to access a read only or sup true memory segment.
at the end the boiling question is how OS decides which memory is allocated to which process and how it decides that this access of memory is illegal.
What is a segmentation fault on Linux?
this link didn't help much .
thanks a lot in advance for all you lovely people's inputs :)

Actually each and every process in the Linux kernel has some well defined structure associated with it called task_struct which stores all the information about the corresponding process like its parent, its PID, its child, its address space, pending signals, threads associated with it etc. Now the address space entries tell the kernel while executing the process the legal address space for that process.Every process has its own address space allotted to it by the kernel right from the very beginning when it is created. So when a executing process tries to access the space outside its legal memory a fault is generated(called segmentation fault in Unix/Linux) and the process is terminated by giving a signal to it by the kernel. Its important for memory protection to be achieved in OS.

On x86, linux uses a combination of segmentation and paging, so the address generated by program first looks up for the corresponding segments base and limit registers values. This gives the virtual address which is then translated using the page table. When you try to access a memory which has not been allocated, the accessed page is beyond what the limit register allows, hence generating a segmentation fault.

Does the MMU mediate everything between the operating system and physical memory or is it just an address translator?

I'm trying to understand how does an operating system work when we want to assign some value to a particular virtual memory address.
My first question concerns whether the MMU handles everything between the CPU and the RAM. Is this true? From what one can read from Wikipedia, I'd say so:
A memory management unit (MMU), sometimes called paged memory
management unit (PMMU), is a computer
hardware component responsible for
handling accesses to memory requested
by the CPU.
If that is the case, how can one tell the MMU I want to get 8 bytes, 64 or 128bytes, for example? What about writing?
If that is not the case, I'm guessing the MMU just translates virtual addresses to physical ones?
What happens when the MMU detects there will be what we call a page-fault? I guess it has to tell it to the CPU so the CPU loads the page itself off disk, or is the MMU able to do this?
Thanks

Devoured Elysium,
I'll attempt to answer your questions one by one but note, it might be a good idea to get your hands on a textbook for an OS course or an introductory computer architecture course.
The MMU consists of some hardware logic and state whose purpose is, indeed, to produce a physical address and provide/receive data to and from the memory controller. Actually, the job of memory translation is one that is taken care of by cooperating hardware and software (OS) mechanisms (at least in modern PCs). Once the physical address is obtained, the CPU has essentially done its job and now sends the address out on a bus which is at some point connected to the actual memory chips. In many systems this bus is called the Front-Side Bus (FSB), which is in turn connected to a memory controller. This controller takes the physical address supplied by the CPU and uses it to interact with the DRAM chips, and ultimately extract the bits in the correct rows and columns of the memory array. The data is then sent back to the CPU, which can now operate on it. Note that I'm not including caching in this description.
So no, the MMU does not interact directly with RAM, which I assume you are using to mean the physical DRAM chips. And you cannot tell the MMU that you want 8 bytes, or 24 bytes, or whatever, you can only supply it with an address. How many bytes that gets you depends on the machine you're on and whether it's byte-addressable or word-addressable.
Your last question urges me to remind you: the MMU is actually a part of the CPU--it sits on the same silicon die (although this was not always the case).
Now, let's take your example with the page fault. Suppose our user-level application wants to, like you said, set someAddress = 10, I'll take it in steps. Let's assume someAddress is 0xDEADBEEF and let's ignore caches for now.
1) The application issues a store instruction to 0xsomeAddress, which, in x86 might look something like
mov %eax, 0xDEADBEEF
where 10 is the value in the eax register.
2) 0xDEADBEEF in this case is a virtual address, which must be translated. Most of the time, the virtual to physical address translation will be available in a hardware structure called the Translation Lookaside Buffer (TLB), which will provide this translation to us very fast. Typically, it can do so in one clock cycle. If the translation is in the TLB, called a TLB hit, execution can continue immediately (i.e. the physical address corresponding to 0xDEADBEEF and the value 10 are sent out to the memory controller to be written).
3) Let's suppose though, that the translation wasn't available in the TLB (called a TLB miss). Then we must find the translation in the page tables, which are structures in memory whose structure is defined by the hardware and managed by the OS. They simply contain entries that map a virtual address to a physical one (more accurately, a virtual page number to a physical page number). But these structures also reside in memory, and so must have addresses! The hardware contains a special register called cr3 which contains the physical address of the current page table. We can index into this page table using our virtual address, so the hardware takes the value in cr3, computes an address by adding an offset, and goes off to memory to fetch the page table entry (PTE). This PTE will (hopefully) contain the physical address corresponding to 0xDEADBEEF, in which case we put this mapping in the TLB (so we don't have to walk the page table again) and continue on our way.
4) But oh no! What if there is no PTE in the page tables for 0xDEADBEEF? This is a page fault, and this is where the Operating System comes into play. The PTE we got out of the page table existed, as in it was (let's assume) a valid memory address to access, but the OS had not created a VA->PA mapping for it yet, so it would have had a bit set to indicate that it is invalid. The hardware is programmed in such a way that when it sees this invalid bit upon an access, it generates an exception, in this case a page fault.
5) The exception causes the hardware to invoke the OS by jumping to a well known location--a piece of code called a handler. There can be many exception handlers, and a page fault handler is one of them. The page fault handler will know the address that caused the fault because it's stored in a register somewhere, and so will create a new mapping for our virtual address 0xDEADBEEF. It will do so by allocating a free page of physical memory and then saying "all virtual addresses between VA x and VA y will map to some address within this newly allocated page of physical memory". 0xDEADBEEF will be somewhere in that range, so the mapping is now securely in the page tables, and we can restart the instruction that caused the page fault (the mov).
6) Now, when we go through the page tables again, we will find a mapping and the PTE we pull out will have a nice physical address, the one we want to store to. We provide this with the value 10 to the memory controller and we're done!
Caches will change this game quite a bit, but I hope this serves to illustrate how paging works. Again, it would benefit you greatly to check out some OS/Computer Architecture books. I hope this was clear.

There are data structures that describe which virtual addresses correspond to which physical addresses. The OS creates and manages these data structures, and the CPU uses them to translate virtual addresses into physical addresses.
For example, the OS might use these data structures to say "virtual addresses in the range from 0x00000000 to 0x00000FFF correspond to physical addresses 0x12340000 to 0x12340FFFF"; and if software tries to read 4 bytes from the virtual address 0x00000468 then the CPU will actually read 4 bytes from the physical address 0x12340468.
Typically everything is effected by the virtual->physical translation (except for when the CPU is accessing the data structures that describe the translation). Also, usually there's some sort of translation cache build into the CPU to help reduce the overhead involved.