What does TCM connection with Icache in this RISCV version? - latency

In the middle of this page (https://github.com/ultraembedded/riscv), there is a block diagram about the core, I really do not know what is TCM doing in the same block with the Icache ? Is it an optional thing to be inside the CPU ?

Some embedded systems provide dedicated memory for code and/or for data.  On some of these systems, Tightly-Coupled Memory serves as a replacement for the (instruction) cache, while on other such systems this memory is in addition to and along side a cache, applying to a certain portion of the address space.  This dedicated memory may be on the chip of the processor.
This memory could be some kind of ROM or other memory that is initialized somehow prior to boot.  In any case, TCM typically isn't backed by main memory, so doesn't suffer cache misses and the associated circuitry, usually also has high performance, like a cache when a hit occurs.
Some systems refer to this as Instruction Tightly Integrated Memory, ITIM, or Data Tightly Integrated Memory, DTIM.
When a system uses ITIM or DTIM, it performs more like a Harvard architecture than the Modified Harvard architecture of laptops and desktops.

The cache has no address space. CPU does not ask for data from the cache, it just asks for a data, then the memory controller first checks the cache if the data is present in the cache. If it is in the cache, data is fetched, if not then the controller checks the RAM. All processor does is ask for data, it does not care where the data came from. In the case of TCM, the CPU can directly write data to TCM and ask data from TCM since it has a specific address. Think of TCM as a RAM that is close to the CPU.

Related

Why place of Mem[MA] in MB then copy from MB to IR rather than going straight from Mem[MA] to IR?

During the fetch stage of the fetch-execute cycle, why are the contents of the cell whose address is in the MA (memory address register) placed in MB (memory buffer) then copied to IR (instruction register), rather than placing the contents of address of MA directly in the IR?
In theory it would be possible to send instruction fetch memory data directly to the IR (or to both the MB and the IR) — this would require extra hardware: wires and muxes.
You may notice that the architecture (depending on which one it is) makes use of few (one or two) busses, and this would effectively add another bus.  So, I think that all we can say is that simplicity is the reason.  Back in the day when processors were this simple, transistor counts were very limited for integrated circuits.
Going in the direction of making things more efficient, nowadays, even simple processors separate instruction (usually cache) memory from data (usually cache) memory.  This independence accomplishes a number of improvements.  MIPS, even the unpipelined single cycle processor, for example:
First, the PC (program counter) register replaces the MA for the instruction fetch side of things and the IR replaces the MB (as if loading directly into that register as you're suggesting), but let's also note that the IR can be reduced from being a true register to being wires whose output is stable for the cycle and thus can be worked on by a decode unit directly.  (Stability is gained by not sharing the instruction memory hardware with the data memory hardware; whereas with only a single memory interface, data has to be copied around and stored somewhere so the interface can be shared for both code & data.)
That saves both the cycle you're referring to: to transfer data from MB to IR, but also the cycle before to capture the data in the MB register in the first place.  (Generally speaking, enregistering some data requires a cycle, so if you can feed wires without enregistering, that's better, all other factors being the same.)
(Also depending on the architecture you're looking at, the PC on MIPS uses dedicated increment unit (adder) rather than attempting to share the main ALU and/or the busses for that increment — that could also save a cycle or two.)
Second, meanwhile the data memory can run concurrently with the instruction memory (a nice win) executing a data load from memory or store to memory in parallel with the fetch of the next instruction.  The data side also forgoes the MB register as temporary parking place, and instead can load memory data directly into a processor register (the one specified by the load instruction).
Having two dedicated memories creates an independence that reduces the need for register capture while also allowing for parallelism, of course requiring more hardware for the design.

Does a user process have any control over paging?

A program might have some data that, when needed, it wants to access very fast. Let's call this VIP data. It would like to reduce the likelihood that page in memory that the VIP data resides on gets swapped to disk when memory utilization is high on the system. What types of control/influence does it have over this?
For example, I think it can consider the page replacement policy and try to influence the OS to not swap this VIP data to disk. If the policy is LRU, the program can periodically read the VIP data to ensure that the page has always been accessed fairly recently. A program can also use a very small amount of memory in total, making it likely that all its pages are recently accessed when it runs and therefore the VIP data is not likely swapped to disk.
Can it exert any more explicit control over paging?
In order to do this, you might consider
Prioritising the process using renice command or
Lock the processes in the main memory using MLOCK(2)
This is entirely operating system dependent. On some systems, if you have appropriate privileges you can lock pages in physical memory.

HSA Data copy between RAM and GPU-RAM

Reading the wikipage about HSA found this block diagram.
Could not understand the benefits of passing pointer through PCI-ex
Does this avoids data copying from system memory to graphics memory ?
As far as I understand to process the content of the pointer the GPU will need it to be present in the graphics memory.
If you have separate graphics memory, but you're doing HSA, you have to somehow unify the address spaces. The CPU can see graphics memory, mapped to physical address space. And the GPU can access main memory via DMA. You can set up the CPU and GPU with page tables that direct the same virtual addresses to the same place, which will require one of them to go (transparently) over the PCIe bus.
Where you save time and energy is that you don't have to copy everything you MIGHT want to access; the CPU and GPU access only the data they actually need to use.

How are applications and data accessed by the CPU from RAM

I am having a bit of trouble understanding how applications and data are accessed by the CPU from RAM after the application has been loaded into RAM and a file opened (thus data for the file also stored in RAM).
By my understanding, a CPU just gets instructions from RAM as the program counter ticks or carries out tasks after an interrupt. How then does it access the application and data. Is it that it doesn't and still just gets instructions (for example to load a file on the hard drive to be opened in the application) and processes any requests made by the application which are stored in RAM as instructions thereafter (like saving a file). Or does the application and data relating to an opened file (for example) just stay in RAM and not get accessed by the CPU at all.
Similarly, after reading an article, it said that a copy of the operating system is stored in RAM. The CPU can then access the operating system. (I thought the CPU just worked with instructions from RAM). How does it then communicate with the operating system and how are interrupts sent to the CPU, from the copy of the OS in RAM or from the OS in the hard drive.
Sorry if this is really confusing, alot i didn't understand.
Root of your question: Lack of clear differentiation between Computer's Hardware and Computer's Software.
Components of a Computer System
Just so that we are clear about both of them and that we understand their nature, let me state as follows:
Hardware: It includes CPU, RAM, Disk, Register, Graphics Card, Network Card, Memory BUS and everything that you can touch and call to be the 'Computer'. It is the body.
Software: It includes Operating System, Program, CPU instruction, Compiler, Programming Language and almost everything intangible about the computer. It is the soul.
Firmware: It is that basic code which is absolutely essential for hardware's working. This is stored on a Read Only Memory installed in the hardware itself. This piece of software is vital for hardware therefore is considered in the mid of hardware and software and hence called Firmware.
We will start with understanding from the time when we say that the computer is up and running and is properly executing our instructions. But at that time you will say - How did I reach here? So I will mention a few points about the startup of the computer.
When the power button is pressed...
...the most primitive and basic input output system (therefore called BIOS), which is hard written on the computer hardware begins execution. This is written on Read Only Memory and this starts the process to get the machine to stand on its own. And it loads the software (Operating System) from one piece of hardware (disks) into another piece of hardware (RAM and CPU registers) enabling the software to work properly with hardware.
Now the body and soul are together and the individual (machine) can work.
Until now, OS is already in RAM and CPU. (Read When the power button is pressed if you doubt it.) Let's handle your question paragraph by paragraph now -
First Paragraph
I am having a bit of trouble understanding how applications and data
are accessed by the CPU from RAM after the application has been loaded
into RAM and a file opened (thus data for the file also stored in
RAM).
The explanation is as follows:
The exact issue here is your thinking that it is CPU and RAM that access the data. CPU and RAM are only executing units.
It is OS (software) that accesses the data by means of CPU and RAM (hardware). It is in the realm of OS where applications are executed.
This is why you can install Linux and Windows on same hardware but cannot execute .exe files in Linux because OS does the execution and not RAM/CPU.
Further, how do CPU and RAM and disk physically interact to bring in the data, execute it, save it back etc. is in the domain of hardware. That would require explanation which involves logic gates (AND, OR, NOT...), diodes, circuitry and a hell lot of other things which an Electronics guy can explain.
Second Paragraph
By my understanding, a CPU just gets instructions from RAM as the
program counter ticks or carries out tasks after an interrupt. How
then does it access the application and data. Is it that it doesn't
and still just gets instructions (for example to load a file on the
hard drive to be opened in the application) and processes any
requests made by the application which are stored in RAM as
instructions thereafter (like saving a file).
As you have guessed it - CPU doesn't get instructions, Operating System does it through CPU. Also, just the way brain doesn't directly instruct the hands and legs to move and instead uses nerves for interaction, the CPU doesn't tell the disks to give/take the data. CPU works with RAM and registers only. Multiple units of hardware work in conjunction to provide a path for data and instruction to travel. The important pieces of involved hardware are:
Processor (CPU and registers built in the CPU)
Cache
Memory (RAM)
Disk
Tape
I like the image provided in this answer. This image not only lists the hardware pieces but also illustrates the mammoth difference in the execution speed of these pieces.
Let's move on to the...
Third Paragraph
Similarly, after reading an article, it said that a copy of the
operating system is stored in RAM. The CPU can then access the
operating system. (I thought the CPU just worked with instructions
from RAM). How does it then communicate with the operating system and
how are interrupts sent to the CPU, from the copy of the OS in RAM or
from the OS in the hard drive.
By now you already know that indeed OS is present in RAM and CPU registers. That is where it lives. That is from where it tells the CPU how to work. If OS would be small enough (or if Registers and Caches would be big enough), the OS would live even closer to CPU.
The CPU does not communicate with the OS. It can't. It is the worker that is controlled by a boss. OS is that boss.
CPU cannot access Operating System. CPU is the body, OS is the soul. Soul tells the body what to do, not vice-versa.
CPU doesn't work with instructions from RAM. It merely executes the instructions given by the Operating System (which may be living in RAM). So even when there is an instruction to load some module of OS into the RAM, it is not RAM/CPU but OS itself that issues that instruction.
Interrupts are of two types - Hardware and Software - and your query is about the software interrupts. Since the executive part of OS is in the RAM, in simple words we can say that interrupts are sent to CPU from OS living in RAM.
Conclusions
The lack of distinction between hardware and software is the basic cause of your confusions. Take some course about Operating Systems on Coursera or Academic Earth for deeper understanding.
It is confusing indeed. Let me try to explain.
CPU and RAM
The CPU is hardwired to the RAM via the 'motherboard', and they work together. The CPU can perform many instructions, but it has to be told what to do by instructions in RAM. The CPU is basically in a loop: all it does it fetch the next instruction from RAM and execute it, over and over.
So how does this RAM get filled with instructions?
BIOS (basic input/output system)
When the computer first boots up, a portion of RAM is filled with data from a chip on the motherboard (the BIOS chip), and the CPU is turned on and starts processing. These are the factory settings.
The data from the BIOS chip that is copied to RAM consists of a library of instructions to access hardware devices (hard disks, CD/ROM, USB storage, network cards etc.),
and a program using that library to load what is called the bootsector, the first sector on the boot device, into RAM, and transfer control to it (with a jump instruction).
BOOTLOADER
The bootsector data that the BIOS program loaded from the boot device is very small - only 440 bytes - but with the help of the BIOS library, this is enough to be able to load more sectors and execute these. The bootsector and the data it loads is called the bootloader, which is in charge of loading the Operating System.
In effect, the bootloader is a more dynamic version of the BIOS: the BIOS program resides in flash memory, whereas the bootloader resides on hard disks, USB sticks, SSD drives etc., and thus can be larger and more complex.
OPERATING SYSTEM
In it's turn, The operating system (OS) is simply a more advanced version of the bootloader, as it can load and run multiple programs from multiple locations at the same time.
--
The BIOS knows about drives.
The Bootloader knows about drives and partitions.
The OS knows about drives, partitions, and file systems.
CPU,as you've noticed, reads the program from RAM, instruction by instruction. When an instruction is executed, it might refer to data stored in memory, which it either fetches explicitly to the registers (internal storage of the CPU, quite small - on x86_64 that's like several 64-bit registers + other stuff like segment registers, IP, SP etc) with a separate instruction, or the data read from the memory (we are talking about small amount of data). That's all it really does.
Loading a file from a disk would be done by asking the appropriate controller to fetch the data into a specific place in memory. CPU is connected to buses which will carry instructions to appropriate controllers.
As to interrupts these are special things - CPU has several interrupt lines which can be activated by various devices, for example your network card. When it receives such an interrupt, it is usually handled by an interrupt handler, which is just a program located in a well-known place in memory. They can be registered by, for example, operating system. Each interrupt line has its own interrupt handler. When interrupt happens, the CPU saves the current state of the program it happens to be executing, handles interrupt, restores the state and resumes the program.
You seem to be asking about addressing modes. At the risk of gross oversimplification (ignoring caching, segments, and logical memory), memory stored as a sequential array accessed by an integer address.
The CPU has a number of internal storage areas called registers. We will call them R0 to Rn. The processor assigns some registers dedicated purposes. One of those registers is the PC.
One common addressing mode is deferred. I indicate this mode as (Rn). An instruction like this:
MOV (R0), R1
uses the value contained in R0 as a memory address, fetches the value stored that memory location, and stores a copy of that value in R1.
An instruction sequence like this:
MOV (R0), R1
MOV (R2), R3
is stored in memory as data (ignoring protection), code, data, and variables all use the same type of memory. In other words, any memory location can be interpreted as code, data, or variable.
The CPU executes the next instruction located at (PC). After executing the instruction, the CPU automatically increments the PC to point to the next instruction.

Does it make sense to cache data obtained from a memory mapped file?

Or it would be faster to re-read that data from mapped memory once again, since the OS might implement its own cache?
The nature of data is not known in advance, it is assumed that file reads are random.
i wanted to mention a few things i've read on the subject. The answer is no, you don't want to second guess the operating system's memory manager.
The first comes from the idea that you want your program (e.g. MongoDB, SQL Server) to try to limit your memory based on a percentage of free RAM:
Don't try to allocate memory until there is only x% free
Occasionally, a customer will ask for a way to design their program so it continues consuming RAM until there is only x% free. The idea is that their program should use RAM aggressively, while still leaving enough RAM available (x%) for other use. Unless you are designing a system where you are the only program running on the computer, this is a bad idea.
(read the article for the explanation of why it's bad, including pictures)
Next comes from some notes from the author of Varnish, and reverse proxy:
Varnish Cache - Notes from the architect
So what happens with squids elaborate memory management is that it gets into fights with the kernels elaborate memory management, and like any civil war, that never gets anything done.
What happens is this: Squid creates a HTTP object in "RAM" and it gets used some times rapidly after creation. Then after some time it get no more hits and the kernel notices this. Then somebody tries to get memory from the kernel for something and the kernel decides to push those unused pages of memory out to swap space and use the (cache-RAM) more sensibly for some data which is actually used by a program. This however, is done without squid knowing about it. Squid still thinks that these http objects are in RAM, and they will be, the very second it tries to access them, but until then, the RAM is used for something productive.
Imagine you do cache something from a memory-mapped file. At some point in the future that memory holding that "cache" will be swapped out to disk.
the OS has written to the hard-drive something which already exists on the hard drive
Next comes a time when you want to perform a lookup from your "cache" memory, rather than the "real" memory. You attempt to access the "cache", and since it has been swapped out of RAM the hardware raises a PAGE FAULT, and cache is swapped back into RAM.
your cache memory is just as slow as the "real" memory, since both are no longer in RAM
Finally, you want to free your cache (perhaps your program is shutting down). If the "cache" has been swapped out, the OS must first swap it back in so that it can be freed. If instead you just unmapped your memory-mapped file, everything is gone (nothing needs to be swapped in).
in this case your cache makes things slower
Again from Raymon Chen: If your application is closing - close already:
When DLL_PROCESS_DETACH tells you that the process is exiting, your best bet is just to return without doing anything
I regularly use a program that doesn't follow this rule. The program
allocates a lot of memory during the course of its life, and when I
exit the program, it just sits there for several minutes, sometimes
spinning at 100% CPU, sometimes churning the hard drive (sometimes
both). When I break in with the debugger to see what's going on, I
discover that the program isn't doing anything productive. It's just
methodically freeing every last byte of memory it had allocated during
its lifetime.
If my computer wasn't under a lot of memory pressure, then most of the
memory the program had allocated during its lifetime hasn't yet been
paged out, so freeing every last drop of memory is a CPU-bound
operation. On the other hand, if I had kicked off a build or done
something else memory-intensive, then most of the memory the program
had allocated during its lifetime has been paged out, which means that
the program pages all that memory back in from the hard drive, just so
it could call free on it. Sounds kind of spiteful, actually. "Come
here so I can tell you to go away."
All this anal-rententive memory management is pointless. The process
is exiting. All that memory will be freed when the address space is
destroyed. Stop wasting time and just exit already.
The reality is that programs no longer run in "RAM", they run in memory - virtual memory.
You can make use of a cache, but you have to work with the operating system's virtual memory manager:
you want to keep your cache within as few pages as possible
you want to ensure they stay in RAM, by the virtue of them being accessed a lot (i.e. actually being a useful cache)
Accessing:
a thousand 1-byte locations around a 400GB file
is much more expensive than accessing
a single 1000-byte location in a 400GB file
In other words: you don't really need to cache data, you need a more localized data structure.
If you keep your important data confined to a single 4k page, you will play much nicer with the VMM; Windows is your cache.
When you add 64-byte quad-word aligned cache-lines, there's even more incentive to adjust your data structure layout. But then you don't want it too compact, or you'll start suffering performance penalties of cache flushes from False Sharing.
The answer is highly OS-specific. Generally speaking, there will be no sense in caching this data. Both the "cached" data as well as the memory-mapped can be paged away at any time.
If there will be any difference it will be specific to an OS - unless you need that granularity, there is no sense in caching the data.