How does SEND bandwidth improve when the registered memory is aligned to system page size? (In Mellanox IBD)

How does SEND bandwidth improve when the registered memory is aligned to system page size? (In Mellanox IBD) - pci

Operating System: RHEL Centos 7.9 Latest
Operation:
Sending 500MB chunks 21 times from one System to another connected via Mellanox Cables.
(Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6])
(The registered memory region (500MB) is reused for all the 21 iterations.)
The gain in Message Send Bandwidth when using aligned_alloc() (with system page size 4096B) instead of malloc() for the registered memory is around 35Gbps.
with malloc() : ~86Gbps
with aligned_alloc() : ~121Gbps
Since the CPU is not involved for these operations, how is this operation faster with aligned memory?
Please provide useful reference links if available that explains this.
What change does aligned memory bring to the read/write operations?
Is it the address translation within the device that gets improved?
[Very limited information is present over the internet about this, hence asking here.]

RDMA operations use either MMIO or DMA to transfer data from the main memory to the NIC via the PCI bus - DMA is used for larger transfers.
The behavior you're observing can be entirely explained by the DMA component of the transfer. DMA operates at the physical level, and a contiguous region in the Virtual Address Space is unlikely to be mapped to a contiguous region in the physical space. This fragmentation incurs costs - there's more translation needed per unit of transfer, and DMA transfers get interrupted at physical page boundaries.
[1] https://www.kernel.org/doc/html/latest/core-api/dma-api-howto.html
[2] Memory alignment

Related

Is micro kernel possible without MMU?

In the following link;
https://www.openhub.net/p/f9-kernel
F9 Microkernel runs on Cortex M, but Cortex M series doesn't have MMU. My knowledge on MMU and Virtual Memory are limited hence the following quesitons.
How the visibility of entire physical memory is prevented for each process without MMU?
Is it possible to achieve isolation with some static memory settings without MMU. (with enough on chip RAM to run my application and kernel then, just different hard coded memory regions for my limited processes). But still I don't will this prevent the access?

ARM Cortex-M processors lack of MMU, and there is optional memory protection unit (MPU) in some implementations such as STMicroelectronics' STM32F series.
Unlike other L4 kernels, F9 microkernel is designed for MPU-only environments, optimized for Cortex M3/M4, where ARMv7 Protected Memory System Architecture (PMSAv7) model is supported. The system address space of a PMSAv7 compliant system is protected by a MPU. Also, the available RAM is typically small (about 256 Kbytes), but a larger Physical address space (up to 32-bit) can be used with the aid of bit-banding.
MPU-protected memory is divided up into a set of regions, with the number of regions supported IMPLEMENTATION DEFINED. For example, STM32F429, provides 8 separate memory regions. In PMSAv7, the minimum protect region size is 32 bytes, and maximum is up to 4 GB. MPU provides full access over:
Protection region
Overlapping protection region
Access permissions
Exporting memory attributes to the system
MPU mismatches and permission violations invoke the programmable priority MemManage fault handler.
Memory management in F9 microkernel, can split into three conceptions:
memory pool, which represents the area of PAS with specific attributes (hardcoded in mem map table).
address space - sorted list of fpages bound to particular thread(s).
flexible page - unlike traditional pages in L4, fpage represent in MPU region instead.

Yes, but ....
There is no requirement for an MMU at all, things just get less convenient and flexible. Practically, anything that provides some form of isolation (e.g. MPU) might be good enough to make a system work - assuming you do need isolation at all. If you don't need it for some reason and just want the kernel to do scheduling, then a kernel can do this without an MMU or MPU also.

How does CPU access BIOS instructions stored in external memory?

During the process of booting, CPU reads address of system BIOS from the Reset Vector and jumps to the location where BIOS is stored. My question here is:
*As BIOS is stored on some external memory like EEPROM (and not on main memory) , how does CPU access this external memory ?
*Is this external memory already mapped to some region of main memory?
and does the CPU just jump to this mapped region to access BIOS instructions
Or it actually accesses the instructions from external memory where BIOS is stored?

First I can refer you to a detailed article:
https://resources.infosecinstitute.com/system-address-map-initialization-x86x64-architecture-part-2-pci-express-based-systems/#gref
But I will summarize here:
When CPU is "resetted", the reset vector interrupt (a specific memory address - 0xFFFFFFF0H) is executed - and the ROM content has to be there at that specific address.
Intel Reset Vector
How is the BIOS ROM mapped into address space on PC?
Who loads the BIOS and the memory map during boot-up
0xffff0 and the BIOS (hardwired address mapping is also explained/emphasized here)
When BIOS is executed, it will also initialize hardware like VGA, and initialize DRAM memory. Sometimes RAM memory and BIOS may overlapped, and usually the OS will takeover and reimplement all the functionalities of the BIOS (whis is specific to each motherboard).
What information does BIOS load into RAM?
https://resources.infosecinstitute.com/system-address-map-initialization-in-x86x64-architecture-part-1-pci-based-systems/
Diagram below illustrate how motherboard designer will design the address ranges usable by the different hardware peripherals to lie in certain ranges, and the OS then has the responsibilities to allocate RAM ranges to lie in the unused by hardware regions. Don't forget that each core (for 32-bit) can only access 4GB memory - but phyical memory available can be much more than that. This is where pagetable comes in.
Once the pagetable is setup, then only the TLB and pagetable can be used - which is to provide indirect and efficient access to the RAM memory.

Normally the CPU access the data and information through by interfacing with the SPI in turn communicates with the EEEPROM to fulfill the task requested or deliver the information requested by the CPU.
And no, the external memory is not mapped anywhere and no the CPU does not just jump to it. It communicates with what it or the BIOS needs through SPI or I^C depending on the age of the machine.

Diff. between Logical memory and Physical memory

While understanding the concept of Paging in Memory Management, I came through the terms "logical memory" and "physical memory". Can anyone please tell me the diff. between the two ???
Does physical memory = Hard Disk
and logical memory = RAM

There are three related concepts here:
Physical -- An actual device
Logical -- A translation to a physical device
Virtual -- A simulation of a physical device
The term "logical memory" is rarely used because we normally use the term "virtual memory" to cover both the virtual and logical translations of memory.
In an address translation, we have a page index and a byte index into that page.
The page index to the Nth path in the process could be called a logical memory. The operating system redirects the ordinal page number into some arbitrary physical address.
The reason this is rarely called logical memory is that the page made be simulated using paging, becoming a virtual address.
Address transition is a combination of logical and virtual. The normal usage is to just call the whole thing "virtual memory."
We can imagine that in the future, as memory grows, that paging will go away entirely. Instead of having virtual memory systems we will have logical memory systems.

Not a lot of clarity here thus far, here goes:
Physical Memory is what the CPU addresses on its address bus. It's the lowest level software can get to. Physical memory is organized as a sequence of 8-bit bytes, each with a physical address.
Every application having to manage its memory at a physical level is obviously not feasible. So, since the early days, CPUs introduced abstractions of memory known collectively as "Memory Management." These are all optional, but ubiquitous, CPU features managed by your kernel:
Linear Memory is what user-level programs address in their code. It's seen as a contiguous addresses space, but behind the scenes each linear address maps to a physical address. This allows user-level programs to address memory in a common way and leaves the management of physical memory to the kernel.
However, it's not so simple. User-level programs address linear memory using different memory models. One you may have heard of is the segmented memory model. Under this model, programs address memory using logical addresses. Each logical address refers to a table entry which maps to a linear address space. In this way, the o/s can break up an application into different parts of memory as a security feature (details out of scope for here)
In Intel 64-bit (IA-32e, 64-bit submode), segmented memory is never used, and instead every program can address all 2^64 bytes of linear address space using a flat memory model. As the name implies, all of linear memory is available at a byte-accessible level. This is the most straightforward.
Finally we get to Virtual Memory. This is a feature of the CPU facilitated by the MMU, totally unseen to user-level programs, and managed by the kernel. It allows physical addresses to be mapped to virtual addresses, organized as tables of pages ("page tables"). When virtual memory ("paging") is enabled, tables can be loaded into the CPU, causing memory addresses referenced by a program to be translated to physical addresses transparently. Page tables are swapped in and out on the fly by the kernel when different programs are run. This allows for optimization and security in process/memory management (details out of scope for here)
Keep in mind, Linear and Virtual memory are independent features which can work in conjunction. If paging is disabled, linear addresses map one-to-one with physical addresses. When enabled, linear addresses are mapped to virtual memory.
Notes:
This is all linux/x86 specific but the same concepts apply almost everywhere.
There are a ton of details I glossed over
If you want to know more, read The Intel® 64 and IA-32 Architectures Software Developer Manual, from where I plagiarized most of this

I'd like to add a simple answer here.
Physical Memory : This is the memory that is actually present and every process needs space here to execute their code.
Logical Memory:
To a user program the memory seems contiguous,Suppose a program needs 100 MB of space in memory,To this program a virtual address space / Logical address space starts from 0 and continues to some finite number.This address is generated by CPU and then The MMU then maps this virtual address to real physical address through some page table or any other way the mapping is implemented.
Please correct me or add some more content here. Thanks !

Physical memory is RAM; Actually belongs to main memory. Logical address is the address generated by CPU. In paging,logical address is mapped into physical address with the help of page tables. Logical address contains page number and an offset address.

An address generated by the CPU is commonly referred to as a logical address, whereas an address seen by the memory unit—that is, the one loaded into the memory-address register of the memory—is commonly referred to as a physical address

The physical address is the actual address of the frame where each page will be placed, whereas the logical address is the address generated by the CPU for each page.
What exactly is a frame?
Processes are retrieved from secondary memory and stored in main memory using the paging storing technique.
Processes are kept in secondary memory as non-contiguous pages, which implies they are stored in random locations.
Those non-contiguous pages are retrieved into main Memory as a frame by the paging operating system.
The operating system divides the memory frame size equally in main memory, and all processes retrieved from secondary memory are stored concurrently.

Memory mapped IO - how is it done?

I've read about the difference between port mapped IO and memory mapped IO, but I can't figure out how memory mapped Io is implemented in modern operating systems (windows or linux)
What I know is that a part of the physical memory is reserved to communicate with the hardware and there's a MMIO Unit involved in taking care of the bus communication and other memory-related stuff
How would a driver communicate with underlying hardware? What are the functions that the driver would use? Are the addresses to communicate with a video card fixed or is there some kind of "agreement" before using them?
I'm still rather confused

The following statement in your question is wrong:
What I know is that a part of the physical memory is reserved to communicate with the hardware
A part of the physical memory is not reserved for communication with the hardware. A part of the physical address space, to which the physical memory and memory mapped IO are mapped, is. This memory layout is permanent, but user programs do not see it directly - instead, they run into their own virtual address space to which the kernel can decide to map, wherever it wants, physical memory and IO ranges.
You may want to read the following articles which I believe contain answers to most of your questions:
http://duartes.org/gustavo/blog/post/motherboard-chipsets-memory-map
http://duartes.org/gustavo/blog/post/memory-translation-and-segmentation
http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory

http://en.wikipedia.org/wiki/Memory-mapped_I/O
http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/IO/mapped.html
Essentially it is just a form of accessing the data, as if you are saving / reading from the memory. But the hardware will snoop on the address bus, and when it sees the address targetting for him, it will just receive the data on the data bus.

Are you asking about Memory mapped files, or memory mapped port-IO?
Memory mapped files are done by paging out the pages and intercepting page-faults to those addresses. This is all done by the OS by negotiation between the file-system manager and the page-fault handler.
Memory mapped port-IO is done at the CPU level by overloading address lines as port-IO lines which allow writes to memory to be translated onto the QPI bus lines as port-IO. This is all done by the processor interacting with the motherboard. The only other thing that the OS needs to do is to tell the MMU not to coalese reads and writes through the PAE must-writethrough and no-cache bits.

what is the difference between memory mapped io and io mapped io

Pls explain the difference between memory mapped IO and IO mapped IO

Uhm,... unless I misunderstood, you're talking about two completely different things. I'll give you two very short explanations so you can google up what you need to now.
Memory-mapped I/O means mapping I/O hardware devices' memory into the main memory map. That is, there will be addresses in the computer's memory that won't actually correspond to your RAM, but to internal registers and memory of peripheral devices. This is the machine architecture Pointy was talking about.
There's also mapped I/O, which means taking (say) a file, and having the OS load portions of it in memory for faster access later on. In Unix, this can be accomplished through mmap().
I hope this helped.

On x86 there are two different address spaces, one for memory, and another one for I/O ports.
The port address space is limited to 65536 ports, and is accessed using the IN/OUT instructions.
As an example, a video card's VGA functionality can be accessed using some I/O ports, but the framebuffer is memory-mapped.
Other CPU architectures only have one address space. In those architectures, all devices are memory-mapped.

Memory mapped I/O is mapped into the same address space as program memory and/or user memory, and is accessed in the same way.
Port mapped I/O uses a separate, dedicated address space and is accessed via a dedicated set of microprocessor instructions.
As 16-bit processors will slowly become obsolete and replaced with 32-bit and 64-bit in general use, reserving ranges of memory address space for I/O is less of a problem, as the memory address space of the processor is usually much larger than the required space for all memory and I/O devices in a system.
Therefore, it has become more frequently practical to take advantage of the benefits of memory-mapped I/O.
The disadvantage to this method is that the entire address bus must be fully decoded for every device. For example, a machine with a 32-bit address bus would require logic gates to resolve the state of all 32 address lines to properly decode the specific address of any device. This increases the cost of adding hardware to the machine.
The advantage of IO Mapped IO system is that less logic is needed to decode a discrete address and therefore less cost to add hardware devices to a machine. However more instructions could be needed.
Ref:- Check This link

I have one more clear difference between the two. The memory mapped I/O device is that I/O device which respond when IO/M is low. While a I/O (or peripheral) mapped I/O device is that which respond when IO/M is high.

Memory mapped I/O is mapped into the same address space as program memory and/or user memory, and is accessed in the same way.
I/O mapped I/O uses a separate, dedicated address space and is accessed via a dedicated set of microprocessor instructions.
The difference between the two schemes occurs within the Micro processor’s / Micro controller’s. Intel has, for the most part, used the I/O mapped scheme for their microprocessors and Motorola has used the memory mapped scheme.
https://techdhaba.com/2018/06/16/memory-mapped-i-o-vs-i-o-mapped-i-o/

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse