PCIe DMA aarch64 0x10 Translation Fault - linux-device-driver

I'm trying to write a PCIe driver to DMA pages from the host memory to an FPGA. My host setup is Cavium ThunderX2 and my FPGAs are Xilinx Alveo U50.
A DMA from/to the host causes the ARM SMMU v3.4 to throw an event 0x10 Translation fault. I'm using dma_map_single(..) and dma_alloc_coherent(..) Linux APIs to map the virtual address of the page to a DMA-capable address.
Further inspecting the event records, Context Descriptor, and Stream Table Entries, I have the following information.
Type of Fault - F_TRANSLATION (Translation Fault)
S2 == 0 (Stage 1 Fault - Virtual Address -> Intermediate Physical Address stage)
Class of Fault = TT/TTD (Translation Table Descriptor Fetch)
PnU == Underprivileged Access
T0SZ == 5'b01000 (16); T1SZ == 5'b00000 (IGNORED because EPD1 == 1)
VAS == 49 bits (Virtual Address Size)
TG0 == 00 (4 kB page granule size)
EPD0 == 0 (Stage 1 page table walk enabled)
EPD1 == 1 (Stage 2 is bypassed)
TB0/1 == 0 (Top byte ignore disabled)
IPS == 44 bits (Input Address size)
SMMU Config = 3'b101 (Stage 1 translation enabled, Stage 2 bypassed)
Sample Virtual and DMA address of the page obtained -
Virtual Address - 0xFFFF--- (64-bit value)
DMA Address - 0x9F733CA000 (looks within the range defined by T0SZ and compliant with the IPS)
I'm unable to figure why I'm getting a Stage 1 translation fault when everything looks fine. Technically, I should be getting a Stage 2 fault since it is bypassed and the input address should translate through the TTB0.
P.S. I'm a newbie to ARM v8. Let me know if you need additional information in the comments.
Attached is a picture of the fault F_TRANSLATION.

I was able to fix the issue. There was a synchronization lapse between the IOMMU and the DMA mapping. There were no valid descriptors found for the mapped DMA addresses in the SMMU.
I used dma_alloc_coherent(SZ_2M) to get a buffer region and use IOMMU domain ops to map the IOVA to the SMMU.
int ret = iommu_domain->ops->map(domain, IOVA, size, phys_addr, PROT)
Now the SMMU is able to fetch and translate the IOVA.
For some reason, dma_map_single(..) doesn't work with my current implementation. I have to investigate why a streaming DMA API doesn't work.

Related

Does modern operating-systems use the MCFG ACPI table to find the registers of the xHCI?

I'm working on a minimal hobby operating-system. I want to write a driver for xHCI in order to support USB keyboards and mouses. I'm using QEMU for virtualization. I'm passing the -device qemu-xhci parameter to qemu so that it emulates the xHCI. I was wondering if the modern operating-systems use the MCFG ACPI table to find the configuration space of PCI (and the registers of the xHCI)? I was also wondering how to enable PCI-Express in QEMU so that I can find a MCFG table in RAM so that I can support the latest technology in my hobby OS.
To find all xHCI controllers you search PCI configuration space for devices ("functions") with matching "class/subclass/progID" values (see note 2); which means that you have to find a way to access PCI configuration space first.
On 80x86; there are 3 possible ways to access PCI configuration space - 2 that use IO ports ("mechanism #1" and the deprecated "mechanism #2"), and one that maps PCI configuration space into the physical address space (called "Enhanced Configuration Access Mechanism").
If the Enhanced Configuration Access Mechanism is supported; the MCFG ACPI table describes how PCI configuration space is mapped into the physical address space. Primarily; PCI buses are described as "groups of buses", where each group (defined by a "starting bus number" and "total buses in this group" pair) has a base physical address, and the correct physical address for a PCI function is determined by finding information for the relevant group of buses for the requested bus number, then doing a calculation like:
physical_address = base_physical_address_for_group +
(bus_number - starting_bus_number_for_group) << 20 +
device_number << 15 +
function_number << 12 +
offset;
Note 1: because most operating systems use virtual memory it's possible for an OS to create a nice "virtually linear" mapping of the ("possibly physically disjoint") physical memory areas described by MCFG ACPI table (while using the same page full of zeros mapped as read-only to fill any gaps in the "virtually linear mapping"); so that the OS can use a simplified approach (with no need to find information for the relevant group of buses) like:
virtual_address = PCI_config_space_base_virtual_address +
bus_number << 20 +
device_number << 15 +
function_number << 12 +
offset;
Note 2: An OS doesn't/shouldn't literally search PCI configuration space each time it wants to start a device driver for one specific type of device. Instead an OS typically enumerates PCI buses once during boot (and possibly after boot in response to a notification if "hot-plug PCI" is supported) and starts device drivers based on the results of that enumeration. In other words, it's more like "I found an xHCI controller and need to start the appropriate driver" and not like "I want to start an xHCI driver and need to find the appropriate device/s".

STM32 HAL_I2C_Master_Transmit - Why we need to shift address?

after stumbling upon very strange thing I would like to find out if anyone could provide reasonable explanation.
I have SHT31 humidity sensor running on I2C and after trying to run it on STM32F2 it didn't work.
uint8_t __data[5]={0};
__data[0] = SHT31_SOFTRESET >> 8;
__data[1] = SHT31_SOFTRESET & 0xFF;
HAL_I2C_Master_Transmit(&hi2c3,((uint16_t)0x44)<<1,__data,2,1000);
I have opened the function and saw:
/**
* #brief Transmits in master mode an amount of data in blocking mode.
* #param hi2c Pointer to a I2C_HandleTypeDef structure that contains
* the configuration information for the specified I2C.
* #param DevAddress Target device address: The device 7 bits address value
* in datasheet must be shifted to the left before calling the interface
* #param pData Pointer to data buffer
* #param Size Amount of data to be sent
* #param Timeout Timeout duration
* #retval HAL status
*/
HAL_StatusTypeDef HAL_I2C_Master_Transmit(I2C_HandleTypeDef *hi2c, uint16_t DevAddress, uint8_t *pData, uint16_t Size, uint32_t Timeout)
{
/* Init tickstart for timeout management*/
uint32_t tickstart = HAL_GetTick();
if (hi2c->State == HAL_I2C_STATE_READY)
....... and it goes ....
So I followed the comment and frustration from my scope (looking why my bits are not going on the wire) and did:
HAL_I2C_Master_Transmit(&hi2c3,((uint16_t)0x44)<<1,__data,2,1000);
Finally my bits are going out and device ACKs me back - voila it works!
But why?? What would be the reason behind putting burden on the programmer to shift the address?
Because the programmer should probably be made aware if he wants to read or write data to or from the I2C slave device.
In common I2C communication the first seven bits of the "address byte" contains the slave address, whereas the last bit is a read/write bit. 0 is write and 1 is read.
In your case, you want to write data to the device (to perform a soft reset) and therefore a simple left shift will do the trick.
It has never been agreed whether an I2C address is to be specified:
such that it needs to be shifted for transmission, or
such that it does not need to be shifted for transmission.
Therefore some device datasheets specify it in variant 1 and some in variant 2. Similarly, some I2C APIs take the address in variant 1 and some in variant 2.
If the device and the API use a different variant, it's the programmer's burden to shift the address.
It creates a lot of confusion and is quite annoying. I doubt it will every be clarified.
Sorry for the late reply, I just bumped my head against this myself. This should be considered a bug but ST refuses to acknowledge it as such. If you research the reference manual for the I2C section, the OAR1 register states that the address is stored in bits 7:1 for 7 bit mode. Bits 0, 8 and 9 are ignored. The HAL routine that sets the address should then shift the 7 LSB's so that bits 6:0 of your address get written to bits 7:1 of the OAR1 register. This doesn't happen. Essentially, because the code was released, it is now a "feature" and not a bug. Another way to look at it is that the address byte that you send to the HAL is left aligned. This is extremely irritating as it is not consistent for 7 and 10 bit addresses.

Question about Message Signaled Interrupts (MSI) on x86 LAPIC system

Hi I'm writing a kernel and plan to use MSI interrupt for PCI devices.
However, I'm also quite confused by the documentations.
My understanding about MSI are as follow:
From PCI device point of view:
Documentations indicate that I
need to find Capabillty ID = 0x05 to locate 3 registers: Message control (MCR), Message Address (MAR) and Message Data (MDR) registers
MCR provide control functionality for MSI interrupt,
MAR provide the physical address the PCI device
will write once interrupt occurs
MDR forms out the actual data it will write into the physical address
From CPU point of view:
Documentation shows that Message Address register contains fixed top of 0xFEE, and following by destination ID (LAPIC ID) and other controlling bits as follow:
The Message Data register will contain the following information, including the interrupt vector:
After reading all of these, I am thinking if the APIC_ID is 0x0h would the Message Address conflict with the Local APIC memory mapping? Although the address of FEE00000~FEE00010 are reserved.
In addition, is it true that the vector number in MDR is corresponding to the IDT vector number. In other words, if I put MAR = 0xFEE0000C (Destination ID = 0, Using logical APIC ID) and MDR = 0x0032 (edge trigger, Vector = 50) and enable the MSI interrupt, then once the device issues an interrupt CPU would correspondingly run the function pointed by IDT[50]? After that I write 0h to EOI register to end it?
Finally, according to the documentation, the upper 32 bit of MAR is not used? Can anyone help on this?
Thanks a lot!
Your understanding of how to detect and program MSI in a PCI (or PCIe) device is correct.*
The message address controls the destination (which CPU the interrupt is sent to), while the message data contains the vector number. For normal interrupts, all bits of the message data should be 0 except for the low 8 bits, which contain the vector.
The vector is an index into the IDT, so if the message data is 0x0032, the interrupt is delivered through entry 50 of the IDT.**
If the Destination ID in an interrupt message is 0, the Message Address of the MSI does match the default address of the local APIC, but they do not conflict, because the APIC can only be written by the CPU and MSIs can only be written by devices.
On x86 platforms, the upper 32 bits of the message address must be 0. This can be done by setting the upper part of the message address to 0 or by programming the device to use a 32-bit message address (in which case the upper message address register is not used). The PCI spec was designed to work with systems where 64-bit MSI addresses are used, but x86 systems never use the upper 32 bits of the message address.
Reprogramming the APIC base address by writing to the APIC_BASE MSR does not affect the address range used for MSI; it is always 0xFEExxxxx.
* You should also look at the MSI-X capability, because some devices support MSI-X but not MSI. MSI-X is a bit more flexible, which inevitably makes it a bit more complicated.
** When using the MSI capability, the message data isn't exactly the value in the Message Data Register (MDR). The MSI capability allows the device to use several contiguous vectors. When the device sends an interrupt message, it replaces the low bits of the MDR with a different value depending on the interrupt cause within the device.

How to boot STM32 from user flash?

STM32 microcontrollers are capable to boot from different sources such as:
User-Flash
System-memory
Embedded-SRAM.
On the firmware side, does "boot from user Flash" means executing a custom bootloader?
No.
The "Boot from User Flash" mode means that the application code that will be run after reset is located in user flash memory. The user flash memory in that mode is aliased to start at address 0x00000000 in boot memory space. Upon reset, the top-of-stack value is fetched from address 0x00000000, and code then begins execution at address 0x00000004.
In contrast, the "Boot from System Memory" mode simply means that the system memory (not the user flash) is now aliased to start at address 0x00000000. The application code in this case must have already been loaded into system memory.
The "Boot from Embedded SRAM" mode does not alias the SRAM address. When this mode is selected, the device expects the vector table to have been relocated using the NVIC exception table and offset register, and execution begins at the start of embedded SRAM. The application code in this case must have already been loaded into embedded SRAM.
For more details, refer to the Reference Manual for the specific family of STM32 devices you are using, in the section titled "Boot Configuration".
It depends.
The option "Boot from user Flash" will run whatever is in the user flash. This code could be anything you want. It could be:
a custom bootloader or
it could be application code.
The distinction is a logical one based on what you write your code to do. i.e. it is even possible to have an app that rewrites part or all of itself (what you call it is up to you).
If you do use a custom bootloader then you will normally have two projects.
The custom bootloader will have its own projects with its own generated binary image loaded into the start of flash.
The application code also having its own project and generated .bin file.
The application is loaded into a specific address that the bootloader knows about when it was built (in some free statically allocated memory block - normally on page boundaries to allow for reflashing while keeping the bootloader unchanged). These addresses can be configured in linker files.
Custom bootloaders are a great way of allowing features like:
A custom coms stack, to reflash your application via some other coms protocol.
Dual redundancy of the application code.
Note the STM32 micros "System Memory" already contains ST's own bootloader which supports a range of serial links to reflash the micro without needing a fully custom bootloader.
ST's documentation is stupid. They mention this, but they don't also tell that you need to only configure BOOT0 & BOOT1 pins in order to select the boot source.
So these two pins binary speaking give you three options that are:
(A) BOOT0 = 0, BOOT1 = whatever,
(B) BOOT0 = 1, BOOT1 = 0 and
(C) BOOT0 = 1, BOOT1 = 1.
Option A boots in FLASH, option B boots in bootloader and option C boots in SRAM.
So how to use this. First choose configuration B to start communicating with the bootloader to whom you instruct (in bootloader's own commands) to erase FLASH and upload your program inside FLASH. Then you change the configuration to A and reboot using MCU's RESET pin.
Note:
I perfer the NXP's documentation which has everything black on white.
As a developper I value my time and good documentation is a way to go.
Ditch ST and go for NXP if documentation is what you value. Here is NXP's sample documentation:
https://www.nxp.com/docs/en/user-guide/UM10562.pdf
I am really sorry ST, but you have to do better than what you are currently doing...
STM MCUs are a bit stupid in this regard. You need to configure the Option bytes in Flash so that your MCU will always run code from the user Flash after reset.
Run this code at the beginning of your main:
if ((FLASH->OPTR & FLASH_OPTR_nSWBOOT0_Msk) != 0x0 || ((FLASH->OPTR & FLASH_OPTR_nBOOT0_Msk) == 0x0))
{
while ((FLASH->SR & FLASH_SR_BSY_Msk) != 0x0) { ; }
FLASH->KEYR = 0x45670123;
while ((FLASH->SR & FLASH_SR_BSY_Msk) != 0x0) { ; }
FLASH->KEYR = 0xCDEF89AB;
while ((FLASH->SR & FLASH_SR_BSY_Msk) != 0x0) { ; }
FLASH->OPTKEYR = 0x08192A3B;
while ((FLASH->SR & FLASH_SR_BSY_Msk) != 0x0) { ; }
FLASH->OPTKEYR = 0x4C5D6E7F;
while ((FLASH->SR & FLASH_SR_BSY_Msk) != 0x0) { ; }
FLASH->OPTR = (FLASH->OPTR & ~(FLASH_OPTR_nSWBOOT0_Msk)) | FLASH_OPTR_nBOOT0_Msk;
while ((FLASH->SR & FLASH_SR_BSY_Msk) != 0x0) { ; }
FLASH->CR = FLASH->CR | FLASH_CR_OPTSTRT;
while ((FLASH->SR & FLASH_SR_BSY_Msk) != 0x0) { ; }
}

Can you access PCI cards(32 bits) in "real mode"?

Can you access PCI cards(32 bits) in "real mode" ? Isn't "real mode" 16 bits? I have a developer claiming he can only access hardware in Real mode. But PCI is 32bits...
Yes, you can.
IO ports 0xCF8 and 0xCF9 act as index and data registers for accessing PCI Config space. The address to be written to index register (i.e. 0xCF8) has a fixed predefined format (refer PCI spec). To access pci config data, one writes to index register and then reads from data register.
The Index register is a DWORD (32-bit) register and the format is:
Byte-3 = 0x80
Byte-2 = Bus No
Byte-1 = Upper 5 bits as DEVICE no, and lower 3 bits as FUNCTION no.
Byte-0 = Register no. to read from config space
So to read from Bus:0 Device:0 Func:0 register:0 in real mode, you would say:
IoPortWrite32(0xCF8, 0x80000000);
ValueRead = IoPortRead32(0xCFC);
Hope this helps!
Thanks,
Rohit