Cortex-A5 unaligned access exception - cortex-a

I have some question for Cortex-A5 unaligned access exception
Basic System information blow
I and D cache enabled.
Disabled MMU.
Firmware base
In developing the DMA driver code I wrote the following C code.
UINT32 DMA_InstMOV(UINT8 *buf, tENC_RD rd, UINT32 val)
{
buf[0] = CMD_DMAMOV;
buf[1] = rd;
*((UINT32 *)&buf[2]) = val; // this line is exception occur
return SIZE_DMAMOV;
}
Dissamber the code above to check them as follows
DMA_InstMOV
0x00000bf8: e1a03000 .0.. MOV r3,r0
0x00000bfc: e3a000bc .... MOV r0,#0xbc
0x00000c00: e5c30000 .... STRB r0,[r3,#0]
0x00000c04: e5c31001 .... STRB r1,[r3,#1]
0x00000c08: e5832002 . .. STR r2,[r3,#2]
0x00000c0c: e3a00006 .... MOV r0,#6
0x00000c10: e12fff1e ../. BX lr
R3 Value is 0x08040000
STR instruction is executed with unaligned address Exception(Data Abort) occurs.
Cortex A5 is not support unaligned access?
In DDI0406C_b_arm_architecture_reference_manual.pdf(Table A3-1 Alignment requirements of load/store instructions)
LDM, STR is not support unaligned access.
So Data Exception occurs.
But I still have some question
This drivers code is working good in Cortex-R4 core. It didn’t have any problem.
Disassebly code is same.
This is even more confusing
Many linux drivers also use the above code.
If the MMU is turned on, which would solve this problem?
Let’s me know what’s worng for me?

You need to enable MMU for making an unaligned access in cortex A5. Also make sure the bitfield "A" in SCTLR register is set to 0 to ensure that strict alignment fault is disabled

As I see it, you can't do what you are trying if buf is 32-bit aligned.
The compiler doesn't know how buf is aligned, so it assumes you know what you are doing. If you ask for a 32-bit write (by doing something like *((UINT32 *)&buf[2])) then the compilers assumes it is a valid thing to do. It therefore generates a STR instruction, which is only valid for aligned stores. Hence the fault - as buf is 32-bit aligned (as you state), buf[2] is clearly not.
I have no idea why the Cortex-R4 experience should be different, as far as I can tell it operates with the same instruction set and alignment rules as the A5 (but I could be wrong). Maybe you got lucky and your bufs were unaligned such that buf[2] was 32-bit aligned.

Related

QSPI connection on STM32 microcontrollers with other peripherals instead of Flash memories

I will start a project which needs a QSPI protocol. The component I will use is a 16-bit ADC which supports QSPI with all combinations of clock phase and polarity. Unfortunately, I couldn't find a source on the internet that points to QSPI on STM32, which works with other components rather than Flash memories. Now, my question: Can I use STM32's QSPI protocol to communicate with other devices that support QSPI? Or is it just configured to be used for memories?
The ADC component I want to use is: ADS9224R (16-bit, 3MSPS)
Here is the image of the datasheet that illustrates this device supports the full QSPI protocol.
Many thanks
page 33 of the datasheet
The STM32 QSPI can work in several modes. The Memory Mapped mode is specifically designed for memories. The Indirect mode however can be used for any peripheral. In this mode you can specify the format of the commands that are exchanged: presence of an instruction, of an adress, of data, etc...
See register QUADSPI_CCR.
QUADSPI supports indirect mode, where for each data transaction you manually specify command, number of bytes in address part, number of data bytes, number of lines used for each part of the communication and so on. Don't know whether HAL supports all of that, it would probably be more efficient to work directly with QUADSPI registers - there are simply too many levers and controls you need to set up, and if the library is missing something, things may not work as you want, and QUADSPI is pretty unpleasant to debug. Luckily, after initial setup, you probably won't need to change very much in its settings.
In fact, some time ago, when I was learning QUADSPI, I wrote my own indirect read/write for QUADSPI flash. Purely a demo program for myself. With a bit of tweaking it shouldn't be hard to adapt it. From my personal experience, QUADSPI is a little hard at first, I spent a pair of weeks debugging it with logic analyzer until I got it to work. Or maybe it was due to my general inexperience.
Below you can find one of my functions, which can be used after initial setup of QUADSPI. Other communication functions are around the same length. You only need to set some settings in a few registers. Be careful with the order of your register manipulations - there is no "start communication" flag/bit/command. Communication starts automatically when you set some parameters in specific registers. This is explicitly stated in the reference manual, QUADSPI section, which was the only documentation I used to write my code. There is surprisingly limited information on QUADSPI available on the Internet, even less with registers.
Here is a piece from my basic example code on registers:
void QSPI_readMemoryBytesQuad(uint32_t address, uint32_t length, uint8_t destination[]) {
while (QUADSPI->SR & QUADSPI_SR_BUSY); //Make sure no operation is going on
QUADSPI->FCR = QUADSPI_FCR_CTOF | QUADSPI_FCR_CSMF | QUADSPI_FCR_CTCF | QUADSPI_FCR_CTEF; // clear all flags
QUADSPI->DLR = length - 1U; //Set number of bytes to read
QUADSPI->CR = (QUADSPI->CR & ~(QUADSPI_CR_FTHRES)) | (0x00 << QUADSPI_CR_FTHRES_Pos); //Set FIFO threshold to 1
/*
* Set communication configuration register
* Functional mode: Indirect read
* Data mode: 4 Lines
* Instruction mode: 4 Lines
* Address mode: 4 Lines
* Address size: 24 Bits
* Dummy cycles: 6 Cycles
* Instruction: Quad Output Fast Read
*
* Set 24-bit Address
*
*/
QUADSPI->CCR =
(QSPI_FMODE_INDIRECT_READ << QUADSPI_CCR_FMODE_Pos) |
(QIO_QUAD << QUADSPI_CCR_DMODE_Pos) |
(QIO_QUAD << QUADSPI_CCR_IMODE_Pos) |
(QIO_QUAD << QUADSPI_CCR_ADMODE_Pos) |
(QSPI_ADSIZE_24 << QUADSPI_CCR_ADSIZE_Pos) |
(0x06 << QUADSPI_CCR_DCYC_Pos) |
(MT25QL128ABA1EW9_COMMAND_QUAD_OUTPUT_FAST_READ << QUADSPI_CCR_INSTRUCTION_Pos);
QUADSPI->AR = (0xFFFFFF) & address;
/* ---------- Communication Starts Automatically ----------*/
while (QUADSPI->SR & QUADSPI_SR_BUSY) {
if (QUADSPI->SR & QUADSPI_SR_FTF) {
*destination = *((uint8_t*) &(QUADSPI->DR)); //Read a byte from data register, byte access
destination++;
}
}
QUADSPI->FCR = QUADSPI_FCR_CTOF | QUADSPI_FCR_CSMF | QUADSPI_FCR_CTCF | QUADSPI_FCR_CTEF; //Clear flags
}
It is a little crude, but it may be a good starting point for you, and it's well-tested and definitely works. You can find all my functions here (GitHub). Combine it with reading the QUADSPI section of the reference manual, and you should start to get a grasp of how to make it work.
Your job will be to determine what kind of commands and in what format you need to send to your QSPI slave device. That information is available in the device's datasheet. Make sure you send command and address and every other part on the correct number of QUADSPI lines. For example, sometimes you need to have command on 1 line and data on all 4, all in the same transaction. Make sure you set dummy cycles, if they are required for some operation. Pay special attention at how you read data that you receive via QUADSPI. You can read it in 32-bit words at once (if incoming data is a whole number of 32-bit words). In my case - in the function provided here - I read it by individual bytes, hence such a scary looking *destination = *((uint8_t*) &(QUADSPI->DR));, where I take an address of the data register, cast it to pointer to uint8_t and dereference it. Otherwise, if you read DR just as QUADSPI->DR, your MCU reads 32-bit word for every byte that arrives, and QUADSPI goes crazy and hangs and shows various errors and triggers FIFO threshold flags and stuff. Just be mindful of how you read that register.

How to loop through all page table entries of a process in xv6?

I'm trying to loop through all pages for a process in xv6.
I've looked at this diagram to understand how it works:
but my code is getting:
unexpected trap 14 from cpu 0 eip 801045ff (cr2=0xdfbb000)
Code:
pde_t * physPgDir = (void *)V2P(p->pgdir);
int c = 0;
for(unsigned int i =0; i<NPDENTRIES;i++){
pde_t * pde = &physPgDir[i];
if(*pde & PTE_P){//page directory valid entry
int pde_ppn = (int)((PTE_ADDR(*pde)) >> PTXSHIFT);
pte_t * physPgtab = (void *)(PTE_ADDR(*pde));//grab 20 MSB for inner page table phys pointer;
// go through inner page table
for(unsigned int j =0;j<NPDENTRIES;j++){
pte_t * pte = &physPgtab[j];
if(*pte & PTE_P){//valid entry
c++;
unsigned int pte_ppn = (PTE_ADDR(*pte)) >> PTXSHIFT;//grab 20 MSB for inner page table phys pointer;
//do thing
}
}
}
}
This is in some custom function in proc.c that gets called elsewhere. p is a process pointer.
As far as I understand, cr3 contains the physical address of the current process. However in my case I need to get the page table for a given process pointer. The xv6 code seems to load V2P(p->pgdir) in cr3. That is why I tried to get V2P(p->pgdir). However,the trap happens before pde is dereferenced. Which means there is a problem there. Am I not supposed to use the physical address?
Edit: As Brendan answered, the virtual address p->pgdir is what should be dereferenced. Additionally, the PPN from the Page directory should also be converted via P2V to properly dereference into the page table.
If someone else also gets confused about this aspect of xv6 in the future, I hope that helps.
When dealing with paging the golden rule is "never store physical addresses in any kind of pointer". The reasons are:
a) They aren't virtual addresses and can't be dereferenced, so it's better to make bugs obvious by ensuring you get compile time errors if you try to use a physical address as a pointer.
b) In some cases physical addresses are a different size to virtual addresses (e.g. "PAE paging" in 80x86 where virtual addresses are still 32-bit but physical addresses are potentially up to 52 bits); and it's better (for portability - e.g. so that PAE support can be added to XV6 easier at some point).
With this in mind your first line of code is an obvious bug (it breaks the "golden rule"). It should either be pde_t physPgDir = V2P(p->pgdir); or pde_t * pgDir = p->pgdir;. I'll let you figure out which (as I suspect it's homework, and I'm confident that by adhering to the "golden rule" you'll solve your own problem).

arm64 ptrace SINGLESTEP: are the steps described in this paper correct?

I was reading the paper Hiding in the Shadows: Empowering ARM for Stealthy Virtual Machine Introspection and I was wondering whether the steps they are described in paragraph "2.3 Debug Exceptions" were correct or not:
AArch64 allows to generate Software Step exceptions by setting the SS
bit of the Monitor Debug System Control MDSCR_EL1 and Saved Program
Status Register SPSR of the target exception level. For instance, to
single-step a hit breakpoint in EL1 the monitor must set the
MDSCR_EL1.SS and SPSR_EL1.SS bits. After returning to the trapped
instruction, the SPSR will be written to the process state PSTATE
register in EL1. Consequently, the CPU executes the next instruction
and generates a Software Step exception.
I have tried to understand how single-stepping happens in freeBSD, and I am noticing a mismatch.
I am basing the following lines of code to the release 12.3.0 of freeBSD (4 December 2021), commit: 70cb68e7a00ac0310a2d0ca428c1d5018e6d39e1. I chose to base this question on freeBSD because, in my opinion, following its code is easier than Linux, but the same principles shall be common to both families.
According to my understanding, this is what happens in freeBSD:
1- Ptrace single step is invoked, arriving in the architecture-independent code proc_sstep(), in sys_process.c:
int proc_sstep(struct thread *td)
{
PROC_ACTION(ptrace_single_step(td));
}
2- Architecture-dependent code ptrace_single_step()is called, in arm64/ptrace_machdep.c:
int ptrace_single_step(struct thread *td)
{
td->td_frame->tf_spsr |= PSR_SS;
td->td_pcb->pcb_flags |= PCB_SINGLE_STEP;
return (0);
}
Here single step bit (number 21) is set in the "Process State" of the tracee (tracee = thread that is traced) and a flag is set.
3- After a while, the traced task will be selected for scheduling. In cpu_throw() of swtch.S (where the new thread takes place), the flags of the new thread are checked, to see if it must single step:
/* If we are single stepping, enable it */
ldr w5, [x4, #PCB_FLAGS]
set_step_flag w5, x6
4- set_step_flag macro in defined in the same swtch.S:
.macro set_step_flag pcbflags, tmp
tbz \pcbflags, #PCB_SINGLE_STEP_SHIFT, 999f
mrs \tmp, mdscr_el1
orr \tmp, \tmp, #1
msr mdscr_el1, \tmp
isb
999:
.endm
Here, if the single-step flag is set, it sets the single step bit of register MDSCR_EL1 (bit in position 0).
4- To the best of my understanding, the combination of single step bit on SPSR_EL1 of the "Pstate" + single step bit on MDSCRL_EL1 implies that the tracee execute 1 instruction and it traps.
5- Trap is recognized as a EXCP_SOFTSTP_EL0 and it is handled in do_el0_sync() function of trap.c:
case EXCP_SOFTSTP_EL0:
td->td_frame->tf_spsr &= ~PSR_SS;
td->td_pcb->pcb_flags &= ~PCB_SINGLE_STEP;
WRITE_SPECIALREG(mdscr_el1,
READ_SPECIALREG(mdscr_el1) & ~DBG_MDSCR_SS);
call_trapsignal(td, SIGTRAP, TRAP_TRACE,
(void *)frame->tf_elr, exception);
userret(td, frame);
break;
Here, all the flags are reset and the traced thread receives a SIGTRAP (sent by itself, I think). Being traced, it will stop. And the tracer, at this point, can return from a possible waitpid().
What I could observe differs from the paper explanation. Can you check and correct the steps that I listed, please ?

How can I change the start address on flash?

I'm using STM32F746ZG and FreeRTOS.
The start address of flash is 0x08000000. But I want to change it to 0x08040000. I've searched this issue through google but I didn't find the solution.
I changed the linker script like the following.
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 320K
/* FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 1024K */
FLASH (rx) : ORIGIN = 0x8040000, LENGTH = 768K
}
If I only change it and run the debugger, it has the problem.
If I change the VECT_TAB_OFFSET from 0x00 to 0x4000, it works fine.
/* #define VECT_TAB_SRAM */
#define VECT_TAB_OFFSET 0x40000 /* 0x00 */
SCB->VTOR = FLASH_BASE | VECT_TAB_OFFSET;
But if I don't use debugger, it doesn't work anything.
It means it only works when using ST-Linker.
Please let me know if you know the solution.
Thank you for in advance of your reply.
The boot address can be set in the option bytes.
You can set any address in the flash with 16k increments. There are two 16 bit registers in the option bytes area, one is used when the boot pin is low at reset, the other when the pin is high. Write the desired address shifted right by 14 bits, i.e. divided by 16384.
To boot from 0x08040000, write 0x2010 into the register as described in the Option bytes programming chapter of the reference manual.
You could also write a bootloader. Bootloader sits on the 0x0800 0000 address and loads your application firmware meaning jumps to it.
This is the other way to do it.
You need to place 8 bytes at the original beginning of the FLASH. Stm32 boots always from the address 0x00000000 which is aliased to the one of the memories (depending on the boot pins and options).
The first word contains the stack pointer the second one your reset handler. You never get to your code as it boots always from the same address.
You will need to modify your linker script and the startup files where vectors are defined

STM32L073RZ (rev Z) IAP jump to bootloader (system memory)

I use the STM32L073RZ (Nucleo 64 board).
I would like to jump into the system memory in application programming (IAP).
My code works on the revision B of the STM32L073 microcontroller but fails on the latest revision, rev Z.
I read the errata sheet, no details are given, just a limitation fixed on the dual boot mechanism into system memory according to the BFB2 bit.
Is the system memory no longer supports an IAP jumping to execute its code (to flash firmwares through USB or UART without using the BOOT0 pin) ?
The function is the first line of my main program, it tests if the code has to jump to the booloader:
void jumpBootLoader(void)
{
/* to do jump? */
if ( *((unsigned long *)0x20003FF0) == 0xDEADBEEF )
{
/* erase the label */
*((unsigned long *)0x20003FF0) = 0xCAFEFEED;
/* set stack pointer to the bootloader start address */
__set_MSP(*((uint32_t*)(0x1FF00000)));
/* system memory mapped at 0x00000000 */
__HAL_SYSCFG_REMAPMEMORY_SYSTEMFLASH();
/* jump to #bootloader + 4 */
((void (*)(void))(*((uint32_t*)(0x1FF00004))))();
}
}
I call these two lines as soon as the BP1 button is pressed to trig the jump operation after resetting the µC:
*((unsigned long *)0x20003FF0) = 0xDEADBEEF;
NVIC_SystemReset();
I use the HSI 16Mhz clock source.
The solution is to jump twice to the system memory.
First Jump to bootloader startup to initialize Data in RAM until the Program counter will returned to Flash by the Dualbank management.
Second Jump: Jump to the Dualbank bypassed address
How to use: User has first to initialize a variable “ Data_Address” (must be an offset Flash sector aligned address) in Flash to distinguish between first/second Jump.
EraseInitStruct.TypeErase = FLASH_TYPEERASE_PAGES;
EraseInitStruct.PageAddress = Data_Address;
EraseInitStruct.NbPages = 1;
First_jump = *(__IO uint32_t *)(Data_Address);
if (First_jump == 0) {
HAL_FLASH_Unlock();
HAL_FLASH_Program(FLASH_TYPEPROGRAM_WORD, Data_Address, 0xAAAAAAAA);
HAL_FLASH_Lock();
/* Reinitialize the Stack pointer and jump to application address */
JumpAddress = *(__IO uint32_t *)(0x1FF00004);
}
if (First_jump != 0) {
HAL_FLASH_Unlock();
HAL_FLASHEx_Erase(&EraseInitStruct, &PAGEError);
HAL_FLASH_Lock();
/* Reinitialize the Stack pointer and jump to application address */
JumpAddress = (0x1FF00369);
}
Jump_To_Application = (pFunction) JumpAddress;
__set_MSP(*(__IO uint32_t *)(0x1FF00000));
Jump_To_Application();
First important thing: you use 0x1FF0 0000 as the addres where SP is stored, this is correct. Then you use 0x1 FF00 0004 as the address from which you load the function pointer. This is not correct - one zero too many.
Note that using __set_MSP() is generally not such a good idea if you also use MSP as your stack pointer (which you most likely are). The recent definition of this function, which marks "sp" as clobbered register, causes your change to be reverted almost immediately. Incidentally today I was doing exactly the same thing you are doing and I've found that problem. In your assembly listing you'll see that SP is saved into some other register before the msr msp, ... instruction and restored right after that.
Finally I wrote that manually (STM32F4, so different addresses):
constexpr uint32_t systemMemoryBase {0x1fff0000};
asm volatile
(
" msr msp, %[sp] \n"
" bx %[pc] \n"
:: [sp] "r" (*reinterpret_cast<const uint32_t*>(systemMemoryBase)),
[pc] "r" (*reinterpret_cast<const uint32_t*>(systemMemoryBase + 4))
);
BTW - you don't need to set memory remap for the bootloader to work.
Thanks for your help. I have my answer !
The v4.0 bootloader (initial version) does not implement the dual bank mechanism but this feature is supported by v4.1.
Software can jump to bootloader but it will execute the dual boot mechanism.
So the bootloader goes back to bank1 (or bank2 if a code is "valid").
Today it is not possible to bypass the dual bank mechanism to execute the bootloader with my configuration:
The boot0 pin is reset and the protection level is 0 (see "Table 11. Boot pin and BFB2 bit configuration" in the reference manual).
Where is your program counter when you call __HAL_SYSCFG_REMAPMEMORY_SYSTEMFLASH()?
Remapping a memory region while you're executing out of that same region will end poorly! You may need to relocate this code into SRAM, or execute this code with PC set to the fixed FLASH memory mapping (0x0800xxxx).