Understanding STMicro Reset Handler example code for STM32 - stm32

The example Reset Handler code provided by STMicro for STM32 (in my case it is for STM32H753) is the following:
Reset_Handler:
ldr sp, =_estack
movs r1, #0
b LoopCopyDataInit
...
I don't understand the first instruction, that sets the stack pointer.
Indeed the Vector Table is defined as follows:
This means that the Stack Pointer is set by the CPU from the first word in the Vector Table. This is confirmed by debug (when breaking before executing the very first instruction of the Reset Handler, the SP is set properly).
Is there a reason to keep this instruction ldr sp, =_estack in thr Reset Handler ?

The Vector table contains on its first position the initial stack address. But the programmer might want to set another value to it or set up the double stack.
In the linker script you have:
_estack = address ;
and in the very simple startup file:
g_pfnVectors:
.word _estack
.word Reset_Handler
but you can change those values to be different or the ResetHandler is called from the bootloader. Then you need to set the stack pointer to the correct value.

Related

arm64 ptrace SINGLESTEP: are the steps described in this paper correct?

I was reading the paper Hiding in the Shadows: Empowering ARM for Stealthy Virtual Machine Introspection and I was wondering whether the steps they are described in paragraph "2.3 Debug Exceptions" were correct or not:
AArch64 allows to generate Software Step exceptions by setting the SS
bit of the Monitor Debug System Control MDSCR_EL1 and Saved Program
Status Register SPSR of the target exception level. For instance, to
single-step a hit breakpoint in EL1 the monitor must set the
MDSCR_EL1.SS and SPSR_EL1.SS bits. After returning to the trapped
instruction, the SPSR will be written to the process state PSTATE
register in EL1. Consequently, the CPU executes the next instruction
and generates a Software Step exception.
I have tried to understand how single-stepping happens in freeBSD, and I am noticing a mismatch.
I am basing the following lines of code to the release 12.3.0 of freeBSD (4 December 2021), commit: 70cb68e7a00ac0310a2d0ca428c1d5018e6d39e1. I chose to base this question on freeBSD because, in my opinion, following its code is easier than Linux, but the same principles shall be common to both families.
According to my understanding, this is what happens in freeBSD:
1- Ptrace single step is invoked, arriving in the architecture-independent code proc_sstep(), in sys_process.c:
int proc_sstep(struct thread *td)
{
PROC_ACTION(ptrace_single_step(td));
}
2- Architecture-dependent code ptrace_single_step()is called, in arm64/ptrace_machdep.c:
int ptrace_single_step(struct thread *td)
{
td->td_frame->tf_spsr |= PSR_SS;
td->td_pcb->pcb_flags |= PCB_SINGLE_STEP;
return (0);
}
Here single step bit (number 21) is set in the "Process State" of the tracee (tracee = thread that is traced) and a flag is set.
3- After a while, the traced task will be selected for scheduling. In cpu_throw() of swtch.S (where the new thread takes place), the flags of the new thread are checked, to see if it must single step:
/* If we are single stepping, enable it */
ldr w5, [x4, #PCB_FLAGS]
set_step_flag w5, x6
4- set_step_flag macro in defined in the same swtch.S:
.macro set_step_flag pcbflags, tmp
tbz \pcbflags, #PCB_SINGLE_STEP_SHIFT, 999f
mrs \tmp, mdscr_el1
orr \tmp, \tmp, #1
msr mdscr_el1, \tmp
isb
999:
.endm
Here, if the single-step flag is set, it sets the single step bit of register MDSCR_EL1 (bit in position 0).
4- To the best of my understanding, the combination of single step bit on SPSR_EL1 of the "Pstate" + single step bit on MDSCRL_EL1 implies that the tracee execute 1 instruction and it traps.
5- Trap is recognized as a EXCP_SOFTSTP_EL0 and it is handled in do_el0_sync() function of trap.c:
case EXCP_SOFTSTP_EL0:
td->td_frame->tf_spsr &= ~PSR_SS;
td->td_pcb->pcb_flags &= ~PCB_SINGLE_STEP;
WRITE_SPECIALREG(mdscr_el1,
READ_SPECIALREG(mdscr_el1) & ~DBG_MDSCR_SS);
call_trapsignal(td, SIGTRAP, TRAP_TRACE,
(void *)frame->tf_elr, exception);
userret(td, frame);
break;
Here, all the flags are reset and the traced thread receives a SIGTRAP (sent by itself, I think). Being traced, it will stop. And the tracer, at this point, can return from a possible waitpid().
What I could observe differs from the paper explanation. Can you check and correct the steps that I listed, please ?

In Application Programming issue

I'm working on project on STM32L152RCT6, where i have to build a mechanism to self update the code from the newly gated file(HEX file).
For that i have implemented such mechanism like boot loader where it checks for the new firmware if there it it has to cross verify and if found valid it has to store on "Application location".
I'm taking following steps.
Boot loader address = 0x08000000
Application address = 0x08008000
Somewhere on specified location it has to check for new file through Boot loader program.
If found valid it has to be copy all the HEX on location(as per the guide).
Than running the application code through jump on that location.
Now problem comes from step 5, all the above steps I've done even storing of data has been done properly(verify in STM32 utility), but when i'm jump to the application code it won't work.
Is there i have to cross check or something i'm missing?
Unlike other ARM controllers that directly jump to address 0 at reset, the Cortex-M series takes the start address from a vector table. If the program is loaded directly (without a bootloader), the vector table is at the start of the binary (loaded or mapped to address 0). First entry at offset 0 is the initial value of the stack pointer, second entry at address 4 is called the reset vector, it contains the address of the first instruction to be executed.
Programs loaded with a bootloader usually preserve this arrangement, and put the vector table at the start of the binary, 0x08008000 in your case. Then the reset vector would be at 0x08008004. But it's your application, you should check where did you put your vector table. Hint: look at the .map file generated by the linker to be sure. If it's indeed at 0x08008000, then you can transfer control to the application reset vector so:
void (*app)(void); // declare a pointer to a function
app = *(void (**)(void))0x08008004; // see below
app(); // invoke the function through the pointer
The complicated cast in the second line converts the physical address to a pointer to a pointer to a function, takes the value pointed to it, which is now a pointer to a function, and assigns it to app.
Then you should manage the switchover to the application vector table. You can do it either in the bootloader or in the application, or divide the steps between them.
Disable all interrupts and stop SysTick. Note that SysTick is not an interrupt, don't call NVIC_DisableIRQ() on it. I'd do this step in the bootloader, so it gets responsible to disable whatever it has enabled.
Assign the new vector table address to SCB->VTOR. Beware that the boilerplate SystemInit() function in system_stm32l1xx.c unconditionally changes SCB->VTOR back to the start of the flash, i.e. to 0x08000000, you should edit it to use the proper offset.
You can load the stack pointer value from the vector table too, but it's tricky to do it properly, and not really necessary, the application can just continue to use the stack that was set up in the bootloader. Just check it to make sure it's reasonable.
Have you changed the application according to the new falsh position?
For example the Vector Table has to be set correctl via
SCB->VTOR = ...
When your bootloader starts the app it has to configure everything back to the reset state as the application may relay on the default reset values. Espessially you need to:
Return values of all hardware registers to its reset values
Switch off all peripheral clocks (do not forget about the SysTick)
Disable all enabled interrupts
Return all clock domains to its reset values.
Set the vector table address
Load the stack pointer from the beginning of the APP vector table.
Call the APP entry point.(vertor table start + 4)
Your app has to be compiled and linked using the custom linker script where the FLASH start point is 0x8008000
for example:
FLASH (rx) : ORIGIN = 0x8000000 + 32K, LENGTH = 512K - 32K
SCB->VTOR = FLASH_BASE | VECT_TAB_OFFSET;
where FLASH_BASE's value must be equal to the address of your IROM's value in KEIL
example:
#define FLASH_BASE 0x08004000
Keil configuration

STM32L073RZ (rev Z) IAP jump to bootloader (system memory)

I use the STM32L073RZ (Nucleo 64 board).
I would like to jump into the system memory in application programming (IAP).
My code works on the revision B of the STM32L073 microcontroller but fails on the latest revision, rev Z.
I read the errata sheet, no details are given, just a limitation fixed on the dual boot mechanism into system memory according to the BFB2 bit.
Is the system memory no longer supports an IAP jumping to execute its code (to flash firmwares through USB or UART without using the BOOT0 pin) ?
The function is the first line of my main program, it tests if the code has to jump to the booloader:
void jumpBootLoader(void)
{
/* to do jump? */
if ( *((unsigned long *)0x20003FF0) == 0xDEADBEEF )
{
/* erase the label */
*((unsigned long *)0x20003FF0) = 0xCAFEFEED;
/* set stack pointer to the bootloader start address */
__set_MSP(*((uint32_t*)(0x1FF00000)));
/* system memory mapped at 0x00000000 */
__HAL_SYSCFG_REMAPMEMORY_SYSTEMFLASH();
/* jump to #bootloader + 4 */
((void (*)(void))(*((uint32_t*)(0x1FF00004))))();
}
}
I call these two lines as soon as the BP1 button is pressed to trig the jump operation after resetting the µC:
*((unsigned long *)0x20003FF0) = 0xDEADBEEF;
NVIC_SystemReset();
I use the HSI 16Mhz clock source.
The solution is to jump twice to the system memory.
First Jump to bootloader startup to initialize Data in RAM until the Program counter will returned to Flash by the Dualbank management.
Second Jump: Jump to the Dualbank bypassed address
How to use: User has first to initialize a variable “ Data_Address” (must be an offset Flash sector aligned address) in Flash to distinguish between first/second Jump.
EraseInitStruct.TypeErase = FLASH_TYPEERASE_PAGES;
EraseInitStruct.PageAddress = Data_Address;
EraseInitStruct.NbPages = 1;
First_jump = *(__IO uint32_t *)(Data_Address);
if (First_jump == 0) {
HAL_FLASH_Unlock();
HAL_FLASH_Program(FLASH_TYPEPROGRAM_WORD, Data_Address, 0xAAAAAAAA);
HAL_FLASH_Lock();
/* Reinitialize the Stack pointer and jump to application address */
JumpAddress = *(__IO uint32_t *)(0x1FF00004);
}
if (First_jump != 0) {
HAL_FLASH_Unlock();
HAL_FLASHEx_Erase(&EraseInitStruct, &PAGEError);
HAL_FLASH_Lock();
/* Reinitialize the Stack pointer and jump to application address */
JumpAddress = (0x1FF00369);
}
Jump_To_Application = (pFunction) JumpAddress;
__set_MSP(*(__IO uint32_t *)(0x1FF00000));
Jump_To_Application();
First important thing: you use 0x1FF0 0000 as the addres where SP is stored, this is correct. Then you use 0x1 FF00 0004 as the address from which you load the function pointer. This is not correct - one zero too many.
Note that using __set_MSP() is generally not such a good idea if you also use MSP as your stack pointer (which you most likely are). The recent definition of this function, which marks "sp" as clobbered register, causes your change to be reverted almost immediately. Incidentally today I was doing exactly the same thing you are doing and I've found that problem. In your assembly listing you'll see that SP is saved into some other register before the msr msp, ... instruction and restored right after that.
Finally I wrote that manually (STM32F4, so different addresses):
constexpr uint32_t systemMemoryBase {0x1fff0000};
asm volatile
(
" msr msp, %[sp] \n"
" bx %[pc] \n"
:: [sp] "r" (*reinterpret_cast<const uint32_t*>(systemMemoryBase)),
[pc] "r" (*reinterpret_cast<const uint32_t*>(systemMemoryBase + 4))
);
BTW - you don't need to set memory remap for the bootloader to work.
Thanks for your help. I have my answer !
The v4.0 bootloader (initial version) does not implement the dual bank mechanism but this feature is supported by v4.1.
Software can jump to bootloader but it will execute the dual boot mechanism.
So the bootloader goes back to bank1 (or bank2 if a code is "valid").
Today it is not possible to bypass the dual bank mechanism to execute the bootloader with my configuration:
The boot0 pin is reset and the protection level is 0 (see "Table 11. Boot pin and BFB2 bit configuration" in the reference manual).
Where is your program counter when you call __HAL_SYSCFG_REMAPMEMORY_SYSTEMFLASH()?
Remapping a memory region while you're executing out of that same region will end poorly! You may need to relocate this code into SRAM, or execute this code with PC set to the fixed FLASH memory mapping (0x0800xxxx).

STM32: Booting and fetching vector table from SRAM

I would like to run my program from the SRAM region of the device.
It seemed quite clear to me, that I have to perform following steps:
Modify the vector table offset register SCB->VTOR (located at 0xE000ED08) to point to the beginning of the SRAM region, as that is where my vector table is located: 0x20000000
Reset the device so it fetches the stack pointer initialization value and the reset handler adress again.
Unfortunately, whenever I issue a reset init command in OpenOCD, the value of SCB->VTOR gets cleared. Hence, stack pointer initialization value and reset handler adress are fetched from 0x00000000 instead of 0x20000000.
Question
How do I get my STM32F4 to fetch the vector table from 0x20000000?
Just load the SP (MSP) from 0x20000000 (=VTOR) and the PC from 0x20000004 (=VTOR+4) manually.
The reset init command will usually reset the whole chip and no just the core - and VTOR will be initialized to zero even then.

change local stack variable value

Using Windbg/SOS, it possible to change value of a local varible on stack? If so how?
The short answer is: It depends.
Per default local value types are stored on the stack but due to optimization they will often be stored only in registers as needed. Reference types are stored on the heap, with a reference to the instance on the stack (or in a register).
I am going to assume that you're looking to change a local value type. Let's look at a simple example.
[MethodImpl(MethodImplOptions.NoInlining)] // avoid inlining of short method
public static void Method(int x) {
Console.WriteLine("The answer is {0}", x + x);
}
Assuming we set a breakpoint on Method and run until the breakpoint is hit, the stack looks like this:
0:000> !clrstack -a
OS Thread Id: 0x1abc (0)
Child SP IP Call Site
0035f290 003600e0 TestBench2010.Program.Method(Int32)*** WARNING: Unable to verify checksum for C:\workspaces\TestBench2010\TestBench2010\bin\Release\TestBench2010.exe
[C:\workspaces\TestBench2010\TestBench2010\Program.cs # 17]
PARAMETERS:
x (<CLR reg>) = 0x00000002
0035f294 003600a2 TestBench2010.Program.Main(System.String[]) [C:\workspaces\TestBench2010\TestBench2010\Program.cs # 24]
PARAMETERS:
args = <no data>
0035f4c0 636221bb [GCFrame: 0035f4c0]
Notice that the local x is listed as , but it doesn't tell us which register. We could look at the registers and find the one with the value 2, but there could be more than one. Instead let's look at the JIT compiled code for the method.
0:000> !u 001c37f0
Normal JIT generated code
TestBench2010.Program.Method(Int32)
Begin 003600e0, size 32
C:\workspaces\TestBench2010\TestBench2010\Program.cs # 17:
003600e0 55 push ebp
003600e1 8bec mov ebp,esp
003600e3 56 push esi
003600e4 8bf1 mov esi,ecx
*** WARNING: Unable to verify checksum for C:\windows\assembly\NativeImages_v4.0.30319_32\mscorlib\658bbc023e2f4f4e802be9483e988373\mscorlib.ni.dll
003600e6 b9302be004 mov ecx,offset mscorlib_ni+0x322b30 (04e02b30) (MT: System.Int32)
003600eb e8301fe5ff call 001b2020 (JitHelp: CORINFO_HELP_NEWSFAST)
003600f0 8bd0 mov edx,eax
003600f2 03f6 add esi,esi <==== This is x + x
003600f4 897204 mov dword ptr [edx+4],esi
003600f7 8bf2 mov esi,edx
003600f9 e882709d04 call mscorlib_ni+0x257180 (04d37180)(System.Console.get_Out(), mdToken: 060008cd)
003600fe 56 push esi
003600ff 8bc8 mov ecx,eax
00360101 8b1534204c03 mov edx,dword ptr ds:[34C2034h] ("The answer is {0}")
00360107 8b01 mov eax,dword ptr [ecx]
00360109 8b403c mov eax,dword ptr [eax+3Ch]
0036010c ff5018 call dword ptr [eax+18h]
C:\workspaces\TestBench2010\TestBench2010\Program.cs # 18:
0036010f 5e pop esi
00360110 5d pop ebp
00360111 c3 ret
Looking at the code, we see that the only add instruction uses the esi register, so our value is stored here prior to the calculation. Unfortunately, esi doesn't hold the correct value at this point, but looking backwards we find mov esi,ecx. I.e. the value is initially stored in ecx.
To change the value of ecx use the r command. E.g. to set the value to 0x15 do the following:
0:000> r ecx=15
The output of the method is now:
The answer is 42
Please keep in mind that the example above is only one of many possible scenarios. Locals are handled differently depending on debug/release build as well as 32/64 bit. Also, for complex methods it may be a bit harder tracking the exact location of the value.
To change the state of an instance, you have to locate the reference on the stack (e.g. using !clrstack or !dso). Once located you can use the offsets to find the memory, that holds the data and use the e* commands to change the values as needed. Let me know if you want an example for that as well.