arm64 ptrace SINGLESTEP: are the steps described in this paper correct?

arm64 ptrace SINGLESTEP: are the steps described in this paper correct? - operating-system

I was reading the paper Hiding in the Shadows: Empowering ARM for Stealthy Virtual Machine Introspection and I was wondering whether the steps they are described in paragraph "2.3 Debug Exceptions" were correct or not:
AArch64 allows to generate Software Step exceptions by setting the SS
bit of the Monitor Debug System Control MDSCR_EL1 and Saved Program
Status Register SPSR of the target exception level. For instance, to
single-step a hit breakpoint in EL1 the monitor must set the
MDSCR_EL1.SS and SPSR_EL1.SS bits. After returning to the trapped
instruction, the SPSR will be written to the process state PSTATE
register in EL1. Consequently, the CPU executes the next instruction
and generates a Software Step exception.
I have tried to understand how single-stepping happens in freeBSD, and I am noticing a mismatch.
I am basing the following lines of code to the release 12.3.0 of freeBSD (4 December 2021), commit: 70cb68e7a00ac0310a2d0ca428c1d5018e6d39e1. I chose to base this question on freeBSD because, in my opinion, following its code is easier than Linux, but the same principles shall be common to both families.
According to my understanding, this is what happens in freeBSD:
1- Ptrace single step is invoked, arriving in the architecture-independent code proc_sstep(), in sys_process.c:
int proc_sstep(struct thread *td)
{
PROC_ACTION(ptrace_single_step(td));
}
2- Architecture-dependent code ptrace_single_step()is called, in arm64/ptrace_machdep.c:
int ptrace_single_step(struct thread *td)
{
td->td_frame->tf_spsr |= PSR_SS;
td->td_pcb->pcb_flags |= PCB_SINGLE_STEP;
return (0);
}
Here single step bit (number 21) is set in the "Process State" of the tracee (tracee = thread that is traced) and a flag is set.
3- After a while, the traced task will be selected for scheduling. In cpu_throw() of swtch.S (where the new thread takes place), the flags of the new thread are checked, to see if it must single step:
/* If we are single stepping, enable it */
ldr w5, [x4, #PCB_FLAGS]
set_step_flag w5, x6
4- set_step_flag macro in defined in the same swtch.S:
.macro set_step_flag pcbflags, tmp
tbz \pcbflags, #PCB_SINGLE_STEP_SHIFT, 999f
mrs \tmp, mdscr_el1
orr \tmp, \tmp, #1
msr mdscr_el1, \tmp
isb
999:
.endm
Here, if the single-step flag is set, it sets the single step bit of register MDSCR_EL1 (bit in position 0).
4- To the best of my understanding, the combination of single step bit on SPSR_EL1 of the "Pstate" + single step bit on MDSCRL_EL1 implies that the tracee execute 1 instruction and it traps.
5- Trap is recognized as a EXCP_SOFTSTP_EL0 and it is handled in do_el0_sync() function of trap.c:
case EXCP_SOFTSTP_EL0:
td->td_frame->tf_spsr &= ~PSR_SS;
td->td_pcb->pcb_flags &= ~PCB_SINGLE_STEP;
WRITE_SPECIALREG(mdscr_el1,
READ_SPECIALREG(mdscr_el1) & ~DBG_MDSCR_SS);
call_trapsignal(td, SIGTRAP, TRAP_TRACE,
(void *)frame->tf_elr, exception);
userret(td, frame);
break;
Here, all the flags are reset and the traced thread receives a SIGTRAP (sent by itself, I think). Being traced, it will stop. And the tracer, at this point, can return from a possible waitpid().
What I could observe differs from the paper explanation. Can you check and correct the steps that I listed, please ?

Related

STM32 bootloader failure to erase

I am writing an external bootloader for the STM32F730Z8 - (why? I need one windows code that can run the bootloader for the STM32, or use the STM32 to reprog a connected ATF1508 for my client). I've done this before, using info in AN3155 and AN2606. On lesser CPUs, this has had no difficulty (i.e. STM32L4P5). In this case, I try the same:
1-cycle \RESET & BOOT0 to boot to supervisor mode
2-autobaud successfully
3-send 0x00 to get the list of commands, successfully
4-send 01 to get the version and protection, successfully (vers 49, rp and nt both 0)
5-send 02 to get chip id (0x0452), successfully
6-send 0x73 to write-unprotect flash, successfully (i.e. receive back two ACK)
7-send 0x44 to begin an extended erase (intending only to erase sector 0).
This is where it fails. I get neither ACK nor NACK - it just times out. I don't even get to the second half of the extended-erase command where I send it the sector info. (On the STM32L4P5 it succeeds here easily and goes on to finish erasing, then to write code successfully.)
I've tried very long waits & repeat loops to wait for the ACK (many minutes). From past experience this should be fast, it is only the second stage where I tell it how much flash to erase that takes any significant time.
I've inspected the protection option areas of memory, at 0x1FFF0010, 0018, and they are unprotected, as per factory defaults.
I'm communicating over an FT231XS-R, using the D2XX driver calls. I can mess with the baud rates and such, but that only prevents it from autobauding...and we're doing that fine (9600/8/1/E). I've played with the D2XX SetTimeouts - if set too hasty that only screws up earlier commands. I'm wired to a 20 MHz crystal, and the application runs at 200 MHz, but my understanding is that the bootloader just runs at the internal RC clock rate.
I'm certainly missing something stupid, but I didn't see it in the documentation. Help?
Jeff Casey / Rockfield Research Inc. / Las Vegas, NV

Fixed, disregard.
The fineprint of AN3155 clued me in. On the description of the Write Unprotect command, it says that a system reset will be performed after completion. How did I miss this on the STM32L4P5? I just didn't read it. But why did it work then? In the really fine print next page, in a footnote to the flowchart, it says that they were just foolin'....system reset is only called for some (..list omitted..) and for other STM32 products no system reset is called for.
My earlier success had the following sequence:
reboot-supervisor
autobaud
get
gvrp
gid
wpun
xerase
wpun
write
verify
reboot-user
obviously that doesn't work for the F730. what works is:
reboot-super
autobaud
get
gvrp
gid
wpun
reboot-super
autobaud
get gvrp
gid
xerase
reboot-super
autobaud
get
gvrp
gid
write
verify
reboot-user
(obviously I can skip a few of the repeated steps, like get-id, but basically it needed a reboot and re-autobaud.)
note that i had to reboot-super a 3rd time...this was because the write attempt timed out after the xerase unless i went through the whole sequence again. funny, though, the spec doesn't say anything about resetting after an erase. i cross posted this question on the STM32 community site, and I'll do the same with this answer and ping them on this.
Thanks for reading, cheers. Jeff

what's the application of sequential transmission of I2C in HAL library in STM32f746ng

I can understand that you can use first frame option for first frame and next frame options for others, but since you can use them as FIRS_FRAME_LAST_FRAME, what is the advantage of other? and when we must use them?
Findings:
A code use wile to continuously transmit two number and get a callback to see if module has accepted that, if this happen correctly the led must blink.
In this simple code I've tested every xferoption of sequential transmission, every options worked except: I2C_LAST_FRAME_NO_STOP and I2C_FIRST_FRAME.
Code:
while (1)
{
value=300;
*(uint16_t*) buffer=(value<<8)|(value>>8);//Data prepared for DAC module
HAL_I2C_Master_Seq_Transmit_IT (&hi2c1, (MCP4725A0_ADDR_A00<<1), buffer, 2,I2C_LAST_FRAME_NO_STOP);
HAL_Delay(1);
HAL_I2C_Master_Receive(&hi2c1, (MCP4725A0_ADDR_A00<<1), rxbuffer, 3, 1000);
if( (uint16_t)(((uint16_t)rxbuffer[1])<<8|((uint16_t)rxbuffer[2]))>>4 == value ){
HAL_GPIO_WritePin(LED_GPIO_Port,LED_Pin,GPIO_PIN_SET);}
HAL_Delay(50);
value=4000;
*(uint16_t*) buffer=(value<<8)|(value>>8);
HAL_I2C_Master_Seq_Transmit_IT (&hi2c1, (MCP4725A0_ADDR_A00<<1), buffer, 2,I2C_LAST_FRAME_NO_STOP);
HAL_Delay(1);
HAL_I2C_Master_Receive(&hi2c1, (MCP4725A0_ADDR_A00<<1), rxbuffer, 3, 1000);
if( (uint16_t)(((uint16_t)rxbuffer[1])<<8|((uint16_t)rxbuffer[2]))>>4 == value ){
HAL_GPIO_WritePin(LED_GPIO_Port,LED_Pin,GPIO_PIN_RESET);}
HAL_Delay(50);
}

The HAL sometimes poorly documents these variables functions, and you will need to dive into the reference manual !
Looking at what the #defines are
https://github.com/STMicroelectronics/STM32CubeF7/blob/f8bda023e34ce9935cb4efb9d1c299860137b6f3/Drivers/STM32F7xx_HAL_Driver/Inc/stm32f7xx_hal_i2c.h#L302-L307
/** #defgroup I2C_XFEROPTIONS I2C Sequential Transfer Options
* #{
*/
#define I2C_FIRST_FRAME ((uint32_t)I2C_SOFTEND_MODE)
#define I2C_FIRST_AND_NEXT_FRAME ((uint32_t)(I2C_RELOAD_MODE | I2C_SOFTEND_MODE))
#define I2C_NEXT_FRAME ((uint32_t)(I2C_RELOAD_MODE | I2C_SOFTEND_MODE))
#define I2C_FIRST_AND_LAST_FRAME ((uint32_t)I2C_AUTOEND_MODE)
#define I2C_LAST_FRAME ((uint32_t)I2C_AUTOEND_MODE)
#define I2C_LAST_FRAME_NO_STOP ((uint32_t)I2C_SOFTEND_MODE)
We can see references to RELOAD and AUTOEND and SOFTEND.
Digging into the reference manual
https://www.st.com/resource/en/reference_manual/rm0385-stm32f75xxx-and-stm32f74xxx-advanced-armbased-32bit-mcus-stmicroelectronics.pdf#page=969
So we can see here the reference to
AUTOEND - as a way to automatically implement a STOP condition after the set bytes end
SOFTEND as a way to prevent the automatic STOP condition and require the software to decide.
Relationship to your observed behaviour
The define's using the SOFTEND mode is where you saw things not working, and this is to be expected, the I2C protocol was not being fulfilled as there was nothing in the code to indicate the STOP condition.
So what does this mean you can do - an example of a variable byte i2c slave receiver
I haven't found a shining example from ST for this, but let me illustrate an example I have implemented in a project for an I2C Slave.
Let us look at the callbacks that are called:
https://github.com/STMicroelectronics/STM32CubeF7/blob/master/Drivers/STM32F7xx_HAL_Driver/Src/stm32f7xx_hal_i2c.c#L76-L97
*** Interrupt mode IO operation ***
===================================
[..]
(+) Transmit in master mode an amount of data in non-blocking mode using HAL_I2C_Master_Transmit_IT()
(+) At transmission end of transfer, HAL_I2C_MasterTxCpltCallback() is executed and users can
add their own code by customization of function pointer HAL_I2C_MasterTxCpltCallback()
(+) Receive in master mode an amount of data in non-blocking mode using HAL_I2C_Master_Receive_IT()
(+) At reception end of transfer, HAL_I2C_MasterRxCpltCallback() is executed and users can
add their own code by customization of function pointer HAL_I2C_MasterRxCpltCallback()
(+) Transmit in slave mode an amount of data in non-blocking mode using HAL_I2C_Slave_Transmit_IT()
(+) At transmission end of transfer, HAL_I2C_SlaveTxCpltCallback() is executed and users can
add their own code by customization of function pointer HAL_I2C_SlaveTxCpltCallback()
(+) Receive in slave mode an amount of data in non-blocking mode using HAL_I2C_Slave_Receive_IT()
(+) At reception end of transfer, HAL_I2C_SlaveRxCpltCallback() is executed and users can
add their own code by customization of function pointer HAL_I2C_SlaveRxCpltCallback()
(+) In case of transfer Error, HAL_I2C_ErrorCallback() function is executed and users can
add their own code by customization of function pointer HAL_I2C_ErrorCallback()
(+) Abort a master I2C process communication with Interrupt using HAL_I2C_Master_Abort_IT()
(+) End of abort process, HAL_I2C_AbortCpltCallback() is executed and users can
add their own code by customization of function pointer HAL_I2C_AbortCpltCallback()
(+) Discard a slave I2C process communication using __HAL_I2C_GENERATE_NACK() macro.
This action will inform Master to generate a Stop condition to discard the communication.
Therefore, you could implement a I2C slave that could read a variable/dynamic amount of data:
Receive 1 byte - using the SOFTEND based options
This prevents the stop condition being raised, but once this first byte is received will trigger the HAL_I2C_SlaveRxCpltCallback().
In the HAL_I2C_SlaveRxCpltCallback() check the value of the first byte and then request more data of any further length, but this time using an AUTOEND based option.

A question about Interrupt cycle FGI and FGO flipflops conditions - Basic Computer Organization and design

Computer System Architecture - Morris Mano In chapter 5 section 7 figure 5-13
When IEN it checks whether "FGI" or "FGO" are set to 1 then an interrupt cycle happens, but as I know is when FGI = 1 it means that information in INPR cannot be changed, and FGO is the reverse to that which means that when FGO is set to 1 then information in AC will be transferred to OUTR 'OUTR can be changed' so the question here shouldn't the condition of applying interrupt cycle happen when "FGI" = 0 or "FGO" = 1 since INPR or OUTR can be changed under these conditions which now make since to execute an interrupt?

Either flag being 1 logically means a device is "ready", but what "ready" means differs for input and for output devices.  In either case, flag being 1 means that the processor can or should now take action.
FGI=1 means the input device is ready, but that really means a new input is available (e.g. the user typed a key on a keyboard) and the processor should accept it.  FGO=1 prevents the input device(s) from overwriting a prior input held in INPR that the processor hasn't accepted yet. When the processor accepts the input, FGI goes to 0 unlocking the INPR register, and that allows the input device to write again, which it will eventually do when the user presses another key (sending FGI back to 1 to signal the processor).
FGO=1 means ready for output, which really means the last output has been fully accepted by the device, so the OUTR register is unlocked for the processor to write a new data (character for the console).  FGO=0 prevents the processor from writing OUTR as the output device hasn't accepted the last one yet.
The interrupt service routine should check each flag, and if FGI=1 then accept an input (INPR->AC) and move it into a buffer for the user program to read when it is ready. Whereas if FGO=1, then move an output character from memory buffer into the AC, and then do AC->OUTR, also lowering FGO to 0, which will preclude the processor writing until that data has been accepted by the device.
So, FGI=0 means that the processor has accepted the prior INPR value provided by the input device, and there is no new character as yet but the register is unlocked so the device can write at will.
FGO=0 means that the processor has written a value to the OUTR register, but the output device hasn't accepted that yet, so the register should be considered locked.

What is the meaning of CANBUS function mode initilazing settings for STM32?

I want to understand meaning of the following function mode definition, there is explanation in the library. But I don't understand that because explanations are very short and not enough. I searched on the net I couldnt find any information about.
CAN_InitStructure.CAN_TTCM = DISABLE;
CAN_InitStructure.CAN_ABOM = DISABLE;
CAN_InitStructure.CAN_AWUM = DISABLE;
CAN_InitStructure.CAN_NART = ENABLE;
CAN_InitStructure.CAN_RFLM = DISABLE;
CAN_InitStructure.CAN_TXFP = ENABLE;

These are the names of the bits located in the CAN master control register (CAN_MCR). So, the proper source for their meaning is the reference manual. My following answer will be somewhat copy & paste from the reference manual, but I will try to explain these bits in detail.
TTCM (Time triggered communication mode): This bit activates the Time Triggered Communication (TTCAN) mode, which is an extension to the CAN standard. I don't know much about TTCAN, but as I understand, it assigns time windows to messages to satisfy some real-time requirements. So, normally this bit should remain 0.
ABOM (Automatic bus-off management): If the transmit error counter (TEC) becomes greater than 255, the CAN hardware switches to bus-off state. To recover, it must wait for the recovery sequence, 128 occurrences of 11 consecutive recessive bits. Only after that, the CAN hardware may return to the normal operating state. This bit controls the returning behavior. If it's 1, returning to normal state is automatic. Otherwise, software should make the request, provided that the recovery sequence has been observed.
AWUM (Automatic wakeup mode): The CAN module can be in one of 3 modes: Initialization mode, normal mode or sleep (low power) mode. Sleep mode is requested by the software. However, you have 2 options to exit sleep mode. If this bit is 0, then you have to exit sleep mode manually. You may enable CAN wakeup interrupt to inform you about bus activity, then exit the sleep mode in ISR. But if this bit is 1, the hardware returns to normal mode automatically when it detects bus activity.
NART (No automatic retransmission): Normally, CAN hardware retries to transmit a message if its previous attempts fail, because of arbitration lost etc. But if you make this bit 1, the transmitter does not retry. This is required when you use Time Triggered Communication (TTCAN). Otherwise, you should keep this bit 0.
RFLM (Receive FIFO locked mode): Your receive mailboxes have 3 levels depth, meaning that they can store maximum 3 messages before they are overrun. This bit controls what happens in case of mailbox overrun. Default behavior is to keep the oldest 2 messages and the newest one. For example, if you received 5 messages, the buffer keeps the messages 1, 2 & 5. However, if you make this bit 1, the mailbox keeps the messages 1, 2 & 3 and discards the new arrivals.
TXFP (Transmit FIFO priority): You have 3 transmit mailboxes. When you fill more than one, the hardware must decide which one to transmit first. Normally, one can assume that a message with a lower ID number is more important and should be transmitted first. But if you want to transfer them in a first-comes-first-served fashion for some reason, you need to make this bit 1. Of course, this is just a local priority. On the physical bus, the messages with lower ID always have priority.

STM32L073RZ (rev Z) IAP jump to bootloader (system memory)

I use the STM32L073RZ (Nucleo 64 board).
I would like to jump into the system memory in application programming (IAP).
My code works on the revision B of the STM32L073 microcontroller but fails on the latest revision, rev Z.
I read the errata sheet, no details are given, just a limitation fixed on the dual boot mechanism into system memory according to the BFB2 bit.
Is the system memory no longer supports an IAP jumping to execute its code (to flash firmwares through USB or UART without using the BOOT0 pin) ?
The function is the first line of my main program, it tests if the code has to jump to the booloader:
void jumpBootLoader(void)
{
/* to do jump? */
if ( *((unsigned long *)0x20003FF0) == 0xDEADBEEF )
{
/* erase the label */
*((unsigned long *)0x20003FF0) = 0xCAFEFEED;
/* set stack pointer to the bootloader start address */
__set_MSP(*((uint32_t*)(0x1FF00000)));
/* system memory mapped at 0x00000000 */
__HAL_SYSCFG_REMAPMEMORY_SYSTEMFLASH();
/* jump to #bootloader + 4 */
((void (*)(void))(*((uint32_t*)(0x1FF00004))))();
}
}
I call these two lines as soon as the BP1 button is pressed to trig the jump operation after resetting the µC:
*((unsigned long *)0x20003FF0) = 0xDEADBEEF;
NVIC_SystemReset();
I use the HSI 16Mhz clock source.

The solution is to jump twice to the system memory.
First Jump to bootloader startup to initialize Data in RAM until the Program counter will returned to Flash by the Dualbank management.
Second Jump: Jump to the Dualbank bypassed address
How to use: User has first to initialize a variable “ Data_Address” (must be an offset Flash sector aligned address) in Flash to distinguish between first/second Jump.
EraseInitStruct.TypeErase = FLASH_TYPEERASE_PAGES;
EraseInitStruct.PageAddress = Data_Address;
EraseInitStruct.NbPages = 1;
First_jump = *(__IO uint32_t *)(Data_Address);
if (First_jump == 0) {
HAL_FLASH_Unlock();
HAL_FLASH_Program(FLASH_TYPEPROGRAM_WORD, Data_Address, 0xAAAAAAAA);
HAL_FLASH_Lock();
/* Reinitialize the Stack pointer and jump to application address */
JumpAddress = *(__IO uint32_t *)(0x1FF00004);
}
if (First_jump != 0) {
HAL_FLASH_Unlock();
HAL_FLASHEx_Erase(&EraseInitStruct, &PAGEError);
HAL_FLASH_Lock();
/* Reinitialize the Stack pointer and jump to application address */
JumpAddress = (0x1FF00369);
}
Jump_To_Application = (pFunction) JumpAddress;
__set_MSP(*(__IO uint32_t *)(0x1FF00000));
Jump_To_Application();

First important thing: you use 0x1FF0 0000 as the addres where SP is stored, this is correct. Then you use 0x1 FF00 0004 as the address from which you load the function pointer. This is not correct - one zero too many.
Note that using __set_MSP() is generally not such a good idea if you also use MSP as your stack pointer (which you most likely are). The recent definition of this function, which marks "sp" as clobbered register, causes your change to be reverted almost immediately. Incidentally today I was doing exactly the same thing you are doing and I've found that problem. In your assembly listing you'll see that SP is saved into some other register before the msr msp, ... instruction and restored right after that.
Finally I wrote that manually (STM32F4, so different addresses):
constexpr uint32_t systemMemoryBase {0x1fff0000};
asm volatile
(
" msr msp, %[sp] \n"
" bx %[pc] \n"
:: [sp] "r" (*reinterpret_cast<const uint32_t*>(systemMemoryBase)),
[pc] "r" (*reinterpret_cast<const uint32_t*>(systemMemoryBase + 4))
);
BTW - you don't need to set memory remap for the bootloader to work.

Thanks for your help. I have my answer !
The v4.0 bootloader (initial version) does not implement the dual bank mechanism but this feature is supported by v4.1.
Software can jump to bootloader but it will execute the dual boot mechanism.
So the bootloader goes back to bank1 (or bank2 if a code is "valid").
Today it is not possible to bypass the dual bank mechanism to execute the bootloader with my configuration:
The boot0 pin is reset and the protection level is 0 (see "Table 11. Boot pin and BFB2 bit configuration" in the reference manual).

Where is your program counter when you call __HAL_SYSCFG_REMAPMEMORY_SYSTEMFLASH()?
Remapping a memory region while you're executing out of that same region will end poorly! You may need to relocate this code into SRAM, or execute this code with PC set to the fixed FLASH memory mapping (0x0800xxxx).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse