FreeRTOS task priority and stack size - stm32

I have STM32F746ZG Nucleo-144pin board and generated the codes using STMCubeMx.
I chose the FreeRTOS which is version 10.0.0 offered by CubeMx and the toolchain is SW4STM32.
I made two tasks and the following is my function.
My code here:
void led1_task(void)
{
while(1)
{
HAL_GPIO_TogglePin(GPIOB, GPIO_PIN_7);
HAL_Delay(1000);
}
}
void led2_task(void)
{
while(1)
{
HAL_GPIO_TogglePin(GPIOB, GPIO_PIN_14);
HAL_Delay(4100);
}
}
Task priority.
I found that if two tasks have the same task priority, the two tasks work fine, but if they have different priority of task, the low task does not work.
xTaskCreate(led1_task, "led1_task", 1024, NULL, 2, NULL); ==> Works fine.
xTaskCreate(led2_task, "led2_task", 1024, NULL, 2, NULL); ==> Works fine.
----------------------------------------------------------------------------
xTaskCreate(led1_task, "led1_task", 1024, NULL, 2, NULL); ==> This task is not working.
xTaskCreate(led2_task, "led2_task", 1024, NULL, 3, NULL); ==> Works fine.
Task stack size.
If the stack size of the two tasks combined to be greater than 3 KB, it was confirmed that the task was not working correctly.
The code below works correctly.
xTaskCreate(led1_task, "led1_task", 2048, NULL, 2, NULL); ==> Works fine.
xTaskCreate(led2_task, "led2_task", 1024, NULL, 2, NULL); ==> Works fine.
However, the second task does not work if the stack size is changed as follows.
xTaskCreate(led1_task, "led1_task", 2048, NULL, 2, NULL); ==> Works fine.
xTaskCreate(led2_task, "led2_task", 2048, NULL, 2, NULL); ==> This task is not working.
Attempting to change the _Min_Stack_Size from 0x400 to 0x4000 in STM32F746ZGTx_FLASH.ld has the same problem.
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x200; /* required account of heap */
_Min_Stack_Size = 0x4000; /* required account of stack */
Can anyone explain the reason for this?

To answer your questions:
Task priority
Your lower priority task doesn't work because you use HAL_Delay. This function performs "active" blocking, i.e. the task that calls this function will keep on checking the internal tick counter until the condition is met. In other words - it doesn't block this task in RTOS sense. You should use vTaskDelay instead of HAL_Delay.
Task stack size
There are a few points to note here.
a. The stack depth given to xTaskCreate is given in words, not bytes. In your example, the combined size of task stacks is `(2048 + 1024) * sizeof(uint32_t)' bytes. It's a lot in your case, much too much for what you do there currently.
b. Without debugging it's hard to tell with certainty why your second task isn't working, but it's highly likely that the second task doesn't get created at all because you hit some limit, e.g. going past RTOS heap size. It depends which FreeRTOS memory management implementation you use (heap_1, heap_2 etc.). You likely use one that depends on the configTOTAL_HEAP_SIZE - this is the definition that you should check and increase respectively.
c. _Min_Heap_Size and _Min_Stack_Size have nothing to do with FreeRTOS (unless you use heap_3 which uses malloc internally). These correspond to heap and stack outside of RTOS.

For your 2nd question:
Note that xTaskCreate uses FreeRTOS heap, so increase configTOTAL_HEAP_SIZE in FreeRTOS.h file.
Important note: configTOTAL_HEAP_SIZE is in BYTEs while xTaskCreate gets the stack size in WORDs (1 Word equals 4 Bytes).

Related

arm64 ptrace SINGLESTEP: are the steps described in this paper correct?

I was reading the paper Hiding in the Shadows: Empowering ARM for Stealthy Virtual Machine Introspection and I was wondering whether the steps they are described in paragraph "2.3 Debug Exceptions" were correct or not:
AArch64 allows to generate Software Step exceptions by setting the SS
bit of the Monitor Debug System Control MDSCR_EL1 and Saved Program
Status Register SPSR of the target exception level. For instance, to
single-step a hit breakpoint in EL1 the monitor must set the
MDSCR_EL1.SS and SPSR_EL1.SS bits. After returning to the trapped
instruction, the SPSR will be written to the process state PSTATE
register in EL1. Consequently, the CPU executes the next instruction
and generates a Software Step exception.
I have tried to understand how single-stepping happens in freeBSD, and I am noticing a mismatch.
I am basing the following lines of code to the release 12.3.0 of freeBSD (4 December 2021), commit: 70cb68e7a00ac0310a2d0ca428c1d5018e6d39e1. I chose to base this question on freeBSD because, in my opinion, following its code is easier than Linux, but the same principles shall be common to both families.
According to my understanding, this is what happens in freeBSD:
1- Ptrace single step is invoked, arriving in the architecture-independent code proc_sstep(), in sys_process.c:
int proc_sstep(struct thread *td)
{
PROC_ACTION(ptrace_single_step(td));
}
2- Architecture-dependent code ptrace_single_step()is called, in arm64/ptrace_machdep.c:
int ptrace_single_step(struct thread *td)
{
td->td_frame->tf_spsr |= PSR_SS;
td->td_pcb->pcb_flags |= PCB_SINGLE_STEP;
return (0);
}
Here single step bit (number 21) is set in the "Process State" of the tracee (tracee = thread that is traced) and a flag is set.
3- After a while, the traced task will be selected for scheduling. In cpu_throw() of swtch.S (where the new thread takes place), the flags of the new thread are checked, to see if it must single step:
/* If we are single stepping, enable it */
ldr w5, [x4, #PCB_FLAGS]
set_step_flag w5, x6
4- set_step_flag macro in defined in the same swtch.S:
.macro set_step_flag pcbflags, tmp
tbz \pcbflags, #PCB_SINGLE_STEP_SHIFT, 999f
mrs \tmp, mdscr_el1
orr \tmp, \tmp, #1
msr mdscr_el1, \tmp
isb
999:
.endm
Here, if the single-step flag is set, it sets the single step bit of register MDSCR_EL1 (bit in position 0).
4- To the best of my understanding, the combination of single step bit on SPSR_EL1 of the "Pstate" + single step bit on MDSCRL_EL1 implies that the tracee execute 1 instruction and it traps.
5- Trap is recognized as a EXCP_SOFTSTP_EL0 and it is handled in do_el0_sync() function of trap.c:
case EXCP_SOFTSTP_EL0:
td->td_frame->tf_spsr &= ~PSR_SS;
td->td_pcb->pcb_flags &= ~PCB_SINGLE_STEP;
WRITE_SPECIALREG(mdscr_el1,
READ_SPECIALREG(mdscr_el1) & ~DBG_MDSCR_SS);
call_trapsignal(td, SIGTRAP, TRAP_TRACE,
(void *)frame->tf_elr, exception);
userret(td, frame);
break;
Here, all the flags are reset and the traced thread receives a SIGTRAP (sent by itself, I think). Being traced, it will stop. And the tracer, at this point, can return from a possible waitpid().
What I could observe differs from the paper explanation. Can you check and correct the steps that I listed, please ?

How to minimize latency when reading audio with ALSA?

When trying to acquire some signals in the frequency domain, I've encountered the issue of having snd_pcm_readi() take a wildly variable amount of time. This causes problems in the logic section of my code, which is time dependent.
I have that most of the time, snd_pcm_readi() returns after approximately 0.00003 to 0.00006 seconds. However, every 4-5 call to snd_pcm_readi() requires approximately 0.028 seconds. This is a huge difference, and causes the logic part of my code to fail.
How can I get a consistent time for each call to snd_pcm_readi()?
I've tried to experiment with the period size, but it is unclear to me what exactly it does even after re-reading the documentation multiple times. I don't use an interrupt driven design, I simply call snd_pcm_readi() and it blocks until it returns -- with data.
I can only assume that the reason it blocks for a variable amount of time, is that snd_pcm_readi() pulls data from the hardware buffer, which happens to already have data readily available for transfer to the "application buffer" (which I'm maintaining). However, sometimes, there is additional work to do in kernel space or on the hardware side, hence the function call takes longer to return in these cases.
What purpose does the "period size" serve when I'm not using an interrupt driven design? Can my problem be fixed at all by manipulation of the period size, or should I do something else?
I want to achieve that each call to snd_pcm_readi() takes approximately the same amount of time. I'm not asking for a real time compliant API, which I don't imagine ALSA even attempts to be, however, seeing a difference in function call time on the order of being 500 times longer (which is what I'm seeing!) then this is a real problem.
What can be done about it, and what should I do about it?
I would present a minimal reproducible example, but this isn't easy in my case.
Typically when reading and writing audio, the period size specifies how much data ALSA has reserved in DMA silicon. Normally the period size specifies your latency. So for example while you are filling a buffer for writing through DMA to the I2S silicon, one DMA buffer is already being written out.
If you have your period size too small, then the CPU doesn't have time to write audio out in the scheduled execution slot provided. Typically people aim for a minimum of 500 us or 1 ms in latency. If you are doing heavy forms of computation, then you may want to choose 5 ms or 10 ms of latency. You may choose even more latency if you are on a non-powerful embedded system.
If you want to push the limit of the system, then you can request the priority of the audio processing thread be increased. By increasing the priority of your thread, you ask the scheduler to process your audio thread before all other threads with lower priority.
One method for increasing priority taken from the gtkIOStream ALSA C++ OO classes is like so (taken from the changeThreadPriority method) :
/** Set the current thread's priority
\param priority <0 implies maximum priority, otherwise must be between sched_get_priority_max and sched_get_priority_min
\return 0 on success, error code otherwise
*/
static int changeThreadPriority(int priority){
int ret;
pthread_t thisThread = pthread_self(); // get the current thread
struct sched_param origParams, params;
int origPolicy, policy = SCHED_FIFO, newPolicy=0;
if ((ret = pthread_getschedparam(thisThread, &origPolicy, &origParams))!=0)
return ALSA::ALSADebug().evaluateError(ret, "when trying to pthread_getschedparam\n");
printf("ALSA::Stream::changeThreadPriority : Current thread policy %d and priority %d\n", origPolicy, origParams.sched_priority);
if (priority<0) //maximum priority
params.sched_priority = sched_get_priority_max(policy);
else
params.sched_priority = priority;
if (params.sched_priority>sched_get_priority_max(policy))
return ALSA::ALSADebug().evaluateError(ALSA_SCHED_PRIORITY_ERROR, "requested priority is too high\n");
if (params.sched_priority<sched_get_priority_min(policy))
return ALSA::ALSADebug().evaluateError(ALSA_SCHED_PRIORITY_ERROR, "requested priority is too low\n");
if ((ret = pthread_setschedparam(thisThread, policy, &params))!=0)
return ALSA::ALSADebug().evaluateError(ret, "when trying to pthread_setschedparam - are you su or do you have permission to set this priority?\n");
if ((ret = pthread_getschedparam(thisThread, &newPolicy, &params))!=0)
return ALSA::ALSADebug().evaluateError(ret, "when trying to pthread_getschedparam\n");
if(policy != newPolicy)
return ALSA::ALSADebug().evaluateError(ALSA_SCHED_POLICY_ERROR, "requested scheduler policy is not correctly set\n");
printf("ALSA::Stream::changeThreadPriority : New thread priority changed to %d\n", params.sched_priority);
return 0;
}

What is the meaning of CANBUS function mode initilazing settings for STM32?

I want to understand meaning of the following function mode definition, there is explanation in the library. But I don't understand that because explanations are very short and not enough. I searched on the net I couldnt find any information about.
CAN_InitStructure.CAN_TTCM = DISABLE;
CAN_InitStructure.CAN_ABOM = DISABLE;
CAN_InitStructure.CAN_AWUM = DISABLE;
CAN_InitStructure.CAN_NART = ENABLE;
CAN_InitStructure.CAN_RFLM = DISABLE;
CAN_InitStructure.CAN_TXFP = ENABLE;
These are the names of the bits located in the CAN master control register (CAN_MCR). So, the proper source for their meaning is the reference manual. My following answer will be somewhat copy & paste from the reference manual, but I will try to explain these bits in detail.
TTCM (Time triggered communication mode): This bit activates the Time Triggered Communication (TTCAN) mode, which is an extension to the CAN standard. I don't know much about TTCAN, but as I understand, it assigns time windows to messages to satisfy some real-time requirements. So, normally this bit should remain 0.
ABOM (Automatic bus-off management): If the transmit error counter (TEC) becomes greater than 255, the CAN hardware switches to bus-off state. To recover, it must wait for the recovery sequence, 128 occurrences of 11 consecutive recessive bits. Only after that, the CAN hardware may return to the normal operating state. This bit controls the returning behavior. If it's 1, returning to normal state is automatic. Otherwise, software should make the request, provided that the recovery sequence has been observed.
AWUM (Automatic wakeup mode): The CAN module can be in one of 3 modes: Initialization mode, normal mode or sleep (low power) mode. Sleep mode is requested by the software. However, you have 2 options to exit sleep mode. If this bit is 0, then you have to exit sleep mode manually. You may enable CAN wakeup interrupt to inform you about bus activity, then exit the sleep mode in ISR. But if this bit is 1, the hardware returns to normal mode automatically when it detects bus activity.
NART (No automatic retransmission): Normally, CAN hardware retries to transmit a message if its previous attempts fail, because of arbitration lost etc. But if you make this bit 1, the transmitter does not retry. This is required when you use Time Triggered Communication (TTCAN). Otherwise, you should keep this bit 0.
RFLM (Receive FIFO locked mode): Your receive mailboxes have 3 levels depth, meaning that they can store maximum 3 messages before they are overrun. This bit controls what happens in case of mailbox overrun. Default behavior is to keep the oldest 2 messages and the newest one. For example, if you received 5 messages, the buffer keeps the messages 1, 2 & 5. However, if you make this bit 1, the mailbox keeps the messages 1, 2 & 3 and discards the new arrivals.
TXFP (Transmit FIFO priority): You have 3 transmit mailboxes. When you fill more than one, the hardware must decide which one to transmit first. Normally, one can assume that a message with a lower ID number is more important and should be transmitted first. But if you want to transfer them in a first-comes-first-served fashion for some reason, you need to make this bit 1. Of course, this is just a local priority. On the physical bus, the messages with lower ID always have priority.

STM32L073RZ (rev Z) IAP jump to bootloader (system memory)

I use the STM32L073RZ (Nucleo 64 board).
I would like to jump into the system memory in application programming (IAP).
My code works on the revision B of the STM32L073 microcontroller but fails on the latest revision, rev Z.
I read the errata sheet, no details are given, just a limitation fixed on the dual boot mechanism into system memory according to the BFB2 bit.
Is the system memory no longer supports an IAP jumping to execute its code (to flash firmwares through USB or UART without using the BOOT0 pin) ?
The function is the first line of my main program, it tests if the code has to jump to the booloader:
void jumpBootLoader(void)
{
/* to do jump? */
if ( *((unsigned long *)0x20003FF0) == 0xDEADBEEF )
{
/* erase the label */
*((unsigned long *)0x20003FF0) = 0xCAFEFEED;
/* set stack pointer to the bootloader start address */
__set_MSP(*((uint32_t*)(0x1FF00000)));
/* system memory mapped at 0x00000000 */
__HAL_SYSCFG_REMAPMEMORY_SYSTEMFLASH();
/* jump to #bootloader + 4 */
((void (*)(void))(*((uint32_t*)(0x1FF00004))))();
}
}
I call these two lines as soon as the BP1 button is pressed to trig the jump operation after resetting the µC:
*((unsigned long *)0x20003FF0) = 0xDEADBEEF;
NVIC_SystemReset();
I use the HSI 16Mhz clock source.
The solution is to jump twice to the system memory.
First Jump to bootloader startup to initialize Data in RAM until the Program counter will returned to Flash by the Dualbank management.
Second Jump: Jump to the Dualbank bypassed address
How to use: User has first to initialize a variable “ Data_Address” (must be an offset Flash sector aligned address) in Flash to distinguish between first/second Jump.
EraseInitStruct.TypeErase = FLASH_TYPEERASE_PAGES;
EraseInitStruct.PageAddress = Data_Address;
EraseInitStruct.NbPages = 1;
First_jump = *(__IO uint32_t *)(Data_Address);
if (First_jump == 0) {
HAL_FLASH_Unlock();
HAL_FLASH_Program(FLASH_TYPEPROGRAM_WORD, Data_Address, 0xAAAAAAAA);
HAL_FLASH_Lock();
/* Reinitialize the Stack pointer and jump to application address */
JumpAddress = *(__IO uint32_t *)(0x1FF00004);
}
if (First_jump != 0) {
HAL_FLASH_Unlock();
HAL_FLASHEx_Erase(&EraseInitStruct, &PAGEError);
HAL_FLASH_Lock();
/* Reinitialize the Stack pointer and jump to application address */
JumpAddress = (0x1FF00369);
}
Jump_To_Application = (pFunction) JumpAddress;
__set_MSP(*(__IO uint32_t *)(0x1FF00000));
Jump_To_Application();
First important thing: you use 0x1FF0 0000 as the addres where SP is stored, this is correct. Then you use 0x1 FF00 0004 as the address from which you load the function pointer. This is not correct - one zero too many.
Note that using __set_MSP() is generally not such a good idea if you also use MSP as your stack pointer (which you most likely are). The recent definition of this function, which marks "sp" as clobbered register, causes your change to be reverted almost immediately. Incidentally today I was doing exactly the same thing you are doing and I've found that problem. In your assembly listing you'll see that SP is saved into some other register before the msr msp, ... instruction and restored right after that.
Finally I wrote that manually (STM32F4, so different addresses):
constexpr uint32_t systemMemoryBase {0x1fff0000};
asm volatile
(
" msr msp, %[sp] \n"
" bx %[pc] \n"
:: [sp] "r" (*reinterpret_cast<const uint32_t*>(systemMemoryBase)),
[pc] "r" (*reinterpret_cast<const uint32_t*>(systemMemoryBase + 4))
);
BTW - you don't need to set memory remap for the bootloader to work.
Thanks for your help. I have my answer !
The v4.0 bootloader (initial version) does not implement the dual bank mechanism but this feature is supported by v4.1.
Software can jump to bootloader but it will execute the dual boot mechanism.
So the bootloader goes back to bank1 (or bank2 if a code is "valid").
Today it is not possible to bypass the dual bank mechanism to execute the bootloader with my configuration:
The boot0 pin is reset and the protection level is 0 (see "Table 11. Boot pin and BFB2 bit configuration" in the reference manual).
Where is your program counter when you call __HAL_SYSCFG_REMAPMEMORY_SYSTEMFLASH()?
Remapping a memory region while you're executing out of that same region will end poorly! You may need to relocate this code into SRAM, or execute this code with PC set to the fixed FLASH memory mapping (0x0800xxxx).

Very few write cycles in stm32f4

I'm using a STM32F401VCT6U "discovery" board, and I need to provide a way for the user to write addresses in memory at runtime.
I wrote what can be simplified to the following function:
uint8_t Write(uint32_t address, uint8_t* values, uint8_t count)
{
uint8_t index;
for (index = 0; index < count; ++index) {
if (IS_FLASH_ADDRESS(address+index)) {
/* flash write */
FLASH_Unlock();
if (FLASH_ProgramByte(address+index, values[index]) != FLASH_COMPLETE) {
return FLASH_ERROR;
}
FLASH_Lock();
} else {
/* ram write */
((uint8_t*)address)[index] = values[index]
}
}
return NO_ERROR;
}
In the above, address is the base address, values is a buffer of size at least count which contains the bytes to write to memory and count the number of bytes to write.
Now, my problem is the following: when the above function is called with a base address in flash and count=100, it works normally the first few times, writing the passed values buffer to flash. After those first few calls however, I cannot write just any value anymore: I can only reset bits in the values in flash, eg an attempt to write 0xFF to 0x7F will leave 0x7F in the flash, while writing 0xFE to 0x7F will leave 0x7E, and 0x00 to any value will be successful (but no other value will be writable to the address afterwards).
I can still write normally to other addresses in the flash by changing the base address, but again only a few times (two or three calls with count=100).
This behaviour suggests that the maximum write count of the flash has been reached, but I cannot imagine it can be so fast. I'd expect at the very least 10,000 writes before exhaustion.
So what am I doing wrong?
You have missunderstood how flash works - it is not for example as straight forward as writing EEPROM. The behaviour you are discribing is normal for flash.
To repeatidly write the same address of flash the whole sector must be first erased using FLASH_EraseSector. Generally any data that needs to preserved during this erase needs to be either buffered in RAM or in another flash sector.
If you are repeatidly writing a small block of data and are worried about flash burnout do to many erase write cycles you would want to write an interface to the flash where each write you move your data along the flash sector to unwriten flash, keeping track of its current offset from the start of sector. Only then when you run out of bytes in the sector would you need to erase and start again at start of sector.
ST's "right way" is detailed in AN3969: EEPROM emulation in STM32F40x/STM32F41x microcontrollers
This is more or less the process:
Reserve two Flash pages
Write the latest data to the next available location along with its 'EEPROM address'
When you run out of room on the first page, write all of the latest values to the second page and erase the first
Begin writing values where you left off on page 2
When you run out of room on page 2, repeat on page 1
This is insane, but I didn't come up with it.
I have a working and tested solution, but it is rather different from #Ricibob's answer, so I decided to make this an answer.
Since my user can write anywhere in select flash sector, my application cannot handle the responsability of erasing the sector when needed while buffering to RAM only the data that need to be preserved.
As a result, I transferred to my user the responsability of erasing the sector when a write to it doesn't work (this way, the user remains free to use another address in the sector to avoid too many write-erase cycles).
Solution
Basically, I expose a write(uint32_t startAddress, uint8_t count, uint8_t* values) function that has a WRITE_SUCCESSFUL return code and a CANNOT_WRITE_FLASH in case of failure.
I also provide my user with a getSector(uint32_t address) function that returns the id, start address and end address of the sector corresponding to the address passed as a parameter. This way, the user knows what range of address is affected by the erase operation.
Lastly, I expose an eraseSector(uint8_t sectorID) function that erase the flash sector whose id has been passed as a parameter.
Erase Policy
The policy for a failed write is different from #Ricibob's suggestion of "erase if the value in flash is different of FF", as it is documented in the Flash programming manual that a write will succeed as long as it is only bitreset (which matches the behavior I observed in the question):
Note: Successive write operations are possible without the need of an erase operation when
changing bits from ‘1’ to ‘0’.
Writing ‘1’ requires a Flash memory erase operation.
If an erase and a program operation are requested simultaneously, the erase operation is
performed first.
So I use the macro CAN_WRITE(a,b), where a is the original value in flash and b the desired value. The macro is defined as:
!(~a & b)
which works because:
the logical not (!) will transform 0 to true and everything else to false, so ~a & b must equal 0 for the macro to be true;
any bit at 1 in a is at 0 in ~a, so it will be 0 whatever its value in b is (you can transform a 1 in 1 or 0);
if a bit is 0 in a, then it is 1 in ~a, if b equals 1 then ~a & b != 0 and we cannot write, if bequals 0 it's OK (you can transform a 0 to 0 only, not to 1).
List of flash sector in STM32F4
Lastly and for future reference (as it is not that easy to find), the list of sectors of flash in STM32 can be found on page 7 of the Flash programming manual.