Erasing STM32 Flash Sector is hanging program - stm32

I am using FreeRtos and in one of the tasks I Erase a sector of the flash using the following code
HAL_FLASH_Unlock();
// Fill EraseInit structure
static FLASH_EraseInitTypeDef EraseInitStruct;
EraseInitStruct.TypeErase = FLASH_TYPEERASE_SECTORS;
EraseInitStruct.VoltageRange = FLASH_VOLTAGE_RANGE_3;
EraseInitStruct.Sector = sector;
EraseInitStruct.NbSectors = numOfSectorsToErase;
HAL_FLASHEx_Erase_IT(&EraseInitStruct);
HAL_FLASH_Lock();
I thought this was a non-blocking invocation to erase the sector however when this is called, all other threads seem to be Preempted for 100 ms (as seen on oscilloscope) until the erase is completed. I must be doing something wrong because I am using the interrupt version of the erase. It shouldn't hang everything like this correct?
(I am sure that I am erasing the sector where the program code does not reside. Sector 6)

Documentation is clear:
Always read the documentation, not internet forums.

So it would seem that #Hs2 is Correct. Among further research, erasing a sector on the flash will Block execution as stated here https://community.st.com/s/question/0D50X00009XkXwuSAF/how-to-save-a-variable-in-nonvolatile-memory
saying "flashing blocks code execution" .
Now this brings up even more questions though like why in the world would the engineers at stm include a interrupt version of a sector erase when either way the invocation is going to be blocking anyways. Its very misleading. There seems to be no use case for this.

Related

STM32 - Can I subvert the cool context switching for interrupts?

The STM32 family has fantastic interrupt service, they stack a whole slew of extra registers for you, and load the LR with an artificial return to properly unstack while looking for opportunities for tail chaining, aborted entry, etc etc.
HOWEVER....it is too damn slow. I am finding (STM32F730Z8, 200 MHz clock, all code including handlers in ITCM, everything in GNU assembly) that it takes about 120-150 ns overhead to get into an interrupt.
I am still learning about these, used to the old ARM7 where you had to do it all yourself, however, in those chips, if you had a minimal handler you didn't need to stack much.
So -- can I "subvert" the context switching in hardware, and just have it leap to the handler at elevated priority, pausing only to fill the pipeline, and leaving me to take care of stacking what is needed? I don't think so, and haven't seen a way to do it, but I'm working on an extremely tight time-sensitive realtime code, and interrupt switching is eating all my time budget. I'm reverting to doing it all in low-code, polled, but I hate the jitter that gives me on response to pin edges. Help?
No, this is done in pure hardware and is the main defining feature of all "Cortex-M" processors, not just STM32.
150ns at 200MHz is 30 cycles. You can probably get it quite a bit faster.
One way is to mark the floating point unit as unused each time you finish with it, and to set a core flag to tell it not to save the floating point registers. See ARM application note 298 for details.
Another method that you might try is to move your vector table and interrupt handler code to internal SRAM. STM32 has a flash memory accelerator which avoids most wait states on internal flash by performing prefetch of sequential instructions, but an asynchronous interrupt will probably not benefit from this.
that it takes about 120-150 ns overhead to get into an interrupt.
It is not the truth at all
It takes 60ns as it takes 12 clock cycles.

Is there a process to remap/switch CANTX/CANRX pins STM32H7 FDCAN module?

Testing out an STM32 with an FDCAN module (updated from the older BxCAN peripheral). CAN Classic at 500kbps.
I am running into an issue that when using the default pair of pins (D0/D1 in my case) I get expected behavior, but when switching the pins to the secondary option (B8/B9) using GPIO remapping, I get strange output on the bus.
I tried baud settings and options like protocol exception. Nothing seems to explain where this scope output is coming from.
I'm using the HAL to get this working, so I'm certain I'm not missing any registers on remapping. I've DeInit and ReInit the FDCAN module, started/stopped, etc. There seems to be no documented "process" for remapping pins. The entire FDCAN section of the reference module doesn't have the letters GPIO.
Picture: Yellow is the CANTX 0-3V signal (low is dominant). Purple is the CAN+ signal that idles at 2.5V and pulls past ~3.5V on a dominant. There is nothing else on this line, so I'm not concerned about the sawtoothing. The large initial CAN "SOF" pulse is wrong for timing. The long recessives are nonsense. Then the small value 1 bits are of the correct 2uS pulses for 500kbps. Changing the data put into the FDCAN FIFO makes no difference, the output is always the same.
Solved.
After sending this message, the INIT bits were set in the FDCAN->CCCR register. There were values in the error counters. Indicates an internal error. I was using the HAL as a time saver, but it was over-writting my desired GPIO settings.
I would set the pins B8/B9 to AF mode for FDCAN. Then call FDCAN_DeInit/Init, which via an MSP_INIT callback also calls GPIO Init, but for the original D0/D1 pins. Meaning the B8/B9 I set, and the D0/D1 pins were enabled at the same time.
This is an obvious problem. The HAL is fine for prototyping, but careful because it will try and "help". Undefined behavior at best and I normally wouldn't even post such a dumb mistake.
However... Maybe someone else finds it interesting that whatever the FDCAN state machine is doing here, makes this unique output seen in the scope picture. I initially didn't double check my pin setup, because it looked right, I was getting output on the scope, just the wrong output. I spent much more time going over peripheral settings and data I was feeding to it.

How can I determine interrupt source on an stm32?

I recently ended up in the Default_Handler in my stm32 project and couldn't figure out what was casing it:
.section .text.Default_Handler,"ax",%progbits
Default_Handler:
Infinite_Loop:
b Infinite_Loop <--- here!
By default, a lot of interrupts are mapped to the default handler and the only way I could figure out what the actual interrupt reason was, would be to write handlers for all the interrupts (60+) and pause the code in the debugger. Bah!
I didn't find a good answer googling, so I thought I document the solution for others (or most likely myself in 6 months...)
So, it turns out there are some registers in the NVIC (interrupt controller) that we could use:
The above is from the STM32CubeIDE debugger. NVIC_IABRX contains a bitmask of the currently active interrupts and I can see that NVIC_IABR1 has a non-zero bit (it's 0x1000).
Each IABR reg is 32 bits wide, so with some simple bit counting I conclude that the interrupt source is 32+12 = 44. Now I need to look at the datasheet for my mcu (an stm32wb55) so see what that corresponds to:
Aha, so it's the IPCC that's causing the interrupt! To double check, I added a handler for this specific interrupt
void IPCC_C1_RX_IRQHandler(void)
{
}
And it got called!
Note: I initially just had a look at interrupt vector in the startup_stm32xxx.s file and counted from the start of that but that didn't work out since the first few interrupts are not included in the NVIC_IABRX registers.

What exactly happens when an OS goes into kernel mode?

I find that neither my textbooks or my googling skills give me a proper answer to this question. I know it depends on the operating system, but on a general note: what happens and why?
My textbook says that a system call causes the OS to go into kernel mode, given that it's not already there. This is needed because the kernel mode is what has control over I/O-devices and other things outside of a specific process' adress space. But if I understand it correctly, a switch to kernel mode does not necessarily mean a process context switch (where you save the current state of the process elsewhere than the CPU so that some other process can run).
Why is this? I was kinda thinking that some "admin"-process was switched in and took care of the system call from the process and sent the result to the process' address space, but I guess I'm wrong. I can't seem to grasp what ACTUALLY is happening in a switch to and from kernel mode and how this affects a process' ability to operate on I/O-devices.
Thanks alot :)
EDIT: bonus question: does a library call necessarily end up in a system call? If no, do you have any examples of library calls that do not end up in system calls? If yes, why do we have library calls?
Historically system calls have been issued with interrupts. Linux used the 0x80 vector and Windows used the 0x2F vector to access system calls and stored the function's index in the eax register. More recently, we started using the SYSENTER and SYSEXIT instructions. User applications run in Ring3 or userspace/usermode. The CPU is very tricky here and switching from kernel mode to user mode requires special care. It actually involves fooling the CPU to think it was from usermode when issuing a special instruction called iret. The only way to get back from usermode to kernelmode is via an interrupt or the already mentioned SYSENTER/EXIT instruction pairs. They both use a special structure called the TaskStateSegment or TSS for short. These allows to the CPU to find where the kernel's stack is, so yes, it essentially requires a task switch.
But what really happens?
When you issue an system call, the CPU looks for the TSS, gets its esp0 value, which is the kernel's stack pointer and places it into esp. The CPU then looks up the interrupt vector's index in another special structure the InterruptDescriptorTable or IDT for short, and finds an address. This address is where the function that handles the system call is. The CPU pushes the flags register, the code segment, the user's stack and the instruction pointer for the next instruction that is after the int instruction. After the systemcall has been serviced, the kernel issues an iret. Then the CPU returns back to usermode and your application continues as normal.
Do all library calls end in system calls?
Well most of them do, but there are some which don't. For example take a look at memcpy and the rest.

Kernel Code vs User Code

Here's a passage from the book
When executing kernel code, the system is in kernel-space execut-
ing in kernel mode.When running a regular process, the system is in user-space executing
in user mode.
Now what really is a kernel code and user code. Can someone explain with example?
Say i have an application that does printf("HelloWorld") now , while executing this application, will it be a user code, or kernel code.
I guess that at some point of time, user-code will switch into the kernel mode and kernel code will take over, but I guess that's not always the case since I came across this
For example, the open() library function does little except call the open() system call.
Still other C library functions, such as strcpy(), should (one hopes) make no direct use
of the kernel at all.
If it does not make use of the kernel, then how does it make everything work?
Can someone please explain the whole thing in a lucid way.
There isn't much difference between kernel and user code as such, code is code. It's just that the code that executes in kernel mode (kernel code) can (and does) contain instructions only executable in kernel mode. In user mode such instructions can't be executed (not allowed there for reliability and security reasons), they typically cause exceptions and lead to process termination as a result of that.
I/O, especially with external devices other than the RAM, is usually performed by the OS somehow and system calls are the entry points to get to the code that does the I/O. So, open() and printf() use system calls to exercise that code in the I/O device drivers somewhere in the kernel. The whole point of a general-purpose OS is to hide from you, the user or the programmer, the differences in the hardware, so you don't need to know or think about accessing this kind of network card or that kind of display or disk.
Memory accesses, OTOH, most of the time can just happen without the OS' intervention. And strcpy() works as is: read a byte of memory, write a byte of memory, oh, was it a zero byte, btw? repeat if it wasn't, stop if it was.
I said "most of the time" because there's often page translation and virtual memory involved and memory accesses may result in switched into the kernel, so the kernel can load something from the disk into the memory and let the accessing instruction that's caused the switch continue.