STM32 - Can I subvert the cool context switching for interrupts?

STM32 - Can I subvert the cool context switching for interrupts? - stm32

The STM32 family has fantastic interrupt service, they stack a whole slew of extra registers for you, and load the LR with an artificial return to properly unstack while looking for opportunities for tail chaining, aborted entry, etc etc.
HOWEVER....it is too damn slow. I am finding (STM32F730Z8, 200 MHz clock, all code including handlers in ITCM, everything in GNU assembly) that it takes about 120-150 ns overhead to get into an interrupt.
I am still learning about these, used to the old ARM7 where you had to do it all yourself, however, in those chips, if you had a minimal handler you didn't need to stack much.
So -- can I "subvert" the context switching in hardware, and just have it leap to the handler at elevated priority, pausing only to fill the pipeline, and leaving me to take care of stacking what is needed? I don't think so, and haven't seen a way to do it, but I'm working on an extremely tight time-sensitive realtime code, and interrupt switching is eating all my time budget. I'm reverting to doing it all in low-code, polled, but I hate the jitter that gives me on response to pin edges. Help?

No, this is done in pure hardware and is the main defining feature of all "Cortex-M" processors, not just STM32.
150ns at 200MHz is 30 cycles. You can probably get it quite a bit faster.
One way is to mark the floating point unit as unused each time you finish with it, and to set a core flag to tell it not to save the floating point registers. See ARM application note 298 for details.
Another method that you might try is to move your vector table and interrupt handler code to internal SRAM. STM32 has a flash memory accelerator which avoids most wait states on internal flash by performing prefetch of sequential instructions, but an asynchronous interrupt will probably not benefit from this.

that it takes about 120-150 ns overhead to get into an interrupt.
It is not the truth at all
It takes 60ns as it takes 12 clock cycles.

Related

how does the operating system treat few interrupts and keep processes going?

I'm learning computer organization and structure (I'm using Linux OS with x86-64 architecture). we've studied that when an interrupt occurs in user mode, the OS is notified and it switches between the user stack and the kernel stack by loading the kernels rsp from the TSS, afterwards it saves the necessary registers (such as rip) and in case of software interrupt it also saves the error-code. in the end, just before jumping to the adequate handler routine it zeroes the TF and in case of hardware interrupt it zeroes the IF also. I wanted to ask about few things:
the error code is save in the rip, so why loading both?
if I consider a case where few interrupts happen together which causes the IF and TF to turn on, if I zero the TF and IF, but I treat only one interrupt at a time, aren't I leave all the other interrupts untreated? in general, how does the OS treat few interrupts that occur at the same time when using the method of IDT with specific vector for each interrupt?
does this happen because each program has it's own virtual memory and thus the interruption handling processes of all the programs are unrelated? where can i read more about it?
how does an operating system keep other necessary progresses running while handling the interrupt?
thank you very much for your time and attention!

the error code is save in the rip, so why loading both?
You're misunderstanding some things about the error code. Specifically:
it's not generated by software interrupts (e.g. instructions like int 0x80)
it is generated by some exceptions (page fault, general protection fault, double fault, etc).
the error code (if used) is not saved in the RIP, it's pushed on the stack so that the exception handler can use it to get more information about the cause of the exception
2a. if I consider a case where few interrupts happen together which causes the IF and TF to turn on, if I zero the TF and IF, but I treat only one interrupt at a time, aren't I leave all the other interrupts untreated?
When the IF flag is clear, mask-able IRQs (which doesn't include other types of interrupts - software interrupts, exceptions) are postponed (not disabled) until the IF flag is set again. They're "temporarily untreated" until they're treated later.
The TF flag only matters for debugging (e.g. single-step debugging, where you want the CPU to generate a trap after every instruction executed). It's only cleared in case the process (in user-space) was being debugged, so that you don't accidentally continue debugging the kernel itself; but most processes aren't being debugged like this so most of the time the TF flag is already clear (and clearing it when it's already clear doesn't really do anything).
2b. in general, how does the OS treat few interrupts that occur at the same time when using the method of IDT with specific vector for each interrupt? does this happen because each program has it's own virtual memory and thus the interruption handling processes of all the programs are unrelated? where can i read more about it?
There's complex rules that determine when an interrupt can interrupt (including when it can interrupt another interrupt). These rules mostly only apply to IRQs (not software interrupts that the kernel won't ever use itself, and not exceptions which are taken as soon as they occur). Understanding the rules means understanding the IF flag and the interrupt controller (e.g. how interrupt vectors and the "task priority register" in the local APIC influence the "processor priority register" in the local APIC, which determines which groups of IRQs will be postponed when the IF flag is set). Information about this can be obtained from Intel's manuals, but how Linux uses it can only be obtained from Linux source code and/or Linux specific documentation.
On top of that there's "whatever mechanisms and practices the OS felt like adding on top" (e.g. deferred procedure calls, tasklets, softIRQs, additional stack management) that add more complications (which can also only be obtained from Linux source code and/or Linux specific documentation).
Note: I'm not a Linux kernel developer so can't/won't provide links to places to look for Linux specific documentation.
how does an operating system keep other necessary progresses running while handling the interrupt?
A single CPU can't run 2 different pieces of code (e.g. an interrupt handler and user-space code) at the same time. Instead it runs them one at a time (e.g. runs user-space code, then switches to an IRQ handler for very short amount of time, then returns to the user-space code). Because the IRQ handler only runs for a very short amount of time it creates the illusion that everything is happening at the same time (even though it's not).
Of course when you have multiple CPUs, different CPUs can/do run different pieces of code at the same time.

Can the Linux Linked List API be used safely inside of an interrupt handler?

I am writing a device driver for a custom piece of hardware using the Linux kernel 2.6.33. I need am using DMA to transfer data to and from the device. For the output DMA, I was thinking that I would keep track of several output buffers using the Linked List API (struct list_head, list_add(), etc.).
When the device finished the DMA transfer, it raises an interrupt. The interrupt handler would then retrieve item in the linked list to transfer, and remove it from the list.
My question is, is this actually a safe thing to do inside of an interrupt handler? Or are there inherent race conditions in this API that would make it not safe?
The small section in Linux Device Drivers, 3rd Ed. doesn't make mention of this. The section in Essential Linux Device Drivers is more complete but also does not touch on this subject.
Edit:
I am beginning to think that it may very well not be race condition free as msh suggests, due to a note listed in the list_empty_careful() function:
* NOTE: using list_empty_careful() without synchronization
* can only be safe if the only activity that can happen
* to the list entry is list_del_init(). Eg. it cannot be used
* if another CPU could re-list_add() it.
http://lxr.free-electrons.com/source/include/linux/list.h?v=2.6.33;a=powerpc#L202
Note that I plan to add to the queue in process context and remove from the queue in interrupt context. Do you really not need synchronization around the functions for a list?

It is perfectly safe to use kernel linked lists in interrupt context, but but retrieving anything in interrupt handlers is a bad idea. In the interrupt handler you should acknowledge interrupt, schedule "bottom half" and quit. All processing should be done by the "bottom half" (bottom half is just a piece of deferred work - there are several suitable mechanisms - tasklets, work queue, etc).

How do OSes Handle context switching?

As I can understand, every OS need to have some mechanism to periodically check if it should run some tasks and suspend others.
One way would be some kind of timer on whose expiry the OS will check if it should run/suspend some task.
Generally, say on a ARM system that would probably be some kind of ISR.
My real question, is that I've been ABLE to only visualize this and not see it somewhere. Could some one point to some free/open RTOS code where I can actually see the code that handles the preemption/scheduling?

freertos.org. The entire OS is open source, and right there for you to see. And there are dozens of different ports to compare and contrast. For the context switch code, you will want to look in the ports directory, in any one of many files called port.c, port.asm, etc. And yes, in the case of freertos all context switches are performed in interrupts (a tick timer ISR, or any other SysCall interrupt).
A context switch is very-much processor specific, as the list of registers to save and the assembly code to save them varies between processor families, and sometimes within a given family. As a result each port has a separate file for this code.
The scheduling (selection of next task to run), on the other hand, is done in a file called tasks.c, which is common to all ports and references the port-specific code.

It is not the case than an RTOS simply context switches periodically - that is how most GPOS work. In an RTOS the scheduler runs on any scheduling event. These include system-tick, but also message post, event trigger, semaphore give, or mutex unlock for example.
On ARM Cortex-M the CMSIS 3.x includes an RTOS API (intended primarily for RTOS developers rather than a complete RTOS itself), the source for this will include a context switching mechanism.
If you want a detailed description for a simple RTOS you might consider reading µC/OS-II: The Real-Time Kernel or the slightly more sophisticated µC/OS-III: The Real-Time Kernel .
FreeRTOS is increasingly popular, though perhaps a little unconventional architecturally. A more complete (in that it is not just a scheduling kernel but a more complete OS) and very powerful option is eCos.

You can take a look at xv6.
Its not an RTOS, it is just a skeleton OS(based on V6 unix) meant for academic purpose.
In the XV6 book take a look at chapter 4, there is explanation along with the code as to how scheduling is done on a small OS like xv6.XV6 puts a process to sleep when it is waiting for disk or some I/O operation, there is also timer interupt every 100msec to switch a process.
There is also explanation with code on how the context switching takes place, what information is saved( context frame of a process), how the switch from user to kernel mode happens when the scheduler has to run.
The best part is that the amount of reading you have to do to understand these concepts is very less unlike some reference book on OS :) The code is relatively small, you can infact run the XV6 on qemu set breakpoints in the sched , swtch and other functions and actually see the information saved during a context switch.(how to run xv6 in this link)
You dont have to read previous chapters to understand the chapter4. There isnt much dependency,xv6 uses struct proc to identify a process, ptable for all the current running process in the system, proc->conext -refers to the state the process is in (register value etc) , this is saved by the scheduler.
Cheers :)

iOS Threads Wait for Action

I have a processing thread that I use to fill a data buffer. Elsewhere a piece of hardware triggers a callback which reads from this data buffer. The processing thread then kicks in and refills the buffer.
When the buffer fills up I am currently telling the thread to wait by:
while( [self FreeWriteSpace] < mProcessBufferSize && InActive) {
[NSThread sleepForTimeInterval:.0001];
}
However when I profile I am getting a lot of CPU time spent in sleep. Is there a better way to wait? Do I even care if the profiles says time is spent in sleep?

Time spent in sleep is effectively free. In Instruments, look at "running samples" rather than "all samples." But this still isn't an ideal solution.
First, your sleep interval is crazy. Do you really need .1µs granularity? The system almost certainly isn't giving you because the processor isn't that fast. I have to believe you could up this to .1 or .01. But that's still busy-waiting which is not ideal if you can help it.
The better solution is to use an NSCondition. In this thread, wait on the condition, and in your processing thread, trigger the condition when there's room to write.
Do be careful with your naming. Do not name methods with leading caps (that indicates that it's a class name). And avoid accessing ivars directly (InActive) like this. "InActive" is also a very confusing name. Does it mean the system is active (In Active) or not active (inactive). Naming in Objective-C is extremely important. The compiler will not protect you the way it does in C# and C++. Good naming is how you keep your programs working, and many parts of ObjC rely on it.
You may also want to investigate Grand Central Dispatch, which is particularly designed for these kinds of problems. Look at dispatch_async() to run things when new data comes in.

However when I profile I am getting a
lot of CPU time spent in sleep. Is
there a better way to wait? Do I even
care if the profiles says time is
spent in sleep?
Yes -- never, never poll. Polling eats CPU, makes your app less responsive, eats battery, and is an all around waste.
Notify instead.
The easiest way is to use one of the variants of "perform selector on main thread" (see NSThread's documentation). Or dispatch to a queue (including something like dispatch_async(dispatch_get_main_queue(), ^{ ... yo, data be ready ...});).

Interrupt masking: why?

I was reading up on interrupts. It is possible to suspend non-critical interrupts via a special interrupt mask. This is called interrupt masking. What i dont know is when/why you might want to or need to temporarily suspend interrupts? Possibly Semaphores, or programming in a multi-processor environment?

The OS does that when it prepares to run its own "let's orchestrate the world" code.
For example, at some point the OS thread scheduler has control. It prepares the processor registers and everything else that needs to be done before it lets a thread run so that the environment for that process and thread is set up. Then, before letting that thread run, it sets a timer interrupt to be raised after the time it intends to let the thread have on the CPU elapses.
After that time period (quantum) has elapsed, the interrupt is raised and the OS scheduler takes control again. It has to figure out what needs to be done next. To do that, it needs to save the state of the CPU registers so that it knows how to undo the side effects of the code it executes. If another interrupt is raised for any reason (e.g. some async I/O completes) while state is being saved, this would leave the OS in a situation where its world is not in a valid state (in effect, saving the state needs to be an atomic operation).
To avoid being caught in that situation, the OS kernel therefore disables interrupts while any such operations that need to be atomic are performed. After it has done whatever needs doing and the system is in a known state again, it reenables interrupts.

I used to program on an ARM board that had about 10 interrupts that could occur. Each particular program that I wrote was never interested in more than 4 of them. For instance there were 2 timers on the board, but my programs only used 1. I would mask the 2nd timer's interrupt. If I didn't mask that timer, it might have been enabled and continued making interrupts which would slow down my code.
Another example was that I would use the UART receive REGISTER full interrupt and so would never need the UART receive BUFFER full interrupt to occur.
I hope this gives you some insight as to why you might want to disable interrupts.

In addition to answers already given, there's an element of priority to it. There are some interrupts you need or want to be able to respond to as quickly as possible and others you'd like to know about but only when you're not so busy. The most obvious example might be refilling the write buffer on a DVD writer (where, if you don't do so in time, some hardware will simply write the DVD incorrectly) versus processing a new packet from the network. You'd disable the interrupt for the latter upon receiving the interrupt for the former, and keep it disabled for the duration of filling the buffer.
In practise, quite a lot of CPUs have interrupt priority built directly into the hardware. When an interrupt occurs, the disabled flags are set for lesser interrupts and, often, that interrupt at the same time as reading the interrupt vector and jumping to the relevant address. Dictating that receipt of an interrupt also implicitly masks that interrupt until the end of the interrupt handler has the nice side effect of loosening restrictions on interrupting hardware. E.g. you can simply say that signal high triggers the interrupt and leave the external hardware to decide how long it wants to hold the line high for without worrying about inadvertently triggering multiple interrupts.
In many antiquated systems (including the z80 and 6502) there tends to be only two levels of interrupt — maskable and non-maskable, which I think is where the language of enabling or disabling interrupts comes from. But even as far back as the original 68000 you've got eight levels of interrupt and a current priority level in the CPU that dictates which levels of incoming interrupt will actually be allowed to take effect.

Imagine your CPU is in "int3" handler now and at that time "int2" happens and the newly happened "int2" has a lower priority compared with "int3". How would we handle with this situation?
A way is when handling "int3", we are masking out other lower priority interrupters. That is we see the "int2" is signaling to CPU but the CPU would not be interrupted by it. After we finishing handling the "int3", we make a return from "int3" and unmasking the lower priority interrupters.
The place we returned to can be:
Another process(in a preemptive system)
The process that was interrupted by "int3"(in a non-preemptive system or preemptive system)
An int handler that is interrupted by "int3"， say int1's handler.
In cases 1 and 2, because we unmasked the lower priority interrupters and "int2" is still signaling the CPU: "hi, there is a something for you to handle immediately", then the CPU would be interrupted again, when it is executing instructions from a process, to handle "int2"
In case 3, if the priority of “int2” is higher than "int1", then the CPU would be interrupted again, when it is executing instructions from "int1"'s handler, to handle "int2".
Otherwise, "int1"'s handler is executed without interrupting (because we are also masking out the interrupters with priority lower then "int1" ) and the CPU would return to a process after handling the “int1” and unmask. At that time "int2" would be handled.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse