How do interrupts work in multi-core system? - raspberry-pi

I want to write code for interrupts of the buttons on Raspberry pi 2. This board uses QUAD Core Broadcom BCM2836 CPU (ARM architecture). That mean, only one CPU is on this board (Raspberry pi 2). But I don't know how do interrupts in multi-core system. I wonder whether interrupt line is connected to each core or one CPU. So, I found the paragraph below via Google:
Interrupts on multi-core systems
On a multi-core system, each interrupt is directed to one (and only one) CPU, although it doesn't matter which. How this happens is under control of the programmable interrupt controller chip(s) on the board. When you initialize the PICs in your system's startup, you can program them to deliver the interrupts to whichever CPU you want to; on some PICs you can even get the interrupt to rotate between the CPUs each time it goes off.
Does this mean that interrupts happen with each CPU? I can't understand exactly above info. If interrupts happen to each core, I must take account of critical section for shared data on each interrupt service routine of the buttons.
If interrupts happen to each CPU, I don't have to take account of critical section for shared data. What is correct?
To sum up, I wonder How do interrupts in multi-core system? Is the interrupt line is connected to each core or CPU? So, should I have to take account of critical section for same interrupt?

your quote from google looks quite generic or perhaps even leaning on the size of x86, but doesnt really matter if that were the case.
I sure hope that you would be able to control interrupts per cpu such that you can have one type go to one and another to another.
Likewise that there is a choice to have all of them interrupted in case you want that.
Interrupts are irrelevant to shared resources, you have to handle shared resources whether you are in an ISR or not, so the interrupt doesnt matter you have to deal with it. Having the ability to isolate interrupts from one peripheral to one cpu could make the sharing easier in that you could have one cpu own a resource and other cpus make requests to the cpu that owns it for example.
Dual, Quad, etc cores doesnt matter, treat each core as a single cpu, which it is, and solve the interrupt problems as you would for a single cpu. Again shared resources are shared resources, during interrupts or not during interrupts. Solve the problem for one cpu then deal with any sharing.
Being an ARM each chip vendors implementation can vary from another, so there cannot be one universal answer, you have to read the arm docs for the arm core (and if possible the specific version as they can/do vary) as well as the chip vendors docs for whatever they have around the arm core. Being a Broadcom in this case, good luck with chip vendor docs. They are at best limited, esp with the raspi2. You might have to dig through the linux sources. No matter what, arm, x86, mips, etc, you have to just read the documentation and do some experiments. Start off by treating each core as a standalone cpu, then deal with sharing of resources if required.
If I remember right the default case is to have just the first core running the kernel7.img off the sd card, the other three are spinning in a loop waiting for an address (each has its own) to be written to get them to jump to that and start doing something else. So you quite literally can just start off with a single cpu, no sharing, and figure that out, if you choose to not have code on the other cpus that touch that resource, done. if you do THEN figure out how to share a resource.

Related

Practical ways of implementing preemptive scheduling without hardware support?

I understand that using Hardware support for implementing preemptive scheduling is great for efficiency.
I want to know, What are practical ways we can do preemptive scheduling without taking support from hardware? I think one of way is Software Timers.
Also, Other way in multiprocessor system is using the one processor acting as master keep looking at slave processor's processor.
Consider, I'm fine with non-efficient way.
Please, elaborate all ways you think / know can work. Also, preferably but not necessarily works for single processor system.
In order to preempt a process, the operating system has to somehow get control of the CPU without the process's cooperation. Or viewed from the other perspective: The CPU has to somehow decide to stop running the process's code and start running the operating system's code.
Just like processes can't run at the same time as other processes, they can't run at the same time as the OS. The CPU executes instructions in order, that's all it knows. It doesn't run two things at once.
So, here are some reasons for the CPU to switch to executing operating system code instead of process code:
A hardware device sends an interrupt to this CPU - such as a timer, a keypress, a network packet, or a hard drive finishing its operation.
The software running on a different CPU sends an inter-processor interrupt to this CPU.
The running process decides to call a function in the operating system. Depending on the CPU architecture, it could work like a normal call, or it could work like a fake interrupt.
The running process executes an instruction which causes an exception, like accessing unmapped memory, or dividing by zero.
Some kind of hardware debugging interface is used to overwrite the instruction pointer, causing the CPU to suddenly execute different code.
The CPU is actually a simulation and the OS is interpreting the process code, in which case the OS can decide to stop interpreting whenever it wants.
If none of the above things happen, OS code doesn't run. Most OSes will re-evaluate which process should be running, when a hardware event occurs that causes a process to be woken up, and will also use a timer interrupt as a last resort to prevent one program hogging all the CPU time.
Generally, when OS code runs, it has no obligation to return to the same place it was called from. "Preemption" is simply when the OS decides to jump somewhere other than the place it was called from.

Hardware supported mutual exclusion

I'm currently taking a class in Operating Systems, and everything has been smooth until I encountered Concurrency and Mutual Exclusion.
Up until this chapter in the text I am currently reading, I was under the impression that the OS handled calls to certain I/O operations such as printers through queues and interrupts, and the OS also handled the scheduling of processes.
But in this section "Mutual exclusion: Hardware support", it states for a process to guarantee mutual exclusion it is sufficient to block all interrupts, and this can be done through interrupt disabling, however the cost is high since the processor is limited in its ability to interleave(Stallings, p. 211).
If this is a capability, whats stopping a programmer from placing his entire program within a critical section by disabling interrupts? And why can't the OS handle calls to critical resources, in the way that was previously stated(I/O queues & interrupts), but we must rely on programmers to identify their critical sections?
I understand the need for to identify critical sections with shared variables and memory space, but I am baffled as to why a program needs to identify its critical section with regard to I/O devices such as a printers and why the OS can't.
This is not [entirely] correct:
But in this section "Mutual exclusion: Hardware support", it states for a process to guarantee mutual exclusion it is sufficient to block all interrupts, and this can be done through interrupt disabling, however the cost is high since the processor is limited in its ability to interleave.
Processors generally support multiple means of synchronization. The simplest is uninterruptible instructions. These will be generally be short instructions such as set a bit or branch if the bit was set already. Such instructions allow synchronization within a single processor.
As you mention, disabling interrupts is another method. Generally, interrupts have priorities. Usually you can disable all interrupts that has a priority lower than specified. That allows disabling all or some interrupts.
Disabling interrupts only works when locking resources that are not shared by multiple processors.
That is why the quote you have in the context you have it is not [entirely] correct. Disabling interrupts on a processor does not synchronize when there are multiple processors. However, in theory, an operating system could disable all interrupts on all processors but such system would be seriously brain damaged because that would hamper the performance of a multi-processor system. But that might work in, say, a quick-and-dirty student project operating system.
If this is a capability, whats stopping a programmer from placing his entire program within a critical section by disabling interrupts?
Disabling interrupts is only possible in kernel mode.
Another method of hardware synchronization is interlocked instructions. These are instructions that lock the memory of the operands and prevent other processors from accessing that memory while the instruction is executing. Sometimes are are simple add integer interlocked and bit set (or clear) and branch interlocked instructions.

Do CPU and main memory need drivers to work?

Peripheral devices require drivers to work in a computer system (operating system).
Does a CPU need a driver to work?
Same question for a main memory?
The answer is no.
The reason is that the motherboard comes with an (upgradable) BIOS, which takes care of making sure the CPU features function correctly (obviously, an AMD processor won't work on an Intel motherboard). You can upgrade the BIOS, but that should be avoided until, ... reasons of course.
Same goes for memory, it does not require a driver either.
Just so that you know, if you ever tried overclocking you can notice that you can alter the way the RAM functions, ganged/unganged mods and so on. My point is that there is already an interface established using code allowing you to make changes in real time, isn't that the very purpose we even have drivers, to be able to use a peripheral with expected outcome.
On the other hand, peripheral devices are just extensions, which the motherboard does not know how to handle, hence needing a set of instructions i.e. drivers.
In a modern system both memory and the CPU require kernel mode code — as do devices — to function.
Memory requires management of virtual memory tables. The CPU requires maintenance of process control structures.
In the business, such code is not called a "driver".
Generally, one thinks of a device driver as being kernel mode code that responds to devices through the interrupt vector.
That said, on some systems there are "printer drivers" that do not fit that definition of driver.
In short, do memory and CPU have something called a "driver"? No.
Do they have something analogous to a driver? Yes.

Basics of Real Time OS

I am trying to learn an RTOS from scratch and for this, I use freeRTOS.org as a reference. I find out this site as a best resource to learn an RTOS. However, I have some doubts and I was trying to find out but not able to get exact answers.
1) How to find out that device have Real-time capability e.g. some controller has (TI Hercules) and other don't have(MSP430)?
2) Does that depend upon the architecture of the CORE (ARM Cortex-R CPU in TI Hercules TMS570)?
I know that these questions make nuisance, but I don't know how to get the answer of these questions.
Thanks in advance
EDIT:
One more query I have that what is meant by "OS" in RTOS? Does that mean the same OS like others or it's just contains the source code file for the API's?
Figuring out whether a device has a "Real-Time" capability is somewhat arbitrary and depends on your project's timing requirements. If you have timing requirements that are very high, you'll want to use a faster microcontroller/processor.
Using an RTOS (e.g. FreeRTOS, eCOS, or uCOS-X) can help ensure that a given task will execute at a predictable time. The FreeRTOS website provides a good discussion of what operating systems are and what it means for an operating system to claim Real-Time capabilities. http://www.freertos.org/about-RTOS.html
You can also see from the ports pages of uC/OS-X and FreeRTOS that they can run on a variety target microcontrollers / microprocessors.
Real-time capability is a matter of degree. A 32-bit DSP running at 1 GHz has more real-time capability than an 8-bit microcontroller running at 16 MHz. The more powerful microcontroller could be paired with faster memories and ports and could manage applications requiring large amounts of data and computations (such as real-time video image processing). The less powerful microcontroller would be limited to less demanding applications requiring a relatively small amount of data and computations (perhaps real-time motor control).
The MSP430 has real-time capabilities and it's used in a variety of real-time applications. There are many RTOS that have been ported to the MSP430, including FreeRTOS.
When selecting a microcontroller for a real-time application you need to consider the data bandwidth and computational requirements of the application. How much data needs to be processed in what amount of time? Also consider the range and precision of the data (integer or floating point). Then figure out which microcontroller can support those requirements.
While Cortex-R is optimised for hard real-time; that does not imply that other processors are not suited to real-time applications, or even better suited to a specific application. What you need to consider is whether a particular combination of RTOS and processor will meet the real-time constraints of your application; and even then the most critical factor is your software design rather then the platform.
The main goal you want to obtain from an RTOS is determinism, most other features are already available in most other non-RTOS operating systems.
The -OS part in RTOS means Operating System, simply put, and as all other operating systems, RTOSes provide the required infrastructure for managing processor resources so you work on a higher level when designing your application. For accessing those functionalities the OS provides an API. Using that API you can use semaphores, message queues, mutexes, etc.
An RTOS has one requirement to be an RTOS, it must be pre-emptive. This means that it must support task priorities so when a higher-priority task gets ready to run, one of possible task states, the scheduler must switch the current context to that task.
This operation has two implications, one is the requirement of one precise and dedicated timer, tick timer, and the other is that, during context switching, there is a considerable memory operations overhead. The current CPU status, or CPU's in case of multi-core SoCs, must be copied into the pre-empted task's context information and the new ready to run task's context must be restored in the CPU.
ARM processors already provide support for the System Timer, which is intended for a dedicated use as an OS tick timer. Not so long ago, the tick timer was required to be implemented with a regular, non-dedicated timer.
One optimization in cores designed for RTOSes with real-time capabilities is the ability to save/restore the CPU context state with minimum code, so it results in much less execution time than that in regular processors.
It is possible to implement an RTOS in nearly any processor, and there are some implementations targeted to resource constrained cores. You mainly need a timer with interrupt capacity and RAM. If the CPU is very fast you can run the OS tick at high rates, sub-millisecond in some real-time applications with DSPs, or at a lower rate like just 10~100 ticks per second for applications with low timing requirements in low end CPUs.
Some theoretical knowledge would be quite useful too, e.g. figuring out whether a given task set is schedulable under given scheduling approach (sometimes it may not), differences between static-priority and dynamic-priority scheduling, priority inversion problem, etc.

How/does DMA handle multiple concurrent transfers?

I am working on implementing a VM and trying to model all the different hardware components as accurately as possible, just for pure learning purposes.
My question is, how does a DMA device handle multiple concurrent transfer requests? From what I understand a DMA device has several registers to set the location in memory, the type of operation (read or write) and the number of bytes, so what happens when the CPU requests an operation from DMA, puts the thread to sleep and then the next thread that runs also requests a DMA operation while the previous one is still in progress? Is this even supported?
Unless you're talking about ancient, ISA-era hardware, DMA nowadays is handled by the device itself taking ownership of the bus and requesting the data directly from the RAM. See the Wikipedia article on Bus Mastering for more information.
Therefore, it is really up to any individual device how to handle DMA, so not much can be said for the general case. However, most simple devices just support a single DMA operation at a time; if the host wants to submit two DMA operations, it would simply wait for the first DMA to complete (being notified by an interrupt) and then instruct the device to do the second one, the OS putting the requesting thread to sleep while the first DMA is in progress. There are certainly variations, however, such as using a command-buffer that can specify multiple (DMA-involving or not) operations for the device to do in sequence without interrupting the CPU between each.
I doubt there are very many devices at all that try to carry out multiple transfers simultaneously, however, seeing as how interleaving DRAM accesses would just hurt performance anyway. But I wouldn't exclude their existence, especially if the operations involve very large transfers.
In the end, you'll just have to read up on the particular device you're trying to emulate.