Polling vs. Interrupts with slow/fast I/O devices - operating-system

I'm learning about the differences between Polling and Interrupts for I/O in my OS class and one of the things my teacher mentioned was that the speed of the I/O device can make a difference in which method would be better. He didn't follow up on it but I've been wracking my brain about it and I can't figure out why. I feel like using Interrupts is almost always better and I just don't see how the speed of the I/O device has anything to do with it.

The only advantage of polling comes when you don't care about every change that occurs.
Assume you have a real-time system that measures the temperature of a vat of molten plastic used for molding. Let's also say that your device can measure to a resolution of 1/1000 of a degree and can take new temperature every 1/10,000 of a second.
However, you only need the temperature every second and you only need to know the temperature within 1/10 of a degree.
In that kind of environment, polling the device might be preferable. Make one polling request every second. If you used interrupts, you could get 10,000 interrupts a second as the temperature moved +/- 1/1000 of a degree.
Polling used to be common with certain I/O devices, such as joysticks and pointing devices.
That said, there is VERY little need for polling and it has pretty much gone away.

Generally you would want to use interrupts, because polling can waste a lot of CPU cycles. However, if the event is frequent, synchronous (and if other factors apply e.g. short polling times...) polling can be a good alternative, especially because interrupts create more overhead than polling cycles.
You might want to take a look at this thread as well for more detail:
Polling or Interrupt based method

Related

How to run a periodic thread in high frequency(> 100kHz) in a Cortex-M3 microcontroller in an RTOS?

I'm implementing a high frequency(>100kHz) Data acquisition system with an STM32F107VC microcontroller. It uses the spi peripheral to communicate with a high frequency ADC chip. I have to use an RTOS. How can I do this?
I have tried FreeRTOS but its maximum tick frequency is 1000Hz so I can't run a thread for example every 1us with FreeRTOS. I also tried Keil RTX5 and its tick frequency can be up to 1MHz but I studied somewhere that it is not recommended to set the tick frequency high because it increases the overall context switching time. So what should I do?
Thanks.
You do not want to run a task at this frequency. As you mentioned, context switches will kill the performance. This is horribly inefficient.
Instead, you want to use buffering, interrupts and DMA. Since it's a high frequency ADC chip, it probably has an internal buffer of its own. Check the datasheet for this. If the chip has a 16 samples buffer, a 100kHz sampling will only need processing at 6.25kHz. Now don't use a task to process the samples at 6.25kHz. Do the receiving in an interrupt (timer or some signal), and the interrupt should only fill a buffer, and wake up a task for processing when the buffer is full (and switch to another buffer until the task has finished). With this you can have a task that runs only every 10ms or so. An interrupt is not a context switch. On a Cortex-M3 it will have a latency of around 12 cycles, which is low enough to be negligible at 6.25kHz.
If your ADC chip doesn't have a buffer (but I doubt that), you may be ok with a 100kHz interrupt, but put as little code as possible inside.
A better solution is to use a DMA if your MCU supports that. For example, you can setup a DMA to receive from the SPI using a timer as a request generator. Depending on your case it may be impossible or tricky to configure, but a working DMA means that you can receive a large buffer of samples without any code running on your MCU.
I have to use an RTOS.
No way. If it's a requirement by your boss or client, run away from the project fast. If that's not possible, communicate your concerns in writing now to save your posterior when the reasons of failure will be discussed. If it's your idea, then reconsider now.
The maximum system clock speed of the STM32F107 is 36 MHz (72 if there is an external HSE quartz), meaning that there are only 360 to 720 system clock cycles between the ticks coming at 100 kHz. The RTX5 warning is right, a significant amount of this time would be required for task switching overhead.
It is possible to have a timer interrupt at 100 kHz, and do some simple processing in the interrupt handler (don't even think about using HAL), but I'd recommend investigating first whether it's really necessary to run code every 10 μs, or is it possible to offload something that it would do to the DMA or timer hardware.
Since you only have a few hundred cycles (instructions) between input, the typical solution is to use an interrupt to be alerted that data is available, and then the interrupt handler put the data somewhere so you can process them at your leisure. Of course if the data comes in continuously at that rate, you maybe in trouble with no time for actual processing. Depending on how much data is coming in and how frequent, a simple round buffer maybe sufficient. If the amount of data is relatively large (how large is large? Consider that it takes more than one CPU cycle to do a memory access, and it takes 2 memory accesses per each datum that comes in), then using DMA as #Elderbug suggested is a great solution as that consumes the minimal amount of CPU cycles.
There is no need to set the RTOS tick to match the data acquisition rate - the two are unrelated. And to do so would be a very poor and ill-advised solution.
The STM32 has DMA capability for most peripherals including SPI. You need to configure the DMA and SPI to transfer a sequence of samples directly to memory. The DMA controller has full and half transfer interrupts, and can cycle a provided buffer so that when it is full, it starts again from the beginning. That can be used to "double buffer" the sample blocks.
So for example if you use a DMA buffer of say 256 samples and sample at 100Ksps, you will get a DMA interrupt every 1.28ms independent of the RTOS tick interrupt and scheduling. On the half-transfer interrupt the first 128 samples are ready for processing, on the full-transfer, the second 128 samples can be processed, and in the 1.28ms interval, the processor is free to do useful work.
In the interrupt handler, rather then processing all the block data in the interrupt handler - which would not in any case be possible if the processing were non-deterministic or blocking, such as writing it to a file system - you might for example send the samples in blocks via a message queue to a task context that performs the less deterministic processing.
Note that none of this relies on the RTOS tick - the scheduler will run after any interrupt if that interrupt calls a scheduling function such as posting to a message queue. Synchronising actions to an RTOS clock running asynchronously to the triggering event (i.e. polling) is not a good way to achieve highly deterministic real-time response and is a particularly poor method for signal acquisition, which requires a jitter free sampling interval to avoid false artefacts in the signal from aperiodic sampling.
Your assumption that you need to solve this problem by an inappropriately high RTOS tick rate is to misunderstand the operation of the RTOS, and will probably only work if your processor is doing no other work beyond sampling data - in which case you might not need an RTOS at all, but it would not be a very efficient use of the processor.

Why do we need to specify the number of flash wait cycles?

Especially when working with "faster" devices like STMF4xx/F7xx we need to specify the number of flash wait cycles, based on the supply voltage and the sys-clock frequency.
When the CPU fetches instructions/or constants this is done over the FLITF. Am I right with the assumption that the FLITF holds a CPU request as long as it can provide the requested data, making it impossible for other Bus-Masters to access flash meanwhile.
If this was true, why should it be important to any interface to know flash wait cycles. Like Cache does preload instructions so or so, independent if it knows how long to wait, no?
Because the flash interface isn't magic.
It has to meet the necessary setup and hold times for addressing and reading out the flash cells, which will vary somewhat depending on voltage. Taking the STM32F411 as an example (because I have that TRM handy), doing some maths with the voltage/frequency/wait-state table implies that a read from flash on one of those takes in the order of ~30ns above 2.7V, down to ~60ns below 2.1V.
Since the flash interface doesn't have its own asynchronous nanosecond-precision timekeeping ability (because that would be needlessly complicated, power-hungry, and silly), that translates to asserting its signals for n clock cycles, after which it can assume the data signals from the cells are stable enough to read back*. How does it know what the clock frequency is, and therefore what n should be? Simple: you, as the programmer who set the clock, tell it. Some hardware things are just infinitely easier to let software deal with.
* and then going through the further shenanigans of extracting the relevant 8, 16 or 32 bits out of the 128-bit line it's read, to finally spit that out the other side onto the AHB bus to the waiting CPU, obviously.

Measure the electricity consumed by a browser to render a webpage

Is there a way to calculate the electricity consumed to load and render a webpage (frontend)? I was thinking of a 'test' made with phantomjs for example:
load a web page
scroll to the bottom
And measure how much electricity was needed. I can perhaps extrapolate from CPU cycle. But phantomjs is headless, rendering in real browser is certainly different. Perhaps it's impossible to do real measurements.. but with an index it may be possible to compare websites.
Do you have other suggestions?
It's pretty much impossible to measure this internally in modern processors (anything more recent than 286). By internally, I mean by counting cycles. This is because different parts of the processor consume different levels of energy per cycle depending upon the instruction.
That said, you can make your measurements. Stick a power meter between the wall and the processor. Here's a procedure:
Measure the baseline energy usage, i.e. nothing running except the OS and the browser, and the browser completely static (i.e. not doing anything). You need to make sure that everything is stead state (SS) meaning start your measurements only after several minutes of idle.
Measure the usage doing the operation you want. Again, you want to avoid any start up and stopping work, so make sure you start measuring at least 15 seconds after you start the operation. Stopping isn't an issue since the browser will execute any termination code after you finish your measurement.
Sounds simple, right? Unfortunately, because of the nature of your measurements, there are some gotchas.
Do you recall your physics classes (or EE classes) that talked about signal to noise ratios? Well, a scroll down uses very little energy, so the signal (scrolling) is well in the noise (normal background processes). This means you have to take a LOT of samples to get anything useful.
Your browser startup energy usage, or anything else that uses a decent amount of processing, is much easier to measure (better signal to noise ratio).
Also, make sure you understand the underlying electronics. For example, power is VA (voltage*amperage) where both V and A are in phase. I don't think this will be an issue since I'm pretty sure they are in phase for computers. Also, any decent power meter understands the difference.
I'm guessing you intend to do this for mobile devices. Your measurements will only be roughly the same from processor to processor. This is due to architectural differences from generation to generation, and from manufacturer to manufacturer.
Good luck.

Difference between Latency and Jitter in Operating-Systems

discussing criterias for Operating-Systems every time I hear Interupt-Latency and OS-Jitter. And now I ask myself, what is the Difference between these two.
In my opinion the Interrupt-Latency is the Delay from occurence of an Interupt until the Interupt-Service-Routine (ISR) is entered.
On the contrary Jitter is the time the moment of entering the ISR differs over time.
Is this the same you think?
Your understanding is basically correct.
Latency = Delay between an event happening in the real world and code responding to the event.
Jitter = Differences in Latencies between two or more events.
In the realm of clustered computing, especially when dealing with massive scale out solutions, there are cases where work distributed across many systems (and many many processor cores) needs to complete in fairly predictable time-frames. An operating system, and the software stack being leveraged, can introduce some variability in the run-times of these "chunks" of work. This variability is often referred to as "OS Jitter". link
Interrupt latency, as you said is the time between interrupt signal and entry into the interrupt handler.
Both the concepts are orthogonal to each other. However, practically, more interrupts generally implies more OS Jitter.

Communication between processor and high speed perihperal

Considering that a processor runs at 100 MHz and the data is coming to the processor from an external device/peripheral at the rate of 1000 Mbit/s (8 Bits/Clockcycle # 125 MHz), which is the best way to handle traffic that comes at a higher speed to the processor ?
First off, you can't do it in software. There would be no way to sample the digital lines at a sufficient rate, or to doing anything useful with it.
You need to use a hardware FIFO buffer or memory cell. When a data burst comes in, it can be buffered in the high speed FIFO and then read out as needed by the processor.
Drop in high speed FIFO chips are surprisingly expensive (though most are dual ported). To cut cost, you would be best off using an SRAM chip, and a hardware adder to increment the address lines on incoming data.
This is not an uncommon situation for software. semaj said the right word. This is a system engineering issue. Other folks have the right answer too. If you want to look at or process that data with the 100MHz processor, it is not going to happen, dont bother trying. You CAN look at snapshots of it or have the hardware filter out a specific percentage of it that you are looking for. At the end of the day though it is a systems issue, what does the hardware provide, where does it put this data, what is the softwares task for this data, does it see X buffers of data come in on the goesinta, and the notify the goesouta hardware that there are X buffers ready to go? Does the hardware examine and align the buffers so that you can look at a header, and then decide where to route the hardware? Once you do your system engineering you will know if you can use that processor or not, and if you can use it what its job is and how to do it.
Your direct question. What is the best way to handle it. The best way to handle it is to have hardware (fpga, asic, etc) move it into and out of some storage device (ram of some sort probably). Not necessarily the same ram the processor runs out of (DMA is a good thing to avoid). The hardware is something the software can talk to but you cannot examine all of that data so dont try. Without knowing what kind of data this is, what form, what the software looks at how much work you are willing to force the hardware to do, etc determines the rest of the answer. If you expect a certain (guaranteed) percentage to be bad or not belong to this processor, etc have the hardware filter that out and then what is left you can process.
Networking is a good example of this, PCs have gige ports but cannot process GigE line rate data. That is why we use switches now instead of hubs, hardware slices out a percentage of the data so the pc can handle it, the protocols take care of the data that cannot be processed by resending it later. And the switches processors dont look at all of the data, the hardware slices it up so the software can examine just the header. Or sometimes the software simply manages tables that drive the hardware and the hardware does all the work of processing the data.
Do your system engineering the answers will simply fall out.
You buffer it. Typically data from a device is written to a memory buffer (circular queue) using DMA (no cpu involved). The cpu reads from the memory buffer at a constant rate. Usually devices send data in bursts. This keeps the buffer from filling up. If there is too much data, buffer overflow.
DMA (direct memory access) is possibly the solution, however, it seems unlikely that the memory bus could run faster than the processor core, so the receiving peripheral would have to accept data into a larger register than 8 bit because 125MHz could not be sustained. For example a 16bit register would allow memory writes at 62.5MHz which may be achievable. Also the receiving device would have to be able to accept an external clock that is both faster and asynchronous to the core clock. Also of course the receiving peripheral must have support for DMA.
Unless you are more specific about your hardware and the communication protocol it is difficult to give anything other than a general answer.