How to calculate interrupt latency in Ubuntu programmatically?

How to calculate interrupt latency in Ubuntu programmatically? - operating-system

Interrupt latency is defined as the time between when an interrupt occurs and when the first instruction of Interrupt service routine (ISR) starts to process the interrupt. To calculate the interrupt latency I was following the method mentioned in this paper: https://ieeexplore.ieee.org/document/8441042
Basically I was thinking to create a thread (at time-point t1) using pthread library in C, then wait for 100 microseconds and generate an interrupt (at time-point t2). Then the interrupt latency time would be t2-t1-100 microseconds. But I am unable to get how to actually generate such an interrupt and how to measure time-point t2 just at the beginning of such an interrupt?
OR if there is some other better way to calculate interrupt latency programmatically, please let me know! Thanks.

Related

How to run a periodic thread in high frequency(> 100kHz) in a Cortex-M3 microcontroller in an RTOS?

I'm implementing a high frequency(>100kHz) Data acquisition system with an STM32F107VC microcontroller. It uses the spi peripheral to communicate with a high frequency ADC chip. I have to use an RTOS. How can I do this?
I have tried FreeRTOS but its maximum tick frequency is 1000Hz so I can't run a thread for example every 1us with FreeRTOS. I also tried Keil RTX5 and its tick frequency can be up to 1MHz but I studied somewhere that it is not recommended to set the tick frequency high because it increases the overall context switching time. So what should I do?
Thanks.

You do not want to run a task at this frequency. As you mentioned, context switches will kill the performance. This is horribly inefficient.
Instead, you want to use buffering, interrupts and DMA. Since it's a high frequency ADC chip, it probably has an internal buffer of its own. Check the datasheet for this. If the chip has a 16 samples buffer, a 100kHz sampling will only need processing at 6.25kHz. Now don't use a task to process the samples at 6.25kHz. Do the receiving in an interrupt (timer or some signal), and the interrupt should only fill a buffer, and wake up a task for processing when the buffer is full (and switch to another buffer until the task has finished). With this you can have a task that runs only every 10ms or so. An interrupt is not a context switch. On a Cortex-M3 it will have a latency of around 12 cycles, which is low enough to be negligible at 6.25kHz.
If your ADC chip doesn't have a buffer (but I doubt that), you may be ok with a 100kHz interrupt, but put as little code as possible inside.
A better solution is to use a DMA if your MCU supports that. For example, you can setup a DMA to receive from the SPI using a timer as a request generator. Depending on your case it may be impossible or tricky to configure, but a working DMA means that you can receive a large buffer of samples without any code running on your MCU.

I have to use an RTOS.
No way. If it's a requirement by your boss or client, run away from the project fast. If that's not possible, communicate your concerns in writing now to save your posterior when the reasons of failure will be discussed. If it's your idea, then reconsider now.
The maximum system clock speed of the STM32F107 is 36 MHz (72 if there is an external HSE quartz), meaning that there are only 360 to 720 system clock cycles between the ticks coming at 100 kHz. The RTX5 warning is right, a significant amount of this time would be required for task switching overhead.
It is possible to have a timer interrupt at 100 kHz, and do some simple processing in the interrupt handler (don't even think about using HAL), but I'd recommend investigating first whether it's really necessary to run code every 10 μs, or is it possible to offload something that it would do to the DMA or timer hardware.

Since you only have a few hundred cycles (instructions) between input, the typical solution is to use an interrupt to be alerted that data is available, and then the interrupt handler put the data somewhere so you can process them at your leisure. Of course if the data comes in continuously at that rate, you maybe in trouble with no time for actual processing. Depending on how much data is coming in and how frequent, a simple round buffer maybe sufficient. If the amount of data is relatively large (how large is large? Consider that it takes more than one CPU cycle to do a memory access, and it takes 2 memory accesses per each datum that comes in), then using DMA as #Elderbug suggested is a great solution as that consumes the minimal amount of CPU cycles.

There is no need to set the RTOS tick to match the data acquisition rate - the two are unrelated. And to do so would be a very poor and ill-advised solution.
The STM32 has DMA capability for most peripherals including SPI. You need to configure the DMA and SPI to transfer a sequence of samples directly to memory. The DMA controller has full and half transfer interrupts, and can cycle a provided buffer so that when it is full, it starts again from the beginning. That can be used to "double buffer" the sample blocks.
So for example if you use a DMA buffer of say 256 samples and sample at 100Ksps, you will get a DMA interrupt every 1.28ms independent of the RTOS tick interrupt and scheduling. On the half-transfer interrupt the first 128 samples are ready for processing, on the full-transfer, the second 128 samples can be processed, and in the 1.28ms interval, the processor is free to do useful work.
In the interrupt handler, rather then processing all the block data in the interrupt handler - which would not in any case be possible if the processing were non-deterministic or blocking, such as writing it to a file system - you might for example send the samples in blocks via a message queue to a task context that performs the less deterministic processing.
Note that none of this relies on the RTOS tick - the scheduler will run after any interrupt if that interrupt calls a scheduling function such as posting to a message queue. Synchronising actions to an RTOS clock running asynchronously to the triggering event (i.e. polling) is not a good way to achieve highly deterministic real-time response and is a particularly poor method for signal acquisition, which requires a jitter free sampling interval to avoid false artefacts in the signal from aperiodic sampling.
Your assumption that you need to solve this problem by an inappropriately high RTOS tick rate is to misunderstand the operation of the RTOS, and will probably only work if your processor is doing no other work beyond sampling data - in which case you might not need an RTOS at all, but it would not be a very efficient use of the processor.

How is CR8 register used to prioritize interrupts in an x86-64 CPU?

I'm reading the Intel documentation on control registers, but I'm struggling to understand how CR8 register is used. To quote the docs (2-18 Vol. 3A):
Task Priority Level (bit 3:0 of CR8) — This sets the threshold value
corresponding to the highest- priority interrupt to be blocked. A
value of 0 means all interrupts are enabled. This field is available
in 64- bit mode. A value of 15 means all interrupts will be disabled.
I have 3 quick questions, if you don't mind:
So bits 3 thru 0 of CR8 make up those 16 levels of priority values. But priority of what? A running "thread", I assume, correct?
But what is that priority value in CR8 compared to when an interrupt is received to see if it has to be blocked or not?
When an interrupt is blocked, what does it mean? Is it "delayed" until later time, or is it just discarded, i.e. lost?

CR8 indicates the current priority of the CPU. When an interrupt is pending, bits 7:4 of the interrupt vector number is compared to CR8. If the vector is greater, it is serviced, otherwise it is held pending until CR8 is set to a lower value.
Assuming the APIC is in use, it has an IRR (Interrupt Request Register) with one bit per interrupt vector number. When that bit is set, the interrupt is pending. It can stay that way forever.
When an interrupt arrives, it is ORed into the IRR. If the interrupt is already pending (that is, the IRR bit for that vector is already set), the new interrupt is merged with the prior one. (You could say it is dropped, but I don't think of it that way; instead, I say the two are combined into one.) Because of this merging, interrupt service routines must be designed to process all the work that is ready, rather than expecting a distinct interrupt for each unit of work.

Another related point is that Windows (and I assume Linux) tries to keep the IRQ level of a CPU as low as possible at all times. Interrupt service routines do as little work as possible at their elevated hardware interrupt level and then cue a deferred procedure call to do the rest of their work at DPC IRQ level. The DPC will normally be serviced immediately unless another IRQ has arrived because they are at a higher priority than normal processes.
Once a CPU starts executing a DPC it will then execute all the DPC's in its per CPU DPC cue before returning the CPU IRQL to zero to allow normal threads to resume.
The advantage of doing it this way is that an incoming hardware IRQ of any priority can interrupt a DPC and get its own DPC on the cue almost immediately, so it never gets missed.

I should also try and explain ( as I think it is 😁) the difference between the IRQ level of a CPU and the priority of an IRQ .
Before Control Register 8 became available with x64 the CPU had no notion of an IRQ level.
The designers of windows NT decided that every logical processor in a system should have a NOTIONAL IRQ Level that would be stored in a data structure called a processor control block for each CPU. They decided there should be 32 levels for no reason I know of 😁.
Software and hardware interrupts are also assigned a level so if they are above the level that the CPU has assigned then they are allowed to continue.
Windows does NOT make use of the interrupt priority assigned by the PIC/APIC hardware, instead it uses the interrupt mask bits in them. The various pins are assigned a vector number and then they get a level.
When Windows raises the LRQL of a CPU in its PCB it also reprograms the interrupt mask of the PIC/APIC. But not straight away.
Every interrupt that occurs causes the Windows trap dispatcher to execute and compare the IRQ level with the CPU IRQL and if the IRQ level is higher the interrupt goes ahead, if not THEN Windows reprograms the mask and returns to the executing thread instead.
The reason for that is that reprogramming the PIC takes time and if no lower level IRQ comes in then windows can save its self a job.
On x64 there IS CR8 and I am still looking at how that works.

Implementational difference between h/w and s/w interrupt

I know what is the difference between hw and sw interrupt.
The question is the implementational difference.
I know we read a interrupt vector and execute the interrupt service routine.
But what is the difference between them?
This question is for graduate school entrance interview.

The difference is that the timing of a HW interrupt is dictated by the HW and the timing of a SW interrupt is determined programmatically by the operating system.
SW interrupt delivery will take place within some kind of HW interrupt handler (e.g., the timer handler).

STM32F103 Input Capture Too Slow

I have a high speed clock at 10 MHz going to the processor's TIM4 input capture pin (ch.3). I would like to verify that the clock is running at 10 MHz with the processor's input capture. I coded the processor with the input capture module, and it works fine for lower frequencies (around 1 kHz or so). Once I start to climb the frequency up to the MHz range, the processor starts to miss interrupts and thus gives me an incorrect frequency. I didn't see anywhere in the datasheet that states the maximum frequency that the input capture can read. I have an external clock of 8 MHz, and a core clock of 72 MHz, so I would imagine that I can read a 10 MHz signal. Any ideas?

Take a look at the TIM_ICInitStructure.TIM_ICPrescaler options. Usually you'll have it set to TIM_ICPSC_DIV1 so that interrupts are generated on every valid transition.
Prescaler values of 1,2,4 and 8 are available that will allow you to effictively reduce the rate of interrupt generation by that factor. For example, for a 10MHz signal with a prescaler of 8 you'd expect to count a frequency of 10Mhz/8 = 1.25MHz.
This is still quite tight for a 72MHz HCLK so you'll still need to optimise your IRQ handler carefully.

Looks like you're generating an interrupt request for every rising (or falling) edge of the clock.
If that is indeed the case, then think about this for a second: with a 10 MHz input signal, you're generating an interrupt about every 7 CPU cycles. In these 7 CPU cycles, you need to budget time to save registers to RAM, run the IRQ handler function prolog, run the actual code you wrote for the interrupt handler, run the IRQ handler function epilogue, and restore the registers.
Best case, if you set compiler flags to optimize for speed and you're not doing much processing in the interrupt handler, you're looking at tens of cycles to run all these tasks. Since you only have 7 cycles to run tens of cycles' worth of processing, it's no surprise that you're missing interrupts.

You can't use an interrupt routine at that frequency. You need to feed the 10MHz in as an external trigger to the timer. Then you can use the prescaler and the timer to divide down to a suitable lower interrupt frequency.

Calculating Interrupt Data Rate

I'm currently learning about interrupts but don't understand how you
calculate the data rate for the question below. I have the answers but
I have no idea how you get there. If someone could please explain to
me how it is calculated it would be really appreciated.
Here is the question...
This question concerns the use of interrupts to handle the input and
storage in memory of data arriving at an input interface, and the
consideration of data rates that be achieved using this mechanism. In
this particular question, the arrival of each new data item triggers
an interrupt request to input and store the data item in a queue in
memory.The question is about calculating the maximum data rate
achievable in this scenario.
You are first required to calculate the time to respond to an
interrupt from the interface, run the interrupt service routine (ISR)
and return to the interrupted program.From this and the number of data
bits input on each interrupt, you are required to calculate the
maximum data rate in bits per second, that can be handled. Below you
are given: the number of clock cycles the CPU requires to respond to
the interrupt and switch to the ISR, the number of instructions
executed by the ISR, the average number of clock cycles executed per
instruction in the ISR, the number of bits in the data item input on
each interrupt, and the clock frequency. [You can assume that when the
CPU can be immediately interrupted again as soon as the ISR completes,
but not before this]
clock cycles to respond to interrupt = 15
instructions executed in ISR= 70
average clock cycles per instruction = 5
number of bits per data item = 32
clock frequency = 10MHz
Questions
a) What is the time in microseconds to respond to an interrupt from
the interface, run the interrupt service routine (ISR) and return to
the interrupted program?
b)What is the maximum data rate in Kbits/second?
Answers
a) 36.5 - I understand this
b) 876.7 - ????

Because each ISR takes 36.5 us, the absolute maximum number of ISRs that can happen in a second is 27,397.2603.
In each ISR, 32 bits of data are processed.
Therefore, 27397.2603 * 32 bits = 876.712.33 bits processed per second

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse