A FreeRTOS task suddenly does nothing - scheduled-tasks

I'm developing a real time system with FreeRTOS on an
STM3240G
board.
The system contains some different tasks ( GUI, KB, ModBus, Ctrl, etc . . )
The tasks have different priorities.
The GUI seems to display a little slowly.
So I use a Profiler software to see what is going on between the different tasks
during a run. This profiler shows me which task was running at each moment ( microsecond) and what interrupts had arrived.
This profiler enables me to "mark" different locations on the code so I know
when it was there. So I run the program and make a record.
I looked at the record and I saw that (for example) Ctrl task was between two
lines of code for 15 milliseconds (this time change in size) there was not any
task change no interrupt arrived and after this time the system continues normally from this point according to the record and my marks.
I tried closing disabling different interrupts without any success.
Has anyone any idea what it could be?

On the eval board, there is a MIPI connector that supports ETM trace - a considerable luxury/advantage over other development boards!
If you also have one of the more expensive debug adapters that also support ETM trace (like for example, uTrace or J-Trace or
ULINKpro or I-jet Trace), you should use it to trace the entire control flow without having to instrument tasks and ISRs.
Otherwise, you should re-check if really every IRQ handler has been instrumented (credits to #RealtimeRik, who pointed this out) at a low-enough level so that the profiler can really spot it.
Especially if you are using third-party libraries, it may be that some IRQs are serviced by handlers you (or the profiler) doesn't have the code of.
As I had to make this experience once myself, I suggest you review the NVIC settings carefully to re-check if there is an ISR you haven't been aware of.
Another question is how the profiler internally works.
If it is based on ETM/TPIU or ITM/SWO tracing, see above.
If it creates and counts a huge number of snapshots, there might be some systematic cases which prevent snapshots to be made in a particular part of the software:
Could it be that a non-maskable interrupt or exception handler is running in a way that cannot be interrupted by the mechanism that collects snapshots?
Could it be that the timing of the control task correlates (by some coincidence) to a timing signal used for snapshots?
What happens if you insert some time-consuming extra code in front of the unexpected "profiling gap" (e.g., some hundreds or thousands of NOPs)?

Related

Need for multi-threading in Systemverilog using fork-join

In most text books advocating layered testbench designs, it is recommended that different layers/block run in parallel. I'm currently unable to figure out the reason why is it so. Why cannot we follow the following sequence.
repeat for 1000 tests
generate a transaction
drive the transaction on the DUT
monitor the transaction on the DUT
compare output with a reference
Instead, what is recommended is that all four blocks generator, driver, monitor and scoreboard/checker should run in parallel. My confusion is that why do we avoid the above mentioned sequential behavior in which we go through tests one test case at a time and prefer different blocks running in parallel.
Some texts say that it is because that is how things are done in hardware, i.e. everything runs in parallel. However, the layered testbench is not needed to model any synthesizable hardware. So, why do we have to restrict our verification enivornment/testbench to follow these hardware-like behavior.
A sample block diagram that I'm referring to is given below:
Suppose that you have a fifo which you want to test. Your driver pushes data into it, and the monitor checks the other end. The data gets pushed when it is available and till the fifo is full, the consumer on the other end reads data when it can. So, the pipe gets sometimes full, sometimes empty.
When the fifo is full, the driver must stop. The monitor works always, but its values do not change at the same frequency as the stimuli and it is delayed due to the fifo depth.
In your example, when the fifo is full, the stopped driver will block the whole loop, so the monitor will not work either. Of course, you can come up with some conditional statements which will bypass stopped driver. But you will need to run the monitor and the scoreboard every time, even if the data is not changing.
With more complicated designs with multiple fifos, pipelines, delays, clock frequencies, etc., your loop will become so complicated that it would be difficult if not impossible to manage.
The problem is that in the simple programming it is not possible to express block/wait conditions for statement without blocking the whole loop. It is much easier to do with parallel threads.
The general approach is to run driver and monitor in separate simulation threads. Monitor in this case waits for the data to appear and does not block the driver. The driver pushes data when it is available and can be blocked by fifo full or if there is nothing to drive. It does not block the monitor.
With a single monitor, you can probably pack the scoreboard in the same thread with the monitor, but with multiple monitors it will be problematic, in particular when all monitors run in separate threads. So, the scoreboard should run as a separate thread as well.
You are mixing two different concepts. The layered approach is a software concept that helps manage different abstraction levels from software transactions (a frame of data) to the individual pin wiggles. These layers are very similar to OSI Network Model. Layering also help with maintenance and reusability by defining clear interfaces that enable you to build up a larger system. It's hard to see the benefits of this on a testbench for a small combinational block.
Parallelism come into play for other reasons. There are relatively few complete designs out there that can be tested as a single stream of inputs and then comparing the output to a reference model. You might be able to test one small block of a design this way, but not a complete chip as it typically has many interfaces that need to be driven in parallel.
But let's take the case of two simple blocks that you tested individually with the approach above. Now you want to connect them together where the output of the first DUT becomes the driver of the second DUT
Driver1 -> DUT1 -> DUT2 -> Monitor2
This works best if I originally write the drivers and monitors as separate objects running in parallel.

Does each system call create a process?

Does each system call create a process?
Are all functions (e.g. interrupts) of programs and operating systems executed in the form of processes?
I feel that such a large number of process control blocks, a large number of process scheduling waste a lot of resources.
Or, the kernel instruction of the system call is regarded as part of the current
process.
The short answer is - not exactly. But we have to agree on what we are going to call a "process". A process is more of an abstract idea, which encapsulates multiple instructions, each sequentially executed.
So let's start from the first question.
Does each system call create a process?
No. Each system call is the product of the currently running process, that tells the OS - "Hey OS, I need you to open this file for me, or read these here bits". In this case, the process is a bag of sequentially executed instructions, some are system calls, some are not.
Then we have.
Are all functions (e.g. interrupts) of programs and operating systems executed in the form of processes?
Well this kind of goes back to the first question. We are not considering that a system call (an operation that tell the OS to do something and works under very strict conditions) is a separate process. We will NOT see that system call execution to have its OWN process id (pid).
Then we have.
I feel that such a large number of process control blocks, a large number of process scheduling waste a lot of resources.
Well, I would say, do not underestimate your OS and the capabilities of your hardware. A modern processor with a modern OS on it, is VERY, VERY fast and more than capable of computing billions of instructions in seconds. We can't really imagine how fast that is. I wouldn't worry for optimizations on such a micro-level.
Okay, but let's dig deeper into this. What is a process exactly?
Informally, a process is a program in execution. The status of the current activity of a process is represented by a value, called the program counter, and the contents of the processor’s registers. The memory layout of a process is typically divided into multiple sections.
These sections include:
Text section.
Data section.
Heap section.
Stack section.
As a process executes, it changes state. The state of a process is defined in part by the current activity of that process. Each process is represented in the OS by a process control block (PCB), as you already mentioned.
So we can see that we treat a process as a very complicated structure that is MORE that just occupying CPU time. It has a state, storage, timing, and so on.
But because you are interested in system calls, then what are they?
For us, system calls provide an interface to the services made available by an OS. They are the way we tell the OS to do things FOR US. We know that systems execute thousands of system calls per second.
No, they don't.
The operating system uses software interrupt to execute the system call operation within the same process.
You can imagine them as a function call but they are executed with kernel privileges.

The reason(s)/benefit(s) to use realtime operating system instead of while-loop on MCU

I'm working on a wheeled-robot platform. My team is implementing some algorithms on MCU to
keep getting sensors reading (sonar array, IR array, motor encoders, IMU)
receive user command (via a serial port connected to a tablet)
control actuators (motors) to execute user commands.
keep sending sensor readings to the tablet for more complicated algorithms.
We currently implement everything inside a global while-loop, while I know most of the other use-cases do the very same things with a real-time operating system.
Please tell me the benefits and reasons to use a real-time os instead of a simple while-loop.
Thanks.
The RTOS will provide priority based preemption. If your code is off parsing a serial command and executing it, it can't respond to anything else until it returns to your beastly loop. An RTOS will provide the abstractions needed for an instant context switch based on an interrupt event. Otherwise the worst-case latency of an event response is going to be that of the longest possible excursion out of the main loop, and sometimes you really do need long-running processes. Such as, for example, to update an LCD panel or respond to USB device enumeration. Preemption permits you to go off and do these things safe in the knowledge that a 16-bit timer running at the CPU clock isn't going to roll over several times before it's done. While for simple control jobs a loop is sufficient, the problem starting with it is when you get into something like USB device enumeration it's no longer practical and will need a full rewrite. By starting with a preemptive framework like an RTOS provides, you have a lot more future flexibility. But there's definitely a bit more up-front work, and definitely a learning curve.
"Real Time" OS ensures your task periodicity. If you want to read sensors data precisely at every 100msec, simple while loop will not guarantee that. On other hand, RTOS can easily take care of that.
RTOS gives you predictibility. An operation will be executed at given time and it will not be missed.
RTOS gives you Semaphores/Mutex so that your memory will not be corrupted or multiple sources will not access buffers.
RTOS provides message queues which can be useful for communication between tasks.
Yes, you can implement all these features in While loop, but then that's the advantage! You get everything ready and tested.
If your while loop works (i.e. it fulfills the real-time requirements of your system), and it's robust, maintainable, and somewhat extensible, then there probably is no benefit to using a real-time operating system.
But if your while-loop can't fulfill the real-time requirements or is overly complex or over-extended (i.e., any change requires further tuning and tweaking to restore real-time performance), then you should consider another architecture.
An RTOS architecture is one popular next step beyond the super-loop. An RTOS is basically a tool for managing software complexity. It allows you to divide a complex set of software requirements into multiple threads of execution. When done properly, each individual thread has a relatively simple set of requirements and becomes easier to implement. And thread prioritization can make it easier to fulfill the real-time requirements of the application. These are basically the benefits of employing an RTOS.
An RTOS is not a panacea, however. An RTOS increases the overall system complexity and opens you up to new types of bugs (such as deadlocks). It requires knowledge and experience to design and implement an RTOS based program effectively. Consider alternatives such as Multi-Rate Main Loop Tasking or an event-based state machine architecture (such as QP). These alternatives might be simpler, easier to understand, or more compatible with your way of designing software.
There are a couple huge advantage that a RTOS multitasker has:
Very good I/O performance: an RTOS can set a waiting thread ready as soon as an I/O action that it requested has completed, so latency in handling reads/writes/whatever can be low. A looped, polled design cannot respond to I/O completions until it gets round to checking the I/O status, (either directly or my polling some volatile flag set by an interrupt-handler).
Indpendent functionality: The ease of implementing isolated subsystems for serial comms, actuators etc. may well be one for you, as suggested in the other answers. It's very reassuring to know that any extra delay in, say some serial exchange, will not adversely affect timing elsewhere. You need to wait a second? No problem: sleep(1000) and you're done - no effect on the other subsystems:) It's the difference between 'no, I cannot add a network server, it would change all the timing and I would have to retest everything' and 'sure, there's plenty of CPU free, I already have the code from another job and I just need another thread to run it'.
There ae other advanatges that help offset the added annoyance of having to program a preemptive multitasker with its critical sections, semaphores and condvars.
Obviously, if your hardware has multiple cores, the RTOS helps - it is designed to share out available CPU execution cycles just like any other resource, and adding cores means more cycles.
In the end, though, it's the I/O performance and functional isolation that's the big win.
Some of the suggestions in other answers may help, either instead of, or together with, an RTOS. When controlling multiple I/O hardware, eg. sensors and actuators, an event-driven state-machine is a very good idea indeed. I often implement that by queueing all events into one thread, (producer-consumer queue), that has sole access to the state data and implements the state-machine, so serializing actions.
Whether the advantages are worth the candle is between you and your requirements:)
RTOS is not instead of while loop - it is while loop + tools which organize your tasks. How do they organize your tasks? Assign priorities to them, decide how much time each one have for a job and/or at what time it should start/end. RTOS also layers your software, i.e harwdare related stuff, application tasks, etc. Aside of that gives you data structures, containers, ready to use interfaces to handle common tasks so you don't have to implement your own i.e allocate some memory for you, lock access for some resources and so on.

Micro scheduler for real-time kernel in embedded C applications?

I am working with time-critical applications where the microsecond counts. I am interested to a more convenient way to develop my applications using a non bare-metal approach (some kind of framework or base foundation common to all my projects).
A considered real-time operating system such as RTX, Xenomai, Micrium or VXWorks are not really real-time under my terms (or under the terms of electronic engineers). So I prefer to talk about soft-real-time and hard-real-time applications. An hard-real-time application has an acceptable jitter less than 100 ns and a heat-beat of 100..500 microseconds (tick timer).
After lots of readings about operating systems I realized that typical tick-time is 1 to 10 milliseconds and only one task can be executed each tick. Therefore the tasks take usually much more than one tick to complete and this is the case of most available operating systems or micro kernels.
For my applications a typical task has a duration of 10..100 microseconds, with few exceptions that can last for more than one tick. So any real-time operating system cannot not fulfill my requirements. That is the reason why other engineers still not consider operating system, micro or nano kernels because the way they work is too far from their needs. I still want to struggle a bit and in my case I now realize I have to consider a new category of operating system that I never heard about (and that may not exist yet). Let's call this category nano-kernel or subtick-scheduler
In such dreamed kernels I would find:
2 types of tasks:
Preemptive tasks (that run in their own memory space)
Non-preemptive tasks (that run in the kernel space and must complete in less than one tick.
Deterministic kernel scheduler (fixed duration after the ISR to reach the theoretical zero second jitter)
Ability to run multiple tasks per tick
For a better understanding of what I am looking for I made this figure below that represents the two types or kernels. The first representation is the traditional kernel. A task executes at each tick and it may interrupt the kernel with a system call that invoke a full context switch.
The second diagram shows a sub-tick kernel scheduler where multiple tasks may share the same tick interrupt. Task 1 was summoned with a maximum execution time value so it needs 2 ticks to complete. Task 2 is set with low priority, so it consumes the remaining time of each tick upon completion. Task 3 is non-preemptive so it operates on the kernel space which save some precious context switch time.
Available operating systems such as RTOS, RTAI, VxWorks or µC/OS are not fully real-time and are not suitable for embedded hard real-time applications such as motion-control where a typical cycle would last no more than 50 to 500 microseconds. By analyzing my needs I land on different topology for my scheduler were multiple tasks can be executed under the same tick interrupt. Obviously I am not the only one with this kind of need and my problem might simply be a kind of X-Y problem. So said differently I am not really looking at what I am really looking for.
After this (pretty) long introduction I can formulate my question:
What could be a good existing architecture or framework that can fulfill my requirements other than a naive bare-metal approach where everything is written sequentially around one master interrupt? If this kind of framework/design pattern exists what would it be called?
Sorry, but first of all, let me say that your entire post is completely wrong and shows complete lack of understanding how preemptive RTOS works.
After lots of readings about operating systems I realized that typical tick-time is 1 to 10 milliseconds and only one task can be executed each tick.
This is completely wrong.
In reality, a tick frequency in RTOS determines only two things:
resolution of timeouts, sleeps and so on,
context switch due to round-robin scheduling (where two or more threads with the same priority are "runnable" at the same time for a long period of time.
During a single tick - which typically lasts 1-10ms, but you can usually configure that to be whatever you like - scheduler can do hundreds or thousands of context switches. Or none. When an event arrives and wakes up a thread with sufficiently high priority, context switch will happen immediately, not with the next tick. An event can be originated by the thread (posting a semaphore, sending a message to another thread, ...), interrupt (posting a semaphore, sending a message to a queue, ...) or by the scheduler (expired timeout or things like that).
There are also RTOSes with no system ticks - these are called "tickless". There you can have resolution of timeouts in the range of nanoseconds.
That is the reason why other engineers still not consider operating system, micro or nano kernels because the way they work is too far from their needs.
Actually this is a reason why these "engineers" should read something instead of pretending to know everything and seeking "innovative" solutions to non-existing problems. This is completely wrong.
The first representation is the traditional kernel. A task executes at each tick and it may interrupt the kernel with a system call that invoke a full context switch.
This is not a feature of a RTOS, but the way you wrote your application - if a high priority task is constantly doing something, then lower priority tasks will NOT get any chance to run. But this is just because you assigned wrong priorities.
Unless you use cooperative RTOS, but if you have such high requirements, why would you do that?
The second diagram shows a sub-tick kernel scheduler where multiple tasks may share the same tick interrupt.
This is exactly how EVERY preemptive RTOS works.
Available operating systems such as RTOS, RTAI, VxWorks or µC/OS are not fully real-time and are not suitable for embedded hard real-time applications such as motion-control where a typical cycle would last no more than 50 to 500 microseconds.
Completely wrong. In every known RTOS it is not a problem to get a response time down to single microseconds (1-3us) with a chip that has clock in the range of 100MHz. So you actually can run "jobs" which are as short as 10us without too much overhead. You can even have "jobs" as short as 10ns, but then the overhead will be pretty high...
What could be a good existing architecture or framework that can fulfill my requirements other than a naive bare-metal approach where everything is written sequentially around one master interrupt? If this kind of framework/design pattern exists what would it be called?
This pattern is called preemptive RTOS. Do note that threads in RTOS are NOT executed in "tick interrupt". They are executed in standard "thread" context, and tick interrupt is only used to switch context of one thread to another.
What you described in your post is a "cooperative" RTOS, which does NOT preempt threads. You use that in systems with extremely limited resources and with low timing requirements. In every other case you use preemptive RTOS, which is capable of handling the events immediately.

Why is response time important in CPU scheduling?

I'm looking for an example of a job for which response time is important.
One definition of response time is:
The time taken in an interactive program from the issuance of a command to the commence of a response to that command.
I've read that response time is important for interactivity, but I can't understand why. If the job isn't fully completed, what output could be produced that would be of interest to a user?
Wouldn't the user only care about how soon a job finishes, as that's the first time any output is produced?
For example, consider these two possible schedulings of two jobs:
Case 1: |---B---|---A---|
Case 2: |-A-|---B---|-A-|
Suppose that job A and B are issued at the same time, A being a command typed in by the user and B being some background process.
The response time for job A as I understand it would be shorter in case 2. As job A finishes (and produces output) at the same time in the two cases, I don't understand how the user benefits (or even notices) the better response time in case 2.
When writing an operating system, one has to take into consideration what will the intended audience be. In some cases it matters most to finish jobs as quickly as possible (supercomputer systems), in some cases it matters most to be as responsive as possible (regular desktop systems), and in some cases it matters most to be as predictable as possible (real-time systems).
For finishing jobs as fast as possible, tasks should be interrupted the rarest possible (so big intervals between task switches are the best option). Here response time doesn't really matter much. It should be noted that task switches usually take some time (thousands of CPU cycles usually) due to having to save the state (including registers and paging structures) of the old task to memory and restore the state (including registers and paging structures) of the new task from memory. This also causes cache and TLB misses, since the cached information doesn't usually belong to the current process.
For being the most responsive possible, tasks should be interrupted as often as possible so the user doesn't experience the so-called lag. This is where response time is important. Note however that on interrupt-driven architectures (like x86) an interrupt from the keyboard or the mouse would automatically pause execution of the current task and call the interrupt handler, which processes the input and sends it to the appropriate program.
For being the most predictable possible, input should be processed neither too fast, neither too slow. This means that response time is constrained from both ways, thus being much more important than in "most responsive possible" designs. A misprediction can even be a fatal failure in mission-critical systems.
In a nutshell, importance of response time varies from design to design and can range from nearly unimportant to critical.
I think I have an answer to my own question. The problem was, I was just thinking about simple processes like ls that once issued runs for some amount of time and then, when they're finished, deliver their first and only output.
However, suppose job A in the example from the question is a program with multiple print statements. Output will in that case be produced before the process is complete (and some of the printouts may well occur during the first scheduled burst). It would thus make sense for interactivity to want to begin running such a process as soon as possible.