How are missed deadlines and dropped tasks work in EDF and RMS scheduling algorithms?

How are missed deadlines and dropped tasks work in EDF and RMS scheduling algorithms? - real-time

I am writing a school project about real-time systems and their scheduling algorithms. Particularly I am trying to compare several of these algorithms, namely RMS, EDF, and LLF under overloaded systems. But I am confused about how these systems deal with deadlines that are missed.
For example, consider the following set of tasks. I assume all tasks are periodic and their deadlines are equal to their periods.
Task 1:
Execution time: 2
Period: 5
Task 2:
Execution time: 2
Period: 6
Task 3:
Execution time: 2
Period: 7
Task 4:
Execution time: 2
Period: 8
It is not possible to make a feasible schedule with these tasks because the CPU Utilization is over 100%, which means some deadlines will miss and more importantly some tasks will not be completed at all. For the sake of comparison, I want to calculate a penalty (or cost) for each of the tasks which increases as more and more deadlines are missed. Here's where the questions and confusion start.
Now I understand, for example, that in RMS, the first task will never miss since it has the highest priority, and the second task also never misses. On the other hand, the third task does miss a bunch of deadlines. Here is the first question:
Do we consider a task to be dropped in RMS if it misses its deadline and a new task is dispatched?
1.a) If we do consider it dropped how would I reflect this in my penalty calculations? Since the task is never completed it would seem redundant to calculate the time it took to complete the task after its deadline passed.
1.b) If we do not consider it to be dropped and the execution of the task continues even after its deadline passes by a whole period, what happens to the new task that is dispatched? Do we drop that task instead of the one we started already, or does it just domino onto the next one and the next one and etc.? If that is the case this means that when a schedule with a length of the LCM of the tasks' periods are made, there are some task 3 dispatches that are not completed at all.
Another confusion is of the same nature but with EDF. EDF fails after a certain time on several tasks. I understand that in the case of EDF I must continue with the execution of the tasks even if I pass their deadlines which means all of the tasks will be completed even though they will not fit their deadlines completely, hence the domino effect. Then the question becomes
Do we drop any tasks at all? What happens to the tasks which are dispatched because the period resets but they cannot be executed because the same task is being executed since it missed its deadline on the period before?
I know it is a long post but any help is appreciated. Thank you. If you cannot understand any of the questions I may clarify them at your request.

Related

Can a process ask for x amount of time but take y amount instead?

If I am running a set of processes and they all want these burst times: 3, 5, 2 respectively, with the total expected time of execution being 10 time units.
Is it possible for one of the processes to take up more that what they ask for? For example even though it asked for 3 it took 11 instead because it was waiting on the user to enter some input. So the total execution time turns out to be 18.
This was all done in a non-preemptive cpu scheduler.

The reality is that software has no idea how long anything will take - my CPU runs at a different "nominal speed" to your CPU, both our CPUs keep changing their speed for power management reasons, and the speed of software executed by both our CPUs is effected by things like what other CPUs are doing (especially for SMT/hyper-threading) and what other devices happen to be doing at the time (their effect on caches, shared RAM bandwidth, etc); and software can't predict the future (e.g. guess when an IRQ will occur and take some time and upset the cache contents, guess when a read from memory will take 10 times longer because there was a single bit error that ECC needed to correct, guess when the CPU will get hot and reduce its speed to avoid melting, etc). It is possible to record things like "start time, burst time and end time" as it happens (to generate historical data from the past that can be analysed) but typically these things are only seen in fabricated academic exercises that have nothing to do with reality.
Note: I'm not saying fabricated academic exercises are bad - it's a useful tool to help learn basic theory before moving on to more advanced (and more realistic) theory.
Instead; for a non-preemptive scheduler, tasks don't try to tell the scheduler how much time they think they might take - the task can't know this information and the scheduler can't do anything with that information (e.g. a non-preemptive scheduler can't preempt the task when it takes longer than it guessed it might take). For a non-preemptive scheduler; a task simply runs until it calls a kernel function that waits for something (e.g. read() that waits for data from disk or network, sleep() that waits for time to pass, etc) and when that happens the kernel function that was called ends up telling the scheduler that the task is waiting and doesn't need the CPU, and the scheduler finds a different task to run that can use the CPU; and if the task never calls a kernel function that waits for something then the task runs "forever".
Of course "the task runs forever" can be bad (not just for malicious code that deliberately hogs all CPU time as a denial of service attack, but also for normal tasks that have bugs), which is why (almost?) nobody uses non-preemptive schedulers. For example; if one (lower priority) task is doing a lot of heavy processing (e.g. spending hours generating a photo-realistic picture using ray tracing techniques) and another (higher priority) task stops waiting (e.g. because it was waiting for the user to press a key and the user did press a key) then you want the higher priority task to preempt the lower priority task "immediately" (e.g. because most users don't like it when it takes hours for software to respond to their actions).

How to achieve an uncertain score rule in Optaplanner?

I'm using Optaplanner to develop a system, it similars with the example - MeetingScheduling. Assigns some tasks to some machines and determine the start time. I create a class - TaskAssignment as the planning entity, the fields - "machine" and "startTimeGrain" as the planning variables.
But in my use case, there is a constraint doesn't exist in MeetingScheduling, I don't know how to achieve. In some cases, possibly there is a preparation time on the front of the task. It means, TaskA and TaskB is the contiguous tasks on the same machine, TaskB is not going to start until TaskA finished (TaskA is the previous task of TaskB), and possibly there is the preparation time between those tasks, means that after TaskA finished, TaskA have to wait for a while to start, but how long to wait is not fixed, it depends on its previous task.
Possibly like following:
TaskA -> TaskB: TaskB's preparation time is 5 mins.
TaskC -> TaskB: TaskB's preparation time is 15 mins.
TaskC -> TaskA: TaskA's preparation time is 0 min.
So. I get the preparation time for the task base on its previous task (read it from a list) and calculate the interval between two tasks. if the interval is less than preparation time, interval minus preparation time as the punish score.
When I run planning, the rule through a Score Corruption exception. I found that the reason is that both the interval and preparation time are uncertain.
For the interval, it depends on the previous task's end time and its own task's start time, the start time is the planning variable, so it's uncertain.
For the preparation time, there is a preparation time list in each task, which preparation time is available depends on this previous task, due to the start time is keep changing during planning, the preparation time keeps changing too. so preparation time is uncertain too.
in this case, is any way to achieve?
Many thanks
Here is my rule, but score corruption exception appear.
rule "Make sure interval large than preparation time"
salience 1
when
$currentTA : TaskAssignment(
$machine: machine != null,
startingTimeGrain != null,
$lack : getIntervalLack() < 0L // in getIntervalLack(), interval minus preparation time
)
then
scoreHolder.addHardConstraintMatch(kcontext, $lack);
end
The exception message:
Exception in thread "main" java.lang.IllegalStateException: Score corruption: the workingScore (-17hard/0medium/0soft) is not the uncorruptedScore (-20hard/0medium/0soft) after completedAction ([TaskAssignment-5 {Machine-1([023]) -> Machine-1([023])}, TaskAssignment-5 {TimeGrain-2 -> TimeGrain-2}]):
The corrupted scoreDirector has no ConstraintMatch(s) which are in excess.
The corrupted scoreDirector has 1 ConstraintMatch(s) which are missing:
com.esquel.configuration/Make sure interval large than preparation time/[TaskAssignment-4]=-3hard/0medium/0soft
Check your score constraints.
at org.optaplanner.core.impl.score.director.AbstractScoreDirector.assertWorkingScoreFromScratch(AbstractScoreDirector.java:496)
at org.optaplanner.core.impl.solver.scope.DefaultSolverScope.assertWorkingScoreFromScratch(DefaultSolverScope.java:132)
at org.optaplanner.core.impl.phase.scope.AbstractPhaseScope.assertWorkingScoreFromScratch(AbstractPhaseScope.java:167)
at org.optaplanner.core.impl.constructionheuristic.decider.ConstructionHeuristicDecider.processMove(ConstructionHeuristicDecider.java:140)
at org.optaplanner.core.impl.constructionheuristic.decider.ConstructionHeuristicDecider.doMove(ConstructionHeuristicDecider.java:126)
at org.optaplanner.core.impl.constructionheuristic.decider.ConstructionHeuristicDecider.decideNextStep(ConstructionHeuristicDecider.java:99)
at org.optaplanner.core.impl.constructionheuristic.DefaultConstructionHeuristicPhase.solve(DefaultConstructionHeuristicPhase.java:74)
at org.optaplanner.core.impl.solver.AbstractSolver.runPhases(AbstractSolver.java:87)
at org.optaplanner.core.impl.solver.DefaultSolver.solve(DefaultSolver.java:167)
at com.esquel.main.App.startPlan(App.java:94)
at com.esquel.main.App.main(App.java:43)

If it's a hard constraint, I 'd make it build in and do it with a shadow variable:
I'd probably pre-calculate the task dependencies, so taskB has a reference to it's potentioalPrecedingTasks (taskA and taskC in your example). Then I 'd use the "chained through time" pattern (see docs) to determine the order in which the tasks get executed. Based on that order, the starting time is a shadow variable that is actualPrecedingTask.endingTime + lookUpPreperationTime(precedingTask, thisTask). See arrivalTime listener in VRP, same principle.
If it's a soft constraint, I 'd still have that same shadow variable, but call it desiredStartingTime and add a soft constraint to check if the real startingTime is equal or higher than the desiredStartingTime.

Background summary: Some orders are sent to the production workshop, an order is split into multiple tasks in a sequence by the process routing. The tasks of an order must be executed in the sequence. Each task can only be executed on a particular machine. There is maybe preparation time before starting a task, whether the preparation time exists or not, and it is long or short, depends on what task at the front of it on the same machine.
The following are the main constraints what hard to implement:
Hard constraints:
1. A task must be executed by the particular machine.
2. Tasks in an order must be executed by a particular sequence (tasks of an order, come from the processes of the order, usually they need to be executed by different machine).
3. The first task of an order has the earliest start time. It means when the order arrived the production workshop.
4. Some tasks in an order maybe have a request start time, means if the previous task finish, the next task has to start in a period. For example, TaskA is the previous task of TaskB in the same order, TaskB has to start in 16 hours after TaskA finish.
5. A task probably has a preparation time, depends on its previous task of in the same machine(usually, the same process from different order were assigned to the same machine). If there is the preparation time on the task, the task has to start after the preparation time. In other words, there is an interval between these tasks.
Soft constraints:
1. All task should be executed as soon as possible.
2. Minimize the preparation time, due to the different location of the tasks lead to the different relationship of the tasks, then the preparation times are different.
So, here are two "chains" in the solution during planning. Optaplanner generated a chain for the tasks in the same machine. The another "chain" comes from the order, and in this "chain", tasks will be assigned to the different machine. Two "Chains"
were hang together.
I named that the chain in a machine (generated by Optaplanner) as "Machine chain", and the "chain" in an order as "Order chain".
Now, you can see, due to two "Chains" were hang together, a task as the node both in the Machine chain and Order chain.
I had tried the "chained through time" pattern, the undoMove corruption appeared. I think the reason is when I updated a task in Machine chain, the following task in the same Machine chain will be updated too, these tasks are the nodes of Order chains, a chain reaction broke out.
I think my case looks like the example - Project Job Scheduling. But the difference is two "Chains" in this example never hang together.
So, I try the simple patter, but I can't escape the Score Corruption exception.

Is this an intelligent use case for optaPlanner?

I'm trying to clean up an enterprise BI system that currently is using a prioritized FIFO scheduling algorithm (so a priority 4 report from Tuesday will be executed before priority 4 reports from Thursday and priority 3 reports from Monday.) Additional details:
The queue is never empty, jobs are always being added
Jobs range in execution time from under a minute to upwards of 24 hours
There are 40 some odd identical app servers used to execute jobs
I think I could get optaPlanner up and running for this scenario, with hard rules around priority and some soft rules around average time in the queue. I'm new to scheduling optimization so I guess my question is what should I be looking for in this situation to decide if optaPlanner is going to help me or not?

The problem looks like a form of bin packing (and possibly job shop scheduling), which are NP-complete, so OptaPlanner will do better than a FIFO algorithm.
But is it really NP-complete? If all of these conditions are met, it might not be:
All 40 servers are identical. So running a priority report on server A instead of server B won't deliver a report faster.
All 40 servers are identical. So total duration (for a specific input set) is a constant.
Total makespan doesn't matter. So given 20 small jobs of 1 hour and 1 big job of 20 hours and 2 machines, it's fine that it takes all small jobs are done after 10 hours before the big job starts, given a total makespan of 30 hours. There's no desire to reduce the makespan to 20 hours.
"the average time in the queue" is debatable: do you care about how long the jobs are in the queue until they are started or until they are finished? If the total duration is a constant, this can be done by merely FIFO'ing the small jobs first or last (while still respecting priority of course).
There are no dependencies between jobs.
If all these conditions are met, OptaPlanner won't be able to do better than a correctly written greedy algorithm (which schedules the highest priority job that is the smallest/largest first). If any of these conditions aren't met (for example you buy 10 new servers which are faster), then OptaPlanner can do better. You just have to evaluate if it's worth spending 1 thread to figure that out.
If you use OptaPlanner, definitely take a look at real-time scheduling and daemon mode, to replan as new reports enter the system.

what is scheduler latency?

This seems to be a basic question, but i couldn't find answer anywhere in googling it.
As Far As I Understand, scheduler latency is the time incurred in making the task runnable again. I mean, if there are 100 processes namely 1, 2, e.t.c, then they are executed let's say in order starting from 1. So the latency is the time that the process 1 is executed again. which means that the latency is the waiting time of the process as well as the waiting time of it when it is in runqueue ready to execute.
Or
i misunderstood whole point and sheduler latency is just nothing but the context switching time between the processes?

Scheduling latency is the time that the system is inproductive because of scheduling tasks. It is system latency incurred because it has to spend time scheduling.
Specifically it consists of 2 elements:
The delay between a task waking up and actually running (the 'context switching time')
Time spent making scheduler decisions (the actual job of the scheduler, which consumes resources that cannot be used by real tasks anymore)

Task Schedulers

Had an interesting discussion with some colleagues about the best scheduling strategies for realtime tasks, but not everyone had a good understanding of the common or useful scheduling strategies.
For your answer, please choose one strategy and go over it in some detail, rather than giving a little info on several strategies. If you have something to add to someone else's description and it's short, add a comment rather than a new answer (if it's long or useful, or simply a much better description, then please use an answer)
What is the strategy - describe the general case (assume people know what a task queue is, semaphores, locks, and other OS fundamentals outside the scheduler itself)
What is this strategy optimized for (task latency, efficiency, realtime, jitter, resource sharing, etc)
Is it realtime, or can it be made realtime
Current strategies:
Priority Based Preemptive
Lowest power slowest clock
-Adam

As described in a paper titled Real-Time Task Scheduling for Energy-Aware Embedded Systems, Swaminathan and Chakrabarty describe the challenges of real-time task scheduling in low-power (embedded) devices with multiple processor speeds and power consumption profiles available. The scheduling algorithm they outline (and is shown to be only about 1% worse than an optimal solution in tests) has an interesting way of scheduling tasks they call the LEDF Heuristic.
From the paper:
The low-energy earliest deadline ﬁrst
heuristic, or simply LEDF, is an
extension of the well-known earliest
deadline ﬁrst (EDF) algorithm. The
operation of LEDF is as follows: LEDF
maintains a list of all released
tasks, called the “ready list”. When
tasks are released, the task with the
nearest deadline is chosen to be
executed. A check is performed to see
if the task deadline can be met by
executing it at the lower voltage
(speed). If the deadline can be met,
LEDF assigns the lower voltage to the
task and the task begins execution.
During the task’s execution, other
tasks may enter the system. These
tasks are assumed to be placed
automatically on the “ready list”.
LEDF again selects the task with the
nearest deadline to be executed. As
long as there are tasks waiting to be
executed, LEDF does not keep the pro-
cessor idle. This process is repeated
until all the tasks have been
scheduled.
And in pseudo-code:
Repeat forever {
if tasks are waiting to be scheduled {
Sort deadlines in ascending order
Schedule task with earliest deadline
Check if deadline can be met at lower speed (voltage)
If deadline can be met,
schedule task to execute at lower voltage (speed)
If deadline cannot be met,
check if deadline can be met at higher speed (voltage)
If deadline can be met,
schedule task to execute at higher voltage (speed)
If deadline cannot be met,
task cannot be scheduled: run the exception handler!
}
}
It seems that real-time scheduling is an interesting and evolving problem as small, low-power devices become more ubiquitous. I think this is an area in which we'll see plenty of further research and I look forward to keeping abreast!

One common real-time scheduling scheme is to use priority-based preemptive multitasking.
Each tasks is assigned a different priority level.
The highest priority task on the ready queue will be the task that runs. It will run until it either gives up the CPU (i.e. delays, waits on a semaphore, etc...) or a higher priority task becomes ready to run.
The advantage of this scheme is that the system designer has full control over what tasks will run at what priority. The scheduling algorithm is also simple and should be deterministic.
On the other hand, low priority tasks might be starved for CPU. This would indicate a design problem.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse