What type of applications would benefit the most from an In-Memory Database Predictable Latency?

What type of applications would benefit the most from an In-Memory Database Predictable Latency? - latency

I'm doing some research on In-Memory databases and am wondering what type of applications would benefit the most from the predictable latency characteristic of In-Memory databases.
I can imagine online gaming, such as first person shooter games. I'm just wondering what other type of applications.

Not much surprisingly the very applications, that benefit from predictable latency (be it low, or not -- latency jitter bothers...)
low-latency edge:HPC, where nanoseconds and sub-nanosecond delays matter most, due to immense scale of the computational complexity ( static scales beyond Peta, Exa, ... prefixes ), where a guarranteed determinism of all In-RAM data-structure handling latency enables true PARALLEL and not just-an-opportunistic-belief in "best-efforts" based CONCURRENT code-execution.
DSP, where you simply cannot afford to "block/wait" at a cost of missing the next part of the unique & hardly repeatable signal-flow ( may imagine a CERN's LHC signals to experimental data signal-sensor readings / data-acquisition recording / data-conditoining + sanity / experiment control + processing / storage services )
mid-latency zone:"hard" real-time constrained control systems ( F-35 avionics, that keeps otherwise inherently unstable aircraft somewhere up in the blue skies by ( fast enough ) fully coordinated endless-loop of sensor-network-based + pulse-controlled-effectors'-triggered state-transitions between many discrete ( still unstable ) states, that collectively draft an "envelope" illusion of a behaviour that we humans are used to call flying ( while the aircraft is not able to "fly" ( yes, it cannot extend it's own state-of-motion and continue in such motion any few moments further from any current state ( ... sure, except The Bird standing with engines off on the TARMAC ... but who would call that "flying" ??? ) because that would cause an inadvertent nose-down dig / flat-spin stall ... you name it all ),"soft" real-time systems alike operating systems, deterministic schedulers, audio/video live-stream processors,telephone switching ( well, recent packet-radio latency jitter of the mobile access networks a bit skew the advances of global TELCO networks synchronicity, developed over 80-ies/90-ies, but all these were principally building on defined latency tresholds and alowed for the first time due to this very feature to seamlessly connect Japan-standards of their PDH systems with US-standards of PDH hierarchy with the old continent's ISDN / PDH hierarchies, that were otherwise mutually impossible to connect on their own. Ref. to SDH/SONET architecture for details. )
high-latency zone: ( yes, high latency is nothing adverse ( if kept under control ) )SoC-designs, where "just-enough" principle rules the constraint-based system design, at the very edge of the resources available -- i.e. deploy the system with minimum processor resources, with minimum DRAM-powering budget, with minimum Bill-of-Material / ASIC designs, while benefiting on the fact of a known, deterministic, latency, which ensures your "just-enough" design will still meet the required stability & reliability of the deployed processing at a minimised cost of that.
Epilogue:
Author has not either un-knowingly, the less intentionally, slipped into any jargon or strange tags juggling. The terms used in the text above are as common in the contemporary IT and TELCO domains, as alphabet is among the general audience. Sure, any professional specialisation adds plenties of more tags and abbreviations, that have no other chance but share acronyms' appearance with other, similarly looking acronyms from other field of science, technology or other field of humans' activity, but this is the cost of composing acronyms.
Due care with acronym meaning dis-ambiguation is thus a common practice in any scientific and/or engineering domain.
The text above has used a few terms, that are pretty common:
DSP: Digital Signal Processing
CERN: Conseil Européen pour la Recherche Nucléaire
LHC: Large Hadron Collider, a largest known particle accelerator on the Earth ( CERN, CH )
F-35: Lockheed Marting F-35 JSC aircraft
SoC: System-on-Chip -- Xilinx ZynQ, FPGAs, EpiphanyIV MPPA, Kalray Bostan2(R) et al
ASIC: Application Specific Integrated Circuit
HPC: High-Performance Computing is a leading/bleeding edge of all the computation-related sciences -- hardware, software, theoretical foundations behind the computational problems' computability ( Big-O rating ( ref. complexity-theory ) )
nanosecond = 1 / 1.000.000.000 fraction of a second.
Contemporary TV-broadcasting and {CRT|LED}-monitor-refresh ratestake about 1 / 24 .. 1 / 60 second ( i.e. about 40.000.000 - 20.000.000 ns ).
This said, the fastest contemporary CPU-clocks are about 5.000.000.000 [Hz].
That means,such single CPU-core can computeabout 200.000.000 single-CLK CPU-instructions,before a next visual-output ( a picture ) shall get finished and put on screen.
That provides indeed a vast amount of time for underlying gaming-engine to compute / process whatever needed.
Such comfort of having that much time is not granted
On the very contrary, such comfort of having that much time is not so common in high-intensity computing and/or high capacity transport of binary streams ( super-computing and telecommunication networks et al ).
The less is any such assumption fair in externally triggered processing, where events interleaving is not under ones control and is principally non-deterministic. HFT trading realm is such brief example, where lowest-possible latencies are a must, so in-Memory technology is the only feasible approach.
Even low-intensity HFT-trading software does not have plenty of time as10% of events arrive in less than +__100 [ms]20% of events arrive in less than +__200 [ms]30% of events arrive in less than +1 100 [ms]40% of events arrive in less than +1 200 .. +200 000 [ms] -- the rest arrives in anything between 1.2 and 200 [sec]
( deeper details on controlled-latency software design exceed the format of this S/O post, but visual demonstrations and quantitative comparison of ms, us and ns available for any kind of computation hopefully shows the message -- the key difference for a latency-aware software design )
To have some idea, how many computing steps a CPU / a cluster-of-CPUs / a grid-of-CPUs may undertake in contemporary hardware spends less than about 10 ns for CPU to read a value from DRAM, less than about 0.1 ns to fetch a value from on-CPU-cache-memory.
While on-CPU-cache sizes are growing ( and today specifications state for common, consumer electronics, processors { L2 | L3 }-cache sizes above 20 MB, which is for your kind consideration more than my first PC used to have available as it's Hard Disk Drive capacity ( and that high-tech piece was those days under supervision of COCOM export regulations, requiring approval for re-export and having a Cold-War ban to disable any potential export outside of Western Block territories ) cache-allocation algorithms are not providing an a-priori deterministic certainty of having the whole database in-(cache)-memory.
So, the fastest access is about 0.1 ns into local CPU-cache, but it is uncertain.
The next fastest access is about 10 ns into local DRAM memory, and GB .. TB sizes can fit into this memory-type.
The next fastest access is about 800 ns into NUMA distributed memory infrastructures, where capacities above 1 000 TB .. 1 000 000 TB can fit ( Peta Bytes to Exa Bytes ) and all be served under a uniform and predictable access times of about the 800 ns ( the latency becomes both lowest possible for such huge databases and is both uniform and predictable ).
So, if speaking about indeed a low and predictable latency these are the yardsticks that measure it.
Both CAPEX and OPEX costs ( by which any professional purchase of computing technology is assessed ) of such high-capacity + high-performance computing frameworks are very prohibitive, but human civilisation has no better computational engines so far.

Related

Throughput vs latency in computer architecture

I've come across articles on "through-put vs latency" in contexts like networking e.g. https://homepage.cs.uri.edu/~thenry/resources/unix_art/ch12s04.html But in the context of computer architecture / operating systems, I'm not able to understand why would there be a trade-off between latency (response time of a program) and through-put (how many programs we're able to complete in a unit of time, say per hour). Is this solely due to the fact that we can choose to parallelize processing of multiple programs / requests leading to overheads like context switches & sharing of caches which make the start-to-end response time per process to be worse? Or am I missing something here?

In terms of single instructions in a superscalar pipelined out-of-order exec CPU, throughput vs. latency is very important because the CPU is trying to extract parallelism from an instruction stream that has to be executed as if in serial program order. See Assembly - How to score a CPU instruction by latency and throughput and the bottom of my answer on latency vs throughput in intel intrinsics for example.
In terms of OS decisions that affect throughput vs. latency on a much longer timescale than a few clock cycles, that's a totally separate question.
One of the major factors there is choosing how to use the available physical RAM, and whether to page out (to a swap file) infrequently used code / data to make more room to cache disk files. (e.g. Linux's vm.swappiness is widely considered a key tunable in terms of setting it differently between servers and desktops. https://unix.stackexchange.com/questions/88693/why-is-swappiness-set-to-60-by-default).
If you alt-tab to a window when many pages of that process have been paged out, it will take some time before the process can redraw its window. (Multiple hard page faults, can be quite slow especially if paging on a rotational disk, not SSD.) So to optimize for latency, you want the kernel to not aggressively swap out pages from running processes, even if they've been idle for a few hours. Those pages, if they'd been free, could have improved throughput for other processes by acting as buffers / cache.
A related factor is I/O scheduling: trying to group IO requests together to minimize HD seek times (for higher throughput and lower average latency), but sometimes at the expense of delaying a few requests for a longer time (higher worst-case latency). Linux for example has many to choose from, including deadline, Completely Fair Queuing (CFQ), and the original elevator (just grouping requests by locality without consideration of fairness or latency). https://wiki.archlinux.org/title/improving_performance#Input/output_schedulers
CPU scheduling is also a factor: a context-switch hurts throughput, as it takes time itself and caches will likely be cold for the new task on this CPU. You also have to run the kernel's schedule() function to decide which task to run next, so that takes away some time from real work.
To minimize latency (for example between a socket message being sent to a process and it waking up when its poll or select system call returns), you want a short timeslice, like Linux HZ=1000. (Timer interrupts every 1 ms to run the scheduler). And you want to be able to pre-empt even the kernel itself, instead of waiting until the kernel is ready to return to the old user-space to consider the possibility of running a different user-space task.
But neither of these helps throughput, and in fact hurt (assuming the workload has enough parallelism to not bottleneck on latency). So HZ=100 was the default for "server" Linux builds, vs. 1000 on "desktop" builds tuned for interactive use. (Modern Linux can be "tickless", not using a fixed timer interrupt on every core at all, instead deciding when to schedule the next interrupt on a case by case basis.)
Real-time kernels take this even further, spending more time on finer-grained locking and stuff like that to enable pausing work and coming back to it later to minimize interrupt latency and other latencies between it being time to do something and actually starting to do that thing. (There are real-time patches for Linux, and there are also totally separate kernels built from the ground up for real-time operation.)
If you have an embedded system controlling a motor or something, you absolutely need hard real-time latency guarantees that it will never take longer than say 1 millisecond from an interrupt pin being asserted to the interrupt handler starting to run.
(Designing the system to make these guarantees possible often comes at the cost of throughput. e.g. obviously you have to pin some memory to make it not swappable, if we're talking about user-space, making it unavailable for cache even if it goes untouched for days.)

Strict consistency vs atomic consistency

I have read a couple of articles and I am confused about the difference between strict consistency (which is defined as "It can be better understood as though a global clock is present in which every write should be reflected in all processor caches by the end of that clock period.") and atomic consistency (or linearizability, which is defined as "sequential consistency with the real-time constraint"). Both definitions come from Wikipedia. The source of my confusion is the fact that the strict model provides that every process see a change immediately and atomic consistency is also said to work in real-time providing the same sequence of writes for every process.

The requirement for maintaining a system-wide distribution of global clock for a strict consistency case is hopefully clean enough by itself.
The atomic consistency thus needs some more warranties, in exchange to not maintaining the global clock, to still become and stay consistent system-wide.
Here comes useful the warranty from HRT-system, as it keeps the sequential consistency within its realm of deterministic, a-priori known finite time. Thus the state-change propagation planning is possible and holds throughout the whole life-cycle of the HRT-system operation.
On "sequential consistency with the real-time constraint" :
This option ought be understood as a technically less strict, yet for maintaining the system-wide consistency-goal sufficient enough ( see determinism + known deadline below ), not having a need to guarantee a system-wide distribution of a uniform clock.
For touching what the "real-time constraint" actually is useful for, let me borrow ( incl. original typos, accents added ) from a book from Giovanni Di Sirio on Real-Time Operating System (RTOS) disambiguation :
What an RTOS is
An RTOS is an operating system whose internal processes are guaranteed to be compliant with (hard or soft) realtime requirements. The fundamental qualities of an RTOS are:
- Predictable. It is the quality of being predictable in the scheduling behavior.
- Deterministic. It is the quality of being able to consistently produce the same results under the same conditions.
RTOS are often confused with “fast” operating systems. While efficiency is a positive attribute of an RTOS, efficiency alone does not qualifies an OS as RTOS but it could separate a good RTOS from a not so good one.
A deciding factor is the (un-)certainty of completing each work-unit within a(n un-)known deadline :
“A non real time system is a system where the programmed reaction to an event will certainly happen sometime in the future”.
Whereas :
Soft Real Time.A Soft Real Time (SRT) system is a system where not meeting a deadline can have undesirable but not catastrophic effects, a performance degradation for example. Such systems could be described as follow :
“A soft real time system is a system where the programmed reaction to an event is almost always completed within a known finite time”.
Hard Real Time.An Hard Real Time (HRT) system is a system where not meeting a deadline can have catastrophic effects. Hard realtime systems require a much more strict definition and could be described as follow :
“An hard real time system is a system where the programmed reaction to an event is guaranteed to be completed within a known finite time”.

How can communication middleware support soft real-time applications?

Nowadays the concept "real-time" has a lot different interpretations. In this question two definitions are provided:
The hard real-time definition considers any missed deadline to be a system failure. This scheduling is used extensively in mission critical systems where failure to conform to timing constraints results in a loss of life or property.
and
The soft real-time definition allows for frequently missed deadlines, and as long as tasks are timely executed their results continue to have value. Completed tasks may have increasing value up to the deadline and decreasing value past it.
In my research I came to the following conclusions:
The middleware supports hard real-time if it provides predictable and efficient end-to-end control over system resources. Like setting the thread-priority of all the threads created by the middleware.
It appears to me that good performance is the most relevant factor to support soft real-time applications.
Is this true? Are other relevant features of communication middlewares which support soft real-time applications?

First, for precise definitions of real-time principles and terms, based on first principles and mental models, I refer you to real-time.org.
The real-time practitioner computing community uses a variety of inconsistent and incomplete "definitions" of "real-time," "hard real-time," and "soft real-time." The real-time computing research community has a consensus on "hard real-time" but is confused about "soft real-time."
The core of the research community's "hard" real-time computing model is that tasks have hard deadlines, and all these deadlines must not be missed, else the system has failed. Meeting the deadlines is the "timeliness" criterion, and guaranteeing that all deadlines will be met is the "predictability" criterion--that predictability is "deterministic."
(In some of these models, tasks without deadlines are allowed in the background if they do not interfere with the hard real-time tasks; they usually also are prevented from being starved.)
This model requires everything related to the hard real-time tasks to be static (known in advance)--i.e., it requires that the time evolution of the system is known in advance. This requirement is very strong, and in most cases, it is not feasible. There are important hard real-time systems in which this requirement is (at least presumed) to be satisfied. Well-known examples include digital avionics flight control, certain medical devices, power plant control, railroad crossing control, etc. These examples are safety-critical, but not all hard real-time systems are (and we will see below that most safety-critical systems are not and cannot be hard real-time, although some may include simple low level hard real-time components).
Soft real-time refers to a class of real-time systems which are generalizations of hard real-time ones. The generalizations included weaker timeliness criteria and/or weaker predictability criteria.
For example, consider a model with tasks having deadlines as hard real-time ones do. In this particular model, the timeliness criterion is that any number of tasks are allowed to be up to 15% tardy, and the predictability criterion is that this must be guaranteed (i.e., deterministic) just like for hard real-time systems. If one or more of these tasks is more than 15% tardy, the system has failed.This model is not a conventional hard real-time one, although it may be a safety-critical one.
Consider another model: the timeliness criterion is that no more than 20% of the tasks can be more than 5% tardy, and the predictability criterion is that this is guaranteed to be satisfied with at least probability 0.9. Violation of the timeliness and/or predictability criteria means the system has failed.This is not a hard real-time system, although it may be a safety-critical one.
But consider: what if the utility of that system degrades according to not meeting one or any of those criteria--say, 23% were more than 5% tardy, or less than 20% of the tasks were tardy but 10% of those were more than 5% tardy, or all of the criteria were met except that the predictability is only 0.8. There are many real-time systems having such dynamic properties.
We need to specify how that system degradation (say, the system's "utility" or "value") is related to how many and to what degree any of those timeliness and predictability criteria were or were not met. In fact, this model is a notional example of many actual existing real-time systems that are as safety-critical as possible--for example, for doing defense against nuclear armed hostile missiles (and numerous other military combat systems, because they all have various inherent dynamic uncertainties).
Now we return to that need to specify how a real-time system's timeliness and predictability are related to the system's utility. A successfully used solution to that is called "time/utility functions," (or "time/value functions") and is described in great detail on real-time.org. The functions for each task are derived from the physical nature of the system application(s). The system's timeliness and predictability of timeliness are based on those of the tasks--for example, by weighted accrual of their individual utilities.
Soft real-time systems (in the precisely defined sense described on real-time.org) are the general case, and hard real-time systems are a special case that applies to a much smaller domain of real-time problems. All hard and soft real-time systems can be specified and created with time/utility functions.
All that clarified, now we can address your question about real-time middleware.
One obvious source for an answer is The Open Group Real-Time CORBA (RTC) standard (Google, there is a GREAT deal of detailed information available).
RTC can be implemented as a fixed priority infrastructure, with a 15-bit system-wide priority that is mapped onto the node priorities. In that case, the minimum requirements are: respecting thread priorities between client and server for resolving resource contention during the processing of CORBA invocations; bounding the duration of thread priority inversions during end-to-end processing; bounding the latencies of operation invocations. It is possible to build hard real-time RTC distributed systems according to those three requirements (and many exist)--but obviously the underlying network QoS affects the real-time behavior. So RTC provides for pluggable application-specific networking, such as those having deterministic QoS (so hard real-time is possible at and below the RTC layers), and those having non-deterministic QoS (but still the RTC layers have the three essential fixed priority real-time properties).
More generally, RTC provides for soft real-time (in the technical sense defined on real-time.org) at the CORBA layers. It does that by providing a first order scheduling abstraction called "distributed threads." And it provides a scheduling framework that supports not only fixed priorities but also time/utility functions, which are general enough to express a very general class of "utility accrual" soft real-time scheduling algorithms. Such algorithms (or usually heuristics) are needed for distributed systems consisting of application-specific soft real-time system models such as I described above.
What if you don't want to use RTC? The good news is that RTC's principles first appeared publicly in a different distributed real-time system (described on real-time.org), and can be (and have been) transplanted to other real-time middleware for both hard and soft real-time systems.
For soft real-time (again, in the precisely defined sense from real-time.org) middleware, the principles of dynamic timeliness and predictability of timeliness must be applied to resource management at each node of the middleware's system--including being applied to scheduling the middleware's network (e.g., access, routing, etc.). Instances of this approach appear in several Ph.D. theses, and have also been implemented in a number of military combat distributed real-time time systems.

RTOS example where GPOS will most likely fail

I want to know a few application examples where one needs to use RTOS in order to ensure a working system.
I did some google search and whatever examples I found, I feel could be implemented using a windows or linux system.

The primary difference between an RTOS and a GPOS is that an RTOS guarantees deterministic response. That is to say that the worst case response time to an event is precisely bounded (and usually fast). A GPOS schedules processes generally on "balanced load" basis - it assumes that all processes and events are of equal importance and will be allotted a "fair" share of processor resources. For that reason when a process has the CPU, unless it yields "cooperatively" it will have sole use of the CPU for the duration of its time slot (assuming a single core - multi-core processors allow true concurrency, but the GPOS still allots the cores of a balanced load basis). A time slot may be several tens of milliseconds, and the time taken to service a particular process will depend greatly on the number of processes simultaneously demanding CPU time. Outside perhaps of implementing a kernel level driver, achieving timing constraints of a few tens of microseconds (or less) is not possible (or desirable) in a GPOS.
If your application is what Microsoft's marketing used to call "soft" real-time (i.e. not real time at all) that a GPOS may suit. Linux can be built with "real-time" scheduling support, but it does not really make Linux suited to a large set of "hard" real-time tasks, and it is still "soft" in the sense that most of the time it will meet deadlines, but in some outlier conditions it may fail. If that is your medical life-support system, you probably don't want to trust to that!
As an example of an attempt to run essentially real-time tasks on a GPOS that fails, years ago when MMX instructions were added to Pentium processors (running typically at 60MHz then), someone had the bright idea of "Host Signal Processing", a method applied to reduce the cost of PSTN modems (dial-up) by performing the signal processing on the PC rather than using a dedicated processor or DSP in the modem hardware - these "modems" were not really modems at all; they were telephone interfaces and digital converters for modem software. At the time I worked for a company producing PSTN modem test equipment, and we tried one of these early HSP modems, and it worked right up until you launched Microsoft Word (or pretty much any large application), when it would instantly drop the connection. Things improved as PCs became faster, but the point is that it was not guaranteed to work - it just mostly did.
Another example I have worked on is on a carton loading machine in food packaging. The product is inserted into the carton, a glue stripe applied, and the closure folded. The carton is moving continuously during this process an the timing of the glue gun is critical - for a glue stripe to be accurate to within one millimetre on a carton moving at one metre per second requires timing within one millisecond.
Another example is that of TDMA communication as used in digital telephony for example. Such communication allocates a time slot for each stations transmission and failure to transmit in exactly the correct time slot, or encroaching on the time slot of another station is unacceptable. Such systems are globally synchronised to atomic-clock accuracy (typically derived from a GPS receiver). A GSM time slot for example is 577 microseconds, in this time, the transmitter must ramp-up the transmitter power, transmit the data and ramp-down
In short any example that requires 100 percent deterministic timing needs an RTOS. If your timing constraints are say > 100ms, and a small probability of failure to meet timing is tolerable, then a GPOS may work all or most of the time. If timing constraints are sub-millisecond or the cost or consequences of failure unacceptable, then an RTOS is appropriate.

Differences between hard real-time, soft real-time, and firm real-time?

I have read the definitions for the different notions of real-time, and the examples provided for hard and soft real-time systems make sense to me. But, there is no real explanation or example of a firm real-time system. According to the link above:
Firm: Infrequent deadline misses are tolerable, but may degrade the system's quality of service. The usefulness of a result is zero after its deadline.
Is there a clear distinction between firm real-time vs. hard or soft real-time, and is there a good example that illustrates that distinction?
In comments, Charles asked that I submit tag wikis for the new tags. The example of a "firm real-time system" I provided for the firm-real-time tag was a milk serving system. If the system delivers milk after its expiration time, then the milk is considered "not useful". One can tolerate eating cereal without milk, but the quality of the experience is degraded.
This is just the idea I formed in my head when I initially read the definition. I am looking for a much better example, and perhaps a better definition of firm real-time that will improve my notion of it.

Hard Real-Time
The hard real-time definition considers any missed deadline to be a system failure. This scheduling is used extensively in mission critical systems where failure to conform to timing constraints results in a loss of life or property.
Examples:
Air France Flight 447 crashed into the ocean after a sensor malfunction caused a series of system errors. The pilots stalled the aircraft while responding to outdated instrument readings. All 12 crew and 216 passengers were killed.
Mars Pathfinder spacecraft was nearly lost when a priority inversion caused system restarts. A higher priority task was not completed on time due to being blocked by a lower priority task. The problem was corrected and the spacecraft landed successfully.
An Inkjet printer has a print head with control software for depositing the correct amount of ink onto a specific part of the paper. If a deadline is missed then the print job is ruined.
Firm Real-Time
The firm real-time definition allows for infrequently missed deadlines. In these applications the system can survive task failures so long as they are adequately spaced, however the value of the task's completion drops to zero or becomes impossible.
Examples:
Manufacturing systems with robot assembly lines where missing a deadline results in improperly assembling a part. As long as ruined parts are infrequent enough to be caught by quality control and not too costly, then production continues.
A digital cable set-top box decodes time stamps for when frames must appear on the screen. Since the frames are time order sensitive a missed deadline causes jitter, diminishing quality of service. If the missed frame later becomes available it will only cause more jitter to display it, so it's useless. The viewer can still enjoy the program if jitter doesn't occur too often.
Soft Real-Time
The soft real-time definition allows for frequently missed deadlines, and as long as tasks are timely executed their results continue to have value. Completed tasks may have increasing value up to the deadline and decreasing value past it.
Examples:
Weather stations have many sensors for reading temperature, humidity, wind speed, etc. The readings should be taken and transmitted at regular intervals, however the sensors are not synchronized. Even though a sensor reading may be early or late compared with the others it can still be relevant as long as it is close enough.
A video game console runs software for a game engine. There are many resources that must be shared between its tasks. At the same time tasks need to be completed according to the schedule for the game to play correctly. As long as tasks are being completely relatively on time the game will be enjoyable, and if not it may only lag a little.
Siewert: Real-Time Embedded Systems and Components.
Liu & Layland: Scheduling Algorithms for Multiprogramming in a Hard Real-Time Environment.
Marchand & Silly-Chetto: Dynamic Scheduling of Soft Aperiodic Tasks and Periodic Tasks with Skips.

Hard real-time means you must absolutely hit every deadline. Very few systems have this requirement. Some examples are nuclear systems, some medical applications such as pacemakers, a large number of defense applications, avionics, etc.
Firm/soft real time systems can miss some deadlines, but eventually performance will degrade if too many are missed. A good example is the sound system in your computer. If you miss a few bits, no big deal, but miss too many and you're going to eventually degrade the system. Similar would be seismic sensors. If you miss a few datapoints, no big deal, but you have to catch most of them to make sense of the data. More importantly, nobody is going to die if they don't work correctly.
The line is fuzzy, because even a pacemaker can be off by a small amount without killing the patient, but that's the general gist.
It's sort of like the difference between hot and warm. There's not a real divide, but you know it when you feel it.

After reading the Wikipedia page and other pages on real-time computing. I made the following inferences:
1> For a Hard real-time system, if the system fails to meet the deadline even once the system is considered to have Failed.
2> For a Firm real-time system, even if the system fails to meet the deadline, possibly more than once (i.e. for multiple requests), the system is not considered to have failed. Also, the responses for the requests (replies to a query, result of a task, etc.) are worthless once the deadline for that particular request has passed (The usefulness of a result is zero after its deadline). A hypothetical example can be a storm forecast system (if a storm is predicted before arrival, then the system has done its job, prediction after the event has already happened or when it is happening is of no value).
3> For a Soft real-time system, even if the system fails to meet the deadline, possibly more than once (i.e. for multiple requests), the system is not considered to have failed. But, in this case the results of the requests are not worthless value for a result after its deadline, is not zero, rather it degrades as time passes after the deadline. Eg.: Streaming audio-video.
Here is a link to a resource that was very helpful.

It's popular to associate some great catastrophe with the definition of hard real-time, but this is not relevant. Any failure to meet a hard real-time constraint simply means that the system is broken. The severity of the outcome when something is labelled "broken" isn't material to the definition.
Firm and soft simply fail to be automatically declared broken on failing to meet a single deadline.
For a fair example of hard real-time, from the page you linked:
Early video game systems such as the Atari 2600 and Cinematronics vector graphics had hard real-time requirements because of the nature of the graphics and timing hardware.
If something in the video generation loop missed just a single deadline then the whole display would glitch, which would be intolerable, even if it was rare. That would be a broken system and you'd take it back to the shop for a refund. So it's hard real-time.
Obviously any system can be subject to situations it cannot handle, so it's necessary to restrict the definition to being within the expected operating conditions -- noting that in safety-critical applications people must plan for terrible conditions ("the coolant has evaporated", "the brakes have failed", but rarely "the sun has exploded").
And lets not forget that sometimes there's an implicit "while anybody is watching" operating condition. If nobody sees you break the rules (or if they did but they die the fire before telling anyone), and nobody can prove that you broke the rules after the fact, then it's kind of the same as if you never broke the rules!

The simplest way to distinguish between the different kinds of real-time system types is answering the question:
Is a delayed system response (after the deadline) is still useful or not?
So depending on the answer you get for this question, your system could be included as one of the following categories:
Hard: No, and delayed answers are considered a system failure
This is the case when missing the dead-line will make the system unusable. For example the system controlling the car Airbag system should detect the crash and inflate rapidly the bag. The whole process takes more or less one-twenty-fifth of a second. Thus, if the system for example react with 1 second of delay the consequences could be mortal and it will be no benefit having the bag inflated once the car has already crashed.
Firm: No, but delayed answers are not necessary a system failure
This is the case when missing the deadline is tolerable but it will affect the quality of the service. As a simple example consider a video encryption system. Normally the password of encryption is generated in the server (video Head end) and sent to the customer set-top box. This process should be synchronized so normally the set-top box receives the password before starts receiving the encrypted video frames. In this case a delay it may lead to video glitches since the set-top box is not able to decode the frames because it hasn't received the password yet. In this case the service (film, an interesting football match, etc) could be affected by not meeting the deadline. Receiving the password with delay in this case is not useful since the frames encrypted with the same have already caused the glitches.
Soft: Yes, but the system service is degraded
As from the the wikipedia description the usefulness of a result degrades after its deadline. That means, getting a response from the system out of the deadline is still useful for the end user but its usefulness degrade after reaching the deadline. A simple example for this case is a software that automatically controls the temperature of a room (or a building). In this case if the system has some delays reading the temperature sensors it will be a little bit slow to react upon brusque temperature changes. However, at the end it will end up reacting to the change and adjusting accordingly the temperature to keep it constant for example. So in this case the delayed reaction is useful, but it degrades the system quality of service.

A soft real time is easiest to understand, in which even if the result is obtained after the deadline, the results are still considered as valid.
Example: Web browser- We request for certain URL, it takes some time in loading the page. If the system takes more than expected time to provide us with the page, the page obtained is not considered as invalid, we just say that the system's performance wasn't up to the mark (system gave low performance!).
In hard real time system, if the result is obtained after the deadline, the system is considered to have failed completely.
Example: In case of a robot doing some job like line tracing, etc. If a hindrance comes on its path, and the robot doesn't process this information within some programmed deadline (almost instant!), the robot is said to have failed in its task (the robot system may also get completely destroyed!).
In firm real time system, if the result of process execution comes after the deadline, we discard that result, but the system is not termed to have been failed.
Example: Satellite communication for enemy position monitoring or some other task. If the ground computer station to which the satellites send the frames periodically is overloaded, and the current frame (packet) is not processed in time and the next frame comes up, the current packet (the one who missed the deadline) doesn't matter whether the processing was done (or half done or almost done) is dropped/discarded. But the ground computer is not termed to have completely failed.

To define "soft real-time," it is easiest to compare it with "hard real-time." Below we will see that the term "firm real-time" constitutes a misunderstanding about "soft real-time."
Speaking casually, most people implicitly have an informal mental model that considers information or an event as being "real-time"
• if, or to the extent that, it is manifest to them with a delay (latency) that can be related to its perceived currency
• i.e., in a time frame that the information or event has acceptably satisfactory value to them.
There are numerous different ad hoc definitions of "hard real-time," but in that mental model, hard real-time is represented by the "if" term. Specifically, assuming that real-time actions (such as tasks) have completion deadlines, acceptably satisfactory value of the event that all tasks complete is limited to the special case that all tasks meet their deadlines.
Hard real-time systems make the very strong assumptions that everything about the application and system and environment is static and known a' priori—e.g., which tasks, that they are periodic, their arrival times, their periods, their deadlines, that they won’t have resource conflicts, and overall the time evolution of the system. In an aircraft flight control system or automotive braking system and many other cases those assumptions can usually be satisfied so that all the deadlines will be met.
This mental model is deliberately and very usefully general enough to encompass both hard and soft real-time--soft is accommodated by the "to the extent that" phrase. For example, suppose that the task completions event has suboptimal but acceptable value if
no more than 10% of the tasks miss their deadlines
or no task is more than 20% tardy
or the average tardiness of all tasks is no more than 15%
or the maximum tardiness among all tasks is less than 10%
These are all common examples of soft real-time cases in a great many applications.
Consider the single-task application of picking your child up after school. That probably does not have an actual deadline, instead there is some value to you and your child based on when that event takes place. Too early wastes resources (such as your time) and too late has some negative value because your child might be left alone and potentially in harm's way (or at least inconvenienced).
Unlike the static hard real-time special case, soft real-time makes only the minimum necessary application-specific assumptions about the tasks and system, and uncertainties are expected. To pick up your child, you have to drive to the school, and the time to do that is dynamic depending on weather, traffic conditions, etc. You might be tempted to over-provision your system (i.e., allow what you hope is the worst case driving time) but again this is wasting resources (your time, and occupying the family vehicle, possibly denying use by other family members).
That example may not seem to be costly in terms of wasted resources, but consider other examples. All military combat systems are soft real-time. For example, consider performing an aircraft attack on a hostile ground vehicle using a missile guided with updates to it as the target maneuvers. The maximum satisfaction for completing the course update tasks is achieved by a direct destructive strike on the target. But an attempt to over-provision resources to make certain of this outcome is usually far too expensive and may even be impossible. In this case, you may be less but sufficiently satisfied if the missile strikes close enough to the target to disable it.
Obviously combat scenarios have a great many possible dynamic uncertainties that must be accommodated by the resource management. Soft real-time systems are also very common in many civilian systems, such as industrial automation, although obviously military ones are the most dangerous and urgent ones to achieve acceptably satisfactory value in.
The keystone of real-time systems is "predictability." The hard real-time case is interested in only one special case of predictability--i.e., that the tasks will all meet their deadlines and the maximum possible value will be achieved by that event. That special case is named "deterministic."
There is a spectrum of predictability. Deterministic (determinism) is one end-point (maximum predictability) on the predictability spectrum; the other end-point is minimum predictability (maximum non-determinism). The spectrum's metric and end-points have to be interpreted in terms of a chosen predictability model; everything between those two end-points is degrees of unpredictability (= degrees of non-determinism).
Most real-time systems (namely, soft ones) have non-deterministic predictability, for example, of the tasks' completions times and hence the values gained from those events.
In general (in theory), predictability, and hence acceptably satisfactory value, can be made as close to the deterministic end-point as necessary--but at a price which may be physically impossible or excessively expensive (as in combat or perhaps even in picking up your child from school).
Soft real-time requires an application-specific choice of a probability model (not the common frequentist model) and hence predictability model for reasoning about event latencies and resulting values.
Referring back to the above list of events that provide acceptable value, now we can add non-deterministic cases, such as
the probability that no task will miss its deadline by more than 5% is greater than 0.87. (Note the number of scheduling criteria expressed in there.)
In a missile defense application, given the fact that in combat the offense always has the advantage over the defense, which of these two real-time computing scenarios would you prefer:
because the perfect destruction of all the hostile missiles is very unlikely or impossible, assign your defensive resources to maximize the probability that as many of the most threatening (e.g., based on their targets) hostile missiles will be successfully intercepted (close interception counts because it can move the hostile missile off-course);
complain that this is not a real-time computing problem because it is dynamic instead of static, and traditional real-time concepts and techniques do not apply, and it sounds more difficult than static hard real-time, so you are not interested in it.
Despite the various misunderstandings about soft real-time in the real-time computing community, soft real-time is very general and powerful, albeit potentially complex compared with hard real-time. Soft real-time systems as summarized here have a lengthy successful history of use outside the real-time computing community.
To directly answer the OP question:
A hard real-time system can provide deterministic guarantees—most commonly that all tasks will meet their deadlines, interrupt or system call response time will always be less than x, etc.—IF AND ONLY IF very strong assumptions are made and are correct that everything that matters is static and known a' priori (in general, such guarantees for hard real-time systems are an open research problem except for rather simple cases)
A soft real-time system does not make deterministic guarantees, it is intended to provide the best possible analytically specified and accomplished probabilistic timeliness and predictability of timeliness that are feasible under the current dynamic circumstances, according to application-specific criteria.
Obviously hard real-time is a simple special case of soft real-time. Obviously soft real-time's analytical non-deterministic assurances can be very complex to provide, but are mandatory in the most common real-time cases (including the most dangerous safety-critical ones such as combat) since most real-time cases are dynamic not static.
"Firm real-time" is an ill-defined special case of "soft real-time." There is no need for this term if the term "soft real-time" is understood and used properly.
I have a more detailed much more precise discussion of real-time, hard real-time, soft real-time, predictability, determinism, and related topics on my web site real-time.org.

real-time - Pertaining to a system or mode of operation in which computation is performed during the actual time that an external process occurs, in order that the computation results can be used to control, monitor, or respond to the external process in a timely manner. [IEEE Standard 610.12.1990]
I know this definition is old, very old. I can't, however, find a more recent definition by the IEEE (Institute of Electrical and Electronics Engineers).

Consider a task that inputs data from the serial port. When new data arrives the serial port triggers an event. When the software services that event, it reads and processes the new data. The serial port has a hardware to store incoming data (2 on the MSP432, 16 on the TM4C123) such that if the buffer is full and more data arrives, the new data is lost. Is this system hard, firm, or soft real time?
It is hard real time because if the response is late, data may be lost.
Consider a hearing aid that inputs sounds from a microphone, manipulates the sound data, and then outputs the data to a speaker. The system usually has small and bounded jitter, but occasionally other tasks in the hearing aid cause some data to be late, causing a noise pulse on the speaker. Is this system hard, firm or soft real time?
It is firm real time because it causes an error that can be perceived but the effect is harmless and does not significantly alter the quality of the experience.
Consider a task that outputs data to a printer. When the printer is idle the printer triggers an event. When the software services that event, it sends more data to the printer. Is this system hard, firm or soft real time?
It is soft real time because the faster it responses the better, but the value of the system (bandwidth is amount of data printed per second) diminishes with latency.
UTAustinX: UT.RTBN.12.01x Realtime Bluetooth Networks

Maybe the definition is at fault.
From my experience, I would separate the two as being hardware and software dependant.
If you have 200ms to service a hardware driven interrupt, that is what you've got. You stick 300ms of code in there and the system isn't broken, it hasn't been developed. You'll be switched out before you've finished. Your code doesn't work or is not fit for purpose. Many systems have hard defined processing periods. Video, telecoms etc.
If you're writing an application that's real-time, this could be considered soft. If you run out of time you could hope for less load next time, you could tune the OS, add some memory or even upgrade the hardware. You have options.
To look at it from a UX perspective is not helpful. A Skoda might not be broken if it glitches, but a BMW sure as hell will be.

The definition has expanded over the years to the detriment of the term. What is now called "Hard" real-time is what used to be simply called real-time. So systems in which missing timing windows (rather than single-sided time deadlines) would result incorrect data or incorrect behavior should be consider real-time. Systems without that characteristic would be considered non-real-time.
That's not to say that time isn't of interest in non-real-time systems, it just means that timing requirements in such systems don't result in fundamentally incorrect results.

Hard real time systems uses preemptive version of priority scheduling, so that critical tasks get immediately scheduled, whereas soft real time systems uses non-preemptive version of the priority scheduling, which allows the present task to be finished before control is transferred to the higher priority task, causing additional delays. Thus the task deadlines are critically followed in Hard real time systems, whereas in soft real time systems they are handled not that seriously.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse