Distribution of CPU cycles when multiple process are running in parallel?

Distribution of CPU cycles when multiple process are running in parallel? - operating-system

My question is Are cpu cycles are given to different processess in
round robin fashion ?
Conext of question is :-
i have windows system and lets say i have opened these 10 diferent processes for example
playing music in media player, typing in wordpad, typing in notepad, surfing in browser etc.
As music is played in background without interruption when at the same time i am typing in wordpad . I am wondering how
come music player is given continous CPU cycles. My understanding is OS is rotating the cpu cycles among different
processes in round robin fashion but this switching is too fast that end user is not able to spot the interruption
in music(though actually it is interrupted)

The simple round-robin is a way to share the computing resource among a set of processes(threads) but not the one used in Windows. Each thread has it's static and dynamic priority. The scheduler picks up the a thread with a highest priority to run and gives it a time slot to execute. If the thread consumed the time slot completely, then it is swapped out from execution by the scheduler preventively, or the thread may give the rest of the time slot back to the system voluntary if it has nothing to do more (for example it is waiting for the end of IO operation).
In your your particular question there is another thing that creates the continuous play of sound. It is buffering. The media player reads data in advance from the media and then queues it to the hardware to play. So the hardware should always have a buffered data in advance, otherwise the sound will have interruptions. Nowadays our computers are powerful enough to supply the necessary stream to the hardware even under notable loads.
In old days if you run many apps at once and the system starts to swap in and out the processes from the disk (from the OS point of view is more important thing than to give the media player the opportunity to run) then you could have your music played with gaps of silence.

a) The CPU fills a buffer in the sound card, which plays from the buffer while the CPU is doing other stuff. So the CPU doesn't have to care about the sound card all the time.
b) Switching between processes takes place in the milli- or even microsecond time range, so you just won't notice that as a human.
c) Processes that have nothing to do (like wordpad waiting for a key) tell the OS they're idle, so the OS doesn't give them any time until something happens (a key is pressed, or windows have been moved so they have to be redrawn).
d) CPUs are fast. Even if you could type 10 keys per second, the CPU won't take more than 100 microseconds per key (actually much less but this value makes it easier to calculate), so while typing at 10 key/s, you're giving work for 1ms/s to the CPU. So you'll take 0.1% of your CPU time while typing. You won't even see CPU usage in the taskmanager go up.

Cycles into processor are not round robin, because round robin means equality on the shares, and is not.
You can change all the cycles assignament percents using priority(windows) or using nice (Linux). CPU is dinamically assigned, between cycles.
In your context, putting more priority to the sound you will get lag on the keystroke repeating speed. (just like when you do into a 133Mhz computer)
A lot of extra info:
Is a usual problem to say CPU is the unique that makes the pc powerfull.
But really not. IS the core, but not the engine.
Your sound is NOT proccessed by main proccesor as we could "think". Audio chips are for. They got the concrete sound chip processing load. So, if you push play... the CPU push the play, draws a player, put info (mp3 mb files) into ram, then sends to the soundchip, and the Sound chip does the hard work.
Anyway, the computer workflow is not CPU->keyboard->CPU nor like that.
You have to check RAM, where the sound is stored, and passed to the sound chip.
The latency, current state, current load...
Motherboard is the road betteen components (micro, cpu, sound, vga, other), so the sum of total BUS MHz (cycles) it quite more important thant your CPU power.
As a final note. If you got problems when using sound with another apps/process, simply add more priority to the process.
The "background state" of the sound player maybe is one reason to get less priority by itself...but im not sure.
I know that is not a cool asnwer. Maybe re-edit them.
Just put the ideas on the table.
I suggest you to try at least 3 sound players with the same file.
If nothing has changed, try the same with 3 txt editors.
If nothing has changed, we are into a space-tem hole.
I think is the RAM, and if not, the motherboard.
Because you dont post any ANY data about your system, i cannot say nothing more.
If you have 4gb ram, IS NOT.
If you have 133Mhz CPU, we need to talk about it :)

Related

Throughput vs latency in computer architecture

I've come across articles on "through-put vs latency" in contexts like networking e.g. https://homepage.cs.uri.edu/~thenry/resources/unix_art/ch12s04.html But in the context of computer architecture / operating systems, I'm not able to understand why would there be a trade-off between latency (response time of a program) and through-put (how many programs we're able to complete in a unit of time, say per hour). Is this solely due to the fact that we can choose to parallelize processing of multiple programs / requests leading to overheads like context switches & sharing of caches which make the start-to-end response time per process to be worse? Or am I missing something here?

In terms of single instructions in a superscalar pipelined out-of-order exec CPU, throughput vs. latency is very important because the CPU is trying to extract parallelism from an instruction stream that has to be executed as if in serial program order. See Assembly - How to score a CPU instruction by latency and throughput and the bottom of my answer on latency vs throughput in intel intrinsics for example.
In terms of OS decisions that affect throughput vs. latency on a much longer timescale than a few clock cycles, that's a totally separate question.
One of the major factors there is choosing how to use the available physical RAM, and whether to page out (to a swap file) infrequently used code / data to make more room to cache disk files. (e.g. Linux's vm.swappiness is widely considered a key tunable in terms of setting it differently between servers and desktops. https://unix.stackexchange.com/questions/88693/why-is-swappiness-set-to-60-by-default).
If you alt-tab to a window when many pages of that process have been paged out, it will take some time before the process can redraw its window. (Multiple hard page faults, can be quite slow especially if paging on a rotational disk, not SSD.) So to optimize for latency, you want the kernel to not aggressively swap out pages from running processes, even if they've been idle for a few hours. Those pages, if they'd been free, could have improved throughput for other processes by acting as buffers / cache.
A related factor is I/O scheduling: trying to group IO requests together to minimize HD seek times (for higher throughput and lower average latency), but sometimes at the expense of delaying a few requests for a longer time (higher worst-case latency). Linux for example has many to choose from, including deadline, Completely Fair Queuing (CFQ), and the original elevator (just grouping requests by locality without consideration of fairness or latency). https://wiki.archlinux.org/title/improving_performance#Input/output_schedulers
CPU scheduling is also a factor: a context-switch hurts throughput, as it takes time itself and caches will likely be cold for the new task on this CPU. You also have to run the kernel's schedule() function to decide which task to run next, so that takes away some time from real work.
To minimize latency (for example between a socket message being sent to a process and it waking up when its poll or select system call returns), you want a short timeslice, like Linux HZ=1000. (Timer interrupts every 1 ms to run the scheduler). And you want to be able to pre-empt even the kernel itself, instead of waiting until the kernel is ready to return to the old user-space to consider the possibility of running a different user-space task.
But neither of these helps throughput, and in fact hurt (assuming the workload has enough parallelism to not bottleneck on latency). So HZ=100 was the default for "server" Linux builds, vs. 1000 on "desktop" builds tuned for interactive use. (Modern Linux can be "tickless", not using a fixed timer interrupt on every core at all, instead deciding when to schedule the next interrupt on a case by case basis.)
Real-time kernels take this even further, spending more time on finer-grained locking and stuff like that to enable pausing work and coming back to it later to minimize interrupt latency and other latencies between it being time to do something and actually starting to do that thing. (There are real-time patches for Linux, and there are also totally separate kernels built from the ground up for real-time operation.)
If you have an embedded system controlling a motor or something, you absolutely need hard real-time latency guarantees that it will never take longer than say 1 millisecond from an interrupt pin being asserted to the interrupt handler starting to run.
(Designing the system to make these guarantees possible often comes at the cost of throughput. e.g. obviously you have to pin some memory to make it not swappable, if we're talking about user-space, making it unavailable for cache even if it goes untouched for days.)

How to troubleshoot and reduce communication overhead on Rockwell ControlLogix

Need help. We have a plc that's cpu keeps getting maxed out. We've already upgraded it once. Now we need work on optimize it.
We have over 50 outgoing msg instructions, 60 incoming, and 103 number of ethernet devices (flow meters, drives, etc) I've gone through and tried to make sure everything is cached that can be, only instructions that are currently needed are running, and communication to the same plc happen in the same scan, but I haven't made a dent.
I'm having trouble identifying which instructions are significant. It seems the connections will be consolidated so lots of msgs shouldn't be too big of a problem. Considering Produced & Consumed tags but our team isn't very familiar with them and I believe you have to do a download to modify them, which is a problem. Our IO module RPIs are all set to around 200ms, but that didn't seem to make a difference (from 5ms).
We have a shutdown this weekend and I plan on disabling everything and turning it back on one part at a time to see where the load is really coming from.
Does anyone have any suggestions? The task monitor doesn't have a lot of detail that I can understand, i.e. It's either too summarized or too instant for me to make heads or tales of it. Here is a couple screens from the Task Monitor to shed some light on what I'm seeing.

First question coming to mind is are you using the Continues Task or is all in Periodic tasks?

I had a similar issue many years ago with a CLX. Rockwell suggested increasing the System Overhead Time Slice to around 40 to 50%. The default is 20%.
Some details:
Look at the System Overhead Time Slice (go to Advanced tab under Controller Properties). Default is 20%. This determines the time the controller spends running its background tasks (communications, messaging, ASCII) relative to running your continuous task.
From Rockwell:
For example, at 25%, your continuous task accrues 3 ms of run time. Then the background tasks can accrue up to 1 ms of run time, then the cycle repeats. Note that the allotted time is interrupted, but not reduced, by higher priority tasks (motion, user periodic or event tasks).
Here is a detailed Word Doc from Rockwell:
https://rockwellautomation.custhelp.com/ci/fattach/get/162759/&ved=2ahUKEwiy88qq0IjeAhUO3lQKHf01DYcQFjADegQIAxAB&usg=AOvVaw125pgiSor_bf-BpNSvNVF8
And here is a detailed KB from Rockwell:
https://rockwellautomation.custhelp.com/app/answers/detail/a_id/42964

What makes RTOS behaviour predictable ?

How can you ensure that interrupt latency will not exceed a certain value when there may be other variables and factors involved, like the hardware ?

Hardware latency is predictable. It doesn't have to be constant, but it definitely is bounded - for example interrupt entry is usually 12 cycles, but sometimes it may take 15 cycles.
RTOS latency is predictable. It also is not constant, but for example you can be certain, that the RTOS does not block the interrupts for longer than 1000 cycles at any time. Usually it will block them for much shorter periods of time, but never longer than stated.
If only your application doesn't do something strange (like a while (1); in the thread with highest possible priority), then the latency of the whole system will be a sum of hardware latency and RTOS latency.
The important fact here is that using real-time operating system to write your application is not the only requirement for the application to also be real-time. In your application you have to ensure that the real-time constraints are not violated. The main job of RTOS is to NOT get in your way of doing that, so it may not introduce random/unpredictable delays.
Generally the most important of the "predictable" things in RTOS is that the highest priority thread that is not blocked is executing. Period. In a GPOS (like the one on your desktop computer, in tablets or in smartphones), this is not true, because the scheduler actively prevents low priority threads from starvation, by allowing them to run for some time, even if there are more important things to do right now. This makes the behaviour of the application unpredictable, because one day it may react within 10us, while on the other day it may react within 10s, because the scheduler decided it's a great moment to save the logs to hard drive or maybe do some garbage collection.
Alternatively you can think that for RTOS the latency is in the range of microseconds, maybe single milliseconds. For a GPOS the max latency would probably be something like dozens of seconds.

RTOS example where GPOS will most likely fail

I want to know a few application examples where one needs to use RTOS in order to ensure a working system.
I did some google search and whatever examples I found, I feel could be implemented using a windows or linux system.

The primary difference between an RTOS and a GPOS is that an RTOS guarantees deterministic response. That is to say that the worst case response time to an event is precisely bounded (and usually fast). A GPOS schedules processes generally on "balanced load" basis - it assumes that all processes and events are of equal importance and will be allotted a "fair" share of processor resources. For that reason when a process has the CPU, unless it yields "cooperatively" it will have sole use of the CPU for the duration of its time slot (assuming a single core - multi-core processors allow true concurrency, but the GPOS still allots the cores of a balanced load basis). A time slot may be several tens of milliseconds, and the time taken to service a particular process will depend greatly on the number of processes simultaneously demanding CPU time. Outside perhaps of implementing a kernel level driver, achieving timing constraints of a few tens of microseconds (or less) is not possible (or desirable) in a GPOS.
If your application is what Microsoft's marketing used to call "soft" real-time (i.e. not real time at all) that a GPOS may suit. Linux can be built with "real-time" scheduling support, but it does not really make Linux suited to a large set of "hard" real-time tasks, and it is still "soft" in the sense that most of the time it will meet deadlines, but in some outlier conditions it may fail. If that is your medical life-support system, you probably don't want to trust to that!
As an example of an attempt to run essentially real-time tasks on a GPOS that fails, years ago when MMX instructions were added to Pentium processors (running typically at 60MHz then), someone had the bright idea of "Host Signal Processing", a method applied to reduce the cost of PSTN modems (dial-up) by performing the signal processing on the PC rather than using a dedicated processor or DSP in the modem hardware - these "modems" were not really modems at all; they were telephone interfaces and digital converters for modem software. At the time I worked for a company producing PSTN modem test equipment, and we tried one of these early HSP modems, and it worked right up until you launched Microsoft Word (or pretty much any large application), when it would instantly drop the connection. Things improved as PCs became faster, but the point is that it was not guaranteed to work - it just mostly did.
Another example I have worked on is on a carton loading machine in food packaging. The product is inserted into the carton, a glue stripe applied, and the closure folded. The carton is moving continuously during this process an the timing of the glue gun is critical - for a glue stripe to be accurate to within one millimetre on a carton moving at one metre per second requires timing within one millisecond.
Another example is that of TDMA communication as used in digital telephony for example. Such communication allocates a time slot for each stations transmission and failure to transmit in exactly the correct time slot, or encroaching on the time slot of another station is unacceptable. Such systems are globally synchronised to atomic-clock accuracy (typically derived from a GPS receiver). A GSM time slot for example is 577 microseconds, in this time, the transmitter must ramp-up the transmitter power, transmit the data and ramp-down
In short any example that requires 100 percent deterministic timing needs an RTOS. If your timing constraints are say > 100ms, and a small probability of failure to meet timing is tolerable, then a GPOS may work all or most of the time. If timing constraints are sub-millisecond or the cost or consequences of failure unacceptable, then an RTOS is appropriate.

Operating System vs Monitor

Without going into details, how is a Monitor different from an OS?
I read that first there was Serial Processing in the earlier days, and then Monitors and now OS.

Monitor in this context means Batch Monitor.
In the 1950s - mid 60s, before we had true operating systems, we had Batch Monitors. You would "program" the job onto punch cards and put them on an input queue that the machine would process one by one.
The programmer would sit in front of a monitor, which would display memory dumps, debugging information, etc - it was an incredibly tedious process.
Of course the major drawback of a Batch Monitor is that the CPU was often idle. Because CPU speeds are so much faster than I/O speed, the machine would spend the majority of the time reading in the cards (I/O) while the CPU waited.
Nowadays, modern operating systems can run several processes at once and optimize CPU utilization. When a process on the run queue needs to do I/O, the OS puts it on another queue, and the CPU starts processing the next job. When the I/O is done, that process is moved back to the run queue. This way, the CPU is always doing something.
Edit:
After looking up "batch monitor" and not finding many references to it, it seems that it is more commonly referred to as a "batch system" - here's a book for reference; should be able to find a pdf version online:
Modern Operating Systems.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse