Custom DispatchQueue quality of service - swift

Is there a way to create a custom DispatchQueue quality of service with its own custom "speed"? For example, I want a QoS that's twice as slow as .utility.
Ideas on how to solve it
Somehow telling the CPU/GPU that we want to run the task every X operation cycles? Not sure if that's directly possible with iOS.
This is a really bad hack which produces messy code and doesn't really solve the issue if 1 line of code runs for several seconds, but we can introduce a wait after every line of code.
In SpriteKit/SceneKit, it's possible to slow down time. Is there a way to utilize that somehow to slow down an arbitrary piece of code?
Blocking the thread every X seconds so that it slows down - not sure if possible without sacrificing app speed

There is no mechanism in iOS or any other Cocoa platform to control the "speed" (for any meaning of that word) of a work item. The only tool offered us is some control over scheduling. Once your work item is scheduled, it will get 100% (*) of the CPU core until it ends or is preempted. There is no way to be asked to be preempted more often (and it would be expensive to allow that, since context switches are expensive).
The way to manage how much work is done is to directly manage the work, not preemption. The best way is to split up the work into small pieces, and schedule them over time and combine them at the end. If your algorithm doesn't support that kind of input segmentation, then the algorithm's main "loop" needs to limit the number of iterations it performs (or the amount of time it spends iterating), and return at that point to be scheduled later.
If you don't control the algorithm code, and you cannot work with whoever does, and you cannot slice your data into smaller pieces, this may be an unsolvable problem.
(*) With the rise of "performance" cores and other such CPU advances, this isn't completely true, but for this question it's close enough.

Technically you cannot alter the speed on the QoS such as .background or .utility or any other Qos.
The way to handle this is to choose the right QoS based on the task you want to perform.
The higher the QoS is, the more resources the OS will spend on it and descends when you use a lower one.

Related

The reason(s)/benefit(s) to use realtime operating system instead of while-loop on MCU

I'm working on a wheeled-robot platform. My team is implementing some algorithms on MCU to
keep getting sensors reading (sonar array, IR array, motor encoders, IMU)
receive user command (via a serial port connected to a tablet)
control actuators (motors) to execute user commands.
keep sending sensor readings to the tablet for more complicated algorithms.
We currently implement everything inside a global while-loop, while I know most of the other use-cases do the very same things with a real-time operating system.
Please tell me the benefits and reasons to use a real-time os instead of a simple while-loop.
Thanks.
The RTOS will provide priority based preemption. If your code is off parsing a serial command and executing it, it can't respond to anything else until it returns to your beastly loop. An RTOS will provide the abstractions needed for an instant context switch based on an interrupt event. Otherwise the worst-case latency of an event response is going to be that of the longest possible excursion out of the main loop, and sometimes you really do need long-running processes. Such as, for example, to update an LCD panel or respond to USB device enumeration. Preemption permits you to go off and do these things safe in the knowledge that a 16-bit timer running at the CPU clock isn't going to roll over several times before it's done. While for simple control jobs a loop is sufficient, the problem starting with it is when you get into something like USB device enumeration it's no longer practical and will need a full rewrite. By starting with a preemptive framework like an RTOS provides, you have a lot more future flexibility. But there's definitely a bit more up-front work, and definitely a learning curve.
"Real Time" OS ensures your task periodicity. If you want to read sensors data precisely at every 100msec, simple while loop will not guarantee that. On other hand, RTOS can easily take care of that.
RTOS gives you predictibility. An operation will be executed at given time and it will not be missed.
RTOS gives you Semaphores/Mutex so that your memory will not be corrupted or multiple sources will not access buffers.
RTOS provides message queues which can be useful for communication between tasks.
Yes, you can implement all these features in While loop, but then that's the advantage! You get everything ready and tested.
If your while loop works (i.e. it fulfills the real-time requirements of your system), and it's robust, maintainable, and somewhat extensible, then there probably is no benefit to using a real-time operating system.
But if your while-loop can't fulfill the real-time requirements or is overly complex or over-extended (i.e., any change requires further tuning and tweaking to restore real-time performance), then you should consider another architecture.
An RTOS architecture is one popular next step beyond the super-loop. An RTOS is basically a tool for managing software complexity. It allows you to divide a complex set of software requirements into multiple threads of execution. When done properly, each individual thread has a relatively simple set of requirements and becomes easier to implement. And thread prioritization can make it easier to fulfill the real-time requirements of the application. These are basically the benefits of employing an RTOS.
An RTOS is not a panacea, however. An RTOS increases the overall system complexity and opens you up to new types of bugs (such as deadlocks). It requires knowledge and experience to design and implement an RTOS based program effectively. Consider alternatives such as Multi-Rate Main Loop Tasking or an event-based state machine architecture (such as QP). These alternatives might be simpler, easier to understand, or more compatible with your way of designing software.
There are a couple huge advantage that a RTOS multitasker has:
Very good I/O performance: an RTOS can set a waiting thread ready as soon as an I/O action that it requested has completed, so latency in handling reads/writes/whatever can be low. A looped, polled design cannot respond to I/O completions until it gets round to checking the I/O status, (either directly or my polling some volatile flag set by an interrupt-handler).
Indpendent functionality: The ease of implementing isolated subsystems for serial comms, actuators etc. may well be one for you, as suggested in the other answers. It's very reassuring to know that any extra delay in, say some serial exchange, will not adversely affect timing elsewhere. You need to wait a second? No problem: sleep(1000) and you're done - no effect on the other subsystems:) It's the difference between 'no, I cannot add a network server, it would change all the timing and I would have to retest everything' and 'sure, there's plenty of CPU free, I already have the code from another job and I just need another thread to run it'.
There ae other advanatges that help offset the added annoyance of having to program a preemptive multitasker with its critical sections, semaphores and condvars.
Obviously, if your hardware has multiple cores, the RTOS helps - it is designed to share out available CPU execution cycles just like any other resource, and adding cores means more cycles.
In the end, though, it's the I/O performance and functional isolation that's the big win.
Some of the suggestions in other answers may help, either instead of, or together with, an RTOS. When controlling multiple I/O hardware, eg. sensors and actuators, an event-driven state-machine is a very good idea indeed. I often implement that by queueing all events into one thread, (producer-consumer queue), that has sole access to the state data and implements the state-machine, so serializing actions.
Whether the advantages are worth the candle is between you and your requirements:)
RTOS is not instead of while loop - it is while loop + tools which organize your tasks. How do they organize your tasks? Assign priorities to them, decide how much time each one have for a job and/or at what time it should start/end. RTOS also layers your software, i.e harwdare related stuff, application tasks, etc. Aside of that gives you data structures, containers, ready to use interfaces to handle common tasks so you don't have to implement your own i.e allocate some memory for you, lock access for some resources and so on.

NSThreading for speed

I'm working on a game sim and want to speed up the match simulation bit. On a given date there may be 50+ matches that need simulating. Currently I loop through each and tell them to simulate themselves, but this can take forever. I was hoping
1) Overlay a 'busy' screen
2) Launch a thread for each
3) When the last thread exits, remove the overlay.
Now I can do 1 & 2, but I cannot figure out how to tell when the last thread is finished, because the last thread I detach may not be the last thread finished. What's the best way to do that?
Also, usually threads are used so that work can be done in the background while the user does other stuff, I'm using it slightly different. My app is a core-data app and I want to avoid the user touching the store in other ways while i'm simulating the matches. So I want single-threading most of the time, but then multithreading for this situation because of how long the sim engine takes. If someone has other ideas for this approach I'm open.
Rob
Likely you want to use NSOperation and NOT 50 threads - 50 threads is not healthy on an iPhone, and NSOperations are easier to boot. It may be that you are killing performance (it would be my guess) by trying to run 50 at once. NSOperation is designed for exactly this problem. Plus its easy to code.
My only problem with NSOperation is that they don't have a standard way to tell the caller that they are done.
You could periodically poll the NSOperationQueue - when its count is 0 there are none left. You could also make each operation increment some counter - when the count is 50 you are done. Or each operation could post a notification using performSelectorOnMainThread on the main thread that its done.
You should see a boost in performance with even a single core - there are lots of times that the main thread is blocked waiting for user input/graphics drawing/etc. Plus multicore phones and iPads will likely be out within a year (total guess - but they are coming).
Also make sure you look at the operation with Instruments. It may be that you can speed the calculations up be a factor of 2 or even 10x!
You're on a single core, so threading probably won't help much, and the overhead may even slow things down.
The first thing to do is use Instruments to profile your code and see what can be sped up. Once you've done that you can look at some specific optimizations for the bottle necks.
One simple approach (if you can use GCD) is dispatch_apply(), which'll let you loop over your matches, automatically thread them in the best manner for your hardware, and doesn't return until all are complete.
Most straightforward solution would be to have all your threads 'performSelectorOnMainThread' to a particular method that decrements a counter, before they exit. And let the method remove the overlay screen when the counter it decremented reaches zero.
Simulating all the matches concurrently may not necessarily improve performance though.
You may get the solution for your specific question from #drowntoge, but generally, I want to give you advice about multithreading:
1/ It is not always speed up your program like Graham said. Your iPhone only has single core.
2/ If your program has some big IO, database or networking process that takes time, then you may consider multithreading because now data processing does not take all the time, it needs to wait for loading data. In this case, multithreading will significantly boost up your performance. But you still need to be careful because thread switching has overhead.
Maybe you only need 1 thread for IO processing and then has a cache layer to share the images/data. Then, you only need the main thread to loop and do simulation
3/ If you want 50 simulation seems to happen at the same time for user to watch, multithreading is also required:)
If you use threading, you won't know in what order the CPU is doing your tasks, and you could potentially be consuming a lot of thread scheduling resources. Better to use an NSOperationQueue and signal completion of each task using performSelectorOnMainThread. Decrementing a counter has already been mentioned, which may be useful for displaying a progress bar. But you could also maintain an array of 50 busy flags and clear them on completion, which might help debugging whether any particular task is slow or stuck if you mark completion with a time stamp.

NSOperationQueue dispatching threads slowly?

I'm writing my first multithreaded iPhone apps using NSOperationQueue because I've heard it's loads better and potentially faster than managing my own thread dispatching.
I'm calculating the outcome of a Game of Life board by splitting the board into seperate pieces and having seperate threads calculate each board and then splicing them back together, to me this seems like a faster way even with the tremendous overhead of splitting and splicing. I'm creating a NSInvocationOperation object for each board and then sending them to the OperationQueue. After I've sent all the pieces of the board I sit and wait for them all to finish calculating with the waitUntilAllOperationsAreFinished call to the OperationQueue.
This seems like it should work, and it does it works just fine BUT the threads get called out very slooooowwwlllyyyyyy and so it actually ends up taking longer for the multithreaded version to calculate than the single threaded version! OH NOES! I monitored the creation and termination of the NSOperations sent to the NSOperationQueue and found that some just sit in the Operation Queue do-diddly-daddlin for awhile before they get called much later on. At first I thought "Hey maybe the queue can only process so many threads at a time" and then bumped the Queues maxConcurrentOperationCount up to some arbitrary high number (well above the amount of board-pieces) but I experienced the same thing!
I was wondering if maybe someone can tell me how to kick NSOperationQueue into "overdrive" so to say so that it dispatches its queues as quickly as possible, either that or tell me whats going on!
Threads do not magically make your processor run faster.
On a single-processor machine, if your algorithm takes a million instructions to execute, splitting it up into 10 chunks of 100,000 instructions each and running it on 10 threads is still going to take just as long. Actually, it will take longer, because you've added the overhead of splitting, merging, and context switching among the threads.
The queue is still fundamentally limited by the processing power of the phone. If the phone can only run two processes simultaneously, you will get (at most) close to a two-fold increase in speed by splitting up the task. Anything more than that, and you are just adding overhead for no gain.
This is especially true if you are running a processor and memory intensive routine like your board calculation. NSOperationQueue makes sense if you have several operations that need to wait for extended periods of time. User interface loops and network downloads would be excellent examples. In those cases, other operations can complete while the inactive ones are waiting for input.
For something like your board, the operation for each piece of the grid never has any wait condition. It is always churning away at full speed until done.
See also: iPhone Maximum thread limit? and concurrency application design

when do we prefer Round robin over FCFS and vice-versa?

i know this depends on the design, but i was asked this question and no assumptions taken..
what should i answer and
First off, this sounds suspiciously like a homework question. If that's the case, I recommend researching on your own.
By FCFS I assume you mean "First Come First Serve", and if I recall that's a system where processes are executed to completion in the order that they are provided to the scheduler, yes?
If so the basic guideline would be this: Use round robin if it is desirable to allow long running processes to execute while not interfering with shorter ones, with the side effect that order of completion is not guaranteed. Round Robin can suffer if there are many processes in the system, since it will take longer for each process to complete since the round trip is longer.
If you do need a guaranteed order of completion, FCFS is a better choice but long running processes can stall the system. However, each process is given the full attention of the system and can complete in the fastest possible time, so that can be a benefit.
In the end it really does come down to not necessarily design but need: Do I need semi-synchronous execution or do I need in-order execution? Is it to my benefit for processes to take longer but compute in sync or will I be better off if everything executes as fast as possible? The needs of the system dictate the model to use.
Edit: Wikipedia has a pretty good breakdown of these and other simple scheduling methods here

How do Real Time Operating Systems work?

I mean how and why are realtime OSes able to meet deadlines without ever missing them? Or is this just a myth (that they do not miss deadlines)? How are they different from any regular OS and what prevents a regular OS from being an RTOS?
Meeting deadlines is a function of the application you write. The RTOS simply provides facilities that help you with meeting deadlines. You could also program on "bare metal" (w/o a RTOS) in a big main loop and meet you deadlines.
Also keep in mind that unlike a more general purpose OS, an RTOS has a very limited set of tasks and processes running.
Some of the facilities an RTOS provide:
Priority-based Scheduler
System Clock interrupt routine
Deterministic behavior
Priority-based Scheduler
Most RTOS have between 32 and 256 possible priorities for individual tasks/processes. The scheduler will run the task with the highest priority. When a running task gives up the CPU, the next highest priority task runs, and so on...
The highest priority task in the system will have the CPU until:
it runs to completion (i.e. it voluntarily give up the CPU)
a higher priority task is made ready, in which case the original task is pre-empted by the new (higher priority) task.
As a developer, it is your job to assign the task priorities such that your deadlines will be met.
System Clock Interrupt routines
The RTOS will typically provide some sort of system clock (anywhere from 500 uS to 100ms) that allows you to perform time-sensitive operations.
If you have a 1ms system clock, and you need to do a task every 50ms, there is usually an API that allows you to say "In 50ms, wake me up". At that point, the task would be sleeping until the RTOS wakes it up.
Note that just being woken up does not insure you will run exactly at that time. It depends on the priority. If a task with a higher priority is currently running, you could be delayed.
Deterministic Behavior
The RTOS goes to great length to ensure that whether you have 10 tasks, or 100 tasks, it does not take any longer to switch context, determine what the next highest priority task is, etc...
In general, the RTOS operation tries to be O(1).
One of the prime areas for deterministic behavior in an RTOS is the interrupt handling. When an interrupt line is signaled, the RTOS immediately switches to the correct Interrupt Service Routine and handles the interrupt without delay (regardless of the priority of any task currently running).
Note that most hardware-specific ISRs would be written by the developers on the project. The RTOS might already provide ISRs for serial ports, system clock, maybe networking hardware but anything specialized (pacemaker signals, actuators, etc...) would not be part of the RTOS.
This is a gross generalization and as with everything else, there is a large variety of RTOS implementations. Some RTOS do things differently, but the description above should be applicable to a large portion of existing RTOSes.
In RTOSes the most critical parameters which should be taken care of are lower latencies and time determinism. Which it pleasantly does by following certain policies and tricks.
Whereas in GPOSes, along with acceptable latencies the critical parameters is high throughput. you cannot count on GPOS for time determinism.
RTOSes have tasks which are much lighter than processes/threads in GPOS.
It is not that they are able to meet deadlines, it is rather that they have deadlines fixed whereas in a regular OS there is no such deadline.
In a regular OS the task scheduler is not really strict. That is the processor will execute so many instructions per second, but it may occasionally not do so. For example a task might be pre-empted to allow a higher priority one to execute (and may be for longer time). In RTOS the processor will always execute the same number of tasks.
Additionally there is usually a time limit for tasks to completed after which a failure is reported. This does not happen in regular OS.
Obviously there is lot more detail to explain, but the above are two of the important design aspects that are used in RTOS.
Your RTOS is designed in such a way that it can guarantee timings for important events, like hardware interrupt handling and waking up sleeping processes exactly when they need to be.
This exact timing allows the programmer to be sure that his (say) pacemaker is going to output a pulse exactly when it needs to, not a few tens of milliseconds later because the OS was busy with another inefficient task.
It's usually a much simpler OS than a fully-fledged Linux or Windows, simply because it's easier to analyse and predict the behaviour of simple code. There is nothing stopping a fully-fledged OS like Linux being used in a RTOS environment, and it has RTOS extensions. Because of the complexity of the code base it will not be able to guarantee its timings down to as small-a scale as a smaller OS.
The RTOS scheduler is also more strict than a general purpose scheduler. It's important to know the scheduler isn't going to change your task priority because you've been running a long time and don't have any interactive users. Most OS would reduce internal the priority of this type of process to favour short-term interactive programs where the interface should not be seen to lag.
You might find it helpful to read the source of a typical RTOS. There are several open-source examples out there, and the following yielded links in a little bit of quick searching:
FreeRTOS
eCos
A commercial RTOS that is well documented, available in source code form, and easy to work with is µC/OS-II. It has a very permissive license for educational use, and (a mildly out of date version of) its source can be had bound into a book describing its theory of operation using the actual implementation as example code. The book is MicroC OS II: The Real Time Kernel by Jean Labrosse.
I have used µC/OS-II in several projects over the years, and can recommend it.
"Basically, you have to code each "task" in the RTOS such that they will terminate in a finite time."
This is actually correct. The RTOS will have a system tick defined by the architecture, say 10 millisec., with all tasks (threads) both designed and measured to complete within specific times. For example in processing real time audio data, where the audio sample rate is 48kHz, there is a known amount of time (in milliseconds) at which the prebuffer will become empty for any downstream task which is processing the data. Therefore using the RTOS requires correct sizing of the buffers, estimating and measuring how long this takes, and measuring the latencies between all software layers in the system. Then the deadlines can be met. Otherwise the applications will miss the deadlines. This requires analysis of the worst-case data processing throughout the entire stack, and once the worst-case is known, the system can be designed for, say, 95% processing time with 5% idle time (this processing may not ever occur in any real usage, because worst-case data processing may not be an allowed state within all layers at any single moment in time).
Example timing diagrams for the design of a real time operating system network app are in this article at EE Times,
PRODUCT HOW-TO: Improving real-time voice quality in a VoIP-based telephony design
http://www.eetimes.com/design/embedded/4007619/PRODUCT-HOW-TO-Improving-real-time-voice-quality-in-a-VoIP-based-telephony-design
I haven't used an RTOS, but I think this is how they work.
There's a difference between "hard real time" and "soft real time". You can write real-time applications on a non-RTOS like Windows, but they're 'soft' real-time:
As an application, I might have a thread or timer which I ask the O/S to run 10 times per second ... and maybe the O/S will do that, most of the time, but there's no guarantee that it will always be able to ... this lack of guarantee is why it's called 'soft'. The reason why the O/S might not be able to is that a different thread might be keeping the system busy doing something else. As an application, I can boost my thread priority to for example HIGH_PRIORITY_CLASS, but even if I do this the O/S still has no API which I can use to request a guarantee that I'll be run at certain times.
A 'hard' real-time O/S does (I imagine) have APIs which let me request guaranteed execution slices. The reason why the RTOS can make such guarantees is that it's willing to abend threads which take more time than expected / than they're allowed.
What is important is realtime applications, not realtime OS. Usually realtime applications are predictable: many tests, inspections, WCET analysis, proofs, ... have been performed which show that deadlines are met in any specified situations.
It happens that RTOSes help doing this work (building the application and verifying its RT constraints). But I've seen realtime applications running on standard Linux, relying more on hardware horsepower than on OS design.
... well ...
A real-time operating system tries to be deterministic and meet deadlines, but it all depends on the way you write your application. You can make a RTOS very non real-time if you don't know how to write "proper" code.
Even if you know how to write proper code:
It's more about trying to be deterministic than being fast.
When we talk about determinism it's
1) event determinism
For each set of inputs the next states and outputs of a system are known
2) temporal determinism
… also the response time for each set of outputs is known
This means that if you have asynchronous events like interrupts your system is strictly speaking not anymore temporal deterministic. (and most systems use interrupts)
If you really want to be deterministic poll everything.
... but maybe it's not necessary to be 100% deterministic
The textbook/interview answer is "deterministic pre-emption". The system is guaranteed to transfer control within a bounded period of time if a higher priority process is ready to run (in the ready queue) or an interrupt is asserted (typically input external to the CPU/MCU).
They actually don't guarantee meeting deadlines; what they do that makes them truly RTOS is to provide the means to recognize and deal with deadline overruns. 'Hard' RT systems generally are those where missing a deadline is disastrous and some kind of shutdown is required, whereas a 'soft' RT system is one where continuing with degraded functionality makes sense. Either way an RTOS permits you to define responses to such overruns. Non RT OS's don't even detect overruns.
Basically, you have to code each "task" in the RTOS such that they will terminate in a finite time.
Additionally your kernel would allocate specific amounts of time to each task, in an attempt to guarantee that certain things happened at certain times.
Note that this is not an easy task to do however. Imagine things like virtual function calls, in OO it's very difficult to determine these things. Also an RTOS must be carefully coded with regard to priority, it may require that a high priority task is given the CPU within x milliseconds, which may be difficult to do depending on how your scheduler works.