I employ a native C library in my iOS app. Calling the code directly puts it on the main thread (of course depending on from where I call it) so I have created a new queue in order to leave the main thread for the call.
let cProcessingQueue = DispatchQueue(label: "cProcessingQueue")
cProcessingQueue.async {
//call that launches the c operation
}
This keeps the UI responsive. However, on larger data sets this has the potential to throw crashes that contain statements like the following:
Action taken: Process killed
CPU: 48 seconds cpu time over 54 seconds (88% cpu average), exceeding limit of 80% cpu over 60 seconds
Is there I can apply to have some kind of hard(ish) limit like "qos: .background will always limit the usage to 50%" in order to slow down the C lib execution in first place? I'm fine with using NSOperation or whatever there is but that would probably the most convenient solution.
A few other ideas I can think of are:
Feeding smaller chunks of data with some delay in between, however then we have arbitrary wait times that go unused
Watching the CPU time as shown in Get CPU usage IOS Swift and modifying the C library to support halting executions
but all in all, the "thread-related limit" would definitely the easiest solution out there.
Related
Do the family of sleep functions (sleep(), nanosleep()) cause timer interrupts once they complete (i.e., are done sleeping)? If not, how does the OS know exactly when they are done? If so, I understand timers have a high interrupt priority. Does this mean a program using sleep() once awoken will likely cause another program running on one of the CPUs (in a multi-processor) to be removed in favor of the recently awoken program?
Does the sleep() function cause a timer interrupt upon completion?
Maybe.
For keeping track of time delays there's 2 common ways it could be implemented:
a) A timer IRQ occurs at a fixed frequency (e.g. maybe every 1 millisecond). When the IRQ occurs the OS checks if any time delays expired and deals with them. In this case there's a compromise between precision and overhead (to get better precision you need to increase the "IRQs per second" which increases the overhead of dealing with all the IRQs).
b) The OS re-configures the timer to generate an IRQ when the soonest delay should expire whenever necessary (when the soonest delay is cancelled, a sooner delay is created, or the soonest delay expires). This has no "precision vs. overhead" compromise, but has more overhead for re-configuring the timer hardware. This is typically called "tickless" (as there's no regular/fixed frequency "tick").
Note that modern 80x86 systems have a local APIC timer per CPU that supports "IRQ on TSC deadline". For "tickless", this means you can normally get better than 1 nanosecond precision without much need for locks (using "per CPU" structures to keep track of time delays); and the cost of re-configuring the timer is very small (as the timer hardware is built directly into the CPU itself).
For "tickless" (which is likely much better for modern systems) you would end up with a timer IRQ when "sleep()" expires most of the time (unless some other delay expires at the same/similar time).
Does this mean a program using sleep() once awoken will likely cause another program running on one of the CPUs (in a multi-processor) to be removed in favor of the recently awoken program?
Whether a recently unblocked task preempts immediately depends on:
a) The scheduler design. For some schedulers (e.g. naive "round robin") it may never happen immediately.
b) The priorities of the unblocked task and the currently running task/s.
c) Optimizations. Task switches cost overhead so attempts to minimize the number of task switches (e.g. postponing/skipping a task switch if some other task switch is likely to happen soon anyway) are practical. There's also complexity involving load balancing, power management, cache efficiency, memory (NUMA, etc) and other things that may be considered.
The Linux man pages notes:
Portability notes
On some systems, sleep() may be implemented using alarm(2) and SIGALRM (POSIX.1 permits
this); mixing calls to alarm(2) and sleep() is a bad idea.
To be honest, I don't know if there might be a solution to my question but I'd like to catch, in Swift, when a context switch is happening.
I was imaging a func which takes a long time in order to be completed such as a write operation on a remote server and I was thinking if there might be a way to understand when (which line at least) the thread which is executing that task is performing a context switch because another task waiting for a long time has to be executed.
Sorry if for you might seem a stupid question or if I made mistakes whilst trying to explain the above
EDIT:
I'm talking about context switches that are happening automatically requested by the scheduler.. So imagine again we are in the middle of this long function which does tons of operations and the scheduler gave this task an amount of seconds, for example 10 seconds in order to make it complete. If the process runs out of time and doesnt end the task, it will be suspended and for example the thread will execute another task of another process. When it ends, scheduler might think to give another try to the suspended job and the execution will be resumed starting where it has been suspended ( so will read the value from PC register and will keep going on )
You can absolutely get what you've asked for, but I have a feeling it's going to be much less useful to you than you may believe. The vast (vast, vast!) majority of context switches are going to occur at points that you probably would think of as "uninteresting." lstat64(), mach_vm_map_trap(), mach_msg_trap(), mach_port_insert_member_trap(), kevent_id(), the list goes on. Most threads spend most of their time deep in the OS stack. "A write operation on a remote server" isn't going to block after it takes some long period of time. It's going to proactively block itself because it knows it's going to take a (mind-bogglingly) long period of time.
Even so, you can certainly explore this with Instruments. Just choose the System Trace profile and it'll show you all the threads and the system cores and how your threads and all the other threads on the device are getting scheduled, every system call, etc etc etc. It's a huge amount of information, so you usually only profile for a few seconds at a time. But it'll look something like this:
This is useful information if you're at the point where context switches are a major bottleneck. This might happen if you're dealing with excessive lock contention, or if you're thrashing your L1 cache because you keep getting interrupted by some other thread. So if you have some thread that you expect to stay running pretty continuously, and it's getting blocked, this is really valuable information. Or if you have two threads that you think should work back and forth smoothly, but they seem to be fighting (switching rapidly), then that's something you could work on. (But this is rarely one of the first places you'd look for performance tuning unless you're working on quite low-level code.)
From your description, I think you may have the wrong idea about the scheduler. Nothing in the scheduler is going to be on the order of 10 seconds. In the scheduler world, milliseconds are a lot of time. You should be thinking about things that take microseconds and even nanoseconds. If you're working on code that assumes fetching data from RAM is free, then you're on the wrong time scale. A network call is so ludicrously slow that you can basically estimate it as "forever." At the context-switch level you're looking at events like:
00:00.770.247 Virtual memory Zero Fill took 10.50 µs small_free_list_add_ptr ← (31 other frames)
I think its a kinda cool question but not clear.The answer is all about my understanding on your question. Normally if you implement C++ or C program for context switching, consider you write code with mutexes or semaphores.In these sections processes or threads work in critical section and sometimes there executing the context switching manually or interruption way. In iOS there are same identical implementation on concurrency such as DispatchSemaphore.(Actually mutex is a Semaphore that working with lock system.) You can read the documentation from here.
For starting to this, this is the Semaphore class definition in Swift.
class DispatchSemaphore : DispatchObject
You can init it with int value such as
let semaphore = DispatchSemaphore(value: intValue)
And if you want to use mutex variant you can easily use lock variable. Such as
var lock = Lock()
When you lock the thread with correct implementation, you will be in critical section and you can switch it with unlock or etc.
It's obviously look similar with POSIX pthread_lock_t
You can handle the context switching within the lock or semaphore critical section like that
lock.lock()
// critical section
// handle the calculation, if its longer unlock, or let it do its job
lock.unlock()
semaphore.wait(timeout: DispatchTime.distantFuture)
// critical section
// handle the calculation, if its too long signal to exit it in here, or let it do its job
// if it is a normal job, it will be signal for exit already.
semaphore.signal()
The handled answer is contains context switching in threads.
Addition to my question, when you ask about the context switching, its natural means is that changing the threads or processes basically. So when you get the data from background thread and then implement it for the UITableView' you probably will do reloadData method call in DispatchQueue.main.async . Its a common usage of context switching.
If I am running a set of processes and they all want these burst times: 3, 5, 2 respectively, with the total expected time of execution being 10 time units.
Is it possible for one of the processes to take up more that what they ask for? For example even though it asked for 3 it took 11 instead because it was waiting on the user to enter some input. So the total execution time turns out to be 18.
This was all done in a non-preemptive cpu scheduler.
The reality is that software has no idea how long anything will take - my CPU runs at a different "nominal speed" to your CPU, both our CPUs keep changing their speed for power management reasons, and the speed of software executed by both our CPUs is effected by things like what other CPUs are doing (especially for SMT/hyper-threading) and what other devices happen to be doing at the time (their effect on caches, shared RAM bandwidth, etc); and software can't predict the future (e.g. guess when an IRQ will occur and take some time and upset the cache contents, guess when a read from memory will take 10 times longer because there was a single bit error that ECC needed to correct, guess when the CPU will get hot and reduce its speed to avoid melting, etc). It is possible to record things like "start time, burst time and end time" as it happens (to generate historical data from the past that can be analysed) but typically these things are only seen in fabricated academic exercises that have nothing to do with reality.
Note: I'm not saying fabricated academic exercises are bad - it's a useful tool to help learn basic theory before moving on to more advanced (and more realistic) theory.
Instead; for a non-preemptive scheduler, tasks don't try to tell the scheduler how much time they think they might take - the task can't know this information and the scheduler can't do anything with that information (e.g. a non-preemptive scheduler can't preempt the task when it takes longer than it guessed it might take). For a non-preemptive scheduler; a task simply runs until it calls a kernel function that waits for something (e.g. read() that waits for data from disk or network, sleep() that waits for time to pass, etc) and when that happens the kernel function that was called ends up telling the scheduler that the task is waiting and doesn't need the CPU, and the scheduler finds a different task to run that can use the CPU; and if the task never calls a kernel function that waits for something then the task runs "forever".
Of course "the task runs forever" can be bad (not just for malicious code that deliberately hogs all CPU time as a denial of service attack, but also for normal tasks that have bugs), which is why (almost?) nobody uses non-preemptive schedulers. For example; if one (lower priority) task is doing a lot of heavy processing (e.g. spending hours generating a photo-realistic picture using ray tracing techniques) and another (higher priority) task stops waiting (e.g. because it was waiting for the user to press a key and the user did press a key) then you want the higher priority task to preempt the lower priority task "immediately" (e.g. because most users don't like it when it takes hours for software to respond to their actions).
I'm working on a game sim and want to speed up the match simulation bit. On a given date there may be 50+ matches that need simulating. Currently I loop through each and tell them to simulate themselves, but this can take forever. I was hoping
1) Overlay a 'busy' screen
2) Launch a thread for each
3) When the last thread exits, remove the overlay.
Now I can do 1 & 2, but I cannot figure out how to tell when the last thread is finished, because the last thread I detach may not be the last thread finished. What's the best way to do that?
Also, usually threads are used so that work can be done in the background while the user does other stuff, I'm using it slightly different. My app is a core-data app and I want to avoid the user touching the store in other ways while i'm simulating the matches. So I want single-threading most of the time, but then multithreading for this situation because of how long the sim engine takes. If someone has other ideas for this approach I'm open.
Rob
Likely you want to use NSOperation and NOT 50 threads - 50 threads is not healthy on an iPhone, and NSOperations are easier to boot. It may be that you are killing performance (it would be my guess) by trying to run 50 at once. NSOperation is designed for exactly this problem. Plus its easy to code.
My only problem with NSOperation is that they don't have a standard way to tell the caller that they are done.
You could periodically poll the NSOperationQueue - when its count is 0 there are none left. You could also make each operation increment some counter - when the count is 50 you are done. Or each operation could post a notification using performSelectorOnMainThread on the main thread that its done.
You should see a boost in performance with even a single core - there are lots of times that the main thread is blocked waiting for user input/graphics drawing/etc. Plus multicore phones and iPads will likely be out within a year (total guess - but they are coming).
Also make sure you look at the operation with Instruments. It may be that you can speed the calculations up be a factor of 2 or even 10x!
You're on a single core, so threading probably won't help much, and the overhead may even slow things down.
The first thing to do is use Instruments to profile your code and see what can be sped up. Once you've done that you can look at some specific optimizations for the bottle necks.
One simple approach (if you can use GCD) is dispatch_apply(), which'll let you loop over your matches, automatically thread them in the best manner for your hardware, and doesn't return until all are complete.
Most straightforward solution would be to have all your threads 'performSelectorOnMainThread' to a particular method that decrements a counter, before they exit. And let the method remove the overlay screen when the counter it decremented reaches zero.
Simulating all the matches concurrently may not necessarily improve performance though.
You may get the solution for your specific question from #drowntoge, but generally, I want to give you advice about multithreading:
1/ It is not always speed up your program like Graham said. Your iPhone only has single core.
2/ If your program has some big IO, database or networking process that takes time, then you may consider multithreading because now data processing does not take all the time, it needs to wait for loading data. In this case, multithreading will significantly boost up your performance. But you still need to be careful because thread switching has overhead.
Maybe you only need 1 thread for IO processing and then has a cache layer to share the images/data. Then, you only need the main thread to loop and do simulation
3/ If you want 50 simulation seems to happen at the same time for user to watch, multithreading is also required:)
If you use threading, you won't know in what order the CPU is doing your tasks, and you could potentially be consuming a lot of thread scheduling resources. Better to use an NSOperationQueue and signal completion of each task using performSelectorOnMainThread. Decrementing a counter has already been mentioned, which may be useful for displaying a progress bar. But you could also maintain an array of 50 busy flags and clear them on completion, which might help debugging whether any particular task is slow or stuck if you mark completion with a time stamp.
I'm writing my first multithreaded iPhone apps using NSOperationQueue because I've heard it's loads better and potentially faster than managing my own thread dispatching.
I'm calculating the outcome of a Game of Life board by splitting the board into seperate pieces and having seperate threads calculate each board and then splicing them back together, to me this seems like a faster way even with the tremendous overhead of splitting and splicing. I'm creating a NSInvocationOperation object for each board and then sending them to the OperationQueue. After I've sent all the pieces of the board I sit and wait for them all to finish calculating with the waitUntilAllOperationsAreFinished call to the OperationQueue.
This seems like it should work, and it does it works just fine BUT the threads get called out very slooooowwwlllyyyyyy and so it actually ends up taking longer for the multithreaded version to calculate than the single threaded version! OH NOES! I monitored the creation and termination of the NSOperations sent to the NSOperationQueue and found that some just sit in the Operation Queue do-diddly-daddlin for awhile before they get called much later on. At first I thought "Hey maybe the queue can only process so many threads at a time" and then bumped the Queues maxConcurrentOperationCount up to some arbitrary high number (well above the amount of board-pieces) but I experienced the same thing!
I was wondering if maybe someone can tell me how to kick NSOperationQueue into "overdrive" so to say so that it dispatches its queues as quickly as possible, either that or tell me whats going on!
Threads do not magically make your processor run faster.
On a single-processor machine, if your algorithm takes a million instructions to execute, splitting it up into 10 chunks of 100,000 instructions each and running it on 10 threads is still going to take just as long. Actually, it will take longer, because you've added the overhead of splitting, merging, and context switching among the threads.
The queue is still fundamentally limited by the processing power of the phone. If the phone can only run two processes simultaneously, you will get (at most) close to a two-fold increase in speed by splitting up the task. Anything more than that, and you are just adding overhead for no gain.
This is especially true if you are running a processor and memory intensive routine like your board calculation. NSOperationQueue makes sense if you have several operations that need to wait for extended periods of time. User interface loops and network downloads would be excellent examples. In those cases, other operations can complete while the inactive ones are waiting for input.
For something like your board, the operation for each piece of the grid never has any wait condition. It is always churning away at full speed until done.
See also: iPhone Maximum thread limit? and concurrency application design