Figure out when a context switch is happening in Swift - swift

To be honest, I don't know if there might be a solution to my question but I'd like to catch, in Swift, when a context switch is happening.
I was imaging a func which takes a long time in order to be completed such as a write operation on a remote server and I was thinking if there might be a way to understand when (which line at least) the thread which is executing that task is performing a context switch because another task waiting for a long time has to be executed.
Sorry if for you might seem a stupid question or if I made mistakes whilst trying to explain the above
EDIT:
I'm talking about context switches that are happening automatically requested by the scheduler.. So imagine again we are in the middle of this long function which does tons of operations and the scheduler gave this task an amount of seconds, for example 10 seconds in order to make it complete. If the process runs out of time and doesnt end the task, it will be suspended and for example the thread will execute another task of another process. When it ends, scheduler might think to give another try to the suspended job and the execution will be resumed starting where it has been suspended ( so will read the value from PC register and will keep going on )

You can absolutely get what you've asked for, but I have a feeling it's going to be much less useful to you than you may believe. The vast (vast, vast!) majority of context switches are going to occur at points that you probably would think of as "uninteresting." lstat64(), mach_vm_map_trap(), mach_msg_trap(), mach_port_insert_member_trap(), kevent_id(), the list goes on. Most threads spend most of their time deep in the OS stack. "A write operation on a remote server" isn't going to block after it takes some long period of time. It's going to proactively block itself because it knows it's going to take a (mind-bogglingly) long period of time.
Even so, you can certainly explore this with Instruments. Just choose the System Trace profile and it'll show you all the threads and the system cores and how your threads and all the other threads on the device are getting scheduled, every system call, etc etc etc. It's a huge amount of information, so you usually only profile for a few seconds at a time. But it'll look something like this:
This is useful information if you're at the point where context switches are a major bottleneck. This might happen if you're dealing with excessive lock contention, or if you're thrashing your L1 cache because you keep getting interrupted by some other thread. So if you have some thread that you expect to stay running pretty continuously, and it's getting blocked, this is really valuable information. Or if you have two threads that you think should work back and forth smoothly, but they seem to be fighting (switching rapidly), then that's something you could work on. (But this is rarely one of the first places you'd look for performance tuning unless you're working on quite low-level code.)
From your description, I think you may have the wrong idea about the scheduler. Nothing in the scheduler is going to be on the order of 10 seconds. In the scheduler world, milliseconds are a lot of time. You should be thinking about things that take microseconds and even nanoseconds. If you're working on code that assumes fetching data from RAM is free, then you're on the wrong time scale. A network call is so ludicrously slow that you can basically estimate it as "forever." At the context-switch level you're looking at events like:
00:00.770.247 Virtual memory Zero Fill took 10.50 µs small_free_list_add_ptr ← (31 other frames)

I think its a kinda cool question but not clear.The answer is all about my understanding on your question. Normally if you implement C++ or C program for context switching, consider you write code with mutexes or semaphores.In these sections processes or threads work in critical section and sometimes there executing the context switching manually or interruption way. In iOS there are same identical implementation on concurrency such as DispatchSemaphore.(Actually mutex is a Semaphore that working with lock system.) You can read the documentation from here.
For starting to this, this is the Semaphore class definition in Swift.
class DispatchSemaphore : DispatchObject
You can init it with int value such as
let semaphore = DispatchSemaphore(value: intValue)
And if you want to use mutex variant you can easily use lock variable. Such as
var lock = Lock()
When you lock the thread with correct implementation, you will be in critical section and you can switch it with unlock or etc.
It's obviously look similar with POSIX pthread_lock_t
You can handle the context switching within the lock or semaphore critical section like that
lock.lock()
// critical section
// handle the calculation, if its longer unlock, or let it do its job
lock.unlock()
semaphore.wait(timeout: DispatchTime.distantFuture)
// critical section
// handle the calculation, if its too long signal to exit it in here, or let it do its job
// if it is a normal job, it will be signal for exit already.
semaphore.signal()
The handled answer is contains context switching in threads.
Addition to my question, when you ask about the context switching, its natural means is that changing the threads or processes basically. So when you get the data from background thread and then implement it for the UITableView' you probably will do reloadData method call in DispatchQueue.main.async . Its a common usage of context switching.

Related

Serial OperationQueue with Operations synchronizing timer and sleep

I have a serial OperationQueue whose operations call usleep. I do this because the operation execution block synchronizes with a Timer that needs to repeat until a designated time.
For example, 3 operations are added to a queue with a maxconcurrent set to 1. Each operations has a timer that repeats until 10 seconds into the future. Upon firing this timer of the first operation, the next line of code is usleep(10seconds). 10 seconds later the timer completes and the thread wakes up. The next operation begins. This is done by design and works, however, I'm concerns about the implications about a sleeping thread. Is it possible that the thread was handling some other code, context switches to handle the operation, then sleeps for a long time, pausing other executions. Does swift know to let the thread execute other blocks while the operation sleeps?
Does swift know to let the thread execute other blocks while the operation sleeps?
Maybe it’s just a wording issue, but the thread is blocked until the sleep (and subsequent tasks) finish, so it’s not going to be made available to do anything else. But, while the thread is sleeping, the core can switch contexts to let some other thread run even though the thread running the operation is tied up.
So using usleep (or I might use Thread.sleep(forTimeInterval: 10)) avoids the problem of blocking the core, but it still blocks the thread. And threads are rather limited (e.g. 64 at this point). So, especially if you might have a lot of these operations going on at any given time, thereby risking exhausting the limited threads, I might advise avoid blocking the thread, too. (Then again, if you’re using a maxConcurrentOperationCount of 1, as long as you aren’t doing other things that might tie up threads, it probably wouldn’t be too serious of a problem.)
For example, I might define an asynchronous Operation subclass, and rather than sleeping, I might just asyncAfter (or use a timer) the finishing of the operation to 10 seconds in the future. That way no thread is blocked, either. Or I might consider other patterns to solve the broader problem. It’s hard to say without knowing the broader problem that you’re trying to solve.

Why is response time important in CPU scheduling?

I'm looking for an example of a job for which response time is important.
One definition of response time is:
The time taken in an interactive program from the issuance of a command to the commence of a response to that command.
I've read that response time is important for interactivity, but I can't understand why. If the job isn't fully completed, what output could be produced that would be of interest to a user?
Wouldn't the user only care about how soon a job finishes, as that's the first time any output is produced?
For example, consider these two possible schedulings of two jobs:
Case 1: |---B---|---A---|
Case 2: |-A-|---B---|-A-|
Suppose that job A and B are issued at the same time, A being a command typed in by the user and B being some background process.
The response time for job A as I understand it would be shorter in case 2. As job A finishes (and produces output) at the same time in the two cases, I don't understand how the user benefits (or even notices) the better response time in case 2.
When writing an operating system, one has to take into consideration what will the intended audience be. In some cases it matters most to finish jobs as quickly as possible (supercomputer systems), in some cases it matters most to be as responsive as possible (regular desktop systems), and in some cases it matters most to be as predictable as possible (real-time systems).
For finishing jobs as fast as possible, tasks should be interrupted the rarest possible (so big intervals between task switches are the best option). Here response time doesn't really matter much. It should be noted that task switches usually take some time (thousands of CPU cycles usually) due to having to save the state (including registers and paging structures) of the old task to memory and restore the state (including registers and paging structures) of the new task from memory. This also causes cache and TLB misses, since the cached information doesn't usually belong to the current process.
For being the most responsive possible, tasks should be interrupted as often as possible so the user doesn't experience the so-called lag. This is where response time is important. Note however that on interrupt-driven architectures (like x86) an interrupt from the keyboard or the mouse would automatically pause execution of the current task and call the interrupt handler, which processes the input and sends it to the appropriate program.
For being the most predictable possible, input should be processed neither too fast, neither too slow. This means that response time is constrained from both ways, thus being much more important than in "most responsive possible" designs. A misprediction can even be a fatal failure in mission-critical systems.
In a nutshell, importance of response time varies from design to design and can range from nearly unimportant to critical.
I think I have an answer to my own question. The problem was, I was just thinking about simple processes like ls that once issued runs for some amount of time and then, when they're finished, deliver their first and only output.
However, suppose job A in the example from the question is a program with multiple print statements. Output will in that case be produced before the process is complete (and some of the printouts may well occur during the first scheduled burst). It would thus make sense for interactivity to want to begin running such a process as soon as possible.

NSRunLoops in Cocoa?

Let's say I have 2 threads, one is the main thread and another one, a secondary thread. The main thread is being used the most, but sometimes (rarely) I want the secondary thread to do some work based on calls from the main thread. Most of the time the secondary thread should sleep. Now after some searching I understand the way to do this is to use runLoops. So I tried to read apple's docs (http://developer.apple.com/library/ios/#documentation/Cocoa/Conceptual/Multithreading/RunLoopManagement/RunLoopManagement.html#//apple_ref/doc/uid/10000057i-CH16-SW5)
but it looks to me very complex and I'm having some hard time there. Is there an elegant and simple way to achieve what I described? Any similar runLoop code examples out there that I can run and play with?
Thanks
Each thread has a run loop.
Each run loop has a list of things that need to be done. These things are said to be “scheduled” on the run loop, although not all of them are scheduled for a specific date and time:
Timers are.
Sources aren't. They generally wait for something to come knocking at a Mach kernel port or a file descriptor.
When the run loop is running, it's usually not running—that is, the thread is sleeping, not consuming any CPU cycles. (If you sample it, you'll find the process appearing to be stuck in mach_msg_trap. This is the “wait-for-something-to-happen” system call.) The kernel wakes up the thread (which thereby returns from mach_msg_trap) when something happens that the thread's run loop needs to take care of.
The way to do exactly what you described is to implement a run loop source. You schedule the source on the secondary thread's run loop, implement it by doing work, and signal it from the primary thread when there's work to be done.
However, NSOperation is almost certainly a better solution, as it's designed for the case you described: Discrete units of work that need to be done serially, up to N (which you choose and is at least 1) at a time.
Note that NSOperationQueue reuses threads, so it does not necessarily create a new thread for every operation. Indeed, not doing that is part of the point: It creates the threads lazily, and uses any that it already has that aren't doing anything.
This sounds like just the sort of thing NSOperation/NSOperationQueue was made for. If you only have the occasional "units of work", why not make them an operation, then monitor it for completion and update your UI accordingly?
Matt Gallagher has a nice blog article comparing the secondary thread approach with other ways of getting background work done.
http://cocoawithlove.com/2010/09/overhead-of-spawning-threads.html
In your case, you don't have to be concerned with thread-creation overhead. But Matt's code examples might provide some insight into managing the secondary thread's runloop.
All that said, I would go with Joshua's advice and just use an NSOperationQueue and an NSOperation to do the background work. If the work could be encapsulated in an NSInvocation, you can use an NSInvocationOperation and avoid an NSOperation subclass.

NSThreading for speed

I'm working on a game sim and want to speed up the match simulation bit. On a given date there may be 50+ matches that need simulating. Currently I loop through each and tell them to simulate themselves, but this can take forever. I was hoping
1) Overlay a 'busy' screen
2) Launch a thread for each
3) When the last thread exits, remove the overlay.
Now I can do 1 & 2, but I cannot figure out how to tell when the last thread is finished, because the last thread I detach may not be the last thread finished. What's the best way to do that?
Also, usually threads are used so that work can be done in the background while the user does other stuff, I'm using it slightly different. My app is a core-data app and I want to avoid the user touching the store in other ways while i'm simulating the matches. So I want single-threading most of the time, but then multithreading for this situation because of how long the sim engine takes. If someone has other ideas for this approach I'm open.
Rob
Likely you want to use NSOperation and NOT 50 threads - 50 threads is not healthy on an iPhone, and NSOperations are easier to boot. It may be that you are killing performance (it would be my guess) by trying to run 50 at once. NSOperation is designed for exactly this problem. Plus its easy to code.
My only problem with NSOperation is that they don't have a standard way to tell the caller that they are done.
You could periodically poll the NSOperationQueue - when its count is 0 there are none left. You could also make each operation increment some counter - when the count is 50 you are done. Or each operation could post a notification using performSelectorOnMainThread on the main thread that its done.
You should see a boost in performance with even a single core - there are lots of times that the main thread is blocked waiting for user input/graphics drawing/etc. Plus multicore phones and iPads will likely be out within a year (total guess - but they are coming).
Also make sure you look at the operation with Instruments. It may be that you can speed the calculations up be a factor of 2 or even 10x!
You're on a single core, so threading probably won't help much, and the overhead may even slow things down.
The first thing to do is use Instruments to profile your code and see what can be sped up. Once you've done that you can look at some specific optimizations for the bottle necks.
One simple approach (if you can use GCD) is dispatch_apply(), which'll let you loop over your matches, automatically thread them in the best manner for your hardware, and doesn't return until all are complete.
Most straightforward solution would be to have all your threads 'performSelectorOnMainThread' to a particular method that decrements a counter, before they exit. And let the method remove the overlay screen when the counter it decremented reaches zero.
Simulating all the matches concurrently may not necessarily improve performance though.
You may get the solution for your specific question from #drowntoge, but generally, I want to give you advice about multithreading:
1/ It is not always speed up your program like Graham said. Your iPhone only has single core.
2/ If your program has some big IO, database or networking process that takes time, then you may consider multithreading because now data processing does not take all the time, it needs to wait for loading data. In this case, multithreading will significantly boost up your performance. But you still need to be careful because thread switching has overhead.
Maybe you only need 1 thread for IO processing and then has a cache layer to share the images/data. Then, you only need the main thread to loop and do simulation
3/ If you want 50 simulation seems to happen at the same time for user to watch, multithreading is also required:)
If you use threading, you won't know in what order the CPU is doing your tasks, and you could potentially be consuming a lot of thread scheduling resources. Better to use an NSOperationQueue and signal completion of each task using performSelectorOnMainThread. Decrementing a counter has already been mentioned, which may be useful for displaying a progress bar. But you could also maintain an array of 50 busy flags and clear them on completion, which might help debugging whether any particular task is slow or stuck if you mark completion with a time stamp.

NSOperationQueue dispatching threads slowly?

I'm writing my first multithreaded iPhone apps using NSOperationQueue because I've heard it's loads better and potentially faster than managing my own thread dispatching.
I'm calculating the outcome of a Game of Life board by splitting the board into seperate pieces and having seperate threads calculate each board and then splicing them back together, to me this seems like a faster way even with the tremendous overhead of splitting and splicing. I'm creating a NSInvocationOperation object for each board and then sending them to the OperationQueue. After I've sent all the pieces of the board I sit and wait for them all to finish calculating with the waitUntilAllOperationsAreFinished call to the OperationQueue.
This seems like it should work, and it does it works just fine BUT the threads get called out very slooooowwwlllyyyyyy and so it actually ends up taking longer for the multithreaded version to calculate than the single threaded version! OH NOES! I monitored the creation and termination of the NSOperations sent to the NSOperationQueue and found that some just sit in the Operation Queue do-diddly-daddlin for awhile before they get called much later on. At first I thought "Hey maybe the queue can only process so many threads at a time" and then bumped the Queues maxConcurrentOperationCount up to some arbitrary high number (well above the amount of board-pieces) but I experienced the same thing!
I was wondering if maybe someone can tell me how to kick NSOperationQueue into "overdrive" so to say so that it dispatches its queues as quickly as possible, either that or tell me whats going on!
Threads do not magically make your processor run faster.
On a single-processor machine, if your algorithm takes a million instructions to execute, splitting it up into 10 chunks of 100,000 instructions each and running it on 10 threads is still going to take just as long. Actually, it will take longer, because you've added the overhead of splitting, merging, and context switching among the threads.
The queue is still fundamentally limited by the processing power of the phone. If the phone can only run two processes simultaneously, you will get (at most) close to a two-fold increase in speed by splitting up the task. Anything more than that, and you are just adding overhead for no gain.
This is especially true if you are running a processor and memory intensive routine like your board calculation. NSOperationQueue makes sense if you have several operations that need to wait for extended periods of time. User interface loops and network downloads would be excellent examples. In those cases, other operations can complete while the inactive ones are waiting for input.
For something like your board, the operation for each piece of the grid never has any wait condition. It is always churning away at full speed until done.
See also: iPhone Maximum thread limit? and concurrency application design