In operating systems how the OS prevent infinite loops - operating-system

In operating system there should be a mechanism to prevent infinite loops. I want to know which steps follow the OS to prevent infinite loops and terminate the process by OS.

Let's start with an (over-simplified) example of an infinite loop:
while(true) {
get_user_input();
handle_user_input();
}
Almost every application you've ever used is (a more complex) infinite loop like this; and it's not just applications (e.g. a web server might loop forever while checking for new connections on a TCP/IP port).
Infinite loops are often necessary, and processes shouldn't be terminated just because they do something that may be necessary.
With this in mind the question becomes: How does an OS detect the difference between an unwanted and unintended infinite loop, and a wanted, intentional and necessary infinite loop?
The answer is that an OS can't.
What an OS can do is have various rules that have nothing to do with infinite loops; like:
a high priority thread may only use 100 milliseconds of CPU time between calling a potentially blocking IO operation (e.g. like reading from a network socket); so that if a high priority thread consumes too much CPU time it can be declared "unresponsive" (regardless of whether it's in an infinite loop or not).
a thread that communicates with the GUI must accept new events (user input, notifications, etc) within 1 second; so that if a thread takes too long to accept an event from GUI it can be declared "unresponsive" (regardless of whether it's in an infinite loop or not).
Of course this kind of thing is OS specific; and there aren't too many rules like this that won't cause problems for correct software.

Related

Does the sleep() function cause a timer interrupt upon completion?

Do the family of sleep functions (sleep(), nanosleep()) cause timer interrupts once they complete (i.e., are done sleeping)? If not, how does the OS know exactly when they are done? If so, I understand timers have a high interrupt priority. Does this mean a program using sleep() once awoken will likely cause another program running on one of the CPUs (in a multi-processor) to be removed in favor of the recently awoken program?
Does the sleep() function cause a timer interrupt upon completion?
Maybe.
For keeping track of time delays there's 2 common ways it could be implemented:
a) A timer IRQ occurs at a fixed frequency (e.g. maybe every 1 millisecond). When the IRQ occurs the OS checks if any time delays expired and deals with them. In this case there's a compromise between precision and overhead (to get better precision you need to increase the "IRQs per second" which increases the overhead of dealing with all the IRQs).
b) The OS re-configures the timer to generate an IRQ when the soonest delay should expire whenever necessary (when the soonest delay is cancelled, a sooner delay is created, or the soonest delay expires). This has no "precision vs. overhead" compromise, but has more overhead for re-configuring the timer hardware. This is typically called "tickless" (as there's no regular/fixed frequency "tick").
Note that modern 80x86 systems have a local APIC timer per CPU that supports "IRQ on TSC deadline". For "tickless", this means you can normally get better than 1 nanosecond precision without much need for locks (using "per CPU" structures to keep track of time delays); and the cost of re-configuring the timer is very small (as the timer hardware is built directly into the CPU itself).
For "tickless" (which is likely much better for modern systems) you would end up with a timer IRQ when "sleep()" expires most of the time (unless some other delay expires at the same/similar time).
Does this mean a program using sleep() once awoken will likely cause another program running on one of the CPUs (in a multi-processor) to be removed in favor of the recently awoken program?
Whether a recently unblocked task preempts immediately depends on:
a) The scheduler design. For some schedulers (e.g. naive "round robin") it may never happen immediately.
b) The priorities of the unblocked task and the currently running task/s.
c) Optimizations. Task switches cost overhead so attempts to minimize the number of task switches (e.g. postponing/skipping a task switch if some other task switch is likely to happen soon anyway) are practical. There's also complexity involving load balancing, power management, cache efficiency, memory (NUMA, etc) and other things that may be considered.
The Linux man pages notes:
Portability notes
On some systems, sleep() may be implemented using alarm(2) and SIGALRM (POSIX.1 permits
this); mixing calls to alarm(2) and sleep() is a bad idea.

bad use cases of scala.concurrent.blocking?

With reference to the third point in this accepted answer, are there any cases for which it would be pointless or bad to use blocking for a long-running computation, whether CPU- or IO-bound, that is being executed 'within' a Future?
It depends on the ExecutionContext your Future is being executed in.
Pointless:
If the ExecutionContext is not a BlockContext, then using blocking will be pointless. That is, it would use the DefaultBlockContext, which simply executes the code without any special handling. It probably wouldn't add that much overhead, but pointless nonetheless.
Bad:
Scala's ExecutionContext.Implicits.global is made to spawn new threads in a ForkJoinPool when the thread pool is about to be exhausted. That is, if it knows that is going to happen via blocking. This can be bad if you're spawning lots of threads. If you're queuing up a lot of work in a short span of time, the global context will happily expand until gridlock. #dk14's answer explains this in more depth, but the gist is that it can be a performance killer as managed blocking can actually become quickly unmanageable.
The main purpose of blocking is to avoid deadlocks within thread pools, so it is tangentially related to performance in the sense that reaching a deadlock would be worse than spawning a few more threads. However, it is definitely not a magical performance enhancer.
I've written more about blocking in particular in this answer.
From my practice, blocking + ForkJoinPool may lead to contionuous and uncontrollable creation of threads if you have a lot of messages to process and each one requires long blocking (which also means that it holds some memory during such). ForkJoinPool creates new thread to compensate the "managable blocked" one, regardless of MaxThreadCount; say hello to hundreds of threads in VisualVm. And it almost kills backpressure, as there is always a place for task in the pool's queue (if your backpressure is based on ThreadPoolExecutor's policies). Performance becomes killed by both new-thread-allocation and garbage collection.
So:
it's good when message rate is not much higher than 1/blocking_time as it allows you to use full power of threads. Some smart backpressure might help to slow down incoming messages.
It's pointless if a task actually uses your CPU during blocking{} (no locks), as it will just increase counts of threads more than count of real cores in system.
And bad for any other cases - you should use separate fixed thread-pool (and maybe polling) then.
P.S. blocking is hidden inside Await.result, so it's not always obvious. In our project someone just did such Await inside some underlying worker actor.

NSOperationQueue dispatching threads slowly?

I'm writing my first multithreaded iPhone apps using NSOperationQueue because I've heard it's loads better and potentially faster than managing my own thread dispatching.
I'm calculating the outcome of a Game of Life board by splitting the board into seperate pieces and having seperate threads calculate each board and then splicing them back together, to me this seems like a faster way even with the tremendous overhead of splitting and splicing. I'm creating a NSInvocationOperation object for each board and then sending them to the OperationQueue. After I've sent all the pieces of the board I sit and wait for them all to finish calculating with the waitUntilAllOperationsAreFinished call to the OperationQueue.
This seems like it should work, and it does it works just fine BUT the threads get called out very slooooowwwlllyyyyyy and so it actually ends up taking longer for the multithreaded version to calculate than the single threaded version! OH NOES! I monitored the creation and termination of the NSOperations sent to the NSOperationQueue and found that some just sit in the Operation Queue do-diddly-daddlin for awhile before they get called much later on. At first I thought "Hey maybe the queue can only process so many threads at a time" and then bumped the Queues maxConcurrentOperationCount up to some arbitrary high number (well above the amount of board-pieces) but I experienced the same thing!
I was wondering if maybe someone can tell me how to kick NSOperationQueue into "overdrive" so to say so that it dispatches its queues as quickly as possible, either that or tell me whats going on!
Threads do not magically make your processor run faster.
On a single-processor machine, if your algorithm takes a million instructions to execute, splitting it up into 10 chunks of 100,000 instructions each and running it on 10 threads is still going to take just as long. Actually, it will take longer, because you've added the overhead of splitting, merging, and context switching among the threads.
The queue is still fundamentally limited by the processing power of the phone. If the phone can only run two processes simultaneously, you will get (at most) close to a two-fold increase in speed by splitting up the task. Anything more than that, and you are just adding overhead for no gain.
This is especially true if you are running a processor and memory intensive routine like your board calculation. NSOperationQueue makes sense if you have several operations that need to wait for extended periods of time. User interface loops and network downloads would be excellent examples. In those cases, other operations can complete while the inactive ones are waiting for input.
For something like your board, the operation for each piece of the grid never has any wait condition. It is always churning away at full speed until done.
See also: iPhone Maximum thread limit? and concurrency application design

Using multiple sockets, is non-blocking or blocking with select better?

Lets say I have a server program that can accept connections from 10 (or more) different clients. The clients send data at random which is received by the server, but it is certain that at least one client will be sending data every update. The server cannot wait for information to arrive because it has other processing to do. Aside from using asynchronous sockets, I see two options:
Make all sockets non-blocking. In a loop, call recv() on each socket and allow it to fail with WSAEWOULDBLOCK if there is no data available and if I happen to get some data, then keep it.
Leave the sockets as blocking. Add all sockets to a FD_SET and call select(). If the return value is non-zero (which it will be most of the time), loop through all the sockets to find the appropriate number of readable sockets with FD_ISSET() and only call recv() on the readable sockets.
The first option will create a lot more calls to the recv() function. The second method is a bigger pain from a programming perspective because of all the FD_SET and FD_ISSET looping.
Which method (or another method) is preferred? Is avoiding the overhead on letting recv() fail on a non-blocking socket worth the hassle of calling select()?
I think I understand both methods and I have tried both with success, but I don't know if one way is considered better or optimal.
I would recommend using overlapped IO instead. You can then kick off a WSARecv(), and provide a callback function to be invoked when the operation completes. What's more, since it'll only be invoked when your program is in an alertable wait state, you don't need to worry about locks like you would in a threaded application (assuming you run them on your main thread).
Note, however, that you do need to enter such an alertable wait state frequently. If this is your UI thread, make sure to use MsgWaitForMultipleObjectsEx() in your message loop, with the MWMO_ALERTABLE flag. This will give your callbacks a chance to run. On non-UI threads, call on a regular basis any of the wait functions that put you into an alertable wait state.
Note also that modal dialogs generally will not enter an alertable wait state, as they have their own message loop which doesn't call MsgWaitForMultipleObjectsEx(). If you need to process network IO when showing a dialog box, do all of your network IO on a dedicated thread, which does enter an alertable wait state regularly.
If, for whatever reason, you can't use overlapped IO - definitely use blocking select(). Using non-blocking recv() like that in an infinite loop is an inexcusable waste of CPU time. However, do put the sockets in non-blocking mode - as otherwise, if one byte arrives and you try to read two, you might end up blocking unexpectedly.
You might also want to consider using a library to abstract away the finicky details. For example, libevent or boost::asio.
the IO should be either completely blocking with one thread per connection and in this case the event loop is essentially an OS scheduler or the IO should be completely non-blocking, and in this case select/waitformultipleobjects-based event loop will be in your application
All intermediate variants are not very maintainable and error prone
Completely non blocking approach scales much better when the amount of concurrent connections grows and does not have a thread context switch overhead, so it is a preferrable where the number of concurrent connections is not fixed. This approach has higher implementation complexity compared to completely blocking one.
For a completely non-blocking IO the core of the applicaiton is a select/waitformultipleobjects-based event loop, all sockets are in non-blocking mode, all reads/writes are generally done from within event loop thread (for top performance writes can be first attempted directly from the thread requesting the write)

Relationship between a kernel and a user thread

Is there a relationship between a kernel and a user thread?
Some operating system textbooks said that "maps one (many) user thread to one (many) kernel thread". What does map means here?
When they say map, they mean that each kernel thread is assigned to a certain number of user mode threads.
Kernel threads are used to provide privileged services to applications (such as system calls ). They are also used by the kernel to keep track of what all is running on the system, how much of which resources are allocated to what process, and to schedule them.
If your applications make heavy use of system calls, more user threads per kernel thread, and your applications will run slower. This is because the kernel thread will become a bottleneck, since all system calls will pass through it.
On the flip side though, if your programs rarely use system calls (or other kernel services), you can assign a large number of user threads to a kernel thread without much performance penalty, other than overhead.
You can increase the number of kernel threads, but this adds overhead to the kernel in general, so while individual threads will be more responsive with respect to system calls, the system as a whole will become slower.
That is why it is important to find a good balance between the number of kernel threads and the number of user threads per kernel thread.
http://www.informit.com/articles/printerfriendly.aspx?p=25075
Implementing Threads in User Space
There are two main ways to implement a threads package: in user space and in the kernel. The choice is moderately controversial, and a hybrid implementation is also possible. We will now describe these methods, along with their advantages and disadvantages.
The first method is to put the threads package entirely in user space. The kernel knows nothing about them. As far as the kernel is concerned, it is managing ordinary, single-threaded processes. The first, and most obvious, advantage is that a user-level threads package can be implemented on an operating system that does not support threads. All operating systems used to fall into this category, and even now some still do.
All of these implementations have the same general structure, which is illustrated in Fig. 2-8(a). The threads run on top of a run-time system, which is a collection of procedures that manage threads. We have seen four of these already: thread_create, thread_exit, thread_wait, and thread_yield, but usually there are more.
When threads are managed in user space, each process needs its own private thread table to keep track of the threads in that process. This table is analogous to the kernel's process table, except that it keeps track only of the per-thread properties such the each thread's program counter, stack pointer, registers, state, etc. The thread table is managed by the run-time system. When a thread is moved to ready state or blocked state, the information needed to restart it is stored in the thread table, exactly the same way as the kernel stores information about processes in the process table.
When a thread does something that may cause it to become blocked locally, for example, waiting for another thread in its process to complete some work, it calls a run-time system procedure. This procedure checks to see if the thread must be put into blocked state. If so, it stores the thread's registers (i.e., its own) in the thread table, looks in the table for a ready thread to run, and reloads the machine registers with the new thread's saved values. As soon as the stack pointer and program counter have been switched, the new thread comes to life again automatically. If the machine has an instruction to store all the registers and another one to load them all, the entire thread switch can be done in a handful of instructions. Doing thread switching like this is at least an order of magnitude faster than trapping to the kernel and is a strong argument in favor of user-level threads packages.
However, there is one key difference with processes. When a thread is finished running for the moment, for example, when it calls thread_yield, the code of thread_yield can save the thread's information in the thread table itself. Furthermore, it can then call the thread scheduler to pick another thread to run. The procedure that saves the thread's state and the scheduler are just local procedures, so invoking them is much more efficient than making a kernel call. Among other issues, no trap is needed, no context switch is needed, the memory cache need not be flushed, and so on. This makes thread scheduling very fast.
User-level threads also have other advantages. They allow each process to have its own customized scheduling algorithm. For some applications, for example, those with a garbage collector thread, not having to worry about a thread being stopped at an inconvenient moment is a plus. They also scale better, since kernel threads invariably require some table space and stack space in the kernel, which can be a problem if there are a very large number of threads.
Despite their better performance, user-level threads packages have some major problems. First among these is the problem of how blocking system calls are implemented. Suppose that a thread reads from the keyboard before any keys have been hit. Letting the thread actually make the system call is unacceptable, since this will stop all the threads. One of the main goals of having threads in the first place was to allow each one to use blocking calls, but to prevent one blocked thread from affecting the others. With blocking system calls, it is hard to see how this goal can be achieved readily.
The system calls could all be changed to be nonblocking (e.g., a read on the keyboard would just return 0 bytes if no characters were already buffered), but requiring changes to the operating system is unattractive. Besides, one of the arguments for user-level threads was precisely that they could run with existing operating systems. In addition, changing the semantics of read will require changes to many user programs.
Another alternative is possible in the event that it is possible to tell in advance if a call will block. In some versions of UNIX, a system call, select, exists, which allows the caller to tell whether a prospective read will block. When this call is present, the library procedure read can be replaced with a new one that first does a select call and then only does the read call if it is safe (i.e., will not block). If the read call will block, the call is not made. Instead, another thread is run. The next time the run-time system gets control, it can check again to see if the read is now safe. This approach requires rewriting parts of the system call library, is inefficient and inelegant, but there is little choice. The code placed around the system call to do the checking is called a jacket or wrapper.
Somewhat analogous to the problem of blocking system calls is the problem of page faults. We will study these in Chap. 4. For the moment, it is sufficient to say that computers can be set up in such a way that not all of the program is in main memory at once. If the program calls or jumps to an instruction that is not in memory, a page fault occurs and the operating system will go and get the missing instruction (and its neighbors) from disk. This is called a page fault. The process is blocked while the necessary instruction is being located and read in. If a thread causes a page fault, the kernel, not even knowing about the existence of threads, naturally blocks the entire process until the disk I/O is complete, even though other threads might be runnable.
Another problem with user-level thread packages is that if a thread starts running, no other thread in that process will ever run unless the first thread voluntarily gives up the CPU. Within a single process, there are no clock interrupts, making it impossible to schedule processes round-robin fashion (taking turns). Unless a thread enters the run-time system of its own free will, the scheduler will never get a chance.
One possible solution to the problem of threads running forever is to have the run-time system request a clock signal (interrupt) once a second to give it control, but this, too, is crude and messy to program. Periodic clock interrupts at a higher frequency are not always possible, and even if they are, the total overhead may be substantial. Furthermore, a thread might also need a clock interrupt, interfering with the run-time system's use of the clock.
Another, and probably the most devastating argument against user-level threads, is that programmers generally want threads precisely in applications where the threads block often, as, for example, in a multithreaded Web server. These threads are constantly making system calls. Once a trap has occurred to the kernel to carry out the system call, it is hardly any more work for the kernel to switch threads if the old one has blocked, and having the kernel do this eliminates the need for constantly making select system calls that check to see if read system calls are safe. For applications that are essentially entirely CPU bound and rarely block, what is the point of having threads at all? No one would seriously propose computing the first n prime numbers or playing chess using threads because there is nothing to be gained by doing it that way.
User threads are managed in userspace - that means scheduling, switching, etc. are not from the kernel.
Since, ultimately, the OS kernel is responsible for context switching between "execution units" - your user threads must be associated (ie., "map") to a kernel schedulable object - a kernel thread†1.
So, given N user threads - you could use N kernel threads (a 1:1 map). That allows you to take advantage of the kernel's hardware multi-processing (running on multiple CPUs) and be a pretty simplistic library - basically just deferring most of the work to the kernel. It does, however, make your app portable between OS's as you're not directly calling the kernel thread functions. I believe that POSIX Threads (PThreads) is the preferred *nix implementation, and that it follows the 1:1 map (making it virtually equivalent to a kernel thread). That, however, is not guaranteed as it'd be implementation dependent (a main reason for using PThreads would be portability between kernels).
Or, you could use only 1 kernel thread. That'd allow you to run on non multitasking OS's, or be completely in charge of scheduling. Windows' User Mode Scheduling is an example of this N:1 map.
Or, you could map to an arbitrary number of kernel threads - a N:M map. Windows has Fibers, which would allow you to map N fibers to M kernel threads and cooperatively schedule them. A threadpool could also be an example of this - N workitems for M threads.
†1: A process has at least 1 kernel thread, which is the actual execution unit. Also, a kernel thread must be contained in a process. OS's must schedule the thread to run - not the process.
This is a question about thread library implement.
In Linux, a thread (or task) could be in user space or in kernel space. The process enter kernel space when it ask kernel to do something by syscall(read, write or ioctl).
There is also a so-called kernel-thread that runs always in kernel space and does not represent any user process.
According to Wikipedia and Oracle, user-level threads are actually in a layer mounted on the kernel threads; not that kernel threads execute alongside user-level threads but that, generally speaking, the only entities that are actually executed by the processor/OS are kernel threads.
For example, assume that we have a program with 2 user-level threads, both mapped to (i.e. assigned) the same kernel thread. Sometimes, the kernel thread runs the first user-level thread (and it is said that currently this kernel thread is mapped to the first user-level thread) and some other times the kernel thread runs the second user-level thread. So we say that we have two user-level threads mapped to the same kernel thread.
As a clarification:
The core of an OS is called its kernel, so the threads at the kernel level (i.e. the threads that the kernel knows of and manages) are called kernel threads, the calls to the OS core for services can be called kernel calls, and ... . The only definite relation between kernel things is that they are strongly related to the OS core, nothing more.