Does asyncronous I/O consume threads? - scala

I'm trying to grok akka actors and determine their benefits. I understand that many actors can share the same thread thus gaining huge efficiencies - but in the context of a web application, the web container should do the same between requests correct?
So the benefit could come down to I/O - blocking I/O pauses the thread for no-one else to use.
Does asynchronous I/O consume a thread or not? When I get a future for some I/O result will a thread be used while that I/O is completed?

The Java Asynchronous I/O model is described here at an easy enough level to grasp. The basic idea is that there's an internal thread pool which retrieves completed I/O notifications from the kernel and then dispatch to other threads to perform the required actions on it.
So, in a sense, yes, it uses threads. And here's something else to consider: so does everything. Every piece of software out there requires that a process, at some point, check whether a piece of I/O has completed so that it can perform follow-up tasks on it (well, it could be fire-and-forget, but that's somewhat limited for practical sues). On nodejs, famous for its asynchronous I/O, that thread is called the "event loop" (though the overall model is very different).
The point here is that there is not a correspondence of one thread per I/O operation. Instead, there's a single internal thread pool that's responsible for receiving all asynchronous I/O completion events, and then taking whatever actions are required on their completion.
Perhaps a better question is: does asynchronous I/O in Java consume threads in proportion to the number of I/O requests being processed? No, it doesn't; it consumes a fixed number of threads. More useful question: when initiating an asynchronous I/O in Java, does that block the thread that initiated the I/O? No, it does not; it returns immediately. And relevant question to the topic: does asynchronous I/O in Java uses threads from the actor thread pool? No, it doesn't.
Next, to the future returned by asynchronous I/O. While the I/O does not complete, no thread will be used. However, there is a thread pool assigned to the completion of that future, and, when the I/O completes, one thread from that pool will be used to perform the actions that you associate with that future's completion. Once those actions are finished, the thread will be returned to that thread pool. That thread pool is probably not going to be the same as the thread pool used by the actors (though I suppose there might be a way to make it so).

TL;DR on the accepted answer: No, threads are not consumed by aysnchronous I/O, but threads are used to retrieve the I/O results from the kernel.
Also, from Play Framework: async I/O without the thread pool and callback hell:
On Evented servers, waiting for I/O is very cheap: idle requests have negligible cost, as they don’t hold up a thread.

Related

Should we use thread pool for long running threads?

Should we use a thread pool for long running threads or start our own threads? Is there some design pattern?
Unfortunately, it depends. There is no hard and fast rule saying that you should always employ thread pools.
Thread pools offer two main things:
Delegated creation/reuse of threads.
Back-pressure
IMO, it's the back-pressure property that's interesting, but often the most poorly understood. Your machine runs on a limited set of resources. If you have (say) 8 CPU cores and they are all busy working, you would like to signal that in some way that adding more work (submitting more tasks) isn't going to help, at least not in terms of latency.
This is the reason java.util.concurrent.ExecutorService implementations allow you to specify a java.util.concurrent.BlockingQueue of your choice. When this queue grows full, invoking threads will block until the thread pool has managed to complete tasks in progress.
Whether or not to have long-running threads inside the thread pool depends on what it's doing. If the thread is constantly busy (meaning it will never complete) then it will always occupy a slot in the thread pool, which is kind of pointless.
Regarding delegated creation/reuse of threads; maybe you could have two pools, one for long-running tasks and one for other tasks. Or perhaps a long-running thread pool with one single slot, this will prevent two long-running tasks from running at the same time, provided that is what you want.
As you can see, there is no single good answer. It really boils down to what you are trying to achieve and how you want to use the resources at hand.

Should I use IOCPs or overlapped WSASend/Receive?

I am investigating the options for asynchronous socket I/O on Windows. There is obviously more than one option: I can use WSASend... with an overlapped structure providing either a completion callback or an event, or I could use IOCPs and the (new) thread pool. From I usually read, the latter option is the recommended one.
However, it is not clear to me, why I should use IOCPs if the completion routine suffices for my goal: tell the socket to send this block of data and inform me if it is done.
I understand that the IOCP stuff in combination with CreateThreadpoolIo etc. uses the OS thread pool. However, the "normal" overlapped I/O must also use separate threads? So what is the difference/disadvantage? Is my callback called by an I/O thread and blocks other stuff?
Thanks in advance,
Christoph
You can use either but, for servers, IOCP with the 'completion queue' will have better performance, in general, because it can use multiple client<>server threads, either with CreateThreadpoolIo or some user-space thread pool. Obviously, in this case, dedicated handler threads are usual.
Overlapped completion-routine I/O is more useful for clients, IMHO. The completion-routine is fired by an Asynchronous Procedure Call that is queued to the thread that initiated the I/O request, (WSASend, WSARecv). This implies that that thread must be in a position to process the APC and typically this means a while(true) loop around some 'blahEx()' call. This can be useful because it's fairly easy to wait on a blocking queue, or other inter-thread signal, that allows the thread to be supplied with data to send and the completion routine is always handled by that thread. This I/O mechanism leaves the 'hEvent' OVL parameter free to use - ideal for passing a comms buffer object pointer into the completion routine.
Overlapped I/O using an actual synchro event/Semaphore/whatever for the overlapped hEvent parameter should be avoided.
Windows IOCP documentation recommends no more than one thread per available core per completion port. Hyperthreading doubles the number of cores. Since use of IOCPs results in a for all practical purposes event-driven application the use of thread pools adds unnecessary processing to the scheduler.
If you think about it you'll understand why: an event should be serviced in its entirety (or placed in some queue after initial processing) as quickly as possible. Suppose five events are queued to an IOCP on a 4-core computer. If there are eight threads associated with the IOCP you run the risk of the scheduler interrupting one event to begin servicing another by using another thread which is inefficient. It can be dangerous too if the interrupted thread was inside a critical section. With four threads you can process four events simultaneously and as soon as one event has been completed you can start on the last remaining event in the IOCP queue.
Of course, you may have thread pools for non-IOCP related processing.
EDIT________________
The socket (file handles work fine too) is associated with an IOCP. The completion routine waits on the IOCP. As soon as a requested read from or write to the socket completes the OS - via the IOCP - releases the completion routine waiting on the IOCP and returns with the additional information you provided when you called the read or write (I usually pass a pointer to a control block). So the completion routine immediately "knows" where the to find information pertinent to the completion.
If you passed information referring to a control block (similar) then that control block (probably) needs to keep track of what operation has completed so it knows what to do next. The IOCP itself neither knows nor cares.
If you're writing a server attached to the internet, the server would issue a read to wait for client input. That input may arrive a milli-second or a week later and when it does the IOCP will release the completion routine which analyzes the input. Typically it responds with a write containing the data requested in the input and then waits on the IOCP. When the write completed the IOCP again releases the completion routine which sees that the write has completed, (typically) issues a new read and a new cycle starts.
So an IOCP-based application typically consumes very little (or no) CPU until the moment a completion occurs at which time the completion routine goes full tilt until it has finished processing, sends a new I/O request and again waits on the completion port. Apart from the IOCP timeout (which can be used to signal house-keeping or such) all I/O-related stuff occurs in the OS.
To further complicate (or simplify) things it is not necessary that sockets be serviced using the WSA routines, the Win32 functions ReadFile and WriteFile work just fine.

Synchronization Objects and Threadpool

This is Windows server application being written in VC++. I am going to use threadpool to handle various request objects coming to the server. Obviously, when a thread is working on particular request and writing its response to the socket, other thread has to wait till it finishes. I fear this probably is not efficient way to use threadpool.
My question therefore is:
If a thread in the threadpool is waiting for thread synchronization object to be freed, it would not be efficient way to use threadpool. Is there any way we can avoid this? (Possibly, knowing in advance if the object is free before allocating thread to work upon)
You could have just one thread writing things to the socket. The other threads put their data in a queue, and the output thread reads from the queue and writes data to the socket.
Assuming, of course, that the threads don't need to wait for a response from the socket in order to continue.
Additionally, it really depends on how often and for how long you expect threads to be waiting on the socket. If it only happens occasionally, then there is no problem. Threads will process at their maximum rate and infrequently they'll have to wait for another thread to finish sending before they send. But if the threads spend a lot of time waiting for the socket, then you probably want to find another way to do things. The output queue with a dedicated thread handling the socket works quite well and is pretty easy to set up.

If one thread is busy on I/O will the entire process be blocked

In a multi-threaded process,If one thread is busy on I/O will the entire process be blocked?
AFAIK, it totally depends on programmer that how they manage the threads inside the programs.
If another thread is there with no I/O, processor will never sit idle & start executing this thread. However, process in split threads such that one thread waits for the result of the other, the the entire process will be blocked.
Please comment if more information needs to be added.
Does there exist any other explaination?
If the process has only one thread, then yes.
If the process has multiple threads, then normally no if the operating system supports multithreading.
This question can also be addressed in terms of the underlying implementation of user threads. There are different models for multithreading models, in order to implement user threads they have to be mapped to a kernel thread:
Many-to-One: Many user threads to one kernel thread
One-to-One: Each user thread is assigned to a kernel thread.
Many-to-Many: Many user threads are split on different kernel threads.
In the many-to-one case, a single block-operation (system call) within the thread can block the whole process. This disadvantage is not present in the one-to-one model.

Terminology about thread

If a function in a thread is going to return, how can we describe this behavior.
The thread returns.
The thread is dying.
What's the meaning of "thread is dead"?
In my understanding, threads are basically kernel data structures. You can create and destroy threads through the system APIs. If you just create a thread, start it executing, and it runs out of code, the kernel will probably put it into a non-executing state. In unmanaged code you still have to release that resource.
Then there's the thread pool. In that case, you queue up work to be done by the thread pool, and the platform takes care of picking a thread and executing your work. When the work is complete, the thread is returned to the thread pool. The platform takes care of creating and destroying threads to balance the available threads against the workload and the system resources.
As of Java 1.3 the six-state thread model was introduced. This includes the following states:
Ready-to-run: The thread is created and waiting for being picked for running by the thread scheduler
Running: The thread is executing.
Waiting: The thread is in blocked state while waiting for some external processing to finish (like I/O).
Sleeping: The thread is forced to sleep via .sleep()
Blocked: On I/O: Will move into state 1 after finished (e.g. reading a byte of data). On sync: Will move into state 1 after a lock is acquired.
Dead (Terminated): The thread has finished working and cannot be resumed.
The term "Dead" is rarely used today, almost totally changed to "Terminated". These two are equivalent.
Most thread APIs work by asking the operating system to run a particular function, supplied by you, on your behalf. When this function eventually returns (via for example a return statement or reaching the end of its code) the operationg system ends the thread.
As for "dead" threads - that's not a term I've seen used in thread APIs.