In Scala, does Futures.awaitAll terminate the thread on timeout? - scala

So I'm writing a mini timeout library in scala, it looks very similar to the code here: How do I get hold of exceptions thrown in a Scala Future?
The function I execute is either going to complete successfully, or block forever, so I need to make sure that on a timeout the executing thread is cancelled.
Thus my question is: On a timeout, does awaitAll terminate the underlying actor, or just let it keep running forever?
One alternative that I'm considering is to use the java Future library to do this as there is an explicit cancel() method one can call.

[Disclaimer - I'm new to Scala actors myself]
As I read it, scala.actors.Futures.awaitAll waits until the list of futures are all resolved OR until the timeout. It will not Future.cancel, Thread.interrupt, or otherwise attempt to terminate a Future; you get to come back later and wait some more.
The Future.cancel may be suitable, however be aware that your code may need to participate in effecting the cancel operation - it doesn't necessarily come for free. Future.cancel cancels a task that is scheduled, but not yet started. It interrupts a running thread [setting a flag that can be checked]... which may or may not acknowledge the interrupt. Review Thread.interrupt and Thread.isInterrupted(). Your long-running task would normally check to see if it's being interrupted (your code), and self-terminate. Various methods (i.e. Thread.sleep, Object.wait and others) respond to the interrupt by throwing InterruptedException. You need to review & understand that mechanism to ensure your code will meet your needs within those constraints. See this.

Related

How does the computer implement callbacks?

I already know the general usage of callback. First,I register a "callback function",when some event occur,this function will be triggered(be executed).
What confuses me is how do I know if the event is occur? The solution I can get is polling.Is there a better way to check whether the event occur in less than the O(n) time ?
All right,Maybe the above question is too abstract.A more realistic description is does epoll_wait avoid using O(n) time to check whether the ready file descriptor?
If so, how did it do it?
Is there a callback mechanism that is different from polling essentially?
Usually, but not exclusively, callbacks get called after some peripheral I/O device signals an operation completion by raising a hardware interrupt. A long chain of stuff involving things like driver interrupt handlers, semaphores, protection ring changes, thread and process context changes, message assembly/enqueueing/requiring/handling/dispatching etc etc then cause your callback to be called, maybe by some system thread, or from a message-handling or signal-handling thread of your own that has to conform to a specific structure or constraint.
So no, polling is generally unnecessary, and unwanted.

Clarification about Scala Future that never complete and its effect on other callbacks

While re-reading scala.lan.org's page detailing Future here, I have stumbled up on the following sentence:
In the event that some of the callbacks never complete (e.g. the callback contains an infinite loop), the other callbacks may not be executed at all. In these cases, a potentially blocking callback must use the blocking construct (see below).
Why may the other callbacks not be executed at all? I may install a number of callbacks for a given Future. The thread that completes the Future, may or may not execute the callbacks. But, because one callback is not playing footsie, the rest should not be penalized, I think.
One possibility I can think of is the way ExecutionContext is configured. If it is configured with one thread, then this may happen, but that is a specific behaviour and a not generally expected behaviour.
Am I missing something obvious here?
Callbacks are called within an ExecutionContext that has an eventually limited number of threads - if not by the specific context implementation, then by the underlying operating system and/or hardware itself.
Let's say your system's limit is OS_LIMIT threads. You create OS_LIMIT + 1 callbacks. From those, OS_LIMIT callbacks immediately get a thread each - and none ever terminate.
How can you guarantee that the remaining 1 callback ever gets a thread?
Sure, there could be some detection mechanisms built into the Scala library, but it's not possible in the general case to make an optimal implementation: maybe you want the callback to run for a month.
Instead (and this seems to be the approach in the Scala library), you could provide facilities for handling situations that you, the developer, know are risky. This removes the element of surprise from the system.
Perhaps most importantly - it enables the developer to "bake in" the necessary information about handler/task characteristics directly into his/her program, rather than relying on some obscure piece of language functionality (which may change from version to version).

Grand Central Dispatch async vs sync [duplicate]

This question already has answers here:
Difference between DispatchQueue.main.async and DispatchQueue.main.sync
(4 answers)
Closed 3 years ago.
I'm reading the docs on dispatch queues for GCD, and in it they say that the queues are FIFO, so I am woundering what effect this has on async / sync dispatches?
from my understand async executes things in the order that it gets things while sync executes things serial..
but when you write your GCD code you decide the order in which things happen.. so as long as your know whats going on in your code you should know the order in which things execute..
my questions are, wheres the benefit of async here? am I missing something in my understanding of these two things.
The first answer isn't quite complete, unfortunately. Yes, sync will block and async will not, however there are additional semantics to take into account. Calling dispatch_sync() will also cause your code to wait until each and every pending item on that queue has finished executing, also making it a synchronization point for said work. dispatch_async() will simply submit the work to the queue and return immediately, after which it will be executed "at some point" and you need to track completion of that work in some other way (usually by nesting one dispatch_async inside another dispatch_async - see the man page for example).
sync means the function WILL BLOCK the current thread until it has completed, async means it will be handled in the background and the function WILL NOT BLOCK the current thread.
If you want serial execution of blocks check out the creation of a serial dispatch queue
From the man page:
FUNDAMENTALS
Conceptually, dispatch_sync() is a convenient wrapper around dispatch_async() with the addition of a semaphore to wait for completion of the block, and a wrapper around the block to signal its completion.
See dispatch_semaphore_create(3) for more information about dispatch semaphores. The actual implementation of the dispatch_sync() function may be optimized and differ from the above description.
Tasks can be performed synchronously or asynchronously.
Synchronous function returns the control on the current queue only after task is finished. It blocks the queue and waits until the task is finished.
Asynchronous function returns control on the current queue right after task has been sent to be performed on the different queue. It doesn't wait until the task is finished. It doesn't block the queue.
Only in Asynchronous we can add delay -> asyncAfter(deadline: 10..

Scala actors: receive vs react

Let me first say that I have quite a lot of Java experience, but have only recently become interested in functional languages. Recently I've started looking at Scala, which seems like a very nice language.
However, I've been reading about Scala's Actor framework in Programming in Scala, and there's one thing I don't understand. In chapter 30.4 it says that using react instead of receive makes it possible to re-use threads, which is good for performance, since threads are expensive in the JVM.
Does this mean that, as long as I remember to call react instead of receive, I can start as many Actors as I like? Before discovering Scala, I've been playing with Erlang, and the author of Programming Erlang boasts about spawning over 200,000 processes without breaking a sweat. I'd hate to do that with Java threads. What kind of limits am I looking at in Scala as compared to Erlang (and Java)?
Also, how does this thread re-use work in Scala? Let's assume, for simplicity, that I have only one thread. Will all the actors that I start run sequentially in this thread, or will some sort of task-switching take place? For example, if I start two actors that ping-pong messages to each other, will I risk deadlock if they're started in the same thread?
According to Programming in Scala, writing actors to use react is more difficult than with receive. This sounds plausible, since react doesn't return. However, the book goes on to show how you can put a react inside a loop using Actor.loop. As a result, you get
loop {
react {
...
}
}
which, to me, seems pretty similar to
while (true) {
receive {
...
}
}
which is used earlier in the book. Still, the book says that "in practice, programs will need at least a few receive's". So what am I missing here? What can receive do that react cannot, besides return? And why do I care?
Finally, coming to the core of what I don't understand: the book keeps mentioning how using react makes it possible to discard the call stack to re-use the thread. How does that work? Why is it necessary to discard the call stack? And why can the call stack be discarded when a function terminates by throwing an exception (react), but not when it terminates by returning (receive)?
I have the impression that Programming in Scala has been glossing over some of the key issues here, which is a shame, because otherwise it's a truly excellent book.
First, each actor waiting on receive is occupying a thread. If it never receives anything, that thread will never do anything. An actor on react does not occupy any thread until it receives something. Once it receives something, a thread gets allocated to it, and it is initialized in it.
Now, the initialization part is important. A receiving thread is expected to return something, a reacting thread is not. So the previous stack state at the end of the last react can be, and is, wholly discarded. Not needing to either save or restore the stack state makes the thread faster to start.
There are various performance reasons why you might want one or other. As you know, having too many threads in Java is not a good idea. On the other hand, because you have to attach an actor to a thread before it can react, it is faster to receive a message than react to it. So if you have actors that receive many messages but do very little with it, the additional delay of react might make it too slow for your purposes.
The answer is "yes" - if your actors are not blocking on anything in your code and you are using react, then you can run your "concurrent" program within a single thread (try setting the system property actors.maxPoolSize to find out).
One of the more obvious reasons why it is necessary to discard the call stack is that otherwise the loop method would end in a StackOverflowError. As it is, the framework rather cleverly ends a react by throwing a SuspendActorException, which is caught by the looping code which then runs the react again via the andThen method.
Have a look at the mkBody method in Actor and then the seq method to see how the loop reschedules itself - terribly clever stuff!
Those statements of "discarding the stack" confused me also for a while and I think I get it now and this is my understanding now. In case of "receive" there is a dedicated thread blocking on the message (using object.wait() on a monitor) and this means that the complete thread stack is available and ready to continue from the point of "waiting" on receiving a message.
For example if you had the following code
def a = 10;
while (! done) {
receive {
case msg => println("MESSAGE RECEIVED: " + msg)
}
println("after receive and printing a " + a)
}
the thread would wait in the receive call until the message is received and then would continue on and print the "after receive and printing a 10" message and with the value of "10" which is in the stack frame before the thread blocked.
In case of react there is no such dedicated thread, the whole method body of the react method is captured as a closure and is executed by some arbitrary thread on the corresponding actor receiving a message. This means only those statements that can be captured as a closure alone will be executed and that's where the return type of "Nothing" comes to play. Consider the following code
def a = 10;
while (! done) {
react {
case msg => println("MESSAGE RECEIVED: " + msg)
}
println("after react and printing a " + a)
}
If react had a return type of void, it would mean that it is legal to have statements after the "react" call ( in the example the println statement that prints the message "after react and printing a 10"), but in reality that would never get executed as only the body of the "react" method is captured and sequenced for execution later (on the arrival of a message). Since the contract of react has the return type of "Nothing" there cannot be any statements following react, and there for there is no reason to maintain the stack. In the example above variable "a" would not have to be maintained as the statements after the react calls are not executed at all. Note that all the needed variables by the body of react is already be captured as a closure, so it can execute just fine.
The java actor framework Kilim actually does the stack maintenance by saving the stack which gets unrolled on the react getting a message.
Just to have it here:
Event-Based Programming without Inversion of Control
These papers are linked from the scala api for Actor and provide the theoretical framework for the actor implementation. This includes why react may never return.
I haven't done any major work with scala /akka, however i understand that there is a very significant difference in the way actors are scheduled.
Akka is just a smart threadpool which is time slicing execution of actors...
Every time slice will be one message execution to completion by an actor unlike in Erlang which could be per instruction?!
This leads me to think that react is better as it hints the current thread to consider other actors for scheduling where as receive "might" engage the current thread to continue executing other messages for the same actor.

Practical use of futures? Ie, how to kill them?

Futures are very convenient, but in practice, you may need some guarantees on their execution. For example, consider:
import scala.actors.Futures._
def slowFn(time:Int) = {
Thread.sleep(time * 1000)
println("%d second fn done".format(time))
}
val fs = List( future(slowFn(2)), future(slowFn(10)) )
awaitAll(5000, fs:_*)
println("5 second expiration. Continuing.")
Thread.sleep(12000) // ie more calculations
println("done with everything")
The idea is to kick off some slow running functions in parallel. But we wouldn't want to hang forever if the functions executed by the futures don't return. So we use awaitAll() to put a timeout on the futures. But if you run the code, you see that the 5 second timer expires, but the 10 second future continues to run and returns later. The timeout doesn't kill the future; it just limits the join wait.
So how do you kill a future after a timeout period? It seems like futures can't be used in practice unless you're certain that they will return in a known amount of time. Otherwise, you run the risk of losing threads in the thread pool to non-terminating futures until there are none left.
So the questions are: How do you kill futures? What are the intended usage patterns for futures given these risks?
Futures are intended to be used in settings where you do need to wait for the computation to complete, no matter what. That's why they are described as being used for slow running functions. You want that function's result, but you have other stuff you can be doing meanwhile. In fact, you might have many futures, all independent of each other that you may want to run in parallel, while you wait until all complete.
The timer just provides a wait to get partial results.
I think the reason Future can't simply be "killed" is exactly the same as why java.lang.Thread.stop() is deprecated.
While Future is running, a Thread is required. In order to stop a Future without calling stop() on the executing Thread, application specific logic is needed: checking for an application specific flag or the interrupted status of the executing Thread periodically is one way to do it.