Scala actors: receive vs react - scala

Let me first say that I have quite a lot of Java experience, but have only recently become interested in functional languages. Recently I've started looking at Scala, which seems like a very nice language.
However, I've been reading about Scala's Actor framework in Programming in Scala, and there's one thing I don't understand. In chapter 30.4 it says that using react instead of receive makes it possible to re-use threads, which is good for performance, since threads are expensive in the JVM.
Does this mean that, as long as I remember to call react instead of receive, I can start as many Actors as I like? Before discovering Scala, I've been playing with Erlang, and the author of Programming Erlang boasts about spawning over 200,000 processes without breaking a sweat. I'd hate to do that with Java threads. What kind of limits am I looking at in Scala as compared to Erlang (and Java)?
Also, how does this thread re-use work in Scala? Let's assume, for simplicity, that I have only one thread. Will all the actors that I start run sequentially in this thread, or will some sort of task-switching take place? For example, if I start two actors that ping-pong messages to each other, will I risk deadlock if they're started in the same thread?
According to Programming in Scala, writing actors to use react is more difficult than with receive. This sounds plausible, since react doesn't return. However, the book goes on to show how you can put a react inside a loop using Actor.loop. As a result, you get
loop {
react {
...
}
}
which, to me, seems pretty similar to
while (true) {
receive {
...
}
}
which is used earlier in the book. Still, the book says that "in practice, programs will need at least a few receive's". So what am I missing here? What can receive do that react cannot, besides return? And why do I care?
Finally, coming to the core of what I don't understand: the book keeps mentioning how using react makes it possible to discard the call stack to re-use the thread. How does that work? Why is it necessary to discard the call stack? And why can the call stack be discarded when a function terminates by throwing an exception (react), but not when it terminates by returning (receive)?
I have the impression that Programming in Scala has been glossing over some of the key issues here, which is a shame, because otherwise it's a truly excellent book.

First, each actor waiting on receive is occupying a thread. If it never receives anything, that thread will never do anything. An actor on react does not occupy any thread until it receives something. Once it receives something, a thread gets allocated to it, and it is initialized in it.
Now, the initialization part is important. A receiving thread is expected to return something, a reacting thread is not. So the previous stack state at the end of the last react can be, and is, wholly discarded. Not needing to either save or restore the stack state makes the thread faster to start.
There are various performance reasons why you might want one or other. As you know, having too many threads in Java is not a good idea. On the other hand, because you have to attach an actor to a thread before it can react, it is faster to receive a message than react to it. So if you have actors that receive many messages but do very little with it, the additional delay of react might make it too slow for your purposes.

The answer is "yes" - if your actors are not blocking on anything in your code and you are using react, then you can run your "concurrent" program within a single thread (try setting the system property actors.maxPoolSize to find out).
One of the more obvious reasons why it is necessary to discard the call stack is that otherwise the loop method would end in a StackOverflowError. As it is, the framework rather cleverly ends a react by throwing a SuspendActorException, which is caught by the looping code which then runs the react again via the andThen method.
Have a look at the mkBody method in Actor and then the seq method to see how the loop reschedules itself - terribly clever stuff!

Those statements of "discarding the stack" confused me also for a while and I think I get it now and this is my understanding now. In case of "receive" there is a dedicated thread blocking on the message (using object.wait() on a monitor) and this means that the complete thread stack is available and ready to continue from the point of "waiting" on receiving a message.
For example if you had the following code
def a = 10;
while (! done) {
receive {
case msg => println("MESSAGE RECEIVED: " + msg)
}
println("after receive and printing a " + a)
}
the thread would wait in the receive call until the message is received and then would continue on and print the "after receive and printing a 10" message and with the value of "10" which is in the stack frame before the thread blocked.
In case of react there is no such dedicated thread, the whole method body of the react method is captured as a closure and is executed by some arbitrary thread on the corresponding actor receiving a message. This means only those statements that can be captured as a closure alone will be executed and that's where the return type of "Nothing" comes to play. Consider the following code
def a = 10;
while (! done) {
react {
case msg => println("MESSAGE RECEIVED: " + msg)
}
println("after react and printing a " + a)
}
If react had a return type of void, it would mean that it is legal to have statements after the "react" call ( in the example the println statement that prints the message "after react and printing a 10"), but in reality that would never get executed as only the body of the "react" method is captured and sequenced for execution later (on the arrival of a message). Since the contract of react has the return type of "Nothing" there cannot be any statements following react, and there for there is no reason to maintain the stack. In the example above variable "a" would not have to be maintained as the statements after the react calls are not executed at all. Note that all the needed variables by the body of react is already be captured as a closure, so it can execute just fine.
The java actor framework Kilim actually does the stack maintenance by saving the stack which gets unrolled on the react getting a message.

Just to have it here:
Event-Based Programming without Inversion of Control
These papers are linked from the scala api for Actor and provide the theoretical framework for the actor implementation. This includes why react may never return.

I haven't done any major work with scala /akka, however i understand that there is a very significant difference in the way actors are scheduled.
Akka is just a smart threadpool which is time slicing execution of actors...
Every time slice will be one message execution to completion by an actor unlike in Erlang which could be per instruction?!
This leads me to think that react is better as it hints the current thread to consider other actors for scheduling where as receive "might" engage the current thread to continue executing other messages for the same actor.

Related

play - how to wrap a blocking code with futures

I am trying to understand the difference between the 2 methods, in terms of functionality.
class MyService (blockService: BlockService){
def doSomething1(): Future[Boolean] = {
//do
//some non blocking
//stuff
val result = blockService.block()
Future.successful(result)
}
def doSomething2(): Future[Boolean] = {
Future{
//do
//some non blocking
//stuff
blockService.block()
}
}
}
To my understanding the difference between the 2 is which thread is the actual thread that will be blocked.
So if there is a thread: thread_1 that execute something1, thread_1 will be the one that is blocked, while if a thread_1 executed something2a new thread will run it - thread_2, and thread_2 is the one to be blocked.
Is this true?
If so, than there is no really a preferred way to write this code? if I don't care which thread will eventually be blocked, then the end result will be the same.
dosomething1 seems like a weird way to write this code, I would choose dosomething2.
Make sense?
Yes, doSomething1 and doSomething2 blocks different threads, but depending on your scenario, this is an important decision.
As #AndreasNeumann said, you can have different execution contexts in doSomething2. Imagine that the main execution context is the one receiving HTTP requests from your users. Block threads in this context is bad because you can easily exhaust the execution context and impact requests that have nothing to do with doSomething.
Play docs have a better explanation about the possible problems with having blocking code:
If you plan to write blocking IO code, or code that could potentially do a lot of CPU intensive work, you need to know exactly which thread pool is bearing that workload, and you need to tune it accordingly. Doing blocking IO without taking this into account is likely to result in very poor performance from Play framework, for example, you may see only a few requests per second being handled, while CPU usage sits at 5%. In comparison, benchmarks on typical development hardware (eg, a MacBook Pro) have shown Play to be able to handle workloads in the hundreds or even thousands of requests per second without a sweat when tuned correctly.
In your case, both methods are being executed using Play default thread pool. I suggest you to take a look at the recommended best practices and see if you need a different execution context or not. I also suggest you to read Akka docs about Dispatchers and Futures to gain a better understanding about what executing Futures and have blocking/non-blocking code.
This approach makes sense if you make use of different execution contexts in the second method.
So having for example one for answering requests and another for blocking requests.
So you would use the normal playExecutionContext to keep you application running and answering and separate blocking operation in a different one.
def doSomething2(): Future[Boolean] = Future{
blocking { blockService.block() }
}( mySpecialExecutionContextForBlockingOperations )
For a little more information: http://docs.scala-lang.org/overviews/core/futures.html#blocking
You are correct. I don't see a point in doSomething1. It simply complicates the interface for the caller while not providing the benefits of an asynchronous API.
Does BlockService handle blocking operation?
Normally, use blocking ,as #Andreas remind,to make blocking operation into another thread is meanful.

Clarification about Scala Future that never complete and its effect on other callbacks

While re-reading scala.lan.org's page detailing Future here, I have stumbled up on the following sentence:
In the event that some of the callbacks never complete (e.g. the callback contains an infinite loop), the other callbacks may not be executed at all. In these cases, a potentially blocking callback must use the blocking construct (see below).
Why may the other callbacks not be executed at all? I may install a number of callbacks for a given Future. The thread that completes the Future, may or may not execute the callbacks. But, because one callback is not playing footsie, the rest should not be penalized, I think.
One possibility I can think of is the way ExecutionContext is configured. If it is configured with one thread, then this may happen, but that is a specific behaviour and a not generally expected behaviour.
Am I missing something obvious here?
Callbacks are called within an ExecutionContext that has an eventually limited number of threads - if not by the specific context implementation, then by the underlying operating system and/or hardware itself.
Let's say your system's limit is OS_LIMIT threads. You create OS_LIMIT + 1 callbacks. From those, OS_LIMIT callbacks immediately get a thread each - and none ever terminate.
How can you guarantee that the remaining 1 callback ever gets a thread?
Sure, there could be some detection mechanisms built into the Scala library, but it's not possible in the general case to make an optimal implementation: maybe you want the callback to run for a month.
Instead (and this seems to be the approach in the Scala library), you could provide facilities for handling situations that you, the developer, know are risky. This removes the element of surprise from the system.
Perhaps most importantly - it enables the developer to "bake in" the necessary information about handler/task characteristics directly into his/her program, rather than relying on some obscure piece of language functionality (which may change from version to version).

In Scala, does Futures.awaitAll terminate the thread on timeout?

So I'm writing a mini timeout library in scala, it looks very similar to the code here: How do I get hold of exceptions thrown in a Scala Future?
The function I execute is either going to complete successfully, or block forever, so I need to make sure that on a timeout the executing thread is cancelled.
Thus my question is: On a timeout, does awaitAll terminate the underlying actor, or just let it keep running forever?
One alternative that I'm considering is to use the java Future library to do this as there is an explicit cancel() method one can call.
[Disclaimer - I'm new to Scala actors myself]
As I read it, scala.actors.Futures.awaitAll waits until the list of futures are all resolved OR until the timeout. It will not Future.cancel, Thread.interrupt, or otherwise attempt to terminate a Future; you get to come back later and wait some more.
The Future.cancel may be suitable, however be aware that your code may need to participate in effecting the cancel operation - it doesn't necessarily come for free. Future.cancel cancels a task that is scheduled, but not yet started. It interrupts a running thread [setting a flag that can be checked]... which may or may not acknowledge the interrupt. Review Thread.interrupt and Thread.isInterrupted(). Your long-running task would normally check to see if it's being interrupted (your code), and self-terminate. Various methods (i.e. Thread.sleep, Object.wait and others) respond to the interrupt by throwing InterruptedException. You need to review & understand that mechanism to ensure your code will meet your needs within those constraints. See this.

What is the difference between GCD Dispatch Sources and select()?

I've been writing some code that replaces some existing:
while(runEventLoop){
if(select(openSockets, readFDS, writeFDS, errFDS, timeout) > 0){
// check file descriptors for activity and dispatch events based on same
}
}
socket reading code. I'd like to change this to use a GCD queue, so that I can pop events on to the queue using dispatch_async instead of maintaining a "must be called on next iteration" array. I also am already using a GCD queue to /contain/ this particular action, hence wanting to devolve it to a more natural GCD dispatch form. ( not a while() loop monopolizing a serial queue )
However, when I tried to refactor this into a form that relied on dispatch sources fired from event handlers tied to DISPATCH_SOURCE_TYPE_READ and DISPATCH_SOURCE_TYPE_WRITE on the socket descriptors, the library code that depended on this scheduling stopped working. My first assumption is that I'm misunderstanding the use of DISPATCH_SOURCE_TYPE_READ and DISPATCH_SOURCE_TYPE_WRITE - I had assumed that they would yield roughly the same behavior as calling select() with those socket descriptors.
Do I misunderstand GCD dispatch sources? Or, regarding the refactor, am I using it in a situation where it is not best suited?
The short answer to your question is: none. There are no differences, both GCD dispatch sources and select() do the same thing: they notify the user that a specific kernel event happened or that a particular condition holds true.
Note that, on a mac or iOS device you should not use select(), but rather the more advanced kqueue() and kevent() (or kevent64()).
You may certainly convert the code to use GCD dispatch sources, but you need to be careful not to break other code relying on this. So, this needs a complete inspection of the whole code handling signals, file descriptors, socket and all of the other low level kernel events.
May be a simpler solution could be to maintain the original code, simply adding GCD code in the part that react to events. Here, you dispatch events on different queues depending on the particular type of event.

Scala actors - worst practices? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I feel a bit insecure about using actors in Scala. I have read documentation about how to do stuff, but I guess I would also need some DON'T rules in order to feel free to use them.
I think I am afraid that I will use them in a wrong way, and I will not even notice it.
Can you think of something, that, if applied, would result in breaking the benefits that Scala actors bring, or even erroneous results?
Avoid !? wherever possible. You will get a locked system!
Always send a message from an Actor-subsystem thread. If this means creating a transient Actor via the Actor.actor method then so be it:
case ButtonClicked(src) => Actor.actor { controller ! SaveTrade(trdFld.text) }
Add an "any other message" handler to your actor's reactions. Otherwise it is impossible to figure out if you are sending a message to the wrong actor:
case other => log.warning(this + " has received unexpected message " + other
Don't use Actor.actor for your primary actors, sublcass Actor instead. The reason for this is that it is only by subclassing that you can provide a sensible toString method. Again, debugging actors is very difficult if your logs are littered with statements like:
12:03 [INFO] Sending RequestTrades(2009-10-12) to scala.actors.Actor$anonfun$1
Document the actors in your system, explicitly stating what messages they will receive and precisely how they should calculate the response. Using actors results in the conversion of a standard procedure (normally encapsulated within a method) to become logic spread across multiple actor's reactions. It is easy to get lost without good documentation.
Always make sure you can communicate with your actor outside of its react loop to find its state. For example, I always declare a method to be invoked via an MBean which looks like the following code snippet. It can otherwise be very difficult to tell if your actor is running, has shut down, has a large queue of messages etc.
.
def reportState = {
val _this = this
synchronized {
val msg = "%s Received request to report state with %d items in mailbox".format(
_this, mailboxSize)
log.info(msg)
}
Actor.actor { _this ! ReportState }
}
Link your actors together and use trapExit = true - otherwise they can fail silently meaning your program is not doing what you think it is and will probably go out of memory as messages remain in the actor's mailbox.
I think that some other interesting choices around design-decisions to be made using actors have been highlighted here and here
I know this doesn't really answer the question, but you should at least take heart in the fact that message-based concurrency is much less prone to wierd errors than shared-memory-thread-based concurrency.
I presume you have seen the actor guidelines in Programming in Scala, but for the record:
Actors should not block while processing a message. Where you might want to block try to arrange to get a message later instead.
Use react {} rather than receive {} when possible.
Communicate with actors only via messages.
Prefer immutable messages.
Make messages self-contained.