How to create non-blocking methods in Scala? - scala

What is a good way of creating non-blocking methods in Scala? One way I can think of is to create a thread/actor and the method just send a message to the thread and returns. Is there a better way of creating a non-blocking method?

Use scala.actors.Future:
import actors._
def asyncify[A, B](f: A => B): A => Future[B] = (a => Futures.future(f(a)))
// normally blocks when called
def sleepFor(seconds: Int) = {
Thread.sleep(seconds * 1000)
seconds
}
val asyncSleepFor = asyncify(sleepFor)
val future = asyncSleepFor(5) // now it does NOT block
println("waiting...") // prints "waiting..." rightaway
println("future returns %d".format(future())) // prints "future returns 5" after 5 seconds
Overloaded "asyncify" that takes a function with more than one parameter is left as an exercise.
One caveat, however, is exception handling. The function that is being "asyncified" has to handle all exceptions itself by catching them. Behavior for exceptions thrown out of the function is undefined.

Learn about actors.

It depends on your definition of "blocking." Strictly speaking, anything that requires acquisition of a lock is blocking. All operations that are dependent on an actor's internal state acquire a lock on the actor. This includes message sends. If lots of threads try to send a message to an actor all-at-once they have to get in line.
So if you really need non-blocking, there are various options in java.util.concurrent.
That being said, from a practical perspective actors give you something that close enough to non-blocking because none of the synchronized operations do a significant amount of work, so chances are actors meet your need.

Related

Alternative to using Future.sequence inside Akka Actors

We have a fairly complex system developed using Akka HTTP and Actors model. Until now, we extensively used ask pattern and mixed Futures and Actors.
For example, an actor gets message, it needs to execute 3 operations in parallel, combine a result out of that data and returns it to sender. What we used is
declare a new variable in actor receive message callback to store a sender (since we use Future.map it can be another sender).
executed all those 3 futures in parallel using Future.sequence (sometimes its call of function that returns a future and sometimes it is ask to another actor to get something from it)
combine the result of all 3 futures using map or flatMap function of Future.sequence result
pipe a final result to a sender using pipeTo
Here is a code simplified:
case RetrieveData(userId, `type`, id, lang, paging, timeRange, platform) => {
val sen = sender
val result: Future[Seq[Map[String, Any]]] = if (paging.getOrElse(Paging(0, 0)) == Paging(0, 0)) Future.successful(Seq.empty)
else {
val start = System.currentTimeMillis()
val profileF = profileActor ? Get(userId)
Future.sequence(Seq(profileF, getSymbols(`type`, id), getData(paging, timeRange, platform)).map { result =>
logger.info(s"Got ${result.size} news in ${System.currentTimeMillis() - start} ms")
result
}.recover { case ex: Throwable =>
logger.error(s"Failure on getting data: ${ex.getMessage}", ex)
Seq.empty
}
}
result.pipeTo(sen)
}
Function getAndProcessData contains Future.sequence with executing 3 futures in parallel.
Now, as I'm reading more and more on Akka, I see that using ask is creating another actor listener. Questions are:
As we extensively use ask, can it lead to a to many threads used in a system and perhaps a thread starvation sometimes?
Using Future.map much also means different thread often. I read about one thread actor illusion which can be easily broken with mixing Futures.
Also, can this affect performances in a bad way?
Do we need to store sender in temp variable send, since we're using pipeTo? Could we do only pipeTo(sender). Also, does declaring sen in almost each receive callback waste to much resources? I would expect its reference will be removed once operation in complete.
Is there a chance to design such a system in a better way, meadning that we don't use map or ask so much? I looked at examples when you just pass a replyTo reference to some actor and the use tell instead of ask. Also, sending message to self and than replying to original sender can replace working with Future.map in some scenarios. But how it can be designed having in mind we want to perform 3 async operations in parallel and returns a formatted data to a sender? We need to have all those 3 operations completed to be able to format data.
I tried not to include to many examples, I hope you understand our concerns and problems. Many questions, but I would really love to understand how it works, simple and clear
Thanks in advance
If you want to do 3 things in parallel you are going to need to create 3 Future values which will potentially use 3 threads, and that can't be avoided.
I'm not sure what the issue with map is, but there is only one call in this code and that is not necessary.
Here is one way to clean up the code to avoid creating unnecessary Future values (untested!):
case RetrieveData(userId, `type`, id, lang, paging, timeRange, platform) =>
if (paging.forall(_ == Paging(0, 0))) {
sender ! Seq.empty
} else {
val sen = sender
val start = System.currentTimeMillis()
val resF = Seq(
profileActor ? Get(userId),
getSymbols(`type`, id),
getData(paging, timeRange, platform),
)
Future.sequence(resF).onComplete {
case Success(result) =>
val dur = System.currentTimeMillis() - start
logger.info(s"Got ${result.size} news in $dur ms")
sen ! result
case Failure(ex)
logger.error(s"Failure on getting data: ${ex.getMessage}", ex)
sen ! Seq.empty
}
}
You can avoid ask by creating your own worker thread that collects the different results and then sends the result to the sender, but that is probably more complicated than is needed here.
An actor only consumes a thread in the dispatcher when it is processing a message. Since the number of messages the actor spawned to manage the ask will process is one, it's very unlikely that the ask pattern by itself will cause thread starvation. If you're already very close to thread starvation, an ask might be the straw that breaks the camel's back.
Mixing Futures and actors can break the single-thread illusion, if and only if the code executing in the Future accesses actor state (meaning, basically, vars or mutable objects defined outside of a receive handler).
Request-response and at-least-once (between them, they cover at least most of the motivations for the ask pattern) will in general limit throughput compared to at-most-once tells. Implementing request-response or at-least-once without the ask pattern might in some situations (e.g. using a replyTo ActorRef for the ultimate recipient) be less overhead than piping asks, but probably not significantly. Asks as the main entry-point to the actor system (e.g. in the streams handling HTTP requests or processing messages from some message bus) are generally OK, but asks from one actor to another are a good opportunity to streamline.
Note that, especially if your actor imports context.dispatcher as its implicit ExecutionContext, transformations on Futures are basically identical to single-use actors.
Situations where you want multiple things to happen (especially when you need to manage partial failure (Future.sequence.recover is a possible sign of this situation, especially if the recover gets nontrivial)) are potential candidates for a saga actor to organize one particular request/response.
I would suggest instead of using Future.sequence, Souce from Akka can be used which will run all the futures in parallel, in which you can provide the parallelism also.
Here is the sample code:
Source.fromIterator( () => Seq(profileF, getSymbols(`type`, id), getData(paging, timeRange, platform)).iterator )
.mapAsync( parallelism = 1 ) { case (seqIdValue, row) =>
row.map( seqIdValue -> _ )
}.runWith( Sink.seq ).map(_.map(idWithDTO => idWithDTO))
This will return Future[Seq[Map[String, Any]]]

Using future callback inside akka actor

I've found in Akka docs:
When using future callbacks, such as onComplete, onSuccess, and onFailure, inside actors you need to carefully avoid closing over the containing actor’s reference, i.e. do not call methods or access mutable state on the enclosing actor from within the callback.
So does it mean that i should always use future pipeTo self and then call some functions? Or i can still use callbacks with method, then how should i avoid concurrency bugs?
It means this:
class NotThreadSafeActor extends Actor {
import context.dispatcher
var counter = 0
def receive = {
case any =>
counter = counter + 1
Future {
// do something else on a future
Thread.sleep(2000)
}.onComplete {
_ => counter = counter + 1
}
}
}
In this example, both the actor's receive method, and the Future's onComplete change the mutable variable counter. In this toy example its easier to see, but the Future call might be nested methods that equally capture a mutable variable.
The issue is that the onComplete call might execute on a different thread to the actor itself, so its perfectly possible to have one thread executing receive and another executing onComplete thus giving you a race condition. Which negates the point of an actor in the first place.
Yes, you should send a message to the enclosing actor if the callback mutates internal state of the actor. This is the easiest (and preferred) way to avoid races.
I think I would be remiss if I did not mention here that I've made a small utility for circumventing this limitation. In other words, my answer to your question is No, you shouldn't use such an inconvenient workaround, you should use https://github.com/makoConstruct/RequestResponseActor
how does it work?
Basically, behind the futures and the promises, it transmits every query in a Request(id:Int, content:Any), and when it receives Response(id, result) it completes the future that corresponds to id with the value of result. It's also capable of transmitting failures, and as far as I can tell, akka can only register query timeouts. The RequestResponseActor supplies a special implicit execution context to apply to callbacks attached to the futures waiting for a Response message. This blunt execution context ensures they're executed while the Response message is being processed, thus ensuring the Actor has exclusive access to its state when the future's callbacks fire.
Maybe this can help. It is an experiment I did and the test is quite conclusive... however, it is still an experiment, so do not take that as an expertise.
https://github.com/Adeynack/ScalaLearning/tree/master/ActorThreadingTest/src/main/scala/david/ActorThreadingTest
Open to comments or suggestions, of course.
Futures with actors is a subject I am very interested in.

Clarification needed about futures and promises in Scala

I am trying to get my head around Scala's promise and future constructs.
I've been reading Futures and Promises in Scala Documentation and am a bit confused as I've got a feeling that the concepts of promises and futures are mixed up.
In my understanding a promise is a container that we could populate
value in a later point. And future is some sort of an asynchronous
operation that would complete in a different execution path.
In Scala we can obtain a result using the attached callbacks to future.
Where I'm lost is how promise has a future?
I have read about these concepts in Clojure too, assuming that promise and future have some generic common concept, but it seems like I was wrong.
A promise p completes the future returned by p.future. This future is
specific to the promise p. Depending on the implementation, it may be
the case that p.future eq p.
val p = promise[T]
val f = p.future
You can think of futures and promises as two different sides of a pipe.
On the promise side, data is pushed in, and on the future side, data can be pulled out.
And future is some sort of an asynchronous operation that would complete in a different execution path.
Actually, a future is a placeholder object for a value that may be become available at some point in time, asynchronously. It is not the asynchronous computation itself.
The fact that there is a future constructor called future that returns such a placeholder object and spawns an asynchronous computation that completes this placeholder object does not mean that the asynchronous computation is called a future. There are also other future constructors/factory methods.
But the point I do not get is how promise has a future?
To divide promises and futures into 2 separate interfaces was a design decision. You could have these two under the same interface Future, but that would then allow clients of futures to complete them instead of the intended completer of the future. This would cause unexpected errors, as there could be any number of contending completers.
E.g. for the asynchronous computation spawned by the future construct, it would no longer be clear whether it has to complete the promise, or if the client will do it.
Futures and promises are intended to constrain the flow of data in the program.
The idea is to have a future client that subscribes to the data to act on it once the data arrives.
The role of the promise client is to provide that data.
Mixing these two roles can lead to programs that are harder to understand or reason about.
You might also ask why the Promise trait does not extend Future. This is another design decision to discourage programmers from blindly passing Promises to clients where they should upcast the Promise to Future (this upcast is prone to be left out, whereas having to explicitly call future on the promise ensures you call it every time). In other words, by returning a promise you are giving the right to complete it to somebody else, and by returning the future you are giving the right to subscribe to it.
EDIT:
If you would like to learn more about futures, Chapter 4 in the Learning Concurrent Programming in Scala book describes them in detail. Disclaimer: I'm the author of the book.
The difference between the two is that futures are usually centered around the computation while promises are centered around data.
It seems your understanding matches this, but let me explain what I mean:
In both scala and clojure futures are (unless returned by some other function/method) created with some computation:
// scala
future { do_something() }
;; clojure
(future (do-something))
In both cases the "return-value" of the future can only be read (without blocking) only after the computation has terminated. When this is the case is typically outside the control of the programmer, as the computation gets executed in some thread (pool) in the background.
In contrast in both cases promises are an initially empty container, which can later be filled (exactly once):
// scala
val p = promise[Int]
...
p success 10 // or failure Exception()
;; clojure
(def p (promise))
(deliver p 10)
Once this is the case it can be read.
Reading the futures and promises is done through deref in clojure (and realized? can be used to check if deref will block). In scala reading is done through the methods provided by the Future trait. In order to read the result of a promise we thus have to obtain an object implementing Future, this is done by p.future. Now if the trait Future is implemented by a Promise, then p.future can return this and the two are equal. This is purely a implementation choice and does not change the concepts. So you were not wrong!
In any case Futures are mostly dealt with using callbacks.
At this point it might be worthwhile to reconsider the initial characterization of the two concepts:
Futures represent a computation that will produce a result at some point. Let's look at one possible implementation: We run the code in some thread(pool) and once its done, we arrange use the return value to fulfill a promise. So reading the result of the future is reading a promise; This is clojure's way of thinking (not necessarily of implementation).
On the other hand a promise represents a value that will be filled at some point. When it gets filled this means that some computation produced a result. So in a way this is like a future completing, so we should consume the value in the same way, using callbacks; This is scala's way of thinking.
Note that under the hood Future is implemented in terms of Promise and this Promise is completed with the body you passed to your Future:
def apply[T](body: =>T): Future[T] = impl.Future(body) //here I have omitted the implicit ExecutorContext
impl.Future is an implementation of Future trait:
def apply[T](body: =>T)(implicit executor: ExecutionContext): scala.concurrent.Future[T] =
{
val runnable = new PromiseCompletingRunnable(body)
executor.prepare.execute(runnable)
runnable.promise.future
}
Where PromiseCompletingRunnable looks like this:
class PromiseCompletingRunnable[T](body: => T) extends Runnable {
val promise = new Promise.DefaultPromise[T]()
override def run() = {
promise complete {
try Success(body) catch { case NonFatal(e) => Failure(e) }
}
} }
So you see even though they are seperate concepts that you can make use of independently in reality you can't get Future without using Promise.

When could Futures be more appropriate than Actors (or vice versa) in Scala?

Suppose I need to run a few concurrent tasks.
I can wrap each task in a Future and wait for their completion. Alternatively I can create an Actor for each task. Each Actor would execute its task (e.g. upon receiving a "start" message) and send the result back.
I wonder when I should use the former (with Futures) and the latter (with Actors) approach and why the Future approach is considered better for the case described above.
Because it is syntactically simpler.
val tasks: Seq[() => T] = ???
val futures = tasks map {
t => future { t() }
}
val results: Future[Seq[T]] = Future.sequence(futures)
The results future you can then wait on using Await.result or you can map it further/use it in for-comprehension or install callbacks on it.
Compare that to instantiating all the actors, sending messages to them, coding their receive blocks, receiving responses from them and shutting them down -- that would generally require more boilerplate.
As a general rule, use the simplest concurrency model that fits your application, rather than the most powerful. Ordering from simplest to most complex would be sequential programming->parallel collections->futures->stateless actors->stateful actors->threads with software transactional memory->threads with explicit locking->threads with lock-free algorithms. Pick the first one in this list that solves your problem. The farther down that list you go, the greater the complexities and risks, so you're better off trading simplicity for conceptual power.
I tend to think that actors are useful when you have interacting threads. In your case, it appears to be that all the jobs are independent; I would use futures.

How to safely use reply and !? on a Scala Actor

Depending on a reply from a Scala Actor seems incredibly error-prone to me. Is this truly the idiomatic Scala way to have conversations between actors? Is there an alternative, or a safer use of reply that I'm missing?
(About me: I'm familiar with synchronization in Java, but I've never designed an actor-based system before and don't yet have a full understanding of the paradigm.)
Example mistakes
For a trivial demonstration, let's look at this silly integer-parsing Actor:
import actors._, Actor._
val a = actor {
loop {
react {
case s: String => reply(s.toInt)
}
}
}
We could intend to use this as
scala> a !? "42"
res0: Any = 42
But if the actor fails to reply (in this case because a careless programmer did not think to catch NumberFormatException in the actor), we'll be waiting forever:
scala> a !? "f"
We also make a mistake at the call site. This next example also blocks indefinitely, because the actor does not reply to Int messages:
scala> a !? 42
Timeout
You could use !? (msec: Long, msg: Any) if the expected reply has some known reasonable time bound, but that is not the case in most circumstances I can think of.
Guaranteeing reply
One thought would be to design that actor such that it necessarily replies to every message:
import actors._, Actor._
val a = actor {
loop {
react {
case m => reply {
try {
m match {
case s: String => reply(s.toInt)
case _ => None
}
} catch {
case e => e
}
}
}
}
}
This feels better, although there is still a little fear of accidentally invoking !? on an actor is no longer acting.
I can see your concerns, but I would actually argue that this is not any worse than the synchronization you are used to. Who guarantees that the locks will ever be released again?
Using !? is at your own risk, so no there are no 'safer' uses that I am aware of. Threads can block or die and there is absolutely nothing we can do about it. Except for providing safety-valves that can soften the blow.
The event-based acting actually gives you alternatives to receiving replies synchronously. The timeout is one of them but another thing such as Futures via the !! method. They are designed to handle deadlocks such as that. The method immediately returns a future that can be handled later.
For inspiration and more in-depth design decisions see:
Actors:
http://docs.scala-lang.org/overviews/core/actors.html
Futures (in scala 2.10):
http://docs.scala-lang.org/sips/pending/futures-promises.html
Don't bother with old local actors - learn Akka. Also it's good that you know about synchronized, but personally me - almost never use such a word, even in Java code. Imagine synchronized is deprecated, learn Java memory model, learn CAS.
I am not familiar with the Actor system in the Scala standard library myself, but I highly recommend checking out the Akka toolkit (http://akka.io/) which has "replaced" the Scala Actors and comes with the Scala distribution as of Scala 2.10.
In terms of Actor system design in general, some of the key ideas are asynchronous (non-blocking), isolated mutability, and communication via message passing. Each Actor encapsulates it's own state, nobody else is allowed to touch it. You can send an Actor a message that may "ask" it to change state, but the Actor implementation is free to ignore it. Messages are sent asynchronously (you CAN make it blocking, not recommended). If you want to have some sort of "response" (so that you can associate a message with a previously sent message), the Future API in Scala 2.10 and ask of Akka can help.
Regarding your error format exception and the problem in general, consider looking at the ask and Future API in Scala 2.10 and Akka 2.1. It will handle exceptions and is non-blocking.
Scala 2.10 also has a new Try that is intended as an alternative to the old-fashioned try-catch clauses. The Try has an apply method that you would use like any try (minus the catch and finally). Try has two sub-classes Success and Failure. An instance of Try[T] will have subclasses Success[T] and Failure[Throwable]. It is easier to explain by example:
>>> val x: Try[Int] = Try { "5".toInt } // Success[Int] with encapsulated value 5
>>> val y: Try[Int] = Try { "foo".toInt } // Failure(java.lang.NumberFormatException: For input string: "foo")
Since Try does not throw the actual exception and the subclasses are conveniently case-classes, you could easily use the result as a message to an Actor.