I have an actor which takes the result from another actor and applies some check on it.
class Actor1(actor2:Actor2) {
def receive = {
case SomeMessage =>
val r = actor2 ? NewMessage()
r.map(someTransform).pipeTo(sender)
}
}
now if I make an ask of Actor1, we now have 2 futures generated, which doesnt seem overly efficient. Is there a way to provide a foward with some kind of continuation, or some other approach I could use here?
case SomeMessage => actor2.forward(NewMessage, someTransform)
Futures are executed in an ExecutionContext, which are like thread pools. Creating a new future is not as expensive as creating a new thread, but it has its cost. The best way to work with futures is to create as much as needed and compose then in a way that things that can be computed in parallel are computed in parallel if the necessary resources are available. This way you will make the best use of your machine.
You mentioned that akka documentation discourages excessive use of futures. I don't know where you read this, but what I think it means is to prefer transforming futures rather than creating your own. This is exactly what you are doing by using map. Also, it may mean that if you create a future where it is not needed you are adding unnecessary overhead.
In your case you have a call that returns a future and you need to apply sometransform and return the result. Using map is the way to go.
Related
I have some cats IO operations, and Future among then. Simplified:
IO(getValue())
.flatMap(v => IO.fromFuture(IO(blockingProcessValue(v)))(myBlockingPoolContextShift))
.map(moreProcessing)
So I have some value in IO, then I need to do some blocking operation using a library that returns Future, and then I need to do some more processing on the value returned from Future
Future runs on a dedicated thread pool - so far so good. The problem is after Future is completed. moreProcessing runs on the same thread the Future was running on.
Is there a way to get back to the thread getValue() was running on?
After the discussion in the chat, the conclusion is that the only thing OP needs is to create a ContextShift in the application entry point using the appropriate (compute) EC and then pass it down to the class containing this method.
// Entry point
val computeEC = ???
val cs = IO.contextShift(computeEC)
val myClass = new MyClass(cs, ...)
// Inside the method on MyClass
IO(getValue())
.flatMap(v => IO.fromFuture(IO(blockingProcessValue(v)))(myBlockingPoolContextShift))
.flatTap(_ => cs.shift)
.map(moreProcessing)
This Scastie showed an approach using Blocker and other techniques common in the Typelevel ecosystem but were not really suitable for OP's use case; anyways I find it useful for future readers who may have a similar problem.
We have a fairly complex system developed using Akka HTTP and Actors model. Until now, we extensively used ask pattern and mixed Futures and Actors.
For example, an actor gets message, it needs to execute 3 operations in parallel, combine a result out of that data and returns it to sender. What we used is
declare a new variable in actor receive message callback to store a sender (since we use Future.map it can be another sender).
executed all those 3 futures in parallel using Future.sequence (sometimes its call of function that returns a future and sometimes it is ask to another actor to get something from it)
combine the result of all 3 futures using map or flatMap function of Future.sequence result
pipe a final result to a sender using pipeTo
Here is a code simplified:
case RetrieveData(userId, `type`, id, lang, paging, timeRange, platform) => {
val sen = sender
val result: Future[Seq[Map[String, Any]]] = if (paging.getOrElse(Paging(0, 0)) == Paging(0, 0)) Future.successful(Seq.empty)
else {
val start = System.currentTimeMillis()
val profileF = profileActor ? Get(userId)
Future.sequence(Seq(profileF, getSymbols(`type`, id), getData(paging, timeRange, platform)).map { result =>
logger.info(s"Got ${result.size} news in ${System.currentTimeMillis() - start} ms")
result
}.recover { case ex: Throwable =>
logger.error(s"Failure on getting data: ${ex.getMessage}", ex)
Seq.empty
}
}
result.pipeTo(sen)
}
Function getAndProcessData contains Future.sequence with executing 3 futures in parallel.
Now, as I'm reading more and more on Akka, I see that using ask is creating another actor listener. Questions are:
As we extensively use ask, can it lead to a to many threads used in a system and perhaps a thread starvation sometimes?
Using Future.map much also means different thread often. I read about one thread actor illusion which can be easily broken with mixing Futures.
Also, can this affect performances in a bad way?
Do we need to store sender in temp variable send, since we're using pipeTo? Could we do only pipeTo(sender). Also, does declaring sen in almost each receive callback waste to much resources? I would expect its reference will be removed once operation in complete.
Is there a chance to design such a system in a better way, meadning that we don't use map or ask so much? I looked at examples when you just pass a replyTo reference to some actor and the use tell instead of ask. Also, sending message to self and than replying to original sender can replace working with Future.map in some scenarios. But how it can be designed having in mind we want to perform 3 async operations in parallel and returns a formatted data to a sender? We need to have all those 3 operations completed to be able to format data.
I tried not to include to many examples, I hope you understand our concerns and problems. Many questions, but I would really love to understand how it works, simple and clear
Thanks in advance
If you want to do 3 things in parallel you are going to need to create 3 Future values which will potentially use 3 threads, and that can't be avoided.
I'm not sure what the issue with map is, but there is only one call in this code and that is not necessary.
Here is one way to clean up the code to avoid creating unnecessary Future values (untested!):
case RetrieveData(userId, `type`, id, lang, paging, timeRange, platform) =>
if (paging.forall(_ == Paging(0, 0))) {
sender ! Seq.empty
} else {
val sen = sender
val start = System.currentTimeMillis()
val resF = Seq(
profileActor ? Get(userId),
getSymbols(`type`, id),
getData(paging, timeRange, platform),
)
Future.sequence(resF).onComplete {
case Success(result) =>
val dur = System.currentTimeMillis() - start
logger.info(s"Got ${result.size} news in $dur ms")
sen ! result
case Failure(ex)
logger.error(s"Failure on getting data: ${ex.getMessage}", ex)
sen ! Seq.empty
}
}
You can avoid ask by creating your own worker thread that collects the different results and then sends the result to the sender, but that is probably more complicated than is needed here.
An actor only consumes a thread in the dispatcher when it is processing a message. Since the number of messages the actor spawned to manage the ask will process is one, it's very unlikely that the ask pattern by itself will cause thread starvation. If you're already very close to thread starvation, an ask might be the straw that breaks the camel's back.
Mixing Futures and actors can break the single-thread illusion, if and only if the code executing in the Future accesses actor state (meaning, basically, vars or mutable objects defined outside of a receive handler).
Request-response and at-least-once (between them, they cover at least most of the motivations for the ask pattern) will in general limit throughput compared to at-most-once tells. Implementing request-response or at-least-once without the ask pattern might in some situations (e.g. using a replyTo ActorRef for the ultimate recipient) be less overhead than piping asks, but probably not significantly. Asks as the main entry-point to the actor system (e.g. in the streams handling HTTP requests or processing messages from some message bus) are generally OK, but asks from one actor to another are a good opportunity to streamline.
Note that, especially if your actor imports context.dispatcher as its implicit ExecutionContext, transformations on Futures are basically identical to single-use actors.
Situations where you want multiple things to happen (especially when you need to manage partial failure (Future.sequence.recover is a possible sign of this situation, especially if the recover gets nontrivial)) are potential candidates for a saga actor to organize one particular request/response.
I would suggest instead of using Future.sequence, Souce from Akka can be used which will run all the futures in parallel, in which you can provide the parallelism also.
Here is the sample code:
Source.fromIterator( () => Seq(profileF, getSymbols(`type`, id), getData(paging, timeRange, platform)).iterator )
.mapAsync( parallelism = 1 ) { case (seqIdValue, row) =>
row.map( seqIdValue -> _ )
}.runWith( Sink.seq ).map(_.map(idWithDTO => idWithDTO))
This will return Future[Seq[Map[String, Any]]]
The use case is actually fairly typical. A lot of web services use authorization tokens that you retrieve at the start of a session and you need to send those back on subsequent requests.
I know I can do it like this:
lazy val myData = {
val request = ws.url("/some/url").withAuth(user, password, WSAuthScheme.BASIC).withHeaders("Accept" -> "application/json")
Await.result(request.get().map{x => x.json }, 120.seconds)
}
That just feels wrong as all the docs say never us Await.
Is there a Future/Promise Scala style way of handling this?
I've found .onComplete which allows me to run code upon the completion of a Promise however without using a (mutable) var I see no way of getting a value in that scope into a lazy val in a different scope. Even with a var there is a possible timing issue -- hence the evils of mutable variables :)
Any other way of doing this?
Unfortunately, there is no way to make this non-blocking - lazy vals are designed to be synchronous and to block any thread accessing them until they are completed with a value (internally a lazy val is represented as a simple synchronized block).
A Future/Promise Scala way would be to use a Future[T] or a Promise[T] instead of a val x: T, but that way implies a great deal of overhead with executionContexts and maps upon each use of the val, and more optimal resource utilization may not be worth the decreased readability in all cases, so it may be OK to leave the Await there if you extensively use the value in many parts of your application.
Came across a problem I did not find an answer yet.
Running on playframework 2 with Scala.
Was required to write an Action method that performs multiple Future calls.
My question:
1) Is the attached code non-blocking and hence looking the way it should be ?
2) Is there a guarantee that both DAO results are caught at any given time ?
def index = Action.async {
val t2:Future[Tuple2[List[PlayerCol],List[CreatureCol]]] = for {
p <- PlayerDAO.findAll()
c <- CreatureDAO.findAlive()
}yield(p,c)
t2.map(t => Ok(views.html.index(t._1, t._2)))
}
Thanks for your feedback.
Is the attached code non-blocking and hence looking the way it should be ?
That depends on a few things. First, I'm going to assume that PlayerDAO.findAll() and CreatureDAO.findAlive() return Future[List[PlayerCol]] and Future[List[CreatureCol]] respectively. What matters most is what these functions are actually calling themselves. Are they making JDBC calls, or using an asynchronous DB driver?
If the answer is JDBC (or some other synchronous db driver), then you're still blocking, and there's no way to make it fully "non-blocking". Why? Because JDBC calls block their current thread, and wrapping them in a Future won't fix that. In this situation, the most you can do is have them block a different ExecutionContext than the one Play is using to handle requests. This is generally a good idea, because if you have several db requests running concurrently, they can block Play's internal thread pool used for handling HTTP requests, and suddenly your server will have to wait to handle other requests (even if they don't require database calls).
For more on different ExecutionContexts see the thread pools documentation and this answer.
If you're answer is an asynchronous database driver like reactive mongo (there's also scalike-jdbc, and maybe some others), then you're in good shape, and I probably made you read a little more than you had to. In that scenario your index controller function would be fully non-blocking.
Is there a guarantee that both DAO results are caught at any given time ?
I'm not quite sure what you mean by this. In your current code, you're actually making these calls in sequence. CreatureDAO.findAlive() isn't executed until PlayerDAO.findAll() has returned. Since they are not dependent on each other, it seems like this isn't intentional. To make them run in parallel, you should instantiate the Futures before mapping them in a for-comprehension:
def index = Action.async {
val players: Future[List[PlayerCol]] = PlayerDAO.findAll()
val creatures: Future[List[CreatureCol]] = CreatureDAO.findAlive()
val t2: Future[(List[PlayerCol], List[CreatureCol])] = for {
p <- players
c <- creatures
} yield (p, c)
t2.map(t => Ok(views.html.index(t._1, t._2)))
}
The only thing you can guarantee about having both results being completed is that yield isn't executed until the Futures have completed (or never, if they failed), and likewise the body of t2.map(...) isn't executed until t2 has been completed.
Further reading:
Are there any benefits in using non-async actions in Play Framework 2.2?
Understanding the Difference Between Non-Blocking Web Service Calls vs Non-Blocking JDBC
Suppose I need to run a few concurrent tasks.
I can wrap each task in a Future and wait for their completion. Alternatively I can create an Actor for each task. Each Actor would execute its task (e.g. upon receiving a "start" message) and send the result back.
I wonder when I should use the former (with Futures) and the latter (with Actors) approach and why the Future approach is considered better for the case described above.
Because it is syntactically simpler.
val tasks: Seq[() => T] = ???
val futures = tasks map {
t => future { t() }
}
val results: Future[Seq[T]] = Future.sequence(futures)
The results future you can then wait on using Await.result or you can map it further/use it in for-comprehension or install callbacks on it.
Compare that to instantiating all the actors, sending messages to them, coding their receive blocks, receiving responses from them and shutting them down -- that would generally require more boilerplate.
As a general rule, use the simplest concurrency model that fits your application, rather than the most powerful. Ordering from simplest to most complex would be sequential programming->parallel collections->futures->stateless actors->stateful actors->threads with software transactional memory->threads with explicit locking->threads with lock-free algorithms. Pick the first one in this list that solves your problem. The farther down that list you go, the greater the complexities and risks, so you're better off trading simplicity for conceptual power.
I tend to think that actors are useful when you have interacting threads. In your case, it appears to be that all the jobs are independent; I would use futures.