React for futures - scala

I am trying to use a divide-and-conquer (aka fork/join) approach for a number crunching problem. Here is the code:
import scala.actors.Futures.future
private def compute( input: Input ):Result = {
if( pairs.size < SIZE_LIMIT ) {
computeSequential()
} else {
val (input1,input2) = input.split
val f1 = future( compute(input1) )
val f2 = future( compute(input2) )
val result1 = f1()
val result2 = f2()
merge(result1,result2)
}
}
It runs (with a nice speed-up) but the the future apply method seems to block a thread and the thread pool increases tremendously. And when too many threads are created, the computations is stucked.
Is there a kind of react method for futures which releases the thread ? Or any other way to achieve that behavior ?
EDIT: I am using scala 2.8.0.final

Don't claim (apply) your Futures, since this forces them to block and wait for an answer; as you've seen this can lead to deadlocks. Instead, use them monadically to tell them what to do when they complete. Instead of:
val result1 = f1()
val result2 = f2()
merge(result1,result2)
Try this:
for {
result1 <- f1
result2 <- f2
} yield merge(result1, result2)
The result of this will be a Responder[Result] (essentially a Future[Result]) containing the merged results; you can do something effectful with this final value using respond() or foreach(), or you can map() or flatMap() it to another Responder[T]. No blocking necessary, just keep scheduling computations for the future!
Edit 1:
Ok, the signature of the compute function is going to have to change to Responder[Result] now, so how does that affect the recursive calls? Let's try this:
private def compute( input: Input ):Responder[Result] = {
if( pairs.size < SIZE_LIMIT ) {
future(computeSequential())
} else {
val (input1,input2) = input.split
for {
result1 <- compute(input1)
result2 <- compute(input2)
} yield merge(result1, result2)
}
}
Now you no longer need to wrap the calls to compute with future(...) because they're already returning Responder (a superclass of Future).
Edit 2:
One upshot of using this continuation-passing style is that your top-level code--whatever calls compute originally--doesn't block at all any more. If it's being called from main(), and that's all the program does, this will be a problem, because now it will just spawn a bunch of futures and then immediately shut down, having finished everything it was told to do. What you need to do is block on all these futures, but only once, at the top level, and only on the results of all the computations, not any intermediate ones.
Unfortunately, this Responder thing that's being returned by compute() no longer has a blocking apply() method like the Future did. I'm not sure why flatMapping Futures produces a generic Responder instead of a Future; this seems like an API mistake. But in any case, you should be able to make your own:
def claim[A](r:Responder[A]):A = {
import java.util.concurrent.ArrayBlockingQueue
import scala.actors.Actor.actor
val q = new ArrayBlockingQueue[A](1)
// uses of 'respond' need to be wrapped in an actor or future block
actor { r.respond(a => q.put(a)) }
return q.take
}
So now you can create a blocking call to compute in your main method like so:
val finalResult = claim(compute(input))

Related

For Comprehension of Futures - Kicking off new thread inside comprehension and disregarding result

I'm trying to use a for comprehension to both run some futures in order and merged results, but also kick off a separate thread after one of those futures completes and not care about the result (basically used to fire some logging info)
I've played around a bit with this with some thread sleeps and it looks like whatever i'm throwing inside the for block will end up blocking the thread.
private def testFunction(): EitherT[Future, Error, Response] =
for {
firstRes <- EitherT(client.getFirst())
secondRes <- EitherT(client.getSecond())
// Future i want to run on a separate async thread outside the comprehension
_ = runSomeLogging(secondRes)
thirdRes <- EitherT(client.getThird(secondRes.value))
} yield thirdRes
def runSomeLogging(): Future[Either[Error, Response]] =
Thread.sleep(10000)
Future.successful(Right(Response("123")))
}
So this above code will wait the 10 seconds before returning the thirdRes result. My hope was to kick off the runSomeLogging function on a separate thread after the secondRes runs. I thought the usage of = rather than <- would cause that, however it doesn't.
The way I am able to get this to work is below. Basically I run my second future outside of the comprehension and use .onComplete on the previous future to only run my logging if certain conditions were meant from the above comprehension. I only want to run this logging function if the secondRes function is successful in my example here.
private def runSomeLogging(response: SecondRes) =
Thread.sleep(10000)
response.value.onComplete {
case Success(either) =>
either.fold(
_ => { },
response => { logThing() }
)
case _ =>
}
private def testFunction(): EitherT[Future, Error, Response] =
val res = for {
firstRes <- EitherT(client.getFirst())
secondRes <- EitherT(client.getSecond())
thirdRes <- EitherT(client.getThird(secondRes.value))
} yield thirdRes
runSomeLogging(res)
return res
This second example works fine and does what I want, it doesn't block the for comprehension for 10 seconds from returning. However, because there are dependencies of this running for certain pieces of the comprehension, but not all of them, I was hoping there was a way to kick off the job from within the for comprehension itself but let it run on its own thread and not block the comprehension from completing.
You need a function that starts the Future but doesn't return it, so the for-comprehension can move on (since Future's map/flatMap functions won't continue to the next step until the current Future resolves). To accomplish a "start and forget", you need to use a function that returns immediately, or a Future that resolves immediately.
// this function will return immediately
def runSomeLogging(res: SomeResult): Unit = {
// since startLoggingFuture uses Future.apply, calling it will start the Future,
// but we ignore the actual Future by returning Unit instead
startLoggingFuture(res)
}
// this function returns a future that takes 10 seconds to resolve
private def startLoggingFuture(res: SomeResult): Future[Unit] = Future {
// note: please don't actually do Thread.sleep in your Future's thread pool
Thread.sleep(10000)
logger.info(s"Got result $res")
}
Then you could put e.g.
_ = runSomeLogging(res)
or
_ <- Future { runSomeLogging(res) }
in your for-comprehension.
Note, Cats-Effect and Monix have a nice abstraction for "start but ignore result", with io.start.void and task.startAndForget respectively. If you were using IO or Task instead of Future, you could use .start.void or .startAndForget on the logging task.

Mapping Iterator into Iterator[Future] is not behaving as expecting

Using Scala 2.13.0:
implicit val ec = ExecutionContext.global
val arr = (0 until 20).toIterator
.map { x =>
Thread.sleep(500);
println(x);
x
}
val fss = arr.map { slowX =>
Future { blocking { slowX } }
}
Await.result(Future.sequence(fss), Inf)
problem
arr is an iterator where each item needs 500ms processing time. We map the iterator with Future { blocking { ... }} with the purpose of making the processing parallel (using the global execution context). Finally we run Future.sequence
to consume the iterator.
Given the definition of Future.apply[T](body: =>T) and blocking[T](body: =>T), body is passed lazily, which means that body will be processed in the Future. If we inject that in the definition of Iterator.map, we get def next() = Future{blocking(self.next())}, so each item of the iterator should be processed in the Future.
But when I try this example however, I can see that the iterator is consumed sequentially, which is not what is expected!
Is that a Scala bug?? Or am I missing something?
No it's not a bug, because:
val arr = (0 until 20).toIterator
// this map invokes first and executed sequentially, because it executes in same thread.
.map { x =>
Thread.sleep(500);
println(x);
x
}
// This go sequentially because upstream map executed sequentially in same thread.
// So, "Future { blocking { slowX } }" can be replaced with "Future.successfull(slowX)"
// because no computation executed
val fss = arr.map { slowX =>
Future { blocking { slowX } }
}
If you want perform completely asynchronously, you can do something like:
def heavyCalculation(x: Int) = {
Thread.sleep(500);
println(x);
x
}
val result = Future.traverse((0 until 20).toList) { x =>
Future(blocking(heavyCalculation(x)))
}
Await.result(result, 1 minute)
Working Scatie example: https://scastie.scala-lang.org/3v06NpypRHKYkqBgzaeVXg
First, this is not a proper benchmark, you actually haven't show formal proof that this is sequential and not parallel (although is "obvious" from the source code that it isn't).
Second, and Iterator of Futures is probably a bad idea; at this point, it may make sense to look into a streaming solution like Akka-Streams, fs2, Monix or ZIO.
Third, what is even the point of having a bunch of blocking futures? you aren't actually winning too much.
Fourth, the problem is that the second map is not passing the block of the first map, just the result. So, you actually did the sleep before creating the Future.
Fifth, you probably want to do this instead.
val result = Future.traverse(data) { elem =>
Future {
blocking {
// Process elem here.
}
}
}
Await.result(result, Inf)
The other answers were pointing in the right direction, but the formal answer is the following: the signature of Iterator.map(f: A => B) tells us that A that A is computed before f is applied to it (because it is not => A). Therefore, next() is computed in the main thread.

Get actual value in Future Scala [duplicate]

I am a newbie to scala futures and I have a doubt regarding the return value of scala futures.
So, generally syntax for a scala future is
def downloadPage(url: URL) = Future[List[Int]] {
}
I want to know how to access the List[Int] from some other method which calls this method.
In other words,
val result = downloadPage("localhost")
then what should be the approach to get List[Int] out of the future ?
I have tried using map method but not able to do this successfully.`
The case of Success(listInt) => I want to return the listInt and I am not able to figure out how to do that.
The best practice is that you don't return the value. Instead you just pass the future (or a version transformed with map, flatMap, etc.) to everyone who needs this value and they can add their own onComplete.
If you really need to return it (e.g. when implementing a legacy method), then the only thing you can do is to block (e.g. with Await.result) and you need to decide how long to await.
You need to wait for the future to complete to get the result given some timespan, here's something that would work:
import scala.concurrent.duration._
def downloadPage(url: URL) = Future[List[Int]] {
List(1,2,3)
}
val result = downloadPage("localhost")
val myListInt = result.result(10 seconds)
Ideally, if you're using a Future, you don't want to block the executing thread, so you would move your logic that deals with the result of your Future into the onComplete method, something like this:
result.onComplete({
case Success(listInt) => {
//Do something with my list
}
case Failure(exception) => {
//Do something with my error
}
})
I hope you already solved this since it was asked in 2013 but maybe my answer can help someone else:
If you are using Play Framework, it support async Actions (actually all Actions are async inside). An easy way to create an async Action is using Action.async(). You need to provide a Future[Result]to this function.
Now you can just make transformations from your Future[List[Int]] to Future[Result] using Scala's map, flatMap, for-comprehension or async/await. Here an example from Play Framework documentation.
import play.api.libs.concurrent.Execution.Implicits.defaultContext
def index = Action.async {
val futureInt = scala.concurrent.Future { intensiveComputation() }
futureInt.map(i => Ok("Got result: " + i))
}
You can do something like that. If The wait time that is given in Await.result method is less than it takes the awaitable to execute, you will have a TimeoutException, and you need to handle the error (or any other error).
import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.util.{Try, Success, Failure}
import scala.concurrent.duration._
object MyObject {
def main(args: Array[String]) {
val myVal: Future[String] = Future { silly() }
// values less than 5 seconds will go to
// Failure case, because silly() will not be done yet
Try(Await.result(myVal, 10 seconds)) match {
case Success(extractedVal) => { println("Success Happened: " + extractedVal) }
case Failure(_) => { println("Failure Happened") }
case _ => { println("Very Strange") }
}
}
def silly(): String = {
Thread.sleep(5000)
"Hello from silly"
}
}
The best way I’ve found to think of a Future is a box that will, at some point, contain the thing that you want. The key thing with a Future is that you never open the box. Trying to force open the box will lead you to blocking and grief. Instead, you put the Future in another, larger box, typically using the map method.
Here’s an example of a Future that contains a String. When the Future completes, then Console.println is called:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
object Main {
def main(args:Array[String]) : Unit = {
val stringFuture: Future[String] = Future.successful("hello world!")
stringFuture.map {
someString =>
// if you use .foreach you avoid creating an extra Future, but we are proving
// the concept here...
Console.println(someString)
}
}
}
Note that in this case, we’re calling the main method and then… finishing. The string’s Future, provided by the global ExecutionContext, does the work of calling Console.println. This is great, because when we give up control over when someString is going to be there and when Console.println is going to be called, we let the system manage itself. In constrast, look what happens when we try to force the box open:
val stringFuture: Future[String] = Future.successful("hello world!")
val someString = Future.await(stringFuture)
In this case, we have to wait — keep a thread twiddling its thumbs — until we get someString back. We’ve opened the box, but we’ve had to commandeer the system’s resources to get at it.
It wasn't yet mentioned, so I want to emphasize the point of using Future with for-comprehension and the difference of sequential and parallel execution.
For example, for sequential execution:
object FuturesSequential extends App {
def job(n: Int) = Future {
Thread.sleep(1000)
println(s"Job $n")
}
val f = for {
f1 <- job(1)
f2 <- job(2)
f3 <- job(3)
f4 <- job(4)
f5 <- job(5)
} yield List(f1, f2, f3, f4, f5)
f.map(res => println(s"Done. ${res.size} jobs run"))
Thread.sleep(6000) // We need to prevent main thread from quitting too early
}
And for parallel execution (note that the Future are before the for-comprehension):
object FuturesParallel extends App {
def job(n: Int) = Future {
Thread.sleep(1000)
println(s"Job $n")
}
val j1 = job(1)
val j2 = job(2)
val j3 = job(3)
val j4 = job(4)
val j5 = job(5)
val f = for {
f1 <- j1
f2 <- j2
f3 <- j3
f4 <- j4
f5 <- j5
} yield List(f1, f2, f3, f4, f5)
f.map(res => println(s"Done. ${res.size} jobs run"))
Thread.sleep(6000) // We need to prevent main thread from quitting too early
}

scala's for yield comprehension used with Future. How to wait until future has returned?

I have a function which provides a Context:
def buildContext(s:String)(request:RequestHeader):Future[Granite.Context] = {
.... // returns a Future[Granite.Context]
}
I then have another function which uses a Context to return an Option[Library.Document]:
def getDocument(tag: String):Option[Library.Document] = {
val fakeRequest = play.api.test.FakeRequest().withHeaders(CONTENT_TYPE -> "application/json")
val context = buildContext(tag)(fakeRequest)
val maybeDoc = context.getDocument //getDocument is defined on Granite.Context to return an Option[Library.Document]
}
How would this code take into account if the Future has returned or not? I have seen for/yield used to wait for the return but I always assumed that a for/yield just flatmaps things together and has nothing really to do with waiting for Futures to return. I'm kinda stuck here and don't really no the correct question to ask!
The other two answers are misleading. A for yield in Scala is a compiler primitive that gets transformed into map or flatMap chains. Do not use Await if you can avoid it, it's not a simple issue.
You are introducing blocking behaviour and you have yet to realise the systemic damage you are doing when blocking.
When it comes to Future, map and flatMap do different things:
map
is executed when the future completes. It's an asynchronous way to do a type safe mapping.
val f: Future[A] = someFutureProducer
def convertAToB(a: A): B = {..}
f map { a => convertAToB(a) }
flatMap
is what you use to chain things:
someFuture flatMap {
_ => {
someOtherFuture
}
}
The equivalent of the above is:
for {
result1 <- someFuture
result2 <- someOtherFuture
} yield result2
In Play you would use Async to handle the above:
Async {
someFuture.map(i => Ok("Got result: " + i))
}
Update
I misunderstood your usage of Play. Still, it doesn't change anything. You can still make your logic asynchronous.
someFuture onComplete {
case Success(result) => // doSomething
case Failure(err) => // log the error etc
}
The main difference when thinking asynchronously is that you always have to map and flatMap and do everything else inside Futures to get things done. The performance gain is massive.
The bigger your app, the bigger the gain.
When using a for-comprehension on a Future, you're not waiting for it to finish, you're just saying: when it is finished, use it like this, and For-comprehension returns another Future in this case.
If you want to wait for a future to finish, you should use the Await as follows:
val resultContext = Await.result(context , timeout.duration)
Then run the getDocument method on it as such:
val maybeDoc = resultContext.getDocument
EDIT
The usual way to work with Futures however is to wait until the last moment before you Await. As pointed out by another answer here, Play Framework does the same thing by allowing you to return Future[Result]. So, a good way to do things would be to only use for-comprehensions and make your methods return Futures, etc, until the last moment when you want to finally return your result.
You can use scala.concurrent.Await for that:
import scala.concurrent.duration._
import scala.concurrent.Await
def getDocument(tag: String):Option[Library.Document] = {
val fakeRequest = play.api.test.FakeRequest().withHeaders(CONTENT_TYPE -> "application/json")
val context = Await.result(buildContext(tag)(fakeRequest), 42.seconds)
val maybeDoc = context.getDocument
}
But Await will block thread while future is not completed, so would be better either make buildContext a synchronous operation returning Granite.Context, or make getDocument async too, returning Future[Option[Library.Document]].
Once you are in a future you must stay in the future or you must wait until the future arrives.
Waiting is usually a bad idea because it blocks your execution, so you should work in the future.
Basically you should change your getDocument method to return a Future to something like getDocument(tag: String):Future[Option[Library.Document]]
Then using map ro flatMap, you chain your future calls:
return buildContext(tag)(fakeRequest).map(_.getDocument)
If buildContext fails, map will wrap the Failure
Then call
getDocument("blah").onComplete {
case Success(optionalDoc) => ...
case Failure(e) =>...
}

akka dataflow and side effects

I'm working with akka dataflow and I'd like to know if there is a way to cause a particular block of code to wait for the completion of a future, without explicitly using the value of that future.
The actual use case is that I have a file and I want the file to be deleted when a particular future completes, but not before. Here is a rough example. First imagine I have this service:
trait ASync {
def pull: Future[File]
def process(input : File): Future[File]
def push(input : File): Future[URI]
}
And I have a workflow I want to run in a non-blocking way:
val uriFuture = flow {
val pulledFile = async.pull(uri)
val processedile = async.process(pulledFile())
val storedUri = async.push(processedFile())
// I'd like the following line executed only after storedUri is completed,
// not as soon as pulled file is ready.
pulledFile().delete()
storedUri()
}
You could try something like this:
val uriFuture = flow {
val pulledFile = async.pull(uri)
val processedile = async.process(pulledFile())
val storedUri = for(uri <- async.push(processedFile())) yield {
pulledFile().delete()
uri
}
storedUri()
}
In this example, pulledFile.delete will only be called if the Future from push succeeds. If it fails, delete will not be called. The result of the storedUri future will still be the result of the call to push.
Or another way would be:
val uriFuture = flow {
val pulledFile = async.pull(uri)
val processedile = async.process(pulledFile())
val storedUri = async.push(processedFile()) andThen{
case whatever => pulledFile().delete()
}
storedUri()
}
The difference here is that delete will be called regardless of if push succeeds or fails. The result of storedUri still will be the result of the call to push.
You can use callbacks for non-blocking workflow:
future onSuccess {
case _ => file.delete() //Deal with cases obviously...
}
Source: http://doc.akka.io/docs/akka/snapshot/scala/futures.html
Alternatively, you can block with Await.result:
val result = Await.result(future, timeout.duration).asInstanceOf[String]
The latter is generally used when you NEED to block - eg in test cases - while non blocking is more performant as you don't park a thread to spin up another thread only to resume the other thread again - that's slower than an asynchronous activity because of the resource management overhead.
The typesafe staff are calling it "Reactive". That's a little bit of a buzzword. I would laugh if you used it in the workplace.